> For the complete documentation index, see [llms.txt](https://book.ice-wzl.xyz/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://book.ice-wzl.xyz/malware-analysis/reversing-malicious-code.md).

# Reversing Malicious Code

* **Goal is to understand common malware characteristics at a code level**
* May include potential branches of execution with code analysis
* **Overview of the code lifecycle**
* Source code is translated into object code by a compiler
* Object code is then combined with libraries and an executable file is created
* To run the file, the operating system reads various information from the executable file, allocates memory, and loads required libraries into memory
* Control is transferred to the code to execute
* At this final stage is where we examine the code with a debugger
* Note: Libraries may be loaded during the programs execution

### Ghidra

* Developed by NSA
* Its decompiler produces a C representation of the code to speed up analysis
* Includes support for writing java and python scripts to automate analysis
* Help is accessed via F1 key
* Ghidra v10 includes a debugger
* <https://ghidra-sre.org/>

### **Create a new project**

* `File --> New Project`
* Choose the project type
* Click `Finish`
* Drag and drop the specimen into the project window
* Accept defaults in the `Imports` windows and click Ok

### **Launch the code browser and being the auto-analysis**

* Make sure to enable `WindowsPE x86 Propagate External Parameters` option
* Finally click the Analyze button and wait for Ghidra to finish
* Once auto analysis is completed an Auto Analysis summary will show any warnings or issues encountered during the process
* A common warning is that the file does not `contain debug information`
* This is common and not an issue

**Before Proceeding save the project and take a snapshot**

### Ghidra Overview

* Main window is the `Listing View` which presents the target programs code and data
* Will initially bring you to the beginning of the file in the `Listing View` --> notice the `MZ` string
* If you scroll down from there you can examine the programs header

**Program Tree**

* Window is in the top left and shows the different sections and headers
* Section names are typically:

```
.text - Contains executable code
.rdata - Contains read-only data
.data - Contains data 
.reloc - Contains relocation data to fix up addresses in the file if it is not loaded at the prefered address
```

* To jump to the `.text` section double click the `.text` node
* <https://docs.microsoft.com/en-us/windows/win32/debug/pe-format>

### **FUN in Ghidra**

* In Ghidra the `FUN_` prefix generically refers to a function while the numeric value refers to the address where the function is loaded into memory
* Original name of the function is normally lost during compilation
* Execution occurs linearly one instruction after the next
* On the far left you will have a 32 bit address such as `00401007` (hex)
* This address represents the location of code in memory after the program is loaded, not the address of a location on disk i.e. within a file hex editor
* On the right there are x86 assembly instructions
* **Note:** - This is the beginning of the `.text` section, not the beginning of the program, that occurs at the entry point

### **Function graph view provides a visual perspective on code**

* Click on the function you want i.e. `FUN_00401007`
* Browse to `Window --> Function Graph` menu item
* Helpful for visualizing loops and complex conditionals within a function but the `Listing view` is more compact nd easier for some people to navigate
* **The color of the arrows symbolize code flow**
* If the code block ends in a conditional jump green arrows indicate the path here execution will continue if the condition is met
* If the condition is not met a red arrow will show where execution continues
* If the arrow is blue the code ends in an unconditional jump
* **View Imports to review a programs external dependencies**
* The import address table (IAT) helps direct code analysis
* You can view imports in the Symbol Tree window but we will access this information via `Window --> Symbol References`
* Filter symbols by "`Imported`" to focus on dependencies

### **Look for API call patterns associated with malware behavior**

* We can examine imports to identify potential functionality associated with common malware characteristics
* **Learn more about an API call at microsoft.com**
* Types of API Calls:

```
A --> (ANSI)
W --> (Wide)
Ex --> (Extended)
```

* Refers to if the function supports ANSI (8 bit character)
* Wide refers to a two byte character representation (UTF-16)
* Extended is when MSFT updates a function and the new function is not compatible with the old one
* **Instructions reference registers, immediate values and memory**
* Instructions have two components: `operation and operand`
* Instructions can have 0-3 operands
* An Operand can be:

```
A register
A memory location 
An immediate value e.g. 0x6453)
```

* Consider `MOV EAX, 0x6453`
* EAX is the destination (first)
* 0x6453 is the source (second)
* You are setting EAX to the value 0x6453
* Operands may be implied

**Intel processor uses registers to track the state of computation as instructions are executed**

* Registers are on chip memory locations
* Instructions act on registers and memory locations
* A CPU has a series of registers

```
Some registers are general purpose
Some have a particular use
Some are both
```

* We monitor registers to track arguments, variables, and function return values
* **The x86 architecture uses the following general purpose registers to hold code and data**

```
EAX --> Used for addition, multiplication, and return values
ECX --> Used as a counter 
EBP --> Used to reference arguments and local variables
ESP --> Points to the last item on the stack 
ESI/EDI --> Used by memory to transfer instructions 
```

### **Special use registers hold flags and track program execution**

* `EIP` points to the next instruction to execute
* `EFLAGS` bit represents the outcome of computers and they control CPU operations

### **Segment registers include:**

```
CS - Code segment
DS - Data segment 
SS - Stack segment 
```

* **32 bit registers can also be accessed as 16 and 8 bit registers**
* On 32 bit arch, registers can be accessed by their default `dword` size
* To access a registers lower `16 bits` the leading `E` is omitted from the name e.g. `EAX` becomes `AX`
* **The naming scheme for `EAX EBX ECX EDX` is as followed**
* `E<letter>X` --> `dword` 32 bit value of the register
* `<letter>X` --> lower word 16 bit value of the register
* `<letter>H` --> high byte 8 bit of the `<letter>X` value of the register
* `<letter>L` --> low byte 8 bit of the `letterX>` value of the register

```
EAX means 32 bits 
AX means the low 16 bit value 
AH means the high 8 bytes of AX 
AL means the low 8 bits of AX
```

* **The length of a word, dword, and qword are 16, 32, and 64 bits**
* A `word` in assembly is the natural size for a unit of data
* `16 bit` processor has `16-bit` words
* Many tools consider a word to be 16 bits regardless of processor size
* Additional common data sizes:

```
8 bits --> 1 byte 
32 bits --> dword 
64 bits --> qword
```

* **The operand for one push instruction is a pointer to a string**
* A `pointer` is a variable that holds a memory address (it points to a memory location)
* When the address that the pointer points to is accessed it is called dereferencing because the pointer references another location in memory
* Pointers are more efficient, rather than copying around a data structure in memory its more efficient to copy the value of a pointer (4 bytes on 32 bit systems)
* A `PUSH` instruction before a `CALL` often represents arguments passed to the function specified by the `CALL`

### **Memory can be accessed directly by many assembly instructions**

* Example:

```
MOV EAX, [0x410230]
```

* Brackets mean fetch data at the specified address (dereference)
* This is direct addressing because we are dereferencing an immediate value
* The result is that 4 bytes of data at 0x410230 will be moved to `EAX`
* Some tools like `IDA` omit brackets for direct addresses (`IDA: dword_410230`)
* **Memory may also be addressed by reference indirectly**
* The address may be calculated or in a register
* This is called an `Effective Address` and it enables us to work efficiently with data structures
* Format: `Base + (Index * Scale) + Displacement`

```
BASE        Index   Scale       Displacement
(EAX EBX) + (EAX EBX  1)   +     (None)
(ECX EDX) + (ECX EDX  2)   +     (8 bit value)
(ESP EBP) + (EBP ESI  4)   +     (16 bit value)
(ESI EDI) + (EDI      8)   +     (32 bit value)
```

* **Indirect Referencing:** address of the destination is calculated or it resides in a register. The calculated address is called the effective address (EA)
* If the address sits in a register, it is still different from direct memory addressing where the register is the destination
* In indirect memory addressing the register holds the address of the destination.
* Large advantage of indirect memory addressing is the capability to efficiently work with data structures
* You can increment the value of a single register to step through fields of a data structure or the same field of an array of data structures
* If the scale is used and index register must also be used
* **Examples of indirectly addressing memory**
* `[EAX]` : Access dynamically allocated memory (base)
* `[EBP + 0x10]` : Access data on the stack (base + displacement)
* `[EAX + EBX * 8]` : Access an array with 8-byte structure ( base + index \* scale)
* `EAX +EBX + 0xC]` : Access fields of a two dimensional array of structures (base + index + displacement)
* Indirect memory addressing may pose challenges for static code analysis because registers are not populated until runtime
* **Strings are an example of a data structure**
* Data structures groups simple variables into more complex types
* Examples of data structures include: strings, linked lists, sockets, and file handles
* When reversing determine the type of data structure by usage
* Data structures enable us to group bytes and advance our understanding of the code

### \*\*Code vs Data \*\*

* Context determines the answer
* `RegOpenKeyExA` Example
* The API call will have to have a symbolic constant i.e. `PUSH 0x80000001`
* During compilation it will be changed from the symbolic constant into the hex representation
* Right click the hex value, choose `Set Equate` and then choose `HKEY_CURRENT_USER` to change it back to the symbolic constant
* Will bring clarity to the code

### **Branch instructions direct code execution to another location**

* The flow of execution i.e. control flow is sequential until a branching instruction is reached
* Then the `EIP` is updated and execution is transferred to another location in memory
* **The code under review contains two types of jumps**
* Jumps are an example of a branching instruction
* Unconditional jumps always perform a jump `JMP, CALL, RET`
* Conditional jumps only jump if a condition is met: `JCC, Loop`
* Conditional jump represents a decision point
* **Conditional jumps require that we review multiple instructions**
* To evaluate whether a conditional is true, arithmetic instructions and Boolean are used
* `sub ecx, 8` Will test if ECX is equal to 8
* `and eax, eax` will test if EAX is equal to zero
* If the result of zero then the `ZF` bit is set in the flags register

### Jumps

* A `Jcc` instruction will be performed if a jump condition is met
* Form: `Jcc`

```
A --> jump if Above 
B --> jump if Below
E --> jump if jmp if equal 
G --> jump if greater 
L --> jump if less than 
Z --> jump if if zero 
N --> jump if not condition JNZ jump if not zero 
```

### Comments

* Use the `;` key to add a comment
* Can add EOL comments, Pre, Post or other types of comments

### HTTP Command and Control

* These APIS enable HTTP C2

```
InternetOpen, InternetConnect --> Create an HTTP connection
HttpOpenRequest, HttpAddRequestHeaders (Optional) --> Build an HTTP request
HttpSendRequest --> Send an HTTP request
InternetReadFile --> Read a response 
```

* To view the API calls

```
Window --> Symbol References --> Locate API's of interest in the Symbol Table
```

* **The code references variables, which holds code or data not known at compile time**
* Local variables are relevant for the current function and are not saved
* Local variables are stored on the stack relative to `ESP` and `EBP`
* Global variables are accessible from all functions e.g. `DAT_00403374`
* Also static variables can be only used from within the function that allocates it, but unlike local variables it does not get marked for reuse when the function exists

### Viewing Function Call Trees

* `Window --> Function Call Tree`
* View the outgoing calls on the right side
* View is ideal for determining which functions are called from the current function
* Once you determine what the current function is being used for make sure to `Rick Click --> Edit Label` and give it a meaningful name

### GetTempFileNameW

* Creates a file name for a temp file
* Can explore other function references to find new IOCs
* Look for a `PUSH` to `lpPrefixString_XXXXXX`
* MSFT documentation states the first three characters make up the temp file name prefix
* To assist Ghidra:

```
Right click on the lpPrefixString --> Click data --> terminate Unicode
```

### Functions

* A function is a group of instructions that performs a specific task (read, write files, send network data, log keystrokes)
* Three Basic Components

```
Input: values passed int
Body: code to perform tasks
Return: value passed back
```

* Calling a function involves a jump to another memory location
* After the function is done execution continues at the instruction after the original function call
* **Calling a function involves two control transfers**
* Function format: `return = function(arg0, arg1)`
* Specific events occur when calling a function

```
Pass in parameters (stack/register)
Save the return pointer 
Transfer control to the funciton 
```

* Specific events occur when returning from a function

```
Set up a return value (typically EAX)
Clean up the stack and restore registers 
Transfer control to the saved return pointer
```

* **Within a function, the prologue and epilogue perform setup and cleanup activities**
* Most functions contain a standard prologue and epilogue
* The prologue occurs at the start of the function

```
Allocates space for variables
Saves resisters that will be reused in the function body
```

* Function epilogue occurs at the end of the function

```
It cleans up the stack e.g. POP allocated variables
It restores registers
```

* **The stack is a section in memory used to store saved registers, local variables and function parameters**
* The stack is LIFO Last in First out
* `PUSH` adds an element and `POP` removes one
* `ESP` points to the next item on the stack and changes with instructions like `PUSH POP CALL LEAVE RET`
* `EBP a.k.a frame pointer` serves as an unchanging reference
* `EBP - value = local variable` registers may also be used
* `EBP + value = parameter`
* When `EBP` is set up in the function prologue in this manner, it means that when you see code reference `EBP` minus some value i.e. `[EBP -8]` it is accessing a local variable
* When its `EBP` plus some value i.e. `[EBP +8]` it is referencing a parameter that was passed in
* When cleaning up the stack compilers use some tricks
* Compilers may `POP` off a value i.e. `POP EDX` which has the result of adding four to `ESP`
* It is also very common to see a value added to `ESP` the used of the `RET` (which can also pop stuff off the stack, and the `leave` instruction

### Functions are called according to calling conventions

* The convention describes how data is passed into and out of functions
* The implementation of the convention may vary by compiler
* The `cdecl` convention (most common) has these characteristics

```
The arguments are placed onto the stack right to left
The return value is placed into EAX
The caller cleans up the stack (removes the arguments)
```

* The `stdcall` convention has the following characteristics

```
Similar to cdecl but the callee cleans up the stack 
This is the convention used in !IN32 APIs
```

* **Additional calling conventions include fastcall and thiscall**
* `fastcall`
* Arguments are stored in registers
* Any extra arguments are placed on the stack
* The callee cleans up arguments on the stack
* `thiscall`
* Used in C++ code (member functions)
* This convention includes a reference to this pointer
* For MSFT compilers, ECX holds the "this" pointer and the callee cleans up the arguments on the stack
* For GNU compilers the "this" pointer is pushed onto the stack last and the caller cleans up
* **Reviewing strings reveals filenames and directories of interest**
* To Locate a reference to a string right click on it and choose to show references

### Loops in malware

* Used to encrypt and decrypt network traffic --> loop over each character in the string to send
* Attempt to connect to C2 server --> loop over a lists of servers
* Perform a port scan --> try to connect to a port 1-65535
* Log keystrokes --> Check state for each key code 0...92
* Similar to JCC the Cs in `LOOPcc` represent the conditional code that must be met for the loop instruction to branch to the address specified
* The conditions are:

```
Z --> Loop if zero 
E --> Loop if equal
N --> Inverts the logic of the looping condition
```

### Reviewing imports to direct our code analysis

* The import table lists functions used to access the resource section

```
FindResourceW --> determine the location of a resource
SizeofResource --> obtain the size of a resource
LockResource --> obtain a pointer to a resource
```

* The resource `.rsrc` section is often used to store information like icons, dialog boxes, and version information
* However malware may hide executables here
* Malware that drops files is called a `dropper`

### CreateMutexA

* `CreateMutexA` --> creates or opens a mutex object
* Malware authors often use a mutex to avoid re-infecting a machine

### Keylogging

* `GetKeyState` and `GetAsyncKeyState` --> Determine if a particular key is pressed
* `GetWindowText` --> Retrieves text from a windows title bar
* `OpenClipboard`, `GetClipboardData`, and `CloseClipboard` --> Opens the clipboard for access, gathers data, and then closes the clipboard
* `GetWindowText` --> obtains the text of a windows title bar, combined with the two previous APIs an attacker could learn about what keys are pressed and what the application context is.
* `GetAsyncKeyState` determines if a key is currently up or down or if it was pressed since the last call to the API

### 64 Bit Malware

* Vast majority is 32 bit
* We will see more 64 bit in the future as they become the standard
* Two types of 64 bit malware have been common

```
Browser Helper Objects for 64 bit Internet Explorer
Device Drivers (rootkits) for Windows x64
```

### **Analyze 32-bit malware on 64-bit OS with caution**

* 32 bit code running on a 64 bit operating systems runs in the `WOW64 Subsystem`
* 32 bit executables load 32 bit dlls
* 32 bit dlls are located in `%SystemRoot%\Syswow64`
* 32 bit processes reference Software hive registry values in `Wow6432Node` using registry redirection
* Some executables run subtly different under WoW64 than on a native 32 bit OS

### 64-Bit Assembly Differences

* All general purpose registers are expanded to 64 bits
* `EAX` --> `RAX`
* There are eight new general purpose registers `R8 --> R15`
* Special use registers are exted and renamed `EIP --> RIP`
* `RSP` not `RBP` is often used to access parameters and variables
* Calling convention resembles `fastcall` (parameters via registers)

```
First four parameters are passed in RCX RDX R8 R9
Additional parameters are stored on the stack 
```

* There is a new addressing mode (`RIP` + displacement)


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://book.ice-wzl.xyz/malware-analysis/reversing-malicious-code.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
