# Reversing Malicious Code

* **Goal is to understand common malware characteristics at a code level**
* May include potential branches of execution with code analysis
* **Overview of the code lifecycle**
* Source code is translated into object code by a compiler
* Object code is then combined with libraries and an executable file is created
* To run the file, the operating system reads various information from the executable file, allocates memory, and loads required libraries into memory
* Control is transferred to the code to execute
* At this final stage is where we examine the code with a debugger
* Note: Libraries may be loaded during the programs execution

### Ghidra

* Developed by NSA
* Its decompiler produces a C representation of the code to speed up analysis
* Includes support for writing java and python scripts to automate analysis
* Help is accessed via F1 key
* Ghidra v10 includes a debugger
* <https://ghidra-sre.org/>

### **Create a new project**

* `File --> New Project`
* Choose the project type
* Click `Finish`
* Drag and drop the specimen into the project window
* Accept defaults in the `Imports` windows and click Ok

### **Launch the code browser and being the auto-analysis**

* Make sure to enable `WindowsPE x86 Propagate External Parameters` option
* Finally click the Analyze button and wait for Ghidra to finish
* Once auto analysis is completed an Auto Analysis summary will show any warnings or issues encountered during the process
* A common warning is that the file does not `contain debug information`
* This is common and not an issue

**Before Proceeding save the project and take a snapshot**

### Ghidra Overview

* Main window is the `Listing View` which presents the target programs code and data
* Will initially bring you to the beginning of the file in the `Listing View` --> notice the `MZ` string
* If you scroll down from there you can examine the programs header

**Program Tree**

* Window is in the top left and shows the different sections and headers
* Section names are typically:

```
.text - Contains executable code
.rdata - Contains read-only data
.data - Contains data 
.reloc - Contains relocation data to fix up addresses in the file if it is not loaded at the prefered address
```

* To jump to the `.text` section double click the `.text` node
* <https://docs.microsoft.com/en-us/windows/win32/debug/pe-format>

### **FUN in Ghidra**

* In Ghidra the `FUN_` prefix generically refers to a function while the numeric value refers to the address where the function is loaded into memory
* Original name of the function is normally lost during compilation
* Execution occurs linearly one instruction after the next
* On the far left you will have a 32 bit address such as `00401007` (hex)
* This address represents the location of code in memory after the program is loaded, not the address of a location on disk i.e. within a file hex editor
* On the right there are x86 assembly instructions
* **Note:** - This is the beginning of the `.text` section, not the beginning of the program, that occurs at the entry point

### **Function graph view provides a visual perspective on code**

* Click on the function you want i.e. `FUN_00401007`
* Browse to `Window --> Function Graph` menu item
* Helpful for visualizing loops and complex conditionals within a function but the `Listing view` is more compact nd easier for some people to navigate
* **The color of the arrows symbolize code flow**
* If the code block ends in a conditional jump green arrows indicate the path here execution will continue if the condition is met
* If the condition is not met a red arrow will show where execution continues
* If the arrow is blue the code ends in an unconditional jump
* **View Imports to review a programs external dependencies**
* The import address table (IAT) helps direct code analysis
* You can view imports in the Symbol Tree window but we will access this information via `Window --> Symbol References`
* Filter symbols by "`Imported`" to focus on dependencies

### **Look for API call patterns associated with malware behavior**

* We can examine imports to identify potential functionality associated with common malware characteristics
* **Learn more about an API call at microsoft.com**
* Types of API Calls:

```
A --> (ANSI)
W --> (Wide)
Ex --> (Extended)
```

* Refers to if the function supports ANSI (8 bit character)
* Wide refers to a two byte character representation (UTF-16)
* Extended is when MSFT updates a function and the new function is not compatible with the old one
* **Instructions reference registers, immediate values and memory**
* Instructions have two components: `operation and operand`
* Instructions can have 0-3 operands
* An Operand can be:

```
A register
A memory location 
An immediate value e.g. 0x6453)
```

* Consider `MOV EAX, 0x6453`
* EAX is the destination (first)
* 0x6453 is the source (second)
* You are setting EAX to the value 0x6453
* Operands may be implied

**Intel processor uses registers to track the state of computation as instructions are executed**

* Registers are on chip memory locations
* Instructions act on registers and memory locations
* A CPU has a series of registers

```
Some registers are general purpose
Some have a particular use
Some are both
```

* We monitor registers to track arguments, variables, and function return values
* **The x86 architecture uses the following general purpose registers to hold code and data**

```
EAX --> Used for addition, multiplication, and return values
ECX --> Used as a counter 
EBP --> Used to reference arguments and local variables
ESP --> Points to the last item on the stack 
ESI/EDI --> Used by memory to transfer instructions 
```

### **Special use registers hold flags and track program execution**

* `EIP` points to the next instruction to execute
* `EFLAGS` bit represents the outcome of computers and they control CPU operations

### **Segment registers include:**

```
CS - Code segment
DS - Data segment 
SS - Stack segment 
```

* **32 bit registers can also be accessed as 16 and 8 bit registers**
* On 32 bit arch, registers can be accessed by their default `dword` size
* To access a registers lower `16 bits` the leading `E` is omitted from the name e.g. `EAX` becomes `AX`
* **The naming scheme for `EAX EBX ECX EDX` is as followed**
* `E<letter>X` --> `dword` 32 bit value of the register
* `<letter>X` --> lower word 16 bit value of the register
* `<letter>H` --> high byte 8 bit of the `<letter>X` value of the register
* `<letter>L` --> low byte 8 bit of the `letterX>` value of the register

```
EAX means 32 bits 
AX means the low 16 bit value 
AH means the high 8 bytes of AX 
AL means the low 8 bits of AX
```

* **The length of a word, dword, and qword are 16, 32, and 64 bits**
* A `word` in assembly is the natural size for a unit of data
* `16 bit` processor has `16-bit` words
* Many tools consider a word to be 16 bits regardless of processor size
* Additional common data sizes:

```
8 bits --> 1 byte 
32 bits --> dword 
64 bits --> qword
```

* **The operand for one push instruction is a pointer to a string**
* A `pointer` is a variable that holds a memory address (it points to a memory location)
* When the address that the pointer points to is accessed it is called dereferencing because the pointer references another location in memory
* Pointers are more efficient, rather than copying around a data structure in memory its more efficient to copy the value of a pointer (4 bytes on 32 bit systems)
* A `PUSH` instruction before a `CALL` often represents arguments passed to the function specified by the `CALL`

### **Memory can be accessed directly by many assembly instructions**

* Example:

```
MOV EAX, [0x410230]
```

* Brackets mean fetch data at the specified address (dereference)
* This is direct addressing because we are dereferencing an immediate value
* The result is that 4 bytes of data at 0x410230 will be moved to `EAX`
* Some tools like `IDA` omit brackets for direct addresses (`IDA: dword_410230`)
* **Memory may also be addressed by reference indirectly**
* The address may be calculated or in a register
* This is called an `Effective Address` and it enables us to work efficiently with data structures
* Format: `Base + (Index * Scale) + Displacement`

```
BASE        Index   Scale       Displacement
(EAX EBX) + (EAX EBX  1)   +     (None)
(ECX EDX) + (ECX EDX  2)   +     (8 bit value)
(ESP EBP) + (EBP ESI  4)   +     (16 bit value)
(ESI EDI) + (EDI      8)   +     (32 bit value)
```

* **Indirect Referencing:** address of the destination is calculated or it resides in a register. The calculated address is called the effective address (EA)
* If the address sits in a register, it is still different from direct memory addressing where the register is the destination
* In indirect memory addressing the register holds the address of the destination.
* Large advantage of indirect memory addressing is the capability to efficiently work with data structures
* You can increment the value of a single register to step through fields of a data structure or the same field of an array of data structures
* If the scale is used and index register must also be used
* **Examples of indirectly addressing memory**
* `[EAX]` : Access dynamically allocated memory (base)
* `[EBP + 0x10]` : Access data on the stack (base + displacement)
* `[EAX + EBX * 8]` : Access an array with 8-byte structure ( base + index \* scale)
* `EAX +EBX + 0xC]` : Access fields of a two dimensional array of structures (base + index + displacement)
* Indirect memory addressing may pose challenges for static code analysis because registers are not populated until runtime
* **Strings are an example of a data structure**
* Data structures groups simple variables into more complex types
* Examples of data structures include: strings, linked lists, sockets, and file handles
* When reversing determine the type of data structure by usage
* Data structures enable us to group bytes and advance our understanding of the code

### \*\*Code vs Data \*\*

* Context determines the answer
* `RegOpenKeyExA` Example
* The API call will have to have a symbolic constant i.e. `PUSH 0x80000001`
* During compilation it will be changed from the symbolic constant into the hex representation
* Right click the hex value, choose `Set Equate` and then choose `HKEY_CURRENT_USER` to change it back to the symbolic constant
* Will bring clarity to the code

### **Branch instructions direct code execution to another location**

* The flow of execution i.e. control flow is sequential until a branching instruction is reached
* Then the `EIP` is updated and execution is transferred to another location in memory
* **The code under review contains two types of jumps**
* Jumps are an example of a branching instruction
* Unconditional jumps always perform a jump `JMP, CALL, RET`
* Conditional jumps only jump if a condition is met: `JCC, Loop`
* Conditional jump represents a decision point
* **Conditional jumps require that we review multiple instructions**
* To evaluate whether a conditional is true, arithmetic instructions and Boolean are used
* `sub ecx, 8` Will test if ECX is equal to 8
* `and eax, eax` will test if EAX is equal to zero
* If the result of zero then the `ZF` bit is set in the flags register

### Jumps

* A `Jcc` instruction will be performed if a jump condition is met
* Form: `Jcc`

```
A --> jump if Above 
B --> jump if Below
E --> jump if jmp if equal 
G --> jump if greater 
L --> jump if less than 
Z --> jump if if zero 
N --> jump if not condition JNZ jump if not zero 
```

### Comments

* Use the `;` key to add a comment
* Can add EOL comments, Pre, Post or other types of comments

### HTTP Command and Control

* These APIS enable HTTP C2

```
InternetOpen, InternetConnect --> Create an HTTP connection
HttpOpenRequest, HttpAddRequestHeaders (Optional) --> Build an HTTP request
HttpSendRequest --> Send an HTTP request
InternetReadFile --> Read a response 
```

* To view the API calls

```
Window --> Symbol References --> Locate API's of interest in the Symbol Table
```

* **The code references variables, which holds code or data not known at compile time**
* Local variables are relevant for the current function and are not saved
* Local variables are stored on the stack relative to `ESP` and `EBP`
* Global variables are accessible from all functions e.g. `DAT_00403374`
* Also static variables can be only used from within the function that allocates it, but unlike local variables it does not get marked for reuse when the function exists

### Viewing Function Call Trees

* `Window --> Function Call Tree`
* View the outgoing calls on the right side
* View is ideal for determining which functions are called from the current function
* Once you determine what the current function is being used for make sure to `Rick Click --> Edit Label` and give it a meaningful name

### GetTempFileNameW

* Creates a file name for a temp file
* Can explore other function references to find new IOCs
* Look for a `PUSH` to `lpPrefixString_XXXXXX`
* MSFT documentation states the first three characters make up the temp file name prefix
* To assist Ghidra:

```
Right click on the lpPrefixString --> Click data --> terminate Unicode
```

### Functions

* A function is a group of instructions that performs a specific task (read, write files, send network data, log keystrokes)
* Three Basic Components

```
Input: values passed int
Body: code to perform tasks
Return: value passed back
```

* Calling a function involves a jump to another memory location
* After the function is done execution continues at the instruction after the original function call
* **Calling a function involves two control transfers**
* Function format: `return = function(arg0, arg1)`
* Specific events occur when calling a function

```
Pass in parameters (stack/register)
Save the return pointer 
Transfer control to the funciton 
```

* Specific events occur when returning from a function

```
Set up a return value (typically EAX)
Clean up the stack and restore registers 
Transfer control to the saved return pointer
```

* **Within a function, the prologue and epilogue perform setup and cleanup activities**
* Most functions contain a standard prologue and epilogue
* The prologue occurs at the start of the function

```
Allocates space for variables
Saves resisters that will be reused in the function body
```

* Function epilogue occurs at the end of the function

```
It cleans up the stack e.g. POP allocated variables
It restores registers
```

* **The stack is a section in memory used to store saved registers, local variables and function parameters**
* The stack is LIFO Last in First out
* `PUSH` adds an element and `POP` removes one
* `ESP` points to the next item on the stack and changes with instructions like `PUSH POP CALL LEAVE RET`
* `EBP a.k.a frame pointer` serves as an unchanging reference
* `EBP - value = local variable` registers may also be used
* `EBP + value = parameter`
* When `EBP` is set up in the function prologue in this manner, it means that when you see code reference `EBP` minus some value i.e. `[EBP -8]` it is accessing a local variable
* When its `EBP` plus some value i.e. `[EBP +8]` it is referencing a parameter that was passed in
* When cleaning up the stack compilers use some tricks
* Compilers may `POP` off a value i.e. `POP EDX` which has the result of adding four to `ESP`
* It is also very common to see a value added to `ESP` the used of the `RET` (which can also pop stuff off the stack, and the `leave` instruction

### Functions are called according to calling conventions

* The convention describes how data is passed into and out of functions
* The implementation of the convention may vary by compiler
* The `cdecl` convention (most common) has these characteristics

```
The arguments are placed onto the stack right to left
The return value is placed into EAX
The caller cleans up the stack (removes the arguments)
```

* The `stdcall` convention has the following characteristics

```
Similar to cdecl but the callee cleans up the stack 
This is the convention used in !IN32 APIs
```

* **Additional calling conventions include fastcall and thiscall**
* `fastcall`
* Arguments are stored in registers
* Any extra arguments are placed on the stack
* The callee cleans up arguments on the stack
* `thiscall`
* Used in C++ code (member functions)
* This convention includes a reference to this pointer
* For MSFT compilers, ECX holds the "this" pointer and the callee cleans up the arguments on the stack
* For GNU compilers the "this" pointer is pushed onto the stack last and the caller cleans up
* **Reviewing strings reveals filenames and directories of interest**
* To Locate a reference to a string right click on it and choose to show references

### Loops in malware

* Used to encrypt and decrypt network traffic --> loop over each character in the string to send
* Attempt to connect to C2 server --> loop over a lists of servers
* Perform a port scan --> try to connect to a port 1-65535
* Log keystrokes --> Check state for each key code 0...92
* Similar to JCC the Cs in `LOOPcc` represent the conditional code that must be met for the loop instruction to branch to the address specified
* The conditions are:

```
Z --> Loop if zero 
E --> Loop if equal
N --> Inverts the logic of the looping condition
```

### Reviewing imports to direct our code analysis

* The import table lists functions used to access the resource section

```
FindResourceW --> determine the location of a resource
SizeofResource --> obtain the size of a resource
LockResource --> obtain a pointer to a resource
```

* The resource `.rsrc` section is often used to store information like icons, dialog boxes, and version information
* However malware may hide executables here
* Malware that drops files is called a `dropper`

### CreateMutexA

* `CreateMutexA` --> creates or opens a mutex object
* Malware authors often use a mutex to avoid re-infecting a machine

### Keylogging

* `GetKeyState` and `GetAsyncKeyState` --> Determine if a particular key is pressed
* `GetWindowText` --> Retrieves text from a windows title bar
* `OpenClipboard`, `GetClipboardData`, and `CloseClipboard` --> Opens the clipboard for access, gathers data, and then closes the clipboard
* `GetWindowText` --> obtains the text of a windows title bar, combined with the two previous APIs an attacker could learn about what keys are pressed and what the application context is.
* `GetAsyncKeyState` determines if a key is currently up or down or if it was pressed since the last call to the API

### 64 Bit Malware

* Vast majority is 32 bit
* We will see more 64 bit in the future as they become the standard
* Two types of 64 bit malware have been common

```
Browser Helper Objects for 64 bit Internet Explorer
Device Drivers (rootkits) for Windows x64
```

### **Analyze 32-bit malware on 64-bit OS with caution**

* 32 bit code running on a 64 bit operating systems runs in the `WOW64 Subsystem`
* 32 bit executables load 32 bit dlls
* 32 bit dlls are located in `%SystemRoot%\Syswow64`
* 32 bit processes reference Software hive registry values in `Wow6432Node` using registry redirection
* Some executables run subtly different under WoW64 than on a native 32 bit OS

### 64-Bit Assembly Differences

* All general purpose registers are expanded to 64 bits
* `EAX` --> `RAX`
* There are eight new general purpose registers `R8 --> R15`
* Special use registers are exted and renamed `EIP --> RIP`
* `RSP` not `RBP` is often used to access parameters and variables
* Calling convention resembles `fastcall` (parameters via registers)

```
First four parameters are passed in RCX RDX R8 R9
Additional parameters are stored on the stack 
```

* There is a new addressing mode (`RIP` + displacement)
