Reversing Malicious Code
Last updated
Was this helpful?
Last updated
Was this helpful?
Goal is to understand common malware characteristics at a code level
May include potential branches of execution with code analysis
Overview of the code lifecycle
Source code is translated into object code by a compiler
Object code is then combined with libraries and an executable file is created
To run the file, the operating system reads various information from the executable file, allocates memory, and loads required libraries into memory
Control is transferred to the code to execute
At this final stage is where we examine the code with a debugger
Note: Libraries may be loaded during the programs execution
Developed by NSA
Its decompiler produces a C representation of the code to speed up analysis
Includes support for writing java and python scripts to automate analysis
Help is accessed via F1 key
Ghidra v10 includes a debugger
File --> New Project
Choose the project type
Click Finish
Drag and drop the specimen into the project window
Accept defaults in the Imports
windows and click Ok
Make sure to enable WindowsPE x86 Propagate External Parameters
option
Finally click the Analyze button and wait for Ghidra to finish
Once auto analysis is completed an Auto Analysis summary will show any warnings or issues encountered during the process
A common warning is that the file does not contain debug information
This is common and not an issue
Before Proceeding save the project and take a snapshot
Main window is the Listing View
which presents the target programs code and data
Will initially bring you to the beginning of the file in the Listing View
--> notice the MZ
string
If you scroll down from there you can examine the programs header
Program Tree
Window is in the top left and shows the different sections and headers
Section names are typically:
To jump to the .text
section double click the .text
node
In Ghidra the FUN_
prefix generically refers to a function while the numeric value refers to the address where the function is loaded into memory
Original name of the function is normally lost during compilation
Execution occurs linearly one instruction after the next
On the far left you will have a 32 bit address such as 00401007
(hex)
This address represents the location of code in memory after the program is loaded, not the address of a location on disk i.e. within a file hex editor
On the right there are x86 assembly instructions
Note: - This is the beginning of the .text
section, not the beginning of the program, that occurs at the entry point
Click on the function you want i.e. FUN_00401007
Browse to Window --> Function Graph
menu item
Helpful for visualizing loops and complex conditionals within a function but the Listing view
is more compact nd easier for some people to navigate
The color of the arrows symbolize code flow
If the code block ends in a conditional jump green arrows indicate the path here execution will continue if the condition is met
If the condition is not met a red arrow will show where execution continues
If the arrow is blue the code ends in an unconditional jump
View Imports to review a programs external dependencies
The import address table (IAT) helps direct code analysis
You can view imports in the Symbol Tree window but we will access this information via Window --> Symbol References
Filter symbols by "Imported
" to focus on dependencies
We can examine imports to identify potential functionality associated with common malware characteristics
Learn more about an API call at microsoft.com
Types of API Calls:
Refers to if the function supports ANSI (8 bit character)
Wide refers to a two byte character representation (UTF-16)
Extended is when MSFT updates a function and the new function is not compatible with the old one
Instructions reference registers, immediate values and memory
Instructions have two components: operation and operand
Instructions can have 0-3 operands
An Operand can be:
Consider MOV EAX, 0x6453
EAX is the destination (first)
0x6453 is the source (second)
You are setting EAX to the value 0x6453
Operands may be implied
Intel processor uses registers to track the state of computation as instructions are executed
Registers are on chip memory locations
Instructions act on registers and memory locations
A CPU has a series of registers
We monitor registers to track arguments, variables, and function return values
The x86 architecture uses the following general purpose registers to hold code and data
EIP
points to the next instruction to execute
EFLAGS
bit represents the outcome of computers and they control CPU operations
32 bit registers can also be accessed as 16 and 8 bit registers
On 32 bit arch, registers can be accessed by their default dword
size
To access a registers lower 16 bits
the leading E
is omitted from the name e.g. EAX
becomes AX
The naming scheme for EAX EBX ECX EDX
is as followed
E<letter>X
--> dword
32 bit value of the register
<letter>X
--> lower word 16 bit value of the register
<letter>H
--> high byte 8 bit of the <letter>X
value of the register
<letter>L
--> low byte 8 bit of the letterX>
value of the register
The length of a word, dword, and qword are 16, 32, and 64 bits
A word
in assembly is the natural size for a unit of data
16 bit
processor has 16-bit
words
Many tools consider a word to be 16 bits regardless of processor size
Additional common data sizes:
The operand for one push instruction is a pointer to a string
A pointer
is a variable that holds a memory address (it points to a memory location)
When the address that the pointer points to is accessed it is called dereferencing because the pointer references another location in memory
Pointers are more efficient, rather than copying around a data structure in memory its more efficient to copy the value of a pointer (4 bytes on 32 bit systems)
A PUSH
instruction before a CALL
often represents arguments passed to the function specified by the CALL
Example:
Brackets mean fetch data at the specified address (dereference)
This is direct addressing because we are dereferencing an immediate value
The result is that 4 bytes of data at 0x410230 will be moved to EAX
Some tools like IDA
omit brackets for direct addresses (IDA: dword_410230
)
Memory may also be addressed by reference indirectly
The address may be calculated or in a register
This is called an Effective Address
and it enables us to work efficiently with data structures
Format: Base + (Index * Scale) + Displacement
Indirect Referencing: address of the destination is calculated or it resides in a register. The calculated address is called the effective address (EA)
If the address sits in a register, it is still different from direct memory addressing where the register is the destination
In indirect memory addressing the register holds the address of the destination.
Large advantage of indirect memory addressing is the capability to efficiently work with data structures
You can increment the value of a single register to step through fields of a data structure or the same field of an array of data structures
If the scale is used and index register must also be used
Examples of indirectly addressing memory
[EAX]
: Access dynamically allocated memory (base)
[EBP + 0x10]
: Access data on the stack (base + displacement)
[EAX + EBX * 8]
: Access an array with 8-byte structure ( base + index * scale)
EAX +EBX + 0xC]
: Access fields of a two dimensional array of structures (base + index + displacement)
Indirect memory addressing may pose challenges for static code analysis because registers are not populated until runtime
Strings are an example of a data structure
Data structures groups simple variables into more complex types
Examples of data structures include: strings, linked lists, sockets, and file handles
When reversing determine the type of data structure by usage
Data structures enable us to group bytes and advance our understanding of the code
Context determines the answer
RegOpenKeyExA
Example
The API call will have to have a symbolic constant i.e. PUSH 0x80000001
During compilation it will be changed from the symbolic constant into the hex representation
Right click the hex value, choose Set Equate
and then choose HKEY_CURRENT_USER
to change it back to the symbolic constant
Will bring clarity to the code
The flow of execution i.e. control flow is sequential until a branching instruction is reached
Then the EIP
is updated and execution is transferred to another location in memory
The code under review contains two types of jumps
Jumps are an example of a branching instruction
Unconditional jumps always perform a jump JMP, CALL, RET
Conditional jumps only jump if a condition is met: JCC, Loop
Conditional jump represents a decision point
Conditional jumps require that we review multiple instructions
To evaluate whether a conditional is true, arithmetic instructions and Boolean are used
sub ecx, 8
Will test if ECX is equal to 8
and eax, eax
will test if EAX is equal to zero
If the result of zero then the ZF
bit is set in the flags register
A Jcc
instruction will be performed if a jump condition is met
Form: Jcc
Use the ;
key to add a comment
Can add EOL comments, Pre, Post or other types of comments
These APIS enable HTTP C2
To view the API calls
The code references variables, which holds code or data not known at compile time
Local variables are relevant for the current function and are not saved
Local variables are stored on the stack relative to ESP
and EBP
Global variables are accessible from all functions e.g. DAT_00403374
Also static variables can be only used from within the function that allocates it, but unlike local variables it does not get marked for reuse when the function exists
Window --> Function Call Tree
View the outgoing calls on the right side
View is ideal for determining which functions are called from the current function
Once you determine what the current function is being used for make sure to Rick Click --> Edit Label
and give it a meaningful name
Creates a file name for a temp file
Can explore other function references to find new IOCs
Look for a PUSH
to lpPrefixString_XXXXXX
MSFT documentation states the first three characters make up the temp file name prefix
To assist Ghidra:
A function is a group of instructions that performs a specific task (read, write files, send network data, log keystrokes)
Three Basic Components
Calling a function involves a jump to another memory location
After the function is done execution continues at the instruction after the original function call
Calling a function involves two control transfers
Function format: return = function(arg0, arg1)
Specific events occur when calling a function
Specific events occur when returning from a function
Within a function, the prologue and epilogue perform setup and cleanup activities
Most functions contain a standard prologue and epilogue
The prologue occurs at the start of the function
Function epilogue occurs at the end of the function
The stack is a section in memory used to store saved registers, local variables and function parameters
The stack is LIFO Last in First out
PUSH
adds an element and POP
removes one
ESP
points to the next item on the stack and changes with instructions like PUSH POP CALL LEAVE RET
EBP a.k.a frame pointer
serves as an unchanging reference
EBP - value = local variable
registers may also be used
EBP + value = parameter
When EBP
is set up in the function prologue in this manner, it means that when you see code reference EBP
minus some value i.e. [EBP -8]
it is accessing a local variable
When its EBP
plus some value i.e. [EBP +8]
it is referencing a parameter that was passed in
When cleaning up the stack compilers use some tricks
Compilers may POP
off a value i.e. POP EDX
which has the result of adding four to ESP
It is also very common to see a value added to ESP
the used of the RET
(which can also pop stuff off the stack, and the leave
instruction
The convention describes how data is passed into and out of functions
The implementation of the convention may vary by compiler
The cdecl
convention (most common) has these characteristics
The stdcall
convention has the following characteristics
Additional calling conventions include fastcall and thiscall
fastcall
Arguments are stored in registers
Any extra arguments are placed on the stack
The callee cleans up arguments on the stack
thiscall
Used in C++ code (member functions)
This convention includes a reference to this pointer
For MSFT compilers, ECX holds the "this" pointer and the callee cleans up the arguments on the stack
For GNU compilers the "this" pointer is pushed onto the stack last and the caller cleans up
Reviewing strings reveals filenames and directories of interest
To Locate a reference to a string right click on it and choose to show references
Used to encrypt and decrypt network traffic --> loop over each character in the string to send
Attempt to connect to C2 server --> loop over a lists of servers
Perform a port scan --> try to connect to a port 1-65535
Log keystrokes --> Check state for each key code 0...92
Similar to JCC the Cs in LOOPcc
represent the conditional code that must be met for the loop instruction to branch to the address specified
The conditions are:
The import table lists functions used to access the resource section
The resource .rsrc
section is often used to store information like icons, dialog boxes, and version information
However malware may hide executables here
Malware that drops files is called a dropper
CreateMutexA
--> creates or opens a mutex object
Malware authors often use a mutex to avoid re-infecting a machine
GetKeyState
and GetAsyncKeyState
--> Determine if a particular key is pressed
GetWindowText
--> Retrieves text from a windows title bar
OpenClipboard
, GetClipboardData
, and CloseClipboard
--> Opens the clipboard for access, gathers data, and then closes the clipboard
GetWindowText
--> obtains the text of a windows title bar, combined with the two previous APIs an attacker could learn about what keys are pressed and what the application context is.
GetAsyncKeyState
determines if a key is currently up or down or if it was pressed since the last call to the API
Vast majority is 32 bit
We will see more 64 bit in the future as they become the standard
Two types of 64 bit malware have been common
32 bit code running on a 64 bit operating systems runs in the WOW64 Subsystem
32 bit executables load 32 bit dlls
32 bit dlls are located in %SystemRoot%\Syswow64
32 bit processes reference Software hive registry values in Wow6432Node
using registry redirection
Some executables run subtly different under WoW64 than on a native 32 bit OS
All general purpose registers are expanded to 64 bits
EAX
--> RAX
There are eight new general purpose registers R8 --> R15
Special use registers are exted and renamed EIP --> RIP
RSP
not RBP
is often used to access parameters and variables
Calling convention resembles fastcall
(parameters via registers)
There is a new addressing mode (RIP
+ displacement)