Reversing Malicious Code
Goal is to understand common malware characteristics at a code level
May include potential branches of execution with code analysis
Overview of the code lifecycle
Source code is translated into object code by a compiler
Object code is then combined with libraries and an executable file is created
To run the file, the operating system reads various information from the executable file, allocates memory, and loads required libraries into memory
Control is transferred to the code to execute
At this final stage is where we examine the code with a debugger
Note: Libraries may be loaded during the programs execution
Ghidra
Developed by NSA
Its decompiler produces a C representation of the code to speed up analysis
Includes support for writing java and python scripts to automate analysis
Help is accessed via F1 key
Ghidra v10 includes a debugger
Create a new project
File --> New ProjectChoose the project type
Click
FinishDrag and drop the specimen into the project window
Accept defaults in the
Importswindows and click Ok
Launch the code browser and being the auto-analysis
Make sure to enable
WindowsPE x86 Propagate External ParametersoptionFinally click the Analyze button and wait for Ghidra to finish
Once auto analysis is completed an Auto Analysis summary will show any warnings or issues encountered during the process
A common warning is that the file does not
contain debug informationThis is common and not an issue
Before Proceeding save the project and take a snapshot
Ghidra Overview
Main window is the
Listing Viewwhich presents the target programs code and dataWill initially bring you to the beginning of the file in the
Listing View--> notice theMZstringIf you scroll down from there you can examine the programs header
Program Tree
Window is in the top left and shows the different sections and headers
Section names are typically:
.text - Contains executable code
.rdata - Contains read-only data
.data - Contains data
.reloc - Contains relocation data to fix up addresses in the file if it is not loaded at the prefered addressTo jump to the
.textsection double click the.textnode
FUN in Ghidra
In Ghidra the
FUN_prefix generically refers to a function while the numeric value refers to the address where the function is loaded into memoryOriginal name of the function is normally lost during compilation
Execution occurs linearly one instruction after the next
On the far left you will have a 32 bit address such as
00401007(hex)This address represents the location of code in memory after the program is loaded, not the address of a location on disk i.e. within a file hex editor
On the right there are x86 assembly instructions
Note: - This is the beginning of the
.textsection, not the beginning of the program, that occurs at the entry point
Function graph view provides a visual perspective on code
Click on the function you want i.e.
FUN_00401007Browse to
Window --> Function Graphmenu itemHelpful for visualizing loops and complex conditionals within a function but the
Listing viewis more compact nd easier for some people to navigateThe color of the arrows symbolize code flow
If the code block ends in a conditional jump green arrows indicate the path here execution will continue if the condition is met
If the condition is not met a red arrow will show where execution continues
If the arrow is blue the code ends in an unconditional jump
View Imports to review a programs external dependencies
The import address table (IAT) helps direct code analysis
You can view imports in the Symbol Tree window but we will access this information via
Window --> Symbol ReferencesFilter symbols by "
Imported" to focus on dependencies
Look for API call patterns associated with malware behavior
We can examine imports to identify potential functionality associated with common malware characteristics
Learn more about an API call at microsoft.com
Types of API Calls:
A --> (ANSI)
W --> (Wide)
Ex --> (Extended)Refers to if the function supports ANSI (8 bit character)
Wide refers to a two byte character representation (UTF-16)
Extended is when MSFT updates a function and the new function is not compatible with the old one
Instructions reference registers, immediate values and memory
Instructions have two components:
operation and operandInstructions can have 0-3 operands
An Operand can be:
A register
A memory location
An immediate value e.g. 0x6453)Consider
MOV EAX, 0x6453EAX is the destination (first)
0x6453 is the source (second)
You are setting EAX to the value 0x6453
Operands may be implied
Intel processor uses registers to track the state of computation as instructions are executed
Registers are on chip memory locations
Instructions act on registers and memory locations
A CPU has a series of registers
Some registers are general purpose
Some have a particular use
Some are bothWe monitor registers to track arguments, variables, and function return values
The x86 architecture uses the following general purpose registers to hold code and data
EAX --> Used for addition, multiplication, and return values
ECX --> Used as a counter
EBP --> Used to reference arguments and local variables
ESP --> Points to the last item on the stack
ESI/EDI --> Used by memory to transfer instructions Special use registers hold flags and track program execution
EIPpoints to the next instruction to executeEFLAGSbit represents the outcome of computers and they control CPU operations
Segment registers include:
CS - Code segment
DS - Data segment
SS - Stack segment 32 bit registers can also be accessed as 16 and 8 bit registers
On 32 bit arch, registers can be accessed by their default
dwordsizeTo access a registers lower
16 bitsthe leadingEis omitted from the name e.g.EAXbecomesAXThe naming scheme for
EAX EBX ECX EDXis as followedE<letter>X-->dword32 bit value of the register<letter>X--> lower word 16 bit value of the register<letter>H--> high byte 8 bit of the<letter>Xvalue of the register<letter>L--> low byte 8 bit of theletterX>value of the register
EAX means 32 bits
AX means the low 16 bit value
AH means the high 8 bytes of AX
AL means the low 8 bits of AXThe length of a word, dword, and qword are 16, 32, and 64 bits
A
wordin assembly is the natural size for a unit of data16 bitprocessor has16-bitwordsMany tools consider a word to be 16 bits regardless of processor size
Additional common data sizes:
8 bits --> 1 byte
32 bits --> dword
64 bits --> qwordThe operand for one push instruction is a pointer to a string
A
pointeris a variable that holds a memory address (it points to a memory location)When the address that the pointer points to is accessed it is called dereferencing because the pointer references another location in memory
Pointers are more efficient, rather than copying around a data structure in memory its more efficient to copy the value of a pointer (4 bytes on 32 bit systems)
A
PUSHinstruction before aCALLoften represents arguments passed to the function specified by theCALL
Memory can be accessed directly by many assembly instructions
Example:
MOV EAX, [0x410230]Brackets mean fetch data at the specified address (dereference)
This is direct addressing because we are dereferencing an immediate value
The result is that 4 bytes of data at 0x410230 will be moved to
EAXSome tools like
IDAomit brackets for direct addresses (IDA: dword_410230)Memory may also be addressed by reference indirectly
The address may be calculated or in a register
This is called an
Effective Addressand it enables us to work efficiently with data structuresFormat:
Base + (Index * Scale) + Displacement
BASE Index Scale Displacement
(EAX EBX) + (EAX EBX 1) + (None)
(ECX EDX) + (ECX EDX 2) + (8 bit value)
(ESP EBP) + (EBP ESI 4) + (16 bit value)
(ESI EDI) + (EDI 8) + (32 bit value)Indirect Referencing: address of the destination is calculated or it resides in a register. The calculated address is called the effective address (EA)
If the address sits in a register, it is still different from direct memory addressing where the register is the destination
In indirect memory addressing the register holds the address of the destination.
Large advantage of indirect memory addressing is the capability to efficiently work with data structures
You can increment the value of a single register to step through fields of a data structure or the same field of an array of data structures
If the scale is used and index register must also be used
Examples of indirectly addressing memory
[EAX]: Access dynamically allocated memory (base)[EBP + 0x10]: Access data on the stack (base + displacement)[EAX + EBX * 8]: Access an array with 8-byte structure ( base + index * scale)EAX +EBX + 0xC]: Access fields of a two dimensional array of structures (base + index + displacement)Indirect memory addressing may pose challenges for static code analysis because registers are not populated until runtime
Strings are an example of a data structure
Data structures groups simple variables into more complex types
Examples of data structures include: strings, linked lists, sockets, and file handles
When reversing determine the type of data structure by usage
Data structures enable us to group bytes and advance our understanding of the code
**Code vs Data **
Context determines the answer
RegOpenKeyExAExampleThe API call will have to have a symbolic constant i.e.
PUSH 0x80000001During compilation it will be changed from the symbolic constant into the hex representation
Right click the hex value, choose
Set Equateand then chooseHKEY_CURRENT_USERto change it back to the symbolic constantWill bring clarity to the code
Branch instructions direct code execution to another location
The flow of execution i.e. control flow is sequential until a branching instruction is reached
Then the
EIPis updated and execution is transferred to another location in memoryThe code under review contains two types of jumps
Jumps are an example of a branching instruction
Unconditional jumps always perform a jump
JMP, CALL, RETConditional jumps only jump if a condition is met:
JCC, LoopConditional jump represents a decision point
Conditional jumps require that we review multiple instructions
To evaluate whether a conditional is true, arithmetic instructions and Boolean are used
sub ecx, 8Will test if ECX is equal to 8and eax, eaxwill test if EAX is equal to zeroIf the result of zero then the
ZFbit is set in the flags register
Jumps
A
Jccinstruction will be performed if a jump condition is metForm:
Jcc
A --> jump if Above
B --> jump if Below
E --> jump if jmp if equal
G --> jump if greater
L --> jump if less than
Z --> jump if if zero
N --> jump if not condition JNZ jump if not zero Comments
Use the
;key to add a commentCan add EOL comments, Pre, Post or other types of comments
HTTP Command and Control
These APIS enable HTTP C2
InternetOpen, InternetConnect --> Create an HTTP connection
HttpOpenRequest, HttpAddRequestHeaders (Optional) --> Build an HTTP request
HttpSendRequest --> Send an HTTP request
InternetReadFile --> Read a response To view the API calls
Window --> Symbol References --> Locate API's of interest in the Symbol TableThe code references variables, which holds code or data not known at compile time
Local variables are relevant for the current function and are not saved
Local variables are stored on the stack relative to
ESPandEBPGlobal variables are accessible from all functions e.g.
DAT_00403374Also static variables can be only used from within the function that allocates it, but unlike local variables it does not get marked for reuse when the function exists
Viewing Function Call Trees
Window --> Function Call TreeView the outgoing calls on the right side
View is ideal for determining which functions are called from the current function
Once you determine what the current function is being used for make sure to
Rick Click --> Edit Labeland give it a meaningful name
GetTempFileNameW
Creates a file name for a temp file
Can explore other function references to find new IOCs
Look for a
PUSHtolpPrefixString_XXXXXXMSFT documentation states the first three characters make up the temp file name prefix
To assist Ghidra:
Right click on the lpPrefixString --> Click data --> terminate UnicodeFunctions
A function is a group of instructions that performs a specific task (read, write files, send network data, log keystrokes)
Three Basic Components
Input: values passed int
Body: code to perform tasks
Return: value passed backCalling a function involves a jump to another memory location
After the function is done execution continues at the instruction after the original function call
Calling a function involves two control transfers
Function format:
return = function(arg0, arg1)Specific events occur when calling a function
Pass in parameters (stack/register)
Save the return pointer
Transfer control to the funciton Specific events occur when returning from a function
Set up a return value (typically EAX)
Clean up the stack and restore registers
Transfer control to the saved return pointerWithin a function, the prologue and epilogue perform setup and cleanup activities
Most functions contain a standard prologue and epilogue
The prologue occurs at the start of the function
Allocates space for variables
Saves resisters that will be reused in the function bodyFunction epilogue occurs at the end of the function
It cleans up the stack e.g. POP allocated variables
It restores registersThe stack is a section in memory used to store saved registers, local variables and function parameters
The stack is LIFO Last in First out
PUSHadds an element andPOPremoves oneESPpoints to the next item on the stack and changes with instructions likePUSH POP CALL LEAVE RETEBP a.k.a frame pointerserves as an unchanging referenceEBP - value = local variableregisters may also be usedEBP + value = parameterWhen
EBPis set up in the function prologue in this manner, it means that when you see code referenceEBPminus some value i.e.[EBP -8]it is accessing a local variableWhen its
EBPplus some value i.e.[EBP +8]it is referencing a parameter that was passed inWhen cleaning up the stack compilers use some tricks
Compilers may
POPoff a value i.e.POP EDXwhich has the result of adding four toESPIt is also very common to see a value added to
ESPthe used of theRET(which can also pop stuff off the stack, and theleaveinstruction
Functions are called according to calling conventions
The convention describes how data is passed into and out of functions
The implementation of the convention may vary by compiler
The
cdeclconvention (most common) has these characteristics
The arguments are placed onto the stack right to left
The return value is placed into EAX
The caller cleans up the stack (removes the arguments)The
stdcallconvention has the following characteristics
Similar to cdecl but the callee cleans up the stack
This is the convention used in !IN32 APIsAdditional calling conventions include fastcall and thiscall
fastcallArguments are stored in registers
Any extra arguments are placed on the stack
The callee cleans up arguments on the stack
thiscallUsed in C++ code (member functions)
This convention includes a reference to this pointer
For MSFT compilers, ECX holds the "this" pointer and the callee cleans up the arguments on the stack
For GNU compilers the "this" pointer is pushed onto the stack last and the caller cleans up
Reviewing strings reveals filenames and directories of interest
To Locate a reference to a string right click on it and choose to show references
Loops in malware
Used to encrypt and decrypt network traffic --> loop over each character in the string to send
Attempt to connect to C2 server --> loop over a lists of servers
Perform a port scan --> try to connect to a port 1-65535
Log keystrokes --> Check state for each key code 0...92
Similar to JCC the Cs in
LOOPccrepresent the conditional code that must be met for the loop instruction to branch to the address specifiedThe conditions are:
Z --> Loop if zero
E --> Loop if equal
N --> Inverts the logic of the looping conditionReviewing imports to direct our code analysis
The import table lists functions used to access the resource section
FindResourceW --> determine the location of a resource
SizeofResource --> obtain the size of a resource
LockResource --> obtain a pointer to a resourceThe resource
.rsrcsection is often used to store information like icons, dialog boxes, and version informationHowever malware may hide executables here
Malware that drops files is called a
dropper
CreateMutexA
CreateMutexA--> creates or opens a mutex objectMalware authors often use a mutex to avoid re-infecting a machine
Keylogging
GetKeyStateandGetAsyncKeyState--> Determine if a particular key is pressedGetWindowText--> Retrieves text from a windows title barOpenClipboard,GetClipboardData, andCloseClipboard--> Opens the clipboard for access, gathers data, and then closes the clipboardGetWindowText--> obtains the text of a windows title bar, combined with the two previous APIs an attacker could learn about what keys are pressed and what the application context is.GetAsyncKeyStatedetermines if a key is currently up or down or if it was pressed since the last call to the API
64 Bit Malware
Vast majority is 32 bit
We will see more 64 bit in the future as they become the standard
Two types of 64 bit malware have been common
Browser Helper Objects for 64 bit Internet Explorer
Device Drivers (rootkits) for Windows x64Analyze 32-bit malware on 64-bit OS with caution
32 bit code running on a 64 bit operating systems runs in the
WOW64 Subsystem32 bit executables load 32 bit dlls
32 bit dlls are located in
%SystemRoot%\Syswow6432 bit processes reference Software hive registry values in
Wow6432Nodeusing registry redirectionSome executables run subtly different under WoW64 than on a native 32 bit OS
64-Bit Assembly Differences
All general purpose registers are expanded to 64 bits
EAX-->RAXThere are eight new general purpose registers
R8 --> R15Special use registers are exted and renamed
EIP --> RIPRSPnotRBPis often used to access parameters and variablesCalling convention resembles
fastcall(parameters via registers)
First four parameters are passed in RCX RDX R8 R9
Additional parameters are stored on the stack There is a new addressing mode (
RIP+ displacement)
Last updated