Brief introduction to reverse engineering for CTF (using radare2 + angr)
Intro
I’m writing this blog in order to give a brief introduction of reverse engineering applied to CTFs. Actually I’am not able to write complex sentences in english tongue, then I will keep it short giving only the basic information, in fact this blog is more an exercise to me than a real blog article for you.
Requirements
- Basic binary analysis understanding (assembly, CFG)
- radare2, angr are required in order to follow the steps described here below
Start
We have the following binary file: angrmanagement
We suppose that the binary file runs remotely, then we can’t apply the patching/debugging stuff.
- Giving some basic static analysis here, r2 cheatsheet here
- “
aa
” : analyzing the binary: disassembly, construct CFG, imported functions, etc. - “
iI
” : prints out binary information - “
afl~entry
” : list functions and grep on line row (~
) containing ‘entry’ string
- “
- List functions imported by the binary and printable strings
- “
ii
” : list binary’s function imported - “
fs strings
” : select strings name space - “
f
” : print name space selected
- “
- Nothing suspicious. We inspect the function ‘main’
- “
s main
” : Seek to main - “
pdf
” : disassemble function
- “
- By getting the control flow graph (CFG) we understand the assembly instructions from the point of view of basic blocks (nodes) and control flow (edges). For more context:
- Basic block is a blob of assembly instructions ending in a Change of Flow Instruction (COFI), such as jmp, call, and ret.
- CFG links together those basic blocks by constructing the control flow path
- Control flow executed (CFE) is a subset of the CFG and refers to a control flow captured during the execution of the binary
- To show CFG in r2 we type “
VV
” that stands for visual graph mode- Moving on the graph requires to use arrow keys or ‘h’,’j’,’k’,’l’ if you’re familiar with vim
- By inspecting the binary’s strings we find out the program asking to insert a password. We save the information and go further.
- Following the control flow we find calls to functions such as ‘sym.check_1’
- Going back to graph mode gives more details on the ‘sym.check_len’ function.
- Type “
od
” to traverse a function called from the basic block selected, - Type “
?
” to list commands that can be invoked in graph mode
- Type “
- Here we note the instruction
cmp rax, 0x20
changing the status of the register EFLAGS- After that, instruction
sete al
will set registeral
to 1 if the EFLAGS’s zero flag was 0 - We conclude the password has to be 0x20 byte long
- Type “
x
” and then the enter key to go backward in graph mode
- After that, instruction
- As can be seen in figure below, each basic block calls a function to check some type of information regarding the input (e.g. ‘sym.check_1’, ‘sym.check_2’, and so on)
- If a check fails its return value, which is saved in RAX register, will be a different value than the 0 one.
- As we can note the last compare prints the content of the ‘flag.txt’ file.
- All checks previously shown should return true
- We start reversing
check_0
function-
The function does check that: ` input[15] != ‘h’ and input[25] != ‘ ’ and input[27] != ‘>’ `
-
- The “check” functions start to get a little obfuscated starting from the
check_1
one- 31 check functions are present
- After inspecting some of the check functions we give a quick look at the following image illustrating taxonomy of binary obfuscation
- The check functions use encode literals and arithmetic types of obfuscation.
- Dataflow analysis techniques can be used to solve the problem
- Also symbolic execution can be used, if we have control of the paths to be traversed and so the path grow
-
To limit the path grow during symbolic executiong, we end up with the following considerations:
- The string/our input have to be 32 byte long, the input is saved at rbp-0x30 address of main function’s stack
- Avoid to jump the offset 0x2347, and instead follow the path to reach 0x2359 offset
- The instruction
je 0x2347
is 6 byte long, and after each of them, there’s a basic block that we want to match in our path - The instruction
endbr64
must be patched/hooked because breaks angr (more here) - main() function is located at 0x206f offset
- We write down some python code in order to simplify the considerations 2,3,4,5; (utils.py); We could do that in angr as well.
- read the comments in the code (docs)
- The symbolic execution in angr is composed of one ‘SimulationManager’ object and many ‘SimState’ that we have by traversing basic blocks.
- A state gives access to its registers and memory
- To hook something in angr we need to define a method accepting a state object as argument
- Map as symbolic variable the address where’s the input saved
- We do that by hooking fgets call from main function, and get the address of the buffer from RDI register
- Start symbolic execution
-
After executing solution.py, utils.py we have the password:
’
<#P(J\xb9ZmT[$D5\x06X` hbAd\x880(`.+?@ACj
’