Unicorn Engine tutorial
In this tutorial you will learn how to use Unicorn Engine by solving practical exercises.
There are 4
exercises and I will solve the first exercise for you.
For the others I am providing hints and solutions, which you can obtain by clicking the spoiler buttons.
Fast FAQ:
- What is Unicorn Engine?
It is an emulator. Not usual though. You don’t emulate whole program or system. Also, it doesn’t support syscalls. You have to map memory and write data into it manually, then you can start the emulation from a chosen address.
- When is it useful?
- You can call an interesting function from the malware, without creating a harmfull process
- CTF’s
- Fuzzing
- Plugin for gdb that predicts the future, for example further jumps.
- Emulating obfuscated code.
- What do I need to have installed for this tutorial?
- Unicorn Engine with python binding
- A disassembler
Table of Contents
Task 1
This is a task from hxp CTF 2017 called Fibonacci
.
The binary can be downloaded here.
When we run this program, we can notice that it computes and prints our flag but very slowly. Every next byte of the flag is computed slower and slower.
The flag is: hxp{F
It means it is necessary to optimize the program to get the flag (in a reasonable amount of time).
We’ve decompiled the code to C-like pseudocode, with help of IDA Pro. Although the code is not necessarily decompiled properly, we can get some idea of what is happening.
Here is assembly code of main
function:
Assembly code of fibonacci
function looks as follows:
There are many possible ways of solving this task. We can for example reconstruct the code in one of the programming languages and then apply optimizations there. The process of reconstructing code is not easy and of course, we can introduce bugs and errors. Staring at code to spot a mistake is not funny at all. By solving this task with Unicorn Engine we can skip the process of reconstructing code and avoid problems like mentioned above. We could skip rewriting code in a few other ways - for example by scripting gdb or using Frida.
Before applying optimizations we will first emulate normal program, without optimizations in Unicorn Engine. After the success, we will optimize it.
Part 1: Let’s emulate the program.
Let’s create a file named fibonacci.py and put the binary in the same folder.
Add the following code to the file:
The first line loads main binary and basic unicorn constants. The second loads constants specific for both architectures x86 and x86-64.
Next, add the following lines:
Here, we only added some usual functions that will be helpful later.
read
just returns contents of the whole file.
u32
takes a 4-byte string and converts it to an integer which represents this data in little endian.
p32
makes the opposite - it takes a number and converts it to representing 4-byte string in little endian.
If you have pwntools installed, you don’t need to create these functions.
Just do from pwn import *
.
Let’s initialize our Unicorn Engine class for architecture x86-64
:
We need to call function Uc
with following arguments:
- first - main architecture branch. The constant starts with
UC_ARCH_
- second - further architecture specification. The constant starts with
UC_MODE_
You can find a full list of architecture constants in Cheatsheet.
As I wrote previously, to use Unicorn Engine we need to initialize virtual memory manually. For this binary we need to write code somewhere and also allocate a stack.
The base of the binary is 0x400000
.
Let’s say our stack will start at address 0x0
and have size 1024*1024
.
Probably we don’t need so much space, but it won’t hurt us.
We can map our memory by calling the mem_map
method.
Add following lines:
Now, we need to load the binary at our base address, like loader does. Then we need to set RSP
to point at the end of our stack.
We can start the emulation and run our code, but we need to know what is the start address and where emulator should stop.
We can start emulating the code at address 0x00000000004004E0
which is the first address of main
.
The end can be 0x0000000000400575
. This is putc("\n")
which is called after our whole flag is printed out. Look:
We can begin our simulation:
Now, we can run this script:
a@x:~/Desktop/unicorn_engine_lessons$ python solve.py
Traceback (most recent call last):
File "solve.py", line 32, in <module>
mu.emu_start(0x00000000004004E0, 0x0000000000400575)
File "/usr/local/lib/python2.7/dist-packages/unicorn/unicorn.py", line 288, in emu_start
raise UcError(status)
unicorn.unicorn.UcError: Invalid memory read (UC_ERR_READ_UNMAPPED)
Oooops, something is wrong and we don’t know what. Right before mu.emu_start
we can add:
def hook_code(mu, address, size, user_data):
print('>>> Tracing instruction at 0x%x, instruction size = 0x%x' %(address, size))
mu.hook_add(UC_HOOK_CODE, hook_code)
This code adds a hook. We define our own function hook_code
that is called before emulation of each instruction.
It takes following arguments:
- our
Uc
instance - address of the instruction
- size of the instruction
- user data (we can pass this value in optional argument of
hook_add()
)
At this point, our script should look like solve1.py
When we run it, we can see:
This means that our script fails while executing following instruction:
This instruction reads memory from address 0x601038
(You can see it in IDA Pro).
This is .bss
section and it is not allocated by us.
My solution for this problem is just to skip all instructions that are problematic.
Below there is an instruction:
We can’t call any glibc function because we don’t have glibc loaded to virtual memory. We don’t need to call this function anyway so we can also skip it.
This is a full list of instructions to skip:
We can skip instructions by writing to the register RIP
address of next instruction:
hook_code
should now look like this:
We also have to do something with the instructions that print out the flag byte-by-byte.
__IO_putc
takes a byte to print out in the first argument
(that is register RDI
).
We can read a value from register RDI
, print it out and skip emulating this instruction.
The hook_code
function at this point should look like below:
At this point the whole code should look like solve2.py
We can run it and see that it works, although still slowly.
Part 2: Improve the speed!
Let’s think about the speed improvements. Why is this program so slow?
Looking at the decompiled code, we can see that main()
calls fibonacci()
several times and fibonacci()
is a recursive function.
Taking look at this function we can see that it takes 2 arguments and it returns 2 values.
The first return value in passed via RAX
register, the second via reference through the second argument.
Taking a deeper look at both main()
and fibonacci()
we can notice that the second argument can only take value of 0
or 1
.
If we don’t see it, we can run gdb
and set a breakpoint at the beginning of fibonacci
function.
To optimize this function we can use dynamic programming
to remember return values for given arguments.
Since the second argument takes only 2 values, it is enough to remember only 2*MAX_OF_FIRST_ARGUMENT
pairs.
When RIP
points to the beginning of the fibonacci
function, we can obtain the function arguments.
We know function return values when exiting the function.
Since we don’t know both things at once we need to use a stack that will help us to obtain both things when exiting -
in fibonacci
entry we need to push arguments to stack and pop at the end.
To remember pairs we can use a dictionary.
How to hold the values of pairs?
- At the beginning of the function we can check if return values are memorized in the dictionary for these arguments
- If they are, we can return this pair. We just write return values to reference and
RAX
. We also setRIP
to address of someRET
instruction to exit the function. We cannot jump toRET
infibonacci
function because this instruction is hooked. That’s why we jump toRET
inmain
. - If they are not present in the dictionary, we add arguments to the stack.
- If they are, we can return this pair. We just write return values to reference and
- While exiting the function, we can save return values. We know arguments and reference pointer, because we can read them from our stack structure (here called “stack”).
The code is shown below:
Just in case, the full script you can download here. solve3.py
Hurrah! We’ve managed to successfully optimize the program using Unicorn Engine. Good job.
Some notes
Now, I encourage you to do a small homework. Below you can find 3 tasks, every one has a hint and solution available (through clicking spoiler button). You can look at Cheatsheet when solving the tasks.
I think that one of the problems is to know the name of interesting constant.
The best way to deal with it is to use IPython tab completion.
When you have IPython installed you can type from unicorn import UC_ARCH_
and press TAB
- all consatnts starting with this prefix will be printed out.
Task 2
Analyze the following shellcode:
As you can see, the assembly is obfuscated (command disasm
is a part of pwntools) :
a@x:~/Desktop/unicorn_engine_lessons$ disasm e8ffffffffc05d6a055b29dd83c54e89e96a02030c245b31d266ba12008b39c1e710c1ef1081e9feffffff8b4500c1e010c1e81089c309fb21f8f7d021d86689450083c5024a85d20f85cfffffffec37755d7a0528ed24ed24ed0b887feb509838f95c962b9670fec6ffc6ff9f321f581e00d380
0: e8 ff ff ff ff call 0x4
5: c0 5d 6a 05 rcr BYTE PTR [ebp+0x6a], 0x5
9: 5b pop ebx
a: 29 dd sub ebp, ebx
c: 83 c5 4e add ebp, 0x4e
f: 89 e9 mov ecx, ebp
11: 6a 02 push 0x2
13: 03 0c 24 add ecx, DWORD PTR [esp]
16: 5b pop ebx
17: 31 d2 xor edx, edx
19: 66 ba 12 00 mov dx, 0x12
1d: 8b 39 mov edi, DWORD PTR [ecx]
1f: c1 e7 10 shl edi, 0x10
22: c1 ef 10 shr edi, 0x10
25: 81 e9 fe ff ff ff sub ecx, 0xfffffffe
2b: 8b 45 00 mov eax, DWORD PTR [ebp+0x0]
2e: c1 e0 10 shl eax, 0x10
31: c1 e8 10 shr eax, 0x10
34: 89 c3 mov ebx, eax
36: 09 fb or ebx, edi
38: 21 f8 and eax, edi
3a: f7 d0 not eax
3c: 21 d8 and eax, ebx
3e: 66 89 45 00 mov WORD PTR [ebp+0x0], ax
42: 83 c5 02 add ebp, 0x2
45: 4a dec edx
46: 85 d2 test edx, edx
48: 0f 85 cf ff ff ff jne 0x1d
4e: ec in al, dx
4f: 37 aaa
50: 75 5d jne 0xaf
52: 7a 05 jp 0x59
54: 28 ed sub ch, ch
56: 24 ed and al, 0xed
58: 24 ed and al, 0xed
5a: 0b 88 7f eb 50 98 or ecx, DWORD PTR [eax-0x67af1481]
60: 38 f9 cmp cl, bh
62: 5c pop esp
63: 96 xchg esi, eax
64: 2b 96 70 fe c6 ff sub edx, DWORD PTR [esi-0x390190]
6a: c6 (bad)
6b: ff 9f 32 1f 58 1e call FWORD PTR [edi+0x1e581f32]
71: 00 d3 add bl, dl
73: 80 .byte 0x80
Note that the architecture is x86-32
now.
List of syscalls numbers can be found here.
Hint:
Solution:
Task 3
Download this binary.
It was compiled with the following command:
gcc function.c -m32 -o function
.
The code of this binary is presented below:
The task is to call super_function
in a way that it will return 1
.
The assembly code is:
Hint:
Solution:
Task 4
This task is similar to the first one. The difference is that the architecture is not x86
anymore.
It is ARM32
little-endian.
You can download the binary here.
Arm calling convention will help you.
Right answer:
Hint:
Solution:
CheatSheet
from unicorn import *
- Loads main unicorn library. It contains functions and basic constants.
from unicorn.x86_const import *
- Loads constants specific for architectures x86
and x86-64
All consts in module unicorn
:
UC_API_MAJOR UC_ERR_VERSION UC_MEM_READ UC_PROT_ALL
UC_API_MINOR UC_ERR_WRITE_PROT UC_MEM_READ_AFTER UC_PROT_EXEC
UC_ARCH_ARM UC_ERR_WRITE_UNALIGNED UC_MEM_READ_PROT UC_PROT_NONE
UC_ARCH_ARM64 UC_ERR_WRITE_UNMAPPED UC_MEM_READ_UNMAPPED UC_PROT_READ
UC_ARCH_M68K UC_HOOK_BLOCK UC_MEM_WRITE UC_PROT_WRITE
UC_ARCH_MAX UC_HOOK_CODE UC_MEM_WRITE_PROT UC_QUERY_MODE
UC_ARCH_MIPS UC_HOOK_INSN UC_MEM_WRITE_UNMAPPED UC_QUERY_PAGE_SIZE
UC_ARCH_PPC UC_HOOK_INTR UC_MILISECOND_SCALE UC_SECOND_SCALE
UC_ARCH_SPARC UC_HOOK_MEM_FETCH UC_MODE_16 UC_VERSION_EXTRA
UC_ARCH_X86 UC_HOOK_MEM_FETCH_INVALID UC_MODE_32 UC_VERSION_MAJOR
UC_ERR_ARCH UC_HOOK_MEM_FETCH_PROT UC_MODE_64 UC_VERSION_MINOR
UC_ERR_ARG UC_HOOK_MEM_FETCH_UNMAPPED UC_MODE_ARM Uc
UC_ERR_EXCEPTION UC_HOOK_MEM_INVALID UC_MODE_BIG_ENDIAN UcError
UC_ERR_FETCH_PROT UC_HOOK_MEM_PROT UC_MODE_LITTLE_ENDIAN arm64_const
UC_ERR_FETCH_UNALIGNED UC_HOOK_MEM_READ UC_MODE_MCLASS arm_const
UC_ERR_FETCH_UNMAPPED UC_HOOK_MEM_READ_AFTER UC_MODE_MICRO debug
UC_ERR_HANDLE UC_HOOK_MEM_READ_INVALID UC_MODE_MIPS3 m68k_const
UC_ERR_HOOK UC_HOOK_MEM_READ_PROT UC_MODE_MIPS32 mips_const
UC_ERR_HOOK_EXIST UC_HOOK_MEM_READ_UNMAPPED UC_MODE_MIPS32R6 sparc_const
UC_ERR_INSN_INVALID UC_HOOK_MEM_UNMAPPED UC_MODE_MIPS64 uc_arch_supported
UC_ERR_MAP UC_HOOK_MEM_VALID UC_MODE_PPC32 uc_version
UC_ERR_MODE UC_HOOK_MEM_WRITE UC_MODE_PPC64 unicorn
UC_ERR_NOMEM UC_HOOK_MEM_WRITE_INVALID UC_MODE_QPX unicorn_const
UC_ERR_OK UC_HOOK_MEM_WRITE_PROT UC_MODE_SPARC32 version_bind
UC_ERR_READ_PROT UC_HOOK_MEM_WRITE_UNMAPPED UC_MODE_SPARC64 x86_const
UC_ERR_READ_UNALIGNED UC_MEM_FETCH UC_MODE_THUMB
UC_ERR_READ_UNMAPPED UC_MEM_FETCH_PROT UC_MODE_V8
UC_ERR_RESOURCE UC_MEM_FETCH_UNMAPPED UC_MODE_V9
A few examples of constants from unicorn.x86_const
:
UC_X86_REG_EAX
UC_X86_REG_RIP
UC_X86_REG_RAX
mu = Uc(arch, mode)
- get an instance of Uc
class. Here you specify the architecture.
Examples:
-
mu = Uc(UC_ARCH_X86, UC_MODE_64)
- get an Uc instance for architecturex86-64
-
mu = Uc(UC_ARCH_X86, UC_MODE_32)
- get an Uc instance for architecturex86-32
mu.mem_map(ADDRESS, 4096)
- map a memory region.
mu.mem_write(ADDRESS, DATA)
- write data to memory.
tmp = mu.mem_read(ADDRESS, SIZE)
- read data from memory.
mu.reg_write(UC_X86_REG_ECX, 0x0)
- set a register to a new value.
r_esp = mu.reg_read(UC_X86_REG_ESP)
- read a value from a register.
mu.emu_start(ADDRESS_START, ADDRESS_END)
- start emulation.
instruction tracking:
def hook_code(mu, address, size, user_data):
print('>>> Tracing instruction at 0x%x, instruction size = 0x%x' %(address, size))
mu.hook_add(UC_HOOK_CODE, hook_code)
This code adds a hook. We define our own function hook_code
that is called before emulation of each instruction.
It takes following arguments:
- our
Uc
instance - address of the instruction
- size of the instruction
- user data (we can pass this value in optional argument of
hook_add()
)