Unicorn Engine tutorial

In this tutorial you will learn how to use Unicorn Engine by solving practical exercises. There are 4 exercises and I will solve the first exercise for you. For the others I am providing hints and solutions, which you can obtain by clicking the spoiler buttons.

Fast FAQ:

What is Unicorn Engine?

It is an emulator. Not usual though. You don’t emulate whole program or system. Also, it doesn’t support syscalls. You have to map memory and write data into it manually, then you can start the emulation from a chosen address.

When is it useful?
- You can call an interesting function from the malware, without creating a harmfull process
- CTF’s
- Fuzzing
- Plugin for gdb that predicts the future, for example further jumps.
- Emulating obfuscated code.
What do I need to have installed for this tutorial?
- Unicorn Engine with python binding
- A disassembler

Task 1
Some notes
Task 2
Task 3
Task 4
Cheatsheet
References

Task 1

This is a task from hxp CTF 2017 called Fibonacci.
The binary can be downloaded here.

When we run this program, we can notice that it computes and prints our flag but very slowly. Every next byte of the flag is computed slower and slower.

The flag is: hxp{F

It means it is necessary to optimize the program to get the flag (in a reasonable amount of time).

We’ve decompiled the code to C-like pseudocode, with help of IDA Pro. Although the code is not necessarily decompiled properly, we can get some idea of what is happening.

__int64 __fastcall main(__int64 a1, char **a2, char **a3)
{
  void *v3; // rbp@1
  int v4; // ebx@1
  signed __int64 v5; // r8@2
  char v6; // r9@3
  __int64 v7; // r8@3
  char v8; // cl@3
  __int64 v9; // r9@5
  int a2a; // [sp+Ch] [bp-1Ch]@3

  v3 = &encrypted_flag;
  v4 = 0;
  setbuf(stdout, 0LL);
  printf("The flag is: ", 0LL);
  while ( 1 )
  {
    LODWORD(v5) = 0;
    do
    {
      a2a = 0;
      fibonacci(v4 + v5, &a2a);
      v8 = v7;
      v5 = v7 + 1;
    }
    while ( v5 != 8 );
    v4 += 8;
    if ( (unsigned __int8)(a2a << v8) == v6 )
      break;
    v3 = (char *)v3 + 1;
    _IO_putc((char)(v6 ^ ((_BYTE)a2a << v8)), stdout);
    v9 = *((char *)v3 - 1);
  }
  _IO_putc(10, stdout);
  return 0LL;
}

unsigned int __fastcall fibonacci(int i, _DWORD *a2)
{
  _DWORD *v2; // rbp@1
  unsigned int v3; // er12@3
  unsigned int result; // eax@3
  unsigned int v5; // edx@3
  unsigned int v6; // esi@3
  unsigned int v7; // edx@4

  v2 = a2;
  if ( i )
  {
    if ( i == 1 )
    {
      result = fibonacci(0, a2);
      v5 = result - ((result >> 1) & 0x55555555);
      v6 = ((result - ((result >> 1) & 0x55555555)) >> 2) & 0x33333333;
    }
    else
    {
      v3 = fibonacci(i - 2, a2);
      result = v3 + fibonacci(i - 1, a2);
      v5 = result - ((result >> 1) & 0x55555555);
      v6 = ((result - ((result >> 1) & 0x55555555)) >> 2) & 0x33333333;
    }
    v7 = v6 + (v5 & 0x33333333) + ((v6 + (v5 & 0x33333333)) >> 4);
    *v2 ^= ((BYTE1(v7) & 0xF) + (v7 & 0xF) + (unsigned __int8)((((v7 >> 8) & 0xF0F0F) + (v7 & 0xF0F0F0F)) >> 16)) & 1;
  }
  else
  {
    *a2 ^= 1u;
    result = 1;
  }
  return result;
}

Here is assembly code of main function:

.text:0x4004E0 main            proc near               ; DATA XREF: start+1Do
.text:0x4004E0
.text:0x4004E0 var_1C          = dword ptr -1Ch
.text:0x4004E0
.text:0x4004E0                 push    rbp
.text:0x4004E1                 push    rbx
.text:0x4004E2                 xor     esi, esi        ; buf
.text:0x4004E4                 mov     ebp, offset unk_4007E1
.text:0x4004E9                 xor     ebx, ebx
.text:0x4004EB                 sub     rsp, 18h
.text:0x4004EF                 mov     rdi, cs:stdout  ; stream
.text:0x4004F6                 call    _setbuf
.text:0x4004FB                 mov     edi, offset format ; "The flag is: "
.text:0x400500                 xor     eax, eax
.text:0x400502                 call    _printf
.text:0x400507                 mov     r9d, 49h
.text:0x40050D                 nop     dword ptr [rax]
.text:0x400510
.text:0x400510 loc_400510:                             ; CODE XREF: main+8Aj
.text:0x400510                 xor     r8d, r8d
.text:0x400513                 jmp     short loc_40051B
.text:0x400513 ; ---------------------------------------------------------------------------
.text:0x400515                 align 8
.text:0x400518
.text:0x400518 loc_400518:                             ; CODE XREF: main+67j
.text:0x400518                 mov     r9d, edi
.text:0x40051B
.text:0x40051B loc_40051B:                             ; CODE XREF: main+33j
.text:0x40051B                 lea     edi, [rbx+r8]
.text:0x40051F                 lea     rsi, [rsp+28h+var_1C]
.text:0x400524                 mov     [rsp+28h+var_1C], 0
.text:0x40052C                 call    fibonacci
.text:0x400531                 mov     edi, [rsp+28h+var_1C]
.text:0x400535                 mov     ecx, r8d
.text:0x400538                 add     r8, 1
.text:0x40053C                 shl     edi, cl
.text:0x40053E                 mov     eax, edi
.text:0x400540                 xor     edi, r9d
.text:0x400543                 cmp     r8, 8
.text:0x400547                 jnz     short loc_400518
.text:0x400549                 add     ebx, 8
.text:0x40054C                 cmp     al, r9b
.text:0x40054F                 mov     rsi, cs:stdout  ; fp
.text:0x400556                 jz      short loc_400570
.text:0x400558                 movsx   edi, dil        ; c
.text:0x40055C                 add     rbp, 1
.text:0x400560                 call    __IO_putc
.text:0x400565                 movzx   r9d, byte ptr [rbp-1]
.text:0x40056A                 jmp     short loc_400510
.text:0x40056A ; ---------------------------------------------------------------------------
.text:0x40056C                 align 10h
.text:0x400570
.text:0x400570 loc_400570:                             ; CODE XREF: main+76j
.text:0x400570                 mov     edi, 0Ah        ; c
.text:0x400575                 call    __IO_putc
.text:0x40057A                 add     rsp, 18h
.text:0x40057E                 xor     eax, eax
.text:0x400580                 pop     rbx
.text:0x400581                 pop     rbp
.text:0x400582                 retn
.text:0x400582 main            endp

Assembly code of fibonacci function looks as follows:

.text:0x400670 fibonacci       proc near               ; CODE XREF: main+4Cp
.text:0x400670                                         ; fibonacci+19p ...
.text:0x400670                 test    edi, edi
.text:0x400672                 push    r12
.text:0x400674                 push    rbp
.text:0x400675                 mov     rbp, rsi
.text:0x400678                 push    rbx
.text:0x400679                 jz      short loc_4006F8
.text:0x40067B                 cmp     edi, 1
.text:0x40067E                 mov     ebx, edi
.text:0x400680                 jz      loc_400710
.text:0x400686                 lea     edi, [rdi-2]
.text:0x400689                 call    fibonacci
.text:0x40068E                 lea     edi, [rbx-1]
.text:0x400691                 mov     r12d, eax
.text:0x400694                 mov     rsi, rbp
.text:0x400697                 call    fibonacci
.text:0x40069C                 add     eax, r12d
.text:0x40069F                 mov     edx, eax
.text:0x4006A1                 mov     ebx, eax
.text:0x4006A3                 shr     edx, 1
.text:0x4006A5                 and     edx, 55555555h
.text:0x4006AB                 sub     ebx, edx
.text:0x4006AD                 mov     ecx, ebx
.text:0x4006AF                 mov     edx, ebx
.text:0x4006B1                 shr     ecx, 2
.text:0x4006B4                 and     ecx, 33333333h
.text:0x4006BA                 mov     esi, ecx
.text:0x4006BC
.text:0x4006BC loc_4006BC:                             ; CODE XREF: fibonacci+C2j
.text:0x4006BC                 and     edx, 33333333h
.text:0x4006C2                 lea     ecx, [rsi+rdx]
.text:0x4006C5                 mov     edx, ecx
.text:0x4006C7                 shr     edx, 4
.text:0x4006CA                 add     edx, ecx
.text:0x4006CC                 mov     esi, edx
.text:0x4006CE                 and     edx, 0F0F0F0Fh
.text:0x4006D4                 shr     esi, 8
.text:0x4006D7                 and     esi, 0F0F0Fh
.text:0x4006DD                 lea     ecx, [rsi+rdx]
.text:0x4006E0                 mov     edx, ecx
.text:0x4006E2                 shr     edx, 10h
.text:0x4006E5                 add     edx, ecx
.text:0x4006E7                 and     edx, 1
.text:0x4006EA                 xor     [rbp+0], edx
.text:0x4006ED                 pop     rbx
.text:0x4006EE                 pop     rbp
.text:0x4006EF                 pop     r12
.text:0x4006F1                 retn
.text:0x4006F1 ; ---------------------------------------------------------------------------
.text:0x4006F2                 align 8
.text:0x4006F8
.text:0x4006F8 loc_4006F8:                             ; CODE XREF: fibonacci+9j
.text:0x4006F8                 mov     edx, 1
.text:0x4006FD                 xor     [rbp+0], edx
.text:0x400700                 mov     eax, 1
.text:0x400705                 pop     rbx
.text:0x400706                 pop     rbp
.text:0x400707                 pop     r12
.text:0x400709                 retn
.text:0x400709 ; ---------------------------------------------------------------------------
.text:0x40070A                 align 10h
.text:0x400710
.text:0x400710 loc_400710:                             ; CODE XREF: fibonacci+10j
.text:0x400710                 xor     edi, edi
.text:0x400712                 call    fibonacci
.text:0x400717                 mov     edx, eax
.text:0x400719                 mov     edi, eax
.text:0x40071B                 shr     edx, 1
.text:0x40071D                 and     edx, 55555555h
.text:0x400723                 sub     edi, edx
.text:0x400725                 mov     esi, edi
.text:0x400727                 mov     edx, edi
.text:0x400729                 shr     esi, 2
.text:0x40072C                 and     esi, 33333333h
.text:0x400732                 jmp     short loc_4006BC
.text:0x400732 fibonacci       endp

There are many possible ways of solving this task. We can for example reconstruct the code in one of the programming languages and then apply optimizations there. The process of reconstructing code is not easy and of course, we can introduce bugs and errors. Staring at code to spot a mistake is not funny at all. By solving this task with Unicorn Engine we can skip the process of reconstructing code and avoid problems like mentioned above. We could skip rewriting code in a few other ways - for example by scripting gdb or using Frida.

Before applying optimizations we will first emulate normal program, without optimizations in Unicorn Engine. After the success, we will optimize it.

Part 1: Let’s emulate the program.

Let’s create a file named fibonacci.py and put the binary in the same folder.

Add the following code to the file:

from unicorn import *
from unicorn.x86_const import *

The first line loads main binary and basic unicorn constants. The second loads constants specific for both architectures x86 and x86-64.

Next, add the following lines:

import struct
 
def read(name):
    with open(name) as f:
        return f.read()
        
def u32(data):
    return struct.unpack("I", data)[0]
    
def p32(num):
    return struct.pack("I", num)

Here, we only added some usual functions that will be helpful later.

read just returns contents of the whole file.
u32 takes a 4-byte string and converts it to an integer which represents this data in little endian.
p32 makes the opposite - it takes a number and converts it to representing 4-byte string in little endian.

If you have pwntools installed, you don’t need to create these functions. Just do from pwn import *.

Let’s initialize our Unicorn Engine class for architecture x86-64:

mu = Uc (UC_ARCH_X86, UC_MODE_64)

We need to call function Uc with following arguments:

first - main architecture branch. The constant starts with UC_ARCH_
second - further architecture specification. The constant starts with UC_MODE_

You can find a full list of architecture constants in Cheatsheet.

As I wrote previously, to use Unicorn Engine we need to initialize virtual memory manually. For this binary we need to write code somewhere and also allocate a stack.

The base of the binary is 0x400000. Let’s say our stack will start at address 0x0 and have size 1024*1024. Probably we don’t need so much space, but it won’t hurt us.

We can map our memory by calling the mem_map method.

Add following lines:

BASE = 0x400000
STACK_ADDR = 0x0
STACK_SIZE = 1024*1024

mu.mem_map(BASE, 1024*1024)
mu.mem_map(STACK_ADDR, STACK_SIZE)

Now, we need to load the binary at our base address, like loader does. Then we need to set RSP to point at the end of our stack.

mu.mem_write(BASE, read("./fibonacci"))
mu.reg_write(UC_X86_REG_RSP, STACK_ADDR + STACK_SIZE - 1)

We can start the emulation and run our code, but we need to know what is the start address and where emulator should stop.

We can start emulating the code at address 0x00000000004004E0 which is the first address of main. The end can be 0x0000000000400575. This is putc("\n") which is called after our whole flag is printed out. Look:

.text:0x400570                 mov     edi, 0Ah        ; c
.text:0x400575                 call    __IO_putc

We can begin our simulation:

mu.emu_start(0x00000000004004E0, 0x0000000000400575)

Now, we can run this script:

a@x:~/Desktop/unicorn_engine_lessons$ python solve.py 
Traceback (most recent call last):
  File "solve.py", line 32, in <module>
    mu.emu_start(0x00000000004004E0, 0x0000000000400575)
  File "/usr/local/lib/python2.7/dist-packages/unicorn/unicorn.py", line 288, in emu_start
    raise UcError(status)
unicorn.unicorn.UcError: Invalid memory read (UC_ERR_READ_UNMAPPED)

Oooops, something is wrong and we don’t know what. Right before mu.emu_start we can add:

def hook_code(mu, address, size, user_data):  
    print('>>> Tracing instruction at 0x%x, instruction size = 0x%x' %(address, size)) 

mu.hook_add(UC_HOOK_CODE, hook_code)

This code adds a hook. We define our own function hook_codethat is called before emulation of each instruction. It takes following arguments:

our Uc instance
address of the instruction
size of the instruction
user data (we can pass this value in optional argument of hook_add())

At this point, our script should look like solve1.py

When we run it, we can see:

a@x:~/Desktop/unicorn_engine_lessons$ python solve.py 
>>> Tracing instruction at 0x4004e0, instruction size = 0x1
>>> Tracing instruction at 0x4004e1, instruction size = 0x1
>>> Tracing instruction at 0x4004e2, instruction size = 0x2
>>> Tracing instruction at 0x4004e4, instruction size = 0x5
>>> Tracing instruction at 0x4004e9, instruction size = 0x2
>>> Tracing instruction at 0x4004eb, instruction size = 0x4
>>> Tracing instruction at 0x4004ef, instruction size = 0x7
Traceback (most recent call last):
  File "solve.py", line 41, in <module>
    mu.emu_start(0x00000000004004E0, 0x0000000000400575)
  File "/usr/local/lib/python2.7/dist-packages/unicorn/unicorn.py", line 288, in emu_start
    raise UcError(status)
unicorn.unicorn.UcError: Invalid memory read (UC_ERR_READ_UNMAPPED)

This means that our script fails while executing following instruction:

.text:0x4004EF                 mov     rdi, cs:stdout  ; stream

This instruction reads memory from address 0x601038 (You can see it in IDA Pro). This is .bss section and it is not allocated by us. My solution for this problem is just to skip all instructions that are problematic.

Below there is an instruction:

.text:0x4004F6                 call    _setbuf

We can’t call any glibc function because we don’t have glibc loaded to virtual memory. We don’t need to call this function anyway so we can also skip it.

This is a full list of instructions to skip:

.text:0x4004EF                 mov     rdi, cs:stdout  ; stream
.text:0x4004F6                 call    _setbuf
.text:0x400502                 call    _printf
.text:0x40054F                 mov     rsi, cs:stdout  ; fp

We can skip instructions by writing to the register RIP address of next instruction:

mu.reg_write(UC_X86_REG_RIP, address+size)

hook_code should now look like this:

instructions_skip_list = [0x00000000004004EF, 0x00000000004004F6, 0x0000000000400502, 0x000000000040054F]

def hook_code(mu, address, size, user_data):  
    print('>>> Tracing instruction at 0x%x, instruction size = 0x%x' %(address, size))
    
    if address in instructions_skip_list:
        mu.reg_write(UC_X86_REG_RIP, address+size)

We also have to do something with the instructions that print out the flag byte-by-byte.

.text:0x400558                 movsx   edi, dil        ; c
.text:0x40055C                 add     rbp, 1
.text:0x400560                 call    __IO_putc

__IO_putc takes a byte to print out in the first argument (that is register RDI).

We can read a value from register RDI, print it out and skip emulating this instruction. The hook_code function at this point should look like below:

instructions_skip_list = [0x00000000004004EF, 0x00000000004004F6, 0x0000000000400502, 0x000000000040054F]

def hook_code(mu, address, size, user_data):  
    #print('>>> Tracing instruction at 0x%x, instruction size = 0x%x' %(address, size))
    
    if address in instructions_skip_list:
        mu.reg_write(UC_X86_REG_RIP, address+size)
    
    elif address == 0x400560: #that instruction writes a byte of the flag
        c = mu.reg_read(UC_X86_REG_RDI)
        print(chr(c))
        mu.reg_write(UC_X86_REG_RIP, address+size)

At this point the whole code should look like solve2.py

We can run it and see that it works, although still slowly.

a@x:~/Desktop/unicorn_engine_lessons$ python solve.py 
h
x

Part 2: Improve the speed!

Let’s think about the speed improvements. Why is this program so slow?

Looking at the decompiled code, we can see that main() calls fibonacci() several times and fibonacci() is a recursive function.
Taking look at this function we can see that it takes 2 arguments and it returns 2 values. The first return value in passed via RAX register, the second via reference through the second argument. Taking a deeper look at both main() and fibonacci() we can notice that the second argument can only take value of 0 or 1. If we don’t see it, we can run gdb and set a breakpoint at the beginning of fibonacci function.

To optimize this function we can use dynamic programming to remember return values for given arguments. Since the second argument takes only 2 values, it is enough to remember only 2*MAX_OF_FIRST_ARGUMENT pairs.

When RIP points to the beginning of the fibonacci function, we can obtain the function arguments. We know function return values when exiting the function. Since we don’t know both things at once we need to use a stack that will help us to obtain both things when exiting - in fibonacci entry we need to push arguments to stack and pop at the end. To remember pairs we can use a dictionary.

How to hold the values of pairs?

At the beginning of the function we can check if return values are memorized in the dictionary for these arguments
- If they are, we can return this pair. We just write return values to reference and RAX. We also set RIP to address of some RET instruction to exit the function. We cannot jump to RET in fibonacci function because this instruction is hooked. That’s why we jump to RET in main.
- If they are not present in the dictionary, we add arguments to the stack.
While exiting the function, we can save return values. We know arguments and reference pointer, because we can read them from our stack structure (here called “stack”).

The code is shown below:

FIBONACCI_ENTRY = 0x0000000000400670
FIBONACCI_END = [0x00000000004006F1, 0x0000000000400709]

stack = []                                          # Stack for storing the arguments
d = {}                                              # Dictionary that holds return values for given function arguments 

def hook_code(mu, address, size, user_data):  
    #print('>>> Tracing instruction at 0x%x, instruction size = 0x%x' %(address, size))
    
    if address in instructions_skip_list:
        mu.reg_write(UC_X86_REG_RIP, address+size)
    
    elif address == 0x400560:                       # That instruction writes a byte of the flag
        c = mu.reg_read(UC_X86_REG_RDI)
        print(chr(c))
        mu.reg_write(UC_X86_REG_RIP, address+size)
    
    elif address == FIBONACCI_ENTRY:                # Are we at the beginning of fibonacci function?
        arg0 = mu.reg_read(UC_X86_REG_RDI)          # Read the first argument. Tt is passed via RDI
        r_rsi = mu.reg_read(UC_X86_REG_RSI)         # Read the second argument which is a reference
        arg1 = u32(mu.mem_read(r_rsi, 4))           # Read the second argument from reference
        
        if (arg0,arg1) in d:                        # Check whether return values for this function are already saved.
            (ret_rax, ret_ref) = d[(arg0,arg1)]
            mu.reg_write(UC_X86_REG_RAX, ret_rax)   # Set return value in RAX register
            mu.mem_write(r_rsi, p32(ret_ref))       # Set retun value through reference
            mu.reg_write(UC_X86_REG_RIP, 0x400582)  # Set RIP to point at RET instruction. We want to return from fibonacci function
            
        else:
            stack.append((arg0,arg1,r_rsi))         # If return values are not saved for these arguments, add them to stack.
        
    elif address in FIBONACCI_END:
        (arg0, arg1, r_rsi) = stack.pop()           # We know arguments when exiting the function
        
        ret_rax = mu.reg_read(UC_X86_REG_RAX)       # Read the return value that is stored in RAX
        ret_ref = u32(mu.mem_read(r_rsi,4))         # Read the return value that is passed reference
        d[(arg0, arg1)]=(ret_rax, ret_ref)          # Remember the return values for this argument pair

Just in case, the full script you can download here. solve3.py

Hurrah! We’ve managed to successfully optimize the program using Unicorn Engine. Good job.

Some notes

Now, I encourage you to do a small homework. Below you can find 3 tasks, every one has a hint and solution available (through clicking spoiler button). You can look at Cheatsheet when solving the tasks.

I think that one of the problems is to know the name of interesting constant. The best way to deal with it is to use IPython tab completion. When you have IPython installed you can type from unicorn import UC_ARCH_ and press TAB - all consatnts starting with this prefix will be printed out.

Task 2

Analyze the following shellcode:

shellcode = "\xe8\xff\xff\xff\xff\xc0\x5d\x6a\x05\x5b\x29\xdd\x83\xc5\x4e\x89\xe9\x6a\x02\x03\x0c\x24\x5b\x31\xd2\x66\xba\x12\x00\x8b\x39\xc1\xe7\x10\xc1\xef\x10\x81\xe9\xfe\xff\xff\xff\x8b\x45\x00\xc1\xe0\x10\xc1\xe8\x10\x89\xc3\x09\xfb\x21\xf8\xf7\xd0\x21\xd8\x66\x89\x45\x00\x83\xc5\x02\x4a\x85\xd2\x0f\x85\xcf\xff\xff\xff\xec\x37\x75\x5d\x7a\x05\x28\xed\x24\xed\x24\xed\x0b\x88\x7f\xeb\x50\x98\x38\xf9\x5c\x96\x2b\x96\x70\xfe\xc6\xff\xc6\xff\x9f\x32\x1f\x58\x1e\x00\xd3\x80"

As you can see, the assembly is obfuscated (command disasm is a part of pwntools) :

a@x:~/Desktop/unicorn_engine_lessons$ disasm e8ffffffffc05d6a055b29dd83c54e89e96a02030c245b31d266ba12008b39c1e710c1ef1081e9feffffff8b4500c1e010c1e81089c309fb21f8f7d021d86689450083c5024a85d20f85cfffffffec37755d7a0528ed24ed24ed0b887feb509838f95c962b9670fec6ffc6ff9f321f581e00d380
   0:    e8 ff ff ff ff           call   0x4
   5:    c0 5d 6a 05              rcr    BYTE PTR [ebp+0x6a], 0x5
   9:    5b                       pop    ebx
   a:    29 dd                    sub    ebp, ebx
   c:    83 c5 4e                 add    ebp, 0x4e
   f:    89 e9                    mov    ecx, ebp
  11:    6a 02                    push   0x2
  13:    03 0c 24                 add    ecx, DWORD PTR [esp]
  16:    5b                       pop    ebx
  17:    31 d2                    xor    edx, edx
  19:    66 ba 12 00              mov    dx, 0x12
  1d:    8b 39                    mov    edi, DWORD PTR [ecx]
  1f:    c1 e7 10                 shl    edi, 0x10
  22:    c1 ef 10                 shr    edi, 0x10
  25:    81 e9 fe ff ff ff        sub    ecx, 0xfffffffe
  2b:    8b 45 00                 mov    eax, DWORD PTR [ebp+0x0]
  2e:    c1 e0 10                 shl    eax, 0x10
  31:    c1 e8 10                 shr    eax, 0x10
  34:    89 c3                    mov    ebx, eax
  36:    09 fb                    or     ebx, edi
  38:    21 f8                    and    eax, edi
  3a:    f7 d0                    not    eax
  3c:    21 d8                    and    eax, ebx
  3e:    66 89 45 00              mov    WORD PTR [ebp+0x0], ax
  42:    83 c5 02                 add    ebp, 0x2
  45:    4a                       dec    edx
  46:    85 d2                    test   edx, edx
  48:    0f 85 cf ff ff ff        jne    0x1d
  4e:    ec                       in     al, dx
  4f:    37                       aaa
  50:    75 5d                    jne    0xaf
  52:    7a 05                    jp     0x59
  54:    28 ed                    sub    ch, ch
  56:    24 ed                    and    al, 0xed
  58:    24 ed                    and    al, 0xed
  5a:    0b 88 7f eb 50 98        or     ecx, DWORD PTR [eax-0x67af1481]
  60:    38 f9                    cmp    cl, bh
  62:    5c                       pop    esp
  63:    96                       xchg   esi, eax
  64:    2b 96 70 fe c6 ff        sub    edx, DWORD PTR [esi-0x390190]
  6a:    c6                       (bad)
  6b:    ff 9f 32 1f 58 1e        call   FWORD PTR [edi+0x1e581f32]
  71:    00 d3                    add    bl, dl
  73:    80                       .byte 0x80

Note that the architecture is x86-32 now. List of syscalls numbers can be found here.

Hint:

You can hook an instruction int 80h. It is represented by cd 80. Next, you can read registers and memory. Remember that shellcode is a code that can be loaded at any address and most of shellcodes use stack.

Solution:

The code below was created in several steps. Thanks to UE error messages, it was possible to obtain the clues, that later helped implementing the final solution.

from unicorn import *
from unicorn.x86_const import *

shellcode = "\xe8\xff\xff\xff\xff\xc0\x5d\x6a\x05\x5b\x29\xdd\x83\xc5\x4e\x89\xe9\x6a\x02\x03\x0c\x24\x5b\x31\xd2\x66\xba\x12\x00\x8b\x39\xc1\xe7\x10\xc1\xef\x10\x81\xe9\xfe\xff\xff\xff\x8b\x45\x00\xc1\xe0\x10\xc1\xe8\x10\x89\xc3\x09\xfb\x21\xf8\xf7\xd0\x21\xd8\x66\x89\x45\x00\x83\xc5\x02\x4a\x85\xd2\x0f\x85\xcf\xff\xff\xff\xec\x37\x75\x5d\x7a\x05\x28\xed\x24\xed\x24\xed\x0b\x88\x7f\xeb\x50\x98\x38\xf9\x5c\x96\x2b\x96\x70\xfe\xc6\xff\xc6\xff\x9f\x32\x1f\x58\x1e\x00\xd3\x80" 


BASE = 0x400000
STACK_ADDR = 0x0
STACK_SIZE = 1024*1024

mu = Uc (UC_ARCH_X86, UC_MODE_32)

mu.mem_map(BASE, 1024*1024)
mu.mem_map(STACK_ADDR, STACK_SIZE)


mu.mem_write(BASE, shellcode)
mu.reg_write(UC_X86_REG_ESP, STACK_ADDR + STACK_SIZE/2)

def syscall_num_to_name(num):
    syscalls = {1: "sys_exit", 15: "sys_chmod"}
    return syscalls[num]

def hook_code(mu, address, size, user_data):
    #print('>>> Tracing instruction at 0x%x, instruction size = 0x%x' %(address, size))  
    
    machine_code = mu.mem_read(address, size)
    if machine_code == "\xcd\x80":
        
        r_eax = mu.reg_read(UC_X86_REG_EAX)
        r_ebx = mu.reg_read(UC_X86_REG_EBX)
        r_ecx = mu.reg_read(UC_X86_REG_ECX)
        r_edx = mu.reg_read(UC_X86_REG_EDX)
        syscall_name = syscall_num_to_name(r_eax)
        
        print "--------------"
        print "We intercepted system call: "+syscall_name
        
        if syscall_name == "sys_chmod":
            s = mu.mem_read(r_ebx, 20).split("\x00")[0]
            print "arg0 = 0x%x -> %s" % (r_ebx, s)
            print "arg1 = " + oct(r_ecx)
        elif syscall_name == "sys_exit":
            print "arg0 = " + hex(r_ebx)
            exit()
        
        mu.reg_write(UC_X86_REG_EIP, address + size)
        
mu.hook_add(UC_HOOK_CODE, hook_code)

mu.emu_start(BASE, BASE-1)

The result of this code is:

a@x:~/Desktop/unicorn_engine_lessons$ python solve_task2.py
--------------
We intercepted system call: sys_chmod
arg0 = 0x400058 -> /etc/shadow
arg1 = 0666L
--------------
We intercepted system call: sys_exit
arg0 = 0x400058L

Task 3

Download this binary. It was compiled with the following command:
gcc function.c -m32 -o function.
The code of this binary is presented below:

int strcmp(char *a, char *b)
{
    //get length
    int len = 0;
    char *ptr = a;
    while(*ptr)
    {
        ptr++;
        len++;
    }
    
    //comparestrings
    for(int i=0; i<=len; i++)
    {
        if (a[i]!=b[i])
            return 1;
    }
    
    return 0;
}

__attribute__((stdcall))
int  super_function(int a, char *b)
{
    if (a==5 && !strcmp(b, "batman"))
    {
        return 1;
    }
    return 0;
}

int main()
{
    super_function(1, "spiderman");
}

The task is to call super_function in a way that it will return 1.

The assembly code is:

.text:0x8048464 super_function  proc near               ; CODE XREF: main+16p
.text:0x8048464
.text:0x8048464 arg_0           = dword ptr  8
.text:0x8048464 arg_4           = dword ptr  0Ch
.text:0x8048464
.text:0x8048464                 push    ebp
.text:0x8048465                 mov     ebp, esp
.text:0x8048467                 call    __x86_get_pc_thunk_ax
.text:0x804846C                 add     eax, 1B94h
.text:0x8048471                 cmp     [ebp+arg_0], 5
.text:0x8048475                 jnz     short loc_8048494
.text:0x8048477                 lea     eax, (aBatman - 804A000h)[eax] ; "batman"
.text:0x804847D                 push    eax
.text:0x804847E                 push    [ebp+arg_4]
.text:0x8048481                 call    strcmp
.text:0x8048486                 add     esp, 8
.text:0x8048489                 test    eax, eax
.text:0x804848B                 jnz     short loc_8048494
.text:0x804848D                 mov     eax, 1
.text:0x8048492                 jmp     short locret_8048499
.text:0x8048494 ; ---------------------------------------------------------------------------
.text:0x8048494
.text:0x8048494 loc_8048494:                            ; CODE XREF: super_function+11j
.text:0x8048494                                         ; super_function+27j
.text:0x8048494                 mov     eax, 0
.text:0x8048499
.text:0x8048499 locret_8048499:                         ; CODE XREF: super_function+2Ej
.text:0x8048499                 leave
.text:0x804849A                 retn    8
.text:0x804849A super_function  endp

Hint:

According to stdcall calling convention the stack should look like in the picture below when the emulation starts at the beginning of function. On this image RET is just return address (there can be any value).

stack

Solution:

Task 4

This task is similar to the first one. The difference is that the architecture is not x86 anymore. It is ARM32 little-endian.

a@x:~/Desktop/unicorn_engine_lessons$ file task4
task4: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, for GNU/Linux 3.2.0, BuildID[sha1]=3dbf508680ba3d023d3422025954311e1d8fb4a1, not stripped

You can download the binary here.
Arm calling convention will help you.

Right answer:

Hint:

Solution:

from unicorn import *
from unicorn.arm_const import *


import struct

def read(name):
    with open(name) as f:
        return f.read()
        
def u32(data):
    return struct.unpack("I", data)[0]
    
def p32(num):
    return struct.pack("I", num)


mu = Uc (UC_ARCH_ARM, UC_MODE_LITTLE_ENDIAN)


BASE = 0x10000
STACK_ADDR = 0x300000
STACK_SIZE = 1024*1024

mu.mem_map(BASE, 1024*1024)
mu.mem_map(STACK_ADDR, STACK_SIZE)


mu.mem_write(BASE, read("./task4"))
mu.reg_write(UC_ARM_REG_SP, STACK_ADDR + STACK_SIZE/2)

instructions_skip_list = []

CCC_ENTRY = 0x000104D0
CCC_END = 0x00010580

stack = []                                          # Stack for storing the arguments
d = {}                                              # Dictionary that holds return values for given function arguments 

def hook_code(mu, address, size, user_data):  
    #print('>>> Tracing instruction at 0x%x, instruction size = 0x%x' %(address, size))
    
    if address == CCC_ENTRY:                        # Are we at the beginning of ccc function?
        arg0 = mu.reg_read(UC_ARM_REG_R0)           # Read the first argument. it is passed by R0
        
        if arg0 in d:                               # Check whether return value for this function is already saved.
            ret = d[arg0]
            mu.reg_write(UC_ARM_REG_R0, ret)        # Set return value in R0
            mu.reg_write(UC_ARM_REG_PC, 0x105BC)    # Set PC to point at "BX LR" instruction. We want to return from fibonacci function
            
        else:
            stack.append(arg0)                      # If return value is not saved for this argument, add it to stack.
        
    elif address == CCC_END:
        arg0 = stack.pop()                          # We know arguments when exiting the function
        
        ret = mu.reg_read(UC_ARM_REG_R0)            # Read the return value (R0)
        d[arg0] = ret                               # Remember the return value for this argument
        
mu.hook_add(UC_HOOK_CODE, hook_code)

mu.emu_start(0x00010584, 0x000105A8)

return_value = mu.reg_read(UC_ARM_REG_R1)           # We end the emulation at printf("%d\n", ccc(x)).
print "The return value is %d" % return_value

CheatSheet

from unicorn import * - Loads main unicorn library. It contains functions and basic constants.
from unicorn.x86_const import * - Loads constants specific for architectures x86 and x86-64

All consts in module unicorn:

UC_API_MAJOR                UC_ERR_VERSION              UC_MEM_READ                 UC_PROT_ALL
UC_API_MINOR                UC_ERR_WRITE_PROT           UC_MEM_READ_AFTER           UC_PROT_EXEC
UC_ARCH_ARM                 UC_ERR_WRITE_UNALIGNED      UC_MEM_READ_PROT            UC_PROT_NONE
UC_ARCH_ARM64               UC_ERR_WRITE_UNMAPPED       UC_MEM_READ_UNMAPPED        UC_PROT_READ
UC_ARCH_M68K                UC_HOOK_BLOCK               UC_MEM_WRITE                UC_PROT_WRITE
UC_ARCH_MAX                 UC_HOOK_CODE                UC_MEM_WRITE_PROT           UC_QUERY_MODE
UC_ARCH_MIPS                UC_HOOK_INSN                UC_MEM_WRITE_UNMAPPED       UC_QUERY_PAGE_SIZE
UC_ARCH_PPC                 UC_HOOK_INTR                UC_MILISECOND_SCALE         UC_SECOND_SCALE
UC_ARCH_SPARC               UC_HOOK_MEM_FETCH           UC_MODE_16                  UC_VERSION_EXTRA
UC_ARCH_X86                 UC_HOOK_MEM_FETCH_INVALID   UC_MODE_32                  UC_VERSION_MAJOR
UC_ERR_ARCH                 UC_HOOK_MEM_FETCH_PROT      UC_MODE_64                  UC_VERSION_MINOR
UC_ERR_ARG                  UC_HOOK_MEM_FETCH_UNMAPPED  UC_MODE_ARM                 Uc
UC_ERR_EXCEPTION            UC_HOOK_MEM_INVALID         UC_MODE_BIG_ENDIAN          UcError
UC_ERR_FETCH_PROT           UC_HOOK_MEM_PROT            UC_MODE_LITTLE_ENDIAN       arm64_const
UC_ERR_FETCH_UNALIGNED      UC_HOOK_MEM_READ            UC_MODE_MCLASS              arm_const
UC_ERR_FETCH_UNMAPPED       UC_HOOK_MEM_READ_AFTER      UC_MODE_MICRO               debug
UC_ERR_HANDLE               UC_HOOK_MEM_READ_INVALID    UC_MODE_MIPS3               m68k_const
UC_ERR_HOOK                 UC_HOOK_MEM_READ_PROT       UC_MODE_MIPS32              mips_const
UC_ERR_HOOK_EXIST           UC_HOOK_MEM_READ_UNMAPPED   UC_MODE_MIPS32R6            sparc_const
UC_ERR_INSN_INVALID         UC_HOOK_MEM_UNMAPPED        UC_MODE_MIPS64              uc_arch_supported
UC_ERR_MAP                  UC_HOOK_MEM_VALID           UC_MODE_PPC32               uc_version
UC_ERR_MODE                 UC_HOOK_MEM_WRITE           UC_MODE_PPC64               unicorn
UC_ERR_NOMEM                UC_HOOK_MEM_WRITE_INVALID   UC_MODE_QPX                 unicorn_const
UC_ERR_OK                   UC_HOOK_MEM_WRITE_PROT      UC_MODE_SPARC32             version_bind
UC_ERR_READ_PROT            UC_HOOK_MEM_WRITE_UNMAPPED  UC_MODE_SPARC64             x86_const
UC_ERR_READ_UNALIGNED       UC_MEM_FETCH                UC_MODE_THUMB               
UC_ERR_READ_UNMAPPED        UC_MEM_FETCH_PROT           UC_MODE_V8                  
UC_ERR_RESOURCE             UC_MEM_FETCH_UNMAPPED       UC_MODE_V9                  

A few examples of constants from unicorn.x86_const:

UC_X86_REG_EAX
UC_X86_REG_RIP
UC_X86_REG_RAX

mu = Uc(arch, mode) - get an instance of Uc class. Here you specify the architecture.
Examples:

mu = Uc(UC_ARCH_X86, UC_MODE_64) - get an Uc instance for architecture x86-64
mu = Uc(UC_ARCH_X86, UC_MODE_32) - get an Uc instance for architecture x86-32

mu.mem_map(ADDRESS, 4096) - map a memory region.
mu.mem_write(ADDRESS, DATA) - write data to memory.
tmp = mu.mem_read(ADDRESS, SIZE) - read data from memory.

mu.reg_write(UC_X86_REG_ECX, 0x0) - set a register to a new value. r_esp = mu.reg_read(UC_X86_REG_ESP) - read a value from a register.

mu.emu_start(ADDRESS_START, ADDRESS_END) - start emulation.

instruction tracking:

def hook_code(mu, address, size, user_data):  
    print('>>> Tracing instruction at 0x%x, instruction size = 0x%x' %(address, size))  

mu.hook_add(UC_HOOK_CODE, hook_code)

This code adds a hook. We define our own function hook_codethat is called before emulation of each instruction. It takes following arguments:

our Uc instance
address of the instruction
size of the instruction
user data (we can pass this value in optional argument of hook_add())

Table of Contents

Task 1

Part 1: Let’s emulate the program.

Part 2: Improve the speed!

Some notes

Task 2

Task 3

Task 4

CheatSheet

References