Skip to content

Latest commit

 

History

History
567 lines (476 loc) · 14.8 KB

6-symbolic-execution.org

File metadata and controls

567 lines (476 loc) · 14.8 KB

Symbolic execution

Reading material:

Uniform program representation

Problem

  • we have programs in many different formats
  • often in offensive security no access to sources
  • if we want to analyze code we need an uniform way to represent it, so to reuse the analysis on different platform

Example

int main(int argc, char**argv) {
    if (argc > 1 && strcmp(argv[1], "Th3s3cr37") == 0) {
        printf("You got me\n");
    } else {
        printf("No way!\n");
    }
}

x86

main:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $16, %rsp
    movl    %edi, -4(%rbp)
    movq    %rsi, -16(%rbp)
    cmpl    $1, -4(%rbp)
    jle .L2
    movq    -16(%rbp), %rax
    addq    $8, %rax
    movq    (%rax), %rax
    movl    $.LC0, %esi
    movq    %rax, %rdi
    call    strcmp
    testl   %eax, %eax
    jne .L2
    movl    $.LC1, %edi
    call    puts
    jmp .L3
.L2:
    movl    $.LC2, %edi
    call    puts
.L3:
    movl    $0, %eax
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc

ARM

main:
    push    {fp, lr}
    add fp, sp, #4
    sub sp, sp, #8
    str r0, [fp, #-8]
    str r1, [fp, #-12]
    ldr r3, [fp, #-8]
    cmp r3, #1
    ble .L2
    ldr r3, [fp, #-12]
    add r3, r3, #4
    ldr r3, [r3]
    ldr r1, .L5
    mov r0, r3
    bl  strcmp
    mov r3, r0
    cmp r3, #0
    bne .L2
    ldr r0, .L5+4
    bl  puts
    b   .L3
.L2:
    ldr r0, .L5+8
    bl  puts
.L3:
    mov r3, #0
    mov r0, r3
    sub sp, fp, #4
    @ sp needed
    pop {fp, lr}
    bx  lr

Instrumentation

Idea

  • all machines (CPUs or VMs) operate in a very similar way
  • they all have:
    • registers (just one accumulator for stack machines like JVM/.NET CLR, a fixed number like in x86/ARM, an infinite number like in DEX/Android and LLVM-IR)
    • a stack
    • a heap
  • they all do:
    • load/store operations between registers and RAM (x86 does this implicitly)
    • math operations (e.g. addition, subtraction, logical and/or/xor)
    • control operations (branching, function call/return) ## Let’s generalize!

Enter Valgrind/VEX

img/6-symbolic-execution_files/image.png

VEX

  • there are many intermediate representations that try to generalize CPU codes, tailored at different use cases (e.g. LLVM, radare’s ESIL, BAP’s IR, VEX)
  • VEX is tailored towards accurate execution and instrumentation of code

Example

The following ARM instruction:

subs R2, R2, #8

Becomes this VEX IR:

t0 = GET:I32(16)
t1 = 0x8:I32
t3 = Sub32(t0,t1)
PUT(16) = t3
PUT(68) = 0x59FC8:I32

Let’s play with VEX

import angr, claripy
project = angr.Project("examples/ex01")
irsb = project.factory.block(0x0000000000401136).vex
irsb.pp()
IRSB {
   t0:Ity_I64 t1:Ity_I64 t2:Ity_I64 t3:Ity_I64 t4:Ity_I64 t5:Ity_I64 t6:Ity_I64 t7:Ity_I32 t8:Ity_I32 t9:Ity_I32 t10:Ity_I64 t11:Ity_I64 t12:Ity_I64 t13:Ity_I64 t14:Ity_I64 t15:Ity_I64 t16:Ity_I32 t17:Ity_I64 t18:Ity_I64 t19:Ity_I64 t20:Ity_I64 t21:Ity_I64 t22:Ity_I64 t23:Ity_I64 t24:Ity_I64 t25:Ity_I1 t26:Ity_I64 t27:Ity_I64 t28:Ity_I64 t29:Ity_I64 t30:Ity_I64 t31:Ity_I64 t32:Ity_I32 t33:Ity_I64 t34:Ity_I64 t35:Ity_I1 t36:Ity_I32 t37:Ity_I32 t38:Ity_I1

   00 | ------ IMark(0x401136, 1, 0) ------
   01 | t0 = GET:I64(rbp)
   02 | t12 = GET:I64(rsp)
   03 | t11 = Sub64(t12,0x0000000000000008)
   04 | PUT(rsp) = t11
   05 | STle(t11) = t0
   06 | ------ IMark(0x401137, 3, 0) ------
   07 | PUT(rbp) = t11
   08 | ------ IMark(0x40113a, 4, 0) ------
   09 | t2 = Sub64(t11,0x0000000000000010)
   10 | PUT(rsp) = t2
   11 | PUT(rip) = 0x000000000040113e
   12 | ------ IMark(0x40113e, 3, 0) ------
   13 | t14 = Add64(t11,0xfffffffffffffffc)
   14 | t17 = GET:I64(rdi)
   15 | t32 = 64to32(t17)
   16 | t16 = t32
   17 | STle(t14) = t16
   18 | PUT(rip) = 0x0000000000401141
   19 | ------ IMark(0x401141, 4, 0) ------
   20 | t18 = Add64(t11,0xfffffffffffffff0)
   21 | t20 = GET:I64(rsi)
   22 | STle(t18) = t20
   23 | PUT(rip) = 0x0000000000401145
   24 | ------ IMark(0x401145, 4, 0) ------
   25 | t21 = Add64(t11,0xfffffffffffffffc)
   26 | t9 = LDle:I32(t21)
   27 | PUT(cc_op) = 0x0000000000000007
   28 | t33 = 32Uto64(t9)
   29 | t23 = t33
   30 | PUT(cc_dep1) = t23
   31 | PUT(cc_dep2) = 0x0000000000000001
   32 | PUT(rip) = 0x0000000000401149
   33 | ------ IMark(0x401149, 2, 0) ------
   34 | t36 = 64to32(0x0000000000000001)
   35 | t37 = 64to32(t23)
   36 | t35 = CmpLE32S(t37,t36)
   37 | t34 = 1Uto64(t35)
   38 | t30 = t34
   39 | t38 = 64to1(t30)
   40 | t25 = t38
   41 | if (t25) { PUT(rip) = 0x401173; Ijk_Boring }
   NEXT: PUT(rip) = 0x000000000040114b; Ijk_Boring
}

Analyses

Once we have a way to represent arbitrary binary files we can:

  • run static analysis
  • instrument the code
  • fuzz the binary
  • run dynamic analysis

Static analysis

Most useful at compile time, but can be run also on binaries. The basic idea: describe mathematically a piece of code and learn interesting properties instead of running it.

img/6-symbolic-execution_files/image.png

Example: null-pointer analysis

#include <stdio.h>

void main(void) {
    char *s = NULL;
    *s = 'x';
}

what happens if we run this?

Bad news!

Rice’s theorem:

All non-trivial, semantic properties of programs are undecidable

Good news!

Bruni’s postulates:

  • We have jobs for the foreseable future :)
  • And we have tools that can help us

Dynamic analyses

If we cannot decide things statically, then we can:

  • instrument and run the code
  • fuzz the binary
  • run symbolic execution

Binary instrumentation

  • take the IR of a program and change it as to reveal software flaws
  • e.g. add memory bounds checking to every memory access, to find potential buffer overflows
  • look at https://www.valgrind.org/info/tools.html for different analyses

Limitation

  • triggering flaws requires that we execute all code paths
  • one can use unit tests or..

Fuzzying

img/6-symbolic-execution_files/image.png https://lcamtuf.coredump.cx/afl/

  • American Fuzzy Lop is a coverage based fuzzer: it runs the program with different inputs to find failure states
  • Basic: random fuzzying
  • More advanced: guided fuzzying with input grammars, genetic input generation to improve coverage and accelerate search
  • Ultimately, the approach is limited: it relies on randomization to find new path, so certain “rare” paths are hard to find

Example

int main(void) {
    char buf[32];
    char *data = read_string();
    unsigned int magic = read_number();

    if (magic == 0x31337987) {
        memcpy(buf, data, 100); // buffer overflow
    }
}

Symbolic execution

Idea: augment code coverage by building a set of constraints on the input that allow to choose a specific path

Example

int main(void) {
    char buf[32];
    char *data = read_string();
    unsigned int magic = read_number();

    if (magic == 0x31337987) {
        memcpy(buf, data, 100); // buffer overflow
    }
}
  • To reach the buffer overflow, we add the constraint that magic = 0x31337987=
  • We can treat every decision point as a branch on a tree, and try to explore all possible branches.
  • SMT stands for Satisfiability Modulo Theory
  • essentially solving the SAT problem (known NP-complete but efficient solutions are available) where each atom relies on a theory
  • SAT: (A \/ B \/ C) \ (!A \ B) /\ (!B) e.g. \((a ∨ b ∨ c) ∧ (\lnot a ∨ b) ∧ (\lnot b)\
  • theories: empty theory, linear arithmetic, nonlinear arithmetic, bitvectors, arrays, datatypes, quantifiers, strings (from Z3)
from z3 import *
x = Real('x'); y = Real('y')
s = Solver()
s.add(x < 10)
s.add(y > x)
print(s.check())
s.model()
sat

[y = 1, x = 0]

Symbolic execution (cont.)

  • once we have a tool to express path constraints (SMT) we can implement any search procedure for the branch tree: DFS, BFS, etc

Simple Symbolic Execution (from GIST)

  • dynamic languages support easy instrumentation
  • proxy objects: they simulate the interface of “real” objects but add new functionality
def find_path(path, f, *args, **kwargs):
    log = []
    def unwrap(other):
        try:
            return other.v
        except:
            return other
    class Sym:
        def __init__(self, v):
            self.v = v
            
        def __repr__(self):
            return self.v.__repr__()

        # Arithmetic operators
        def __add__(self, other):
            return Sym(self.v.__add__(unwrap(other)))

        def __sub__(self, other):
            return Sym(self.v.__sub__(unwrap(other)))

        def __mul__(self, other):
            return Sym(self.v.__mul__(unwrap(other)))

        def __div__(self, other):
            return Sym(self.v.__div__(unwrap(other)))

        def __neg__(self):
            return Sym(self.v.__neg__())
    
        # Comparison operators
        def __eq__(self, other):
            return Sym(self.v.__eq__(unwrap(other)))

        def __ne__(self, other):
            return Sym(self.v.__ne__(unwrap(other)))

        def __le__(self, other):
            return Sym(self.v.__le__(unwrap(other)))
        
        def __ge__(self, other):
            return Sym(self.v.__ge__(unwrap(other)))

        def __lt__(self, other):
            return Sym(self.v.__lt__(unwrap(other)))

        def __gt__(self, other):
            return Sym(self.v.__gt__(unwrap(other)))

        # Boolean operators
        def __and__(self, other):
            return Sym(And(self.v, unwrap(other)))

        def __or__(self, other):
            return Sym(Or(self.v, unwrap(other)))

        def __not__(self):
            return Sym(Not(self.v))

        # This is the crucial bit: when we evaluate a boolean
        # expression, we check what branch we need to take (true or
        # false), and add a constraint to the solver accordingly
        def __bool__(self):
            if path.pop():
                log.append(self.v)
                return True
            else:
                log.append(Not(self.v))
                return False

    f(*map(Sym,args), **kwargs)
    print(log)
    s = Solver()
    map(lambda x: s.add(x), log)
    res = s.check()
    print(res)
    if res:
        print(s.model())
    else:
        print('unsat')

def f(x, y):
    x = x + y*2
    print(x)
    if x == 5:
        z = 0
    else:
        z = 1

find_path([True], f, Real('x'), Real('y'))
x + y*2
[x + y*2 == 5]
sat
[]

Concolic execution in Angr

  • angr.io is a framework for running analyses that puts together all the concepts above, and more
  • concolic execution: mixes symbolic constraints and concrete inputs to improve coverage
  • you can implement your own search procedure

Example:

int main(void) {
    char buf[32];
    char *data = read_string();
    unsigned int magic = read_number();

    if (magic == 0x31337987) {  // difficult check for fuzzing
        memcpy(buf, data, 100); // buffer overflow
    }

    if (magic < 100 && magic % 15 == 2 && magic % 11 == 6) {
        // Only solution is 17; safe
        memcpy(buf, data, magic);
    }
    
    // Symbolic execution will suffer from path explosion
    int count = 0;
    for (int i = 0; i < 100; i++) {
        if (data[i] == 'Z') {
            count++;
        }
    }

    if (count >= 8 && count <= 16) {
        memcpy(buf, data, count*20); // buffer overflow
    }
}
project = angr.Project('ex01')
argv = claripy.BVS('argv', 10*0x8)
state = project.factory.entry_state(args = ['ex01', argv])
simgr = project.factory.simgr(state)
simgr.explore(find = lambda s: b'You got me' in s.posix.dumps(1))
print(simgr.found[0].solver.eval(argv, cast_to = bytes))
project = angr.Project("examples/ex02")
state = project.factory.entry_state()
simgr = project.factory.simgr(state)
simgr.explore(find = 0x00401245)#lambda s: b'You got it' in s.posix.dumps(1))
simgr.found[0].posix.dumps(0)
import angr
project = angr.Project("examples/ex03")
from angr.exploration_techniques import dfs
state = project.factory.entry_state()
simgr = project.factory.simgr(state)
dfs.DFS().setup(simgr)
simgr.explore(find = 0x4012ae)
simgr.found[0].posix.dumps(0)

Installing angr:

virtualenv env
source env/bin/activate
pip install angr

Program slices

def fibandfact(n):
    a,b = 1,1
    c = 1
    for i in range(n):
        a,b = a+b,a
        c = c*(i+1)
    return b,c

def fib(n):
    a,b = 1,1
    for i in range(n):
        a,b = a+b,a
    return b

def fact(n):
    c = 1
    for i in range(n):
        c = c*(i+1)
    return c

fibandfact(100) == (fib(100), fact(100))

Exercise:

Getting good at using angr requires knowledge of reverse engineering, binary exploitation, etc. Utilizing it is a disciplin. Therefore, we invite you to study and run some of the easier binaries, which also have a run-through of the thought process in the solve.py. https://docs.angr.io/en/latest/examples.html

So:

  • Install/Setup angr.
  • (likewise, consider getting reverse engineering tools like ghidra or IDA to dissassemble and read binaries).
  • Read up on core concepts for angr.docs and get to know the flow of it.

Reversing:

Fauxware

  • When you’re ready, got to the introduction program and try out Fauxware.
  • Fauxware is a very basic program solved with angr.
  • Study the solution file and the source. Does it make sense? What happens if you run solve.py and piping
  • it to ./fauxware ?

“CSAW CTF 2015 Quals - Reversing 500, “wyvern”.

  • It’s a challenge from a CTF competition, it requires just little user input.
  • As before, study the solution file and the source. Does it make sense? What happens if you run solve.py and piping

it to ./wyvern ?

Vulnerability Discovery:

strcpy

  • Try out strcpy. How fast did your machine solve it? Did you find anything different from the reversing challenges?

Getting a proper grasp of angr, takes time, so feel free to check some of the other examples out in the future and if you are ready to solve some of the challenges, go straight ahead :-)