Skip to content

Implementation details

Rohan Padhye edited this page Dec 13, 2017 · 6 revisions

Instrumentation

JQF uses the ASM toolkit to instrument Java bytecode on-the-fly as classes are loaded by the JVM. JQF uses a heavily modified version of the janala2 classes, originally developed by Koushik Sen for performing concolic execution of Java programs.

Janala instruments all applications classes and several JDK classes (classes with names java.lang.* and sun.* are not instrumented). The instrumentation injects a static method call after every bytecode instruction to a class that essentially snoops on everything that is being executed. There is one static method call per bytecode instruction type; the arguments to this call are a uniquely generated instruction identifier (the iid) as well as instruction operands.

The snooping class can therefore recreate the entire execution state of any JVM thread by maintaining a shadow operand stack and call stack.

Trace Events

In JQF, we ignore snoops on most arithmetic and logical data manipulating instructions. We are mainly interested in control-flow instructions such as method calls and branches. Unfortunately, Java has many different ways of invoking method calls (e.g. INVOKESTATIC, INVOKEINTERFACE, INVOKEVIRTUAL, INVOKESPECIAL) as well as branching (e.g. IF_CMPNE, GOTO, TABLESWITCH, LOOKUPSWITCH) plus many other control-flow instructions (e.g. IRETURN, ARETURN, ATHROW). Additionally, there is a lot of noise from instrumentation across multiple JVM threads plus the fact that the JVM hijacks application threads to invoke class loaders when classes are referenced for the first time. Therefore, JQF collects these bytecode instructions and forms a coherent sequence of trace events that are emitted to a Guidance instance.

Trace events provide a higher-level abstraction than bytecode instructions. For example, conditional jumps, gotos and switch statements are analyzed and a BRANCH event is emitted. Similarly, the various method call and return instructions are coalesced into CALL and RETURN events.

By matching call and return events, JQF builds a shadow call stack per-thread. This is useful for many reasons such as ignoring events generated before and after a test method is invoked in the fuzzing loop (so as not to slow-down the test input generation) as well as ignoring events generated by JVM class-loading.

High-performance queues

For maximizing performance, JQF maintains a shadow thread for each application thread, called a tracer. A tracer processes the bytecode instructions executed by its application thread and emits the appropriate trace events to the Guidance instance in a coherent sequence.

The static methods invoked by the injected code in application threads put these instructions on thread-local queues, which are shared only with the thread's own tracer. These queues are implemented as a lock-free single-producer single-consumer circular buffers with busy waiting -- this is only guaranteed to be deadlock-free if the producer and consumer thread are fixed and distinct, which is true for JQF because the producer is always the application thread and the consumer is always its shadow thread.

AFL front-end

AFL is designed to perform coverage-guided fuzzing of C programs that take input through a file or STDIN. AFL communicates with target programs using a 256KB shared memory region called trace_bits, which is a map of 216 keys to 8-bit counters. AFL instruments the target program such that on each basic block transition (i.e. control-flow graph edge), the program increments the 8-bit counter corresponding to the key associated with that edge (this association is done statically at instrumentation time). The AFL fuzzing loop is roughly as follows:

u8* trace_bits = create_shared_memory();
while(1) {
  input = generate_next_input();
  run_target(input);
  if (has_new_bits(trace_bits)) {
    save_input(input);
  }
}

Where has_new_bits is a side-effecting function that returns true only if trace_bits has new non-zero bits as compared to the union of all bits seen cumulatively in all the previous iterations.

AFL-JQF proxy

The AFL integration in JQF is via a proxy program that sits in between AFL and JQF and communicates with each of them.

The proxy masquerades as a target C program to AFL, in that running the proxy with some input causes the trace_bits to be updated. However, instead of being an AFL-instrumented C program, the proxy is in fact just a mediator that communicates with AFLGuidance using FIFO pipes.

The AFLGuidance instance registers a callback with JQF to handle trace events by updating its own copy of trace_bits, which are then sent to the proxy whenever Guidance#handleResult() is invoked. The proxy in turn updates the shared memory region and signals to AFL that a run is complete. The method Guidance#hasInput() blocks until the proxy receives input from AFL. If a test run results in a FAILURE (e.g. due to an assertion error), then JQF informs the proxy as such, which fakes a segmentation fault signal to AFL; thus, AFL marks the input as a crash and saves it for further analysis.