Skip to content

Developers

xd009642 edited this page Jan 25, 2023 · 29 revisions

Contents

  1. Overview
  2. Barriers to Cross-Platform Support
  3. Building other projects
  4. Source Analysis
  5. Instrumenting code
  6. Process Tracing
  7. Generating Reports
  8. Debugging

Overview

This section aims to lay out some of the structure of Tarpaulin and provides a technical reference for the technologies underpinning it. In each specific section check the end for a list of files the section covers. Note it is out of date and largely just describes the state of ptrace coverage. Llvm coverage instrumentation is now supported as a coverage backend!

When Tarpaulin runs, the basic sequence of events is as follows (assuming the user is collecting coverage):

  1. Parse options and look for any project specific Tarpaulin configuartion
  2. Build the tests for the project to cover with any user-supplied options
  3. Take the list of test binaries and for each one:
    • Run source analysis on the project identifying lines that can't be covered or will be absent from debug info but can be covered
    • Load the object file and parse to find lines that can be instrumented
    • Return a list of these lines and their addresses to populate with coverage stats
    • Fork the process, have the child run the test and the parent instrument it
    • The parent now steps through the code like a debugger logging coverage stats
  4. Merge all the coverage statistics to get unified stats for the project
  5. Generate any reports and save or send them

We can split Tarpaulin into a few function areas as a result

  1. Handling configuration
  2. Interactions with the Rust build system (compiler, linker, Cargo)
  3. Source code analysis: what lines are uncoverable #[derive(..)], what lines will be missed (unused templates)
  4. Finding lines to instrument - understanding what's in the DWARF tables or x86_64 assembly
  5. Tracing the process - ptrace in Linux, other tools for other operating systems
  6. Coverage reports - codecov.io, coveralls.io, Cobertura.xml, HTML, other useful formats
  7. General usability/code quality

As well as these areas involving the implementation of Tarpaulin, work also has to be done in testing and documentation to ensure correctness and make Tarpaulin easy to use.

Barriers to Cross-Platform Support (with ptrace)

With only Linux support currently available, there's obviously a demand for support for other operating systems. Using ptrace the lowest effort OS to add support for is probably the numerous BSDs (excluding Apple).

For support via another OS processing tracing API the following challenges would need to be tackled:

  1. Loading the test binaries and getting the debug information
  2. Launching and tracing the test binaries (ptrace and the windows equivalent)
  3. Disabling any OS level security features like ASLR that could prevent tracing

Tarpaulin also currently only supports x64 systems to AMD and Intel 64 bit processors. This is due to needing processor specific opcodes to add breakpoints and processor registers - the program counter. However, this in theory, wouldn't be a large amount of work. But ptrace support can vary wildly between different operating systems and architectures. Given 64 bit support seems ubiquitous in the world of Intel/AMD there's been no demand for 32 bit support.

Apple note: ptrace has been gutted of it's most useful parts in Apple to restrict reverse engineering efforts. Instead Apple offers a debug port API which works with signed binaries. I don't own any Apple devices and didn't make much headway even signing my binary when I borrowed one for a month to try and progress Mac support.

probe-rs

Additionally, there's an opportunity to use probe-rs to collect coverage for tests deployed to embedded devices. Both of these are long term goals and have tracking issues #549 and #675 respectively.

Building Other Projects

Tarpaulin calls to Cargo as an external process to build the users tests and as such a lot of the arguments available are exposing cargo functionality. But in addition we also add some extra linker flags to ensure more accurate results and use the json output of cargo to find things like doctests.

Doctest Coverage

Rust allows for examples in doc comments to be ran as part of doctests to ensure your documentation works. Currently, there is an unstable feature to persist these generated doc test binaries and this is used for doctest coverage.

The doctest binary name is a combination of the file name with path separators and dots replaced with _. Then at the end of the name there are two fields, the line number and an index both delimited by underscores. This is probably because by replacing the path separators and dots with _ two different files with doctests on the same line can otherwise end up with identical names.

However, this poses a mild problem. If a doctest is marked as should_panic then the binary should panic and return a non-zero exit code. So tarpaulin works out if a binary should panic and marks it as passing if the return code is non-zero and failing if it's zero. Otherwise it maintains the same behaviour as other tests and propagates up the return code. No users (yet) have reported incorrect failures/passes with this but it's something to be aware of.

Source Analysis

Most coverage tools are designed to work with C which lacks a lot of abstractions that higher level languages have. This can cause language constructs which aren't actually executable code to be mistakenly included as misses and result in other code being omitted from results. Also, multiple addresses may map to the same expression, and with some code where the expression is split over multiple lines, you don't want that expression appearing more than once in the statistics. Below are some examples which Tarpaulin filters out, for more comprehensive examples look to the tests in the source_analysis module.

Code That Shouldn't be Included

Derive macros, the generated code is mapped to the derive statement partially although the executable lines exist outside the project source causing it to be flagged as a missed line.

#[derive(Debug)]
struct SomeStruct;

Code That Should be Included

Any unused meta-programming code won't generate any assembly, therefore isn't included in the debug tables. This means unused traits and templated functions need to be included in the statistics via source analysis. Also, unused inline functions don't generate assembly or debug information.

fn foo<T>(t: T) {
    // Some code
}

Relevant files:

  • src/source_analysis.rs

Instrumenting Code

Tarpaulin currently only works on Linux where the tests are ELF files and debug information is kept in the DWARF (Debugging With Attributed Record Formats) format. Parsing the DWARF tables is done via Gimli with Object used to load the ELF file.

Object is cross-platform with Linux, Mac and Windows support. Gimli would have to be replaced with an alternative for Windows, however, it should work on Apple operating systems.

Relevant files:

  • src/test_loader.rs

Process Tracing

Process tracing is done via the Ptrace API on Linux, unfortunately, the API is often not well documented and rather esoteric so can be a constant source of frustration. Because of this more than just the man page is often needed. The ptrace readme for strace (link) is a good starting point as well as anything you can glean from the GDB source.

Ptrace support also differs wildly between Linux and the BSDs with each having their own interpretation and levels of support complicating cross-platform support. An alternative to Ptrace will have to be found for Windows.

Once the test binary has been built and the instrumentation points identified we need to launch and trace the test. Tarpaulin at this point forks and the child sends a ptrace TRACE_ME request and launches the test with execve. execve is used because it means the test keeps the same PID as the child process used to launch it and the child is stopped with a SIGTRAP after execve is successful.

At this point, we now initialise the test by placing all the breakpoints. The breakpoint system relies on the INT3 instruction or software interrupt in x64. This instruction is written to each line and the previous instruction byte stored. These writes are aligned so may account for some false negatives in coverage results. When a breakpoint is hit a SIGTRAP is issued which waitpid will pick up. We then write the original byte back and send a ptrace step command. This will trigger another SIGTRAP again (although maybe not straight away as step continues the other threads as well). When this wait comes in we can re-add the breakpoint if that's desired and then continue execution.

With the SIGTRAP captured and the breakpoints placed the parent is now tracing the program. When the test hits one of these points it issues a SIGTRAP, Tarpaulin responds and then continues the test running. It is here we reach Tarpaulin's update loop in it's most basic form:

                +------------------+
                |                  |
          +-----+ Wait for signal  <--------+
          |     |                  |        |
          |     +------------------+        |
          |                                 |
+---------v----------+            +---------+--------+
|                    |            |                  |
|  Signalled by PID  |            |  Continue PID    |
|                    |            |                  |
+---------+----------+            +---------^--------+
          |                                 |
          |   +----------------------+      |
          |   |                      |      |
          +--->  Log coverage stats  +------+
              |                      |
              +----------------------+

Important note: when the parent is signalled by a PID, it can read data from the PID that signalled it as much as it likes and then continue or step the PID. It cannot interact with another PID until the one it got has been continued. Otherwise, a segfault may happen or the child process can crash or behave weirdly. Also, when one of the threads issues a stopped signal all the threads in that process are stopped. Similarly, when the thread is continued with ptrace all the threads are continued.

This complicates multi-threading (more on that later), as we add and remove instrumentation points then continue/step we modify the code being run. We do this by adding and removing the software interrupts so the original code can run. So we have our parent running and modifying the opcodes and our test binary is running any number of threads which are executing the opcodes. If two threads hit the same instruction at the same time, we'll handle whichever signal is raised first by removing the breakpoint and stepping forward one point in the code. At which point both threads will step.

Also, when you issue a step, the next signal may not be for the PID you stepped it may be another one that was pending when you received the signal you're processing.

  • When a PID signals you ptrace must continue that PID before interacting with another
  • When one thread stops they all stop
  • When a thread is continued they all continue
  • The test binary is mutable global data our parent is reading and writing and tests are reading
  • Just because you issue a single step doesn't mean that thread will be the next one handled

Struggles with keeping this above view consistent resulted in the --no-count option being made default and complicate implementing accurate condition coverage.

Waiting with Multithreaded Code

So as multiple threads can hit a breakpoint at the same time the point at which waitpid returns an event, there may be more events in the queue. Any that have hit a breakpoint need it disabled and to be stepped back to the start of the instruction otherwise you'll get a SIGILL as the ptrace continue/step commands will continue that thread with the program counter in an invalid position.

To handle this Tarpaulin will call waitpid until there are no more wait events and then going over this list of pending stops picking an action for each process/thread id. This design change fixed the majority of bugs in code with a lot of threading and removed the need to set --test-threads=1 when running test executables.

Tarpaulin's Update Loop (Simple)

This section will likely vary for every operating system. However, as currently only Linux is supported it's documented here to help people trying to figure out how Tarpaulin works.

While a test is running the state of the test is represented by a state machine (in src/statemachine.rs). This has a core state machine which has been designed to be platform agnostic and a handle to OS-specific data and handlers which implements the StateData trait. As the StateData trait determines most of the state transitions it doesn't make sense to view the state machine without including the OS-specific actions so the diagram below is what's done for Linux. Some explanation to this as well as parts of the state machine which are platform agnostic will be detailed below. Labels have been left off the transition edges for brevity.

                       +
                       |  +---------+
                       |  |         |
                 +-----v--v--+      |
     +-----------+   START   +------+
     |           +----+------+
     |                |
     |                |
     |           +----v------+
     |           |   INIT    +----------------+
     |           +----+------+                |
     |                |                       |
     |                |    +-----------+      |
     |                |    |           |      | 
     |          +-+---v----+-+         |      |
     |   +------>    WAIT    <---+     |      |
     |   |      +-----+--+---+   |     |      |
     |   |            |  |       |     |      |
     |   |            |  +-------+     |      |
     |   |            |                |      |
     |   |      +-----v------+         |      |
     |   +------|    STOP    +         |      |
     |          +-----+------+         |      |
     |                |                |      |
     |                |                |      |
     |         +------v------+         |      |
     +--------->     END     <---------+------+
               +-------------+

Initially, while the test is starting up it's in the START state waiting for the test to become available to initialise the breakpoints in INIT. For calls to waitpid the WNOHANG flag is used to prevent Tarpaulin freezing and having to be manually killed and a time is maintained to check for timeouts. If a timeout occurs the test exits.

Assuming nothing goes wrong in the run, once the test is initialised the basic update loop shown before is executed and is represented by the WAIT and STOP states. A timeout can also occur during WAIT if the test freezes for whatever reason i.e. infinite loops with --no-count. Once the test has finished executing the END state is entered, any resources freed and Tarpaulin will go to the next test to run or report the results.

Below is the state machine happy path assuming no errors and also no time waiting for signals to pop up.

             +
             |
             |
        +----v------+
        |   START   |
        +----+------+
             |
             |
        +----v------+
        |   INIT    |
        +----+------+
             |
             |
       +-----v------+
+------>    WAIT    |
|      +-----+------+
|            |
|            |
|            |
|      +-----v------+
+------+    STOP    |
       +-----+------+
             |
             |
      +------v------+
      |     END     |
      +-------------+

There are 4 conditions which lead directly to the test being stopped detailed below:

  • END - always called this is the final cleanup
  • TIMEOUT - a timeout occurred meaning a test ended up hanging or the timeout arg to Tarpaulin is too short
  • UNRECOVERABLE - something went wrong during execution of this test specifically
  • ABORT - something fundamentally wrong occurred which means Tarpaulin won't work for any other tests it has to run.

Previously, the only example of ABORT being used is if the test binaries aren't Position Independent Executables. This means code addresses have a random offset and this used to prevent Tarpaulin from placing breakpoints and will affected all the test binaries. Now however, Tarpaulin finds the offset by using procfs to read it from /proc/$PID.

Relevant files:

  • src/breakpoint.rs
  • src/ptrace_control.rs
  • src/statemachine/mod.rs
  • src/statemachine/linux.rs
  • src/traces.rs
  • src/processing_handling/linux.rs

Tarpaulin's Update Loop (Harder)

So the previous section is a minor simplication of how Tarpaulin handles running a test binary. Tarpaulin (currently not released), can now follow exec events and trace launched binaries. This is useful for things like CLI tests where a test may launch one of the binary outputs of the project with different args and check the output.

This adds an extra layer of complexity to handling the exec events but can be summed up as so.

  • A ptrace exec event occurs where the event data provides the new PID
  • Using procfs find the path to the binary
  • Call back into the test_loader module to get the TraceMap for the binary
  • Store the process PID mapping to the TraceMap

So when we get the address we've stopped at we need to know which binary the address is in. Because of this we now maintain a map from pid/tids to the parent process and use them to look up the TraceMap so the correct statistics are being updated.

Currently following executables is an opt-in feature with the --follow-exec flag as I've not seen enough projects to be happy that it will work for everyone.

Relevant files:

  • src/statemachine/linux.rs

Generating Reports

There are numerous report formats supported by Tarpaulin, apart from the stdout and html report outputs all of these are existing formats either requested by users or implemented for interop with other services. After running all the Tarpaulin configurations, every TraceMap is merged and the resulting coverage for the application is passed with the Config and the selected reports are generated.

The Tarpaulin output is also always written to a file in the target directory. This allows for the stdout reports to report changes in coverage (and in future the HTML reports). This is mainly for local usage and not CI where a user can use a free service like coveralls.io or codecov.io

Cobertura

Cobertura is an XML based format. Documentation for it's format is scarce online and it features a lot of redundant information so I host a DTD I found for it on a public gist found here

Debugging

Due to it's use of ptrace you can't attach a debugger like GDB to Tarpaulin which complicates fault finding. To tackle this there is a general reliance on logging. There are two main forms of debug logging:

  • stdout logging to the terminal
  • the event logger

For large projects the stdout logging quickly becomes unusable and so the event log output becomes invaluable. With this the EventLog struct exists globally and events are pushed into it as they happen. Then at the end of the run they are serialized to json. Previously, this was then rendered to a static SVG giving each PID/TID it's own vertical lane and edges connected them so you could visualise all interactions along a timeline.

This ended up being too much to render for large projects crashing browser based SVG renderers and making ones like Inkscape sluggish. As a result I've implemented my own renderer using Qt which can be found here. It's not really intended for general use so the interface may undergo sudden changes and it's not documented but it should be relatively intuitive to load traces and navigate via mouse or keyboard.

Other times when changing some behaviour it would be helpful to figure out if this truly fixes an issue or causes other issues. For this I've started work on tater which is a crater style tool for Tarpaulin. I'm hoping this will get more use and prove helpful when dropping big releases.