Skip to content
/ mcsema Public
forked from lifting-bits/mcsema

x86 to LLVM bitcode translation framework

License

Notifications You must be signed in to change notification settings

bdlmt/mcsema

 
 

Repository files navigation

McSema Slack Chat

McSema lifts x86 and amd64 binaries to LLVM bitcode modules. McSema support both Linux and Windows binaries, and most x86 and amd64 instructions, including integer, FPU, and SSE operations.

McSema is separated into two conceptual parts: control flow recovery and instruction translation. Control flow recovery is performed using the mcsema-disass tool, which uses IDA Pro to disassemble a binary file and produces a control flow graph. Instruction translation is performed using the mcsema-lift tool, which converts the control flow graph into LLVM bitcode.

Build status

master
Linux & macOS Build Status
Windows Build Status

Features

  • Translates 32- and 64-bit Linux ELF and Windows PE binaries to bitcode, including executables and shared libraries for each platform.
  • Supports a large subset of x86 and x86-64 instructions, including most integer, FPU, and SSE operations. Use mcsema-lift --list-supported --arch x86 to see a complete list.
  • Runs on both Windows and Linux, and can translate Linux binaries on Windows and Windows binaries on Linux.
  • Output bitcode is compatible with the LLVM 3.8 toolchain.
  • Translated bitcode can be analyzed or recompiled as a new, working executable with functionality identical to the original.
  • McSema runs on Windows and Linux and has been tested on Windows 7, 10, Ubuntu 14.04, and Ubuntu 16.04.

Using Mcsema

Why would anyone translate binaries back to bitcode?

  • Binary Patching And Modification. Lifting to LLVM IR lets you cleanly modify the target program. You can run obfuscation or hardening passes, add features, remove features, rewrite features, or even fix that pesky typo, grammatical error, or insane logic. When done, your new creation can be recompiled to a new binary sporting all those changes. In the Cyber Grand Challenge, we were able to use mcsema to translate challenge binaries to bitcode, insert memory safety checks, and then re-emit working binaries.

  • Symbolic Execution with KLEE. KLEE operates on LLVM bitcode, usually generated by providing source to the LLVM toolchain. Mcsema can lift a binary to LLVM bitcode, permitting KLEE to operate on previously unavailable targets.

  • Re-use existing LLVM-based tools. KLEE is not the only tool that becomes available for use on bitcode. It is possible to run LLVM optimization passes and other LLVM-based tools like libFuzzer on lifted bitcode.

  • Analyze the binary rather than the source. Source level analysis is great but not always possible (e.g. you don't have the source) and, even when it is available, it lacks compiler transformations, re-ordering, and optimizations. Analyzing the actual binary guarantees that you're analyzing the true executed behavior.

  • Write one set of analysis tools. Lifting to LLVM IR means that one set of analysis tools can work on both the source and the binary. Maintaining a single set of tools saves development time and effort, and allows for a single set of better tools.

Dependencies

Name Version
Git Latest
CMake 3.1+
Google Protobuf 2.6.1
LLVM 3.8
Clang 3.8 (3.9 if using Visual Studio 2015)
Python 2.7
Python Package Index Latest
python-protobuf 2.6.1
IDA Pro 6.7+
Visual Studio 2013+ (Windows Only)

Getting and building the code

On Linux

Step 1: Install dependencies

sudo apt-get update
sudo apt-get upgrade

sudo apt-get install \
     git \
     cmake \
     libprotoc-dev libprotobuf-dev libprotobuf-dev protobuf-compiler \
     python2.7 python-pip \
     llvm-3.8 clang-3.8 \
     realpath

sudo pip install --upgrade pip
sudo pip install 'protobuf==2.6.1'

Note: If you are using IDA on 64 bit Ubuntu and your IDA install does not use the system Python, you can add the protobuf library manually to IDA's zip of modules.

# Python module dir is generally in /usr/lib or /usr/local/lib
touch /path/to/python2.7/dist-packages/google/__init__.py
cd /path/to/lib/python2.7/dist-packages/              
sudo zip -rv /path/to/ida-6.X/python/lib/python27.zip google/
sudo chown your_user:your_user /home/taxicat/ida-6.7/python/lib/python27.zip

Step 2: Clone and enter the repository

git clone [email protected]:trailofbits/mcsema.git --depth 1

The Linux bootstrap script supports two configuration options:

  • --prefix: The installation directory prefix for mcsema-lift. Defaults to the directory containing the bootstrap script.
  • --build: Set the build type. Defaults to Debug.
cd mcsema
./bootstrap.sh --build Release

Step 3: Build and install the code

cd build
make
sudo make install

On Windows

Step 1: Install dependencies

Download and install Chocolatey. Then, open Powershell in administrator mode, and run the following:

choco install -y --allowemptychecksum git cmake python2 pip 7zip
choco install -y microsoft-visual-cpp-build-tools --installargs "/InstallSelectableItems Win81SDK_CppBuildSKUV1;Win10SDK_VisibleV1"

Mcsema should be built with clang. Newer versions of clang for Windows automatically integrate with Visual Studio. The mcsema build scripts rely on this integration. The minimum version of clang required is Clang 3.8 (when using VS 2013) or Clang 3.9 (when using VS 2015).

Sometimes cmake will not be available on the command line after being installed from Chocolatey. If you have this issue, install cmake from the official Windows installer.

Step 2: Clone the repository

Open the Developer Command Prompt for Visual Studio, and run:

cd C:\
if not exist git mkdir git
cd git

git clone https://github.com/trailofbits/mcsema.git --depth 1

Step 3: Build and install the code

cd mcsema
bootstrap

Try it Out

If you have a binary, you can get started with the following commands. First, you recover control flow graph information using mcsema-disass. For now, this needs to use IDA Pro as the disassembler.

mcsema-disass --disassembler /path/to/ida/idal64 --arch amd64 --os linux --output /tmp/ls.cfg --binary /bin/ls --entrypoint main

Once you have the control flow graph information, you can lift the target binary using mcsema-lift.

mcsema-lift --arch amd64 --os linux --cfg /tmp/ls.cfg --entrypoint main --output /tmp/ls.bc

There are a few things that we can do with the lifted bitcode. The usual thing to do is to recompile it back to an executable.

clang-3.8 -o /tmp/ls_lifted generated/ELF_64_linux.S /tmp/ls.bc -lpthread -ldl -lpcre /lib/x86_64-linux-gnu/libselinux.so.1

Additional Documentation

Getting help

If you are experiencing problems with McSema or just want to learn more and contribute, join the #tool-mcsema channel of the Empire Hacking Slack. Alternatively, you can join our mailing list at [email protected] or email us privately at [email protected].

FAQ

How do you pronounce McSema and where did the name come from?

McSema is pronounced 'em see se ma' and is short for machine code semantics.

Why do I need IDA Pro to use McSema?

McSema's goal is binary to bitcode translation. Accurate disassembly and control flow recovery is a separate and difficult problem. IDA has already invested countless man-hours into getting disassembly right, and it only makes sense that we re-use existing work. We understand that not everyone can afford an IDA license. With the original release of McSema, we shipped our own tool recursive descent disassembler. It was never as good as IDA and it never would be. Maintaining the broken tool took away valuable development time from more important McSema work. We hope to eventually transition to more accessible control flow recovery frontends, such as Binary Ninja (we have a branch with initial Binary Ninja support). We very warmly welcome pull requests that implement new control flow recovery frontends.

I'm a student and I'd like to contribute to McSema. How can I help?

We would love to take you on as an intern to help improve McSema. We have several project ideas labelled intern_project in the issues tracker. You are not limited to those items: if you think of a great feature you want in McSema, let us know and we will sponsor it. Simply contact us on our Slack channel or via [email protected] and let us know what you'd want to work on and why.

About

x86 to LLVM bitcode translation framework

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 89.9%
  • CMake 4.2%
  • Python 1.4%
  • C 1.3%
  • Shell 1.2%
  • GDB 0.9%
  • Other 1.1%