Overview

Hexagon CPU design presents several, unique, challenges to a reverse engineer:

Scale: with scalar and vector extensions, Hexagon has more than 2000 distinct instructions.
Multi-threading: with four execution slots, this CPU is inherently multi-threaded. Instructions are groups in packets, where each packet has up to four instructions that run in parallel.
Data dependencies: instructions in a given packet can reference data produced by other instructions in the same packet. This ".new" register semantics is unique to this variable length instruction CPU.
Branch semantics: a packet can have up to two branch instructions. There are many branch types: direct vs indirect, conditional vs unconditional and jump vs call. Modeling this unique branch semantics is rather challenging: only a single branch may be taken at the end of packet processing, subject to some ordering rules.

The way this plugin tackles the complexity described above is through automatic code generation. The plugin has build-time components that parse instruction descriptors, and automatically generate the LLIL lifting code. At runtime, the plugin has components that track packet level data, implement the ".new" and branch semantics.

Components

(Read from top to bottom, left to right).

Instruction definitions: a dataset that describes instructions encoding, behavior and semantics. This is part of QEMU's Hexagon target code base, located at third_party/qemu-hexagon/. For example, alu.idef has the following description for the A2_add instruction:

Q6INSN(A2_add,"Rd32=add(Rs32,Rt32)",ATTRIBS(),
"Add 32-bit registers",
{ RdV=RsV+RtV;})

Instruction attributes: a data structure that holds instruction attributes, available to C programs. These header files are generated by a set of scripts in /third_party/qemu_hexagon/, and consumed at runtime by the instruction decoder.
Decoder: decodes a vector of 32b words to a sequence of Hexagon instructions, grouped in a single packet. Decoder fails safe when it cannot decode a given input. Decoder fills the following information for each instruction: instruction id (or tag), operands information (immediate values or registers).
Instruction Text Tokens Generator: gen_insn_text_funcs.py parses instruction definitions, and generates code that implements BN's GetInstructionText API for each instruction. This works by parsing the behavior descriptor using a grammar, then transforming the resulting tree into a sequence of BinaryNinja::InstructionTextTokens. For example, A2_add has the following descriptor "Rd32=add(Rs32,Rt32)". This is parsed into tree:

assign_to_op
  reg
    Rd32
  ...
  call_exp
    ...
    call2
      insn      add
      reg
        Rs32
      reg
        Rt32

and transformed into the following sequence of tokens:

void tokenize_A2_add(uint64_t pc, const Packet &pkt, const Insn &insn,
                     std::vector<InstructionTextToken> &result) {
  result.emplace_back(RegisterToken, StrCat("R", insn.regno[0]));
  result.emplace_back(TextToken, " = ");
  result.emplace_back(InstructionToken, "add");
  result.emplace_back(TextToken, "(");
  result.emplace_back(RegisterToken, StrCat("R", insn.regno[1]));
  result.emplace_back(TextToken, ",");
  result.emplace_back(RegisterToken, StrCat("R", insn.regno[2]));
  result.emplace_back(TextToken, ")");
}

Instruction Utils: this module implements BN's GetInstructionText API by calling the generated instruction tokenizers. In addition, it implements BN's GenInstructionInfo API: it analyzes decoder's information, and reports packet's branch targets.
Packet Database: maps binary addresses to instruction packets. BinaryNinja works at a single instruction level, however, in order to properly model an instruction, knowledge on its neighboring packet instructions is needed.
Instruction IL Generator: gen_il_funcs.py parses instruction definitions, and generated code that implements BN's GetInstructionLowLevelIL API for each (supported) instruction. This works by parsing the semantics descriptor using a C-like grammar, then transforming the resulting tree into a sequence of operations on BinaryNinja::LowLevelILFunction object. This builds an equivalent symbolic model, and effectively lifts the instruction. For example, A2_add has the following descriptor "{ RdV=RsV+RtV;}". This C-code is parsed into tree:

  multi_stmt
    assg_stmt
      assg
        RdV
        expr_binop
          RsV
          +
          RtV

and transformed into the following sequence of operations:

void lift_A2_add(Architecture *arch, uint64_t pc, const Packet &pkt,
                 const Insn &insn, int insn_num, PacketContext &ctx) {
  LowLevelILFunction &il = ctx.IL();
  const int RdV = ctx.AddDestWriteOnlyReg(MapRegNum('R', insn.regno[0]));
  const int RsV = MapRegNum('R', insn.regno[1]);
  const int RtV = MapRegNum('R', insn.regno[2]);
  il.AddInstruction(il.SetRegister(
      4, RdV, il.Add(4, il.Register(4, RsV), il.Register(4, RtV))));
}

A more involved example is A2_pxort instruction. It has the following semantics:

SEMANTICS( \
    "A2_pxort", \
    "if (Pu4) ""Rd32=xor(Rs32,Rt32)", \
    """{if(fLSBOLD(PuV)){RdV=RsV^RtV;} else {CANCEL;}}""" \
)

This is modeled using IL "if" statements:

void lift_A2_pxort(Architecture *arch, uint64_t pc, const Packet &pkt,
                   const Insn &insn, int insn_num, PacketContext &ctx) {
  LowLevelILFunction &il = ctx.IL();
  const int PuV = MapRegNum('P', insn.regno[0]);
  const int RdV = ctx.AddDestWriteOnlyReg(MapRegNum('R', insn.regno[1]));
  const int RsV = MapRegNum('R', insn.regno[2]);
  const int RtV = MapRegNum('R', insn.regno[3]);
  {
    LowLevelILLabel true_case, done;
    il.AddInstruction(il.If(il.Register(4, PuV), true_case, done));
    il.MarkLabel(true_case);
    il.AddInstruction(il.SetRegister(
        4, RdV, il.Xor(4, il.Register(4, RsV), il.Register(4, RtV))));
    il.MarkLabel(done);
  }
}

IL utils: this module implements BN's GetInstructionLowLevelIL API by calling the generated instruction lifters. It lifts all instructions in a packet, and models the packet's branch semantics.
Packet Context: is an auxiliary object that tracks all clobbered registers in a packet. This is used by IL utils module.
Plugin: program's entry point, it implements and registers the new 'Hexagon' architecture module. Architecture module stores decoded instructions in packet_db, calls out to insn_util to disassemble instructions, and il_util to lift packets.

References

QEMU's Hexagon target by Taylor Simpson from Qualcomm Innovation Center.
Qualcomm Hexagon V67 Programmer’s Reference Manual, 80-N2040-45 Rev. B, February 25, 2020. Can be downloaded from Hexagon SDK website.
Binary Ninja API and documentation.
Official BN architecture plugins: arch-x86, arch-arm64, arch-mips.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

design.md

design.md

Overview

Components

References

Files

design.md

Latest commit

History

design.md

File metadata and controls

Overview

Components

References