Skip to content

riscv_dis_opts_hashtable

Tsukasa OI edited this page Aug 5, 2022 · 20 revisions

Disassembler: Use faster hash table

Conflicts With

Aggregate performance benchmark should be available here.

  1. (You are here) Disassembler: Use faster hash table
  2. Disassembler: Minor optimizations (batch 1)
  3. Disassembler: Cache instruction class support

Feature Description

This patchset intends to improve performance on disassembling RISC-V code (which may possibly contain invalid data). It replaces riscv_hash (on opcodes/riscv-dis.c) with much faster data structure: sorted and partitioned hash table.

This is a technique actually used on SPARC architecture (opcodes/sparc-dis.c) and I simplified the algorithm even further. Unlike SPARC, RISC-V hashed opcode table is not a table to linked lists, it's just a table, pointing "start" elements in the sorted opcode list (per hash code) and a global tail.

I benchmarked some of the programs and I usually get 14-20% performance improvements while disassembling code section of compiled RISC-V ELF files (objdump -d $FILE). That is on the border of significance and pretty nice for such a small modification (with about 11KB heap memory allocation on 64-bit environment).

This is not the end. This structure significantly improves plain binary file handling (on objdump, objdump -b binary -m riscv:rv[32|64] -D $FILE). I tested on various binary files including random one and big vmlinux images and I confirmed significant performance improvements (over 50% on many cases). This is partially due to the fact that, disassembling about one quarter of invalid "instruction" words required iterating over one thousand opcode entries (348 or more being vector instructions with OP-V, that can be easily skipped with this new data structure). Another reason for this significance is it doesn't have various ELF overhead.

Other Fixes

Performance Improvements

On disassembling linked RISC-V ELF programs using objdump, performance improvements achieved by this patchset is usually about 14-20%. Pretty good for such a simple change.

This is relative to the latest master (commit e4e340a3ff2).

objdump -d (ELF)

Program Improvements Notes
Busybox 1.35.1 (RV64GC) 16.0-18.0%
OpenSBI 1.1 (generic fw_*.elf) 21.8-22.0%
Linux kernel 5.18 (vmlinux) 14.4-15.6%
Linux kernel 5.18 (vmlinux.o) 0.8-6.7% Not finally linked
glibc (libc.so.6) 13.6-16.8%

objdump -d (ELF-based archive)

Program Improvements
glibc (libc.a) 2.5-2.6%
newlib (libc.a) 2.3-4.9%

objdump -D (binary)

Program Improvements
Linux kernel 5.18 (vmlinux) 48.7-64.6%
Random files (/dev/urandom) 64.9-87.6%
1M (1048576) CSR instructions 97.3%

gdb: disas of near all code region

Program Improvements
Linux kernel 5.18 (vmlinux) with debug info 0.9%
Linux kernel 5.18 (vmlinux) without debug info 1.7%
OpenSBI 1.1 (generic fw_*.elf) 2.4-2.7%
1M (1048576) CSR instructions (ELF) 26.3%

Batch: objdump -d on Linux distribution

Serial Run: All /usr/bin Programs

System N Improvements
Ubuntu 22.04 LTS (image for HiFive Unmatched) 563 14.0%
Debian unstable (as of 2022-07-20) 269 14.3%

Parallel Run: Top 100 in Size (including data-only ELFs)

System N Improvements
Ubuntu 22.04 LTS (image for HiFive Unmatched) 100 7.4%
Debian unstable (as of 2022-07-20) 100 2.8%

Parallel Run: All (including data-only ELFs)

System N Improvements
Ubuntu 22.04 LTS (image for HiFive Unmatched) 7666 7.1%
Debian unstable (as of 2022-07-20) 946 2.2%

Batch: objdump -D (as binary) on Linux distribution

Serial Run: All /usr/bin Programs

System N Improvements
Ubuntu 22.04 LTS (image for HiFive Unmatched) 563 31.3%
Debian unstable (as of 2022-07-20) 269 25.0%

Parallel Run: Top 100 in Size (including data-only ELFs)

System N Improvements
Ubuntu 22.04 LTS (image for HiFive Unmatched) 100 50.4%
Debian unstable (as of 2022-07-20) 100 32.4%

Parallel Run: All (including data-only ELFs)

System N Improvements
Ubuntu 22.04 LTS (image for HiFive Unmatched) 7666 38.3%
Debian unstable (as of 2022-07-20) 946 27.8%
Clone this wiki locally