Skip to content

riscv_dis_opts_minor_1

Tsukasa OI edited this page Jul 21, 2022 · 23 revisions

Disassembler: Minor optimizations (batch 1)

Requires

Aggregate performance benchmark should be available here.

  1. Disassembler: Use faster hash table
  2. (You are here) Disassembler: Minor optimizations (batch 1)
  3. Disassembler: Cache instruction class support

Feature Description

This patchset intends to organize the core disassembler for minor optimizations (contained in this patchset), future optimizations with bigger performance improvements and clarity.

There are some big performance improvements on certain circumstances but they are rarely significant in the real world benchmark.

It will be a prerequisite of upcoming core disassembler changes:

1a. New state initialization functions with minimized calls (PATCH 1/7)

Current disassembler repeatedly checks current state per instruction:

  • Whether riscv_gpr_names is initialized to a non-NULL value
  • Whether the Zfinx extension is available
  • Whether the hash table is initialized

... but they are not frequently changed.

Calling riscv_subset_supports repeatedly harms the performance in a measurable way (3-13% in total) and this patchset reduces this Zfinx query call from per-instruction to per-{arch,option} change.

Note that only a small part of per-instruction calls to riscv_subset_supports are taken care of and this change alone wouldn't measurably improve the performance (I will submit another batch of real optimizations later).

riscv_gpr_names is initialized to a non-NULL value when

  • The first disassembler option is given, or
  • Not initialized with disassembler options (in the first print_insn_riscv function call).

We can safely initialize default disassembler options prior to the print_insn_riscv function call and this per-instruction checking of riscv_gpr_names can be safely removed.

Whether the opcode hash table (both optimized one and old one) is initialized is checked per-instruction but can be replaced, too. For now, first time initialization is sufficient.

Instead, this patchset adds two new functions:

  • init_dis_state_for_arch
    (called when the architecture is changed)
  • init_dis_state_for_arch_and_options
    (called when either the architecture or an option is changed)

We can group common state initialization together with those.

1b. Minimize default architecture initialization (PATCH 2/7)

objdump reuses the disassembler returned by disassembler function.

That's good and benefits well from my optimizations.

However, by default, GDB (default_print_insn in gdb/arch-utils.c) assumes that disassembler function logic is simple and calls that function for every instruction to be disassembled. This is clearly a waste of time because it probes BFD (ELF information) and re-initializes riscv_rps_dis for every instruction.

To deal with this, we cache default architecture string and minimize calls to the riscv_parse_subset function (don't call it when arch is unchanged).

Long disas command on GDB benefits from this:

# Example: disassemble most of the code in the Linux kernel
file  /path/to/vmlinux
disas 0xffffffff80002000,0xffffffff80658aa4

On benchmark using GDB batch file like an example above, I measured big performance improvements (27-83% on various big RISC-V programs). Unfortunately, on interactive usecases of GDB, this improvement is rarely observable since we don't usually disassemble such a big chunk at once and the current disassembler is not very slow.

2. Move register name array initialization (PATCH 3/7)

This doesn't actually improve the performance but...

  • Register name initialization (involving Zfinx) is now clearer and
  • We can prepare future implementation of the RV32E disassembler.

3. Fix CSR hashtable initialization (PATCH 4/7)

Current disassembler intends to initialize the CSR hashtable when CSR name parsing is required at the first time. This is managed by a function-scope static variable init_csr but... there's a problem.

It's never set to true.

Because of this bug, current disassembler actually initializes the CSR hashtable every time when CSR name parsing is required.

After I fixed the bug, I got about 70% performance improvements when thousands of only CSR instructions are disassembled (CSR instructions are rare in general so real world performance improvement is not that high ― even hardly measurable).

When I fixed the bug, I found another issue (covered by the bug above).

When default_priv_spec is changed, CSR hashtable must be reinitialized so that it would reflect specified privileged specification. On objdump, there's no way to change it dynamically but GDB has one. Before I fix the bug above, CSR hashtable is generated every time so specified privileged specification is reflected immediately. However, now CSR hashtable has to be reinitialized manually.

So, I replaced init_csr with a file scope static variable is_init_csr and reset this when init_dis_state_for_arch_and_options is called.

Performance Improvements

On disassembling linked RISC-V ELF programs using objdump, performance improvements achieved by this patchset is usually about 5-6%. Not bad for simple changes.

This is relative to the previous optimization.

objdump -d (ELF)

Program Improvements Notes
Busybox 1.35.1 (RV64GC) 6.2-6.3%
OpenSBI 1.1 (generic fw_*.elf) 9.4-9.4%
Linux kernel 5.18 (vmlinux) 4.7-4.8%
Linux kernel 5.18 (vmlinux.o) (-1.4)-1.0% Not finally linked
glibc (libc.so.6) 5.0-5.3%

objdump -d (ELF-based archive)

Program Improvements
glibc (libc.a) (-2.0)-(-1.7)%
newlib (libc.a) (-2.0)-(-2.0)%

objdump -D (binary)

Program Improvements
Linux kernel 5.18 (vmlinux) 6.8-6.9%
Random files (/dev/urandom) 6.0-6.6%
1M (1048576) CSR instructions 73.7%

gdb: disas of near all code region

Program Improvements
Linux kernel 5.18 (vmlinux) with debug info 27.3%
Linux kernel 5.18 (vmlinux) without debug info 83.0%
OpenSBI 1.1 (generic fw_*.elf) 68.8-69.3%
1M (1048576) CSR instructions (ELF) 131.2%

Batch: objdump -d on Linux distribution

Serial Run: All /usr/bin Programs

System N Improvements
Ubuntu 22.04 LTS (image for HiFive Unmatched) 563 5.0%
Debian unstable (as of 2022-07-20) 269 5.0%

Parallel Run: Top 100 in Size (including data-only ELFs)

System N Improvements
Ubuntu 22.04 LTS (image for HiFive Unmatched) 100 1.5%
Debian unstable (as of 2022-07-20) 100 0.6%

Parallel Run: All (including data-only ELFs)

System N Improvements
Ubuntu 22.04 LTS (image for HiFive Unmatched) 7666 1.4%
Debian unstable (as of 2022-07-20) 946 0.4%
Clone this wiki locally