Skip to content

riscv_dis_opts_minor_1

Tsukasa OI edited this page Aug 31, 2022 · 23 revisions

Disassembler: Minor optimizations (batch 1)

Requires

Aggregate performance benchmark should be available here.

  1. Disassembler: Use faster hash table
  2. (You are here) Disassembler: Minor optimizations (batch 1)
  3. Disassembler: Cache instruction class support

Feature Description

This patchset intends to organize the core disassembler for minor optimizations (contained in this patchset) and improves code clarity for further optimizations.

There are some big performance improvements on certain circumstances but they are rarely significant in the real world benchmark.

It will be a prerequisite of upcoming core disassembler changes:

1a. New state initialization functions with minimized calls (PATCH 1/7)

Current disassembler repeatedly checks current state per instruction:

  • Whether riscv_gpr_names is initialized to a non-NULL value
  • Whether the Zfinx extension is available
  • Whether the hash table is initialized

... but they are not frequently changed.

Calling riscv_subset_supports repeatedly harms the performance in a measurable way (3-13% in total) and this patchset reduces this Zfinx query call from per-instruction to per-{arch,option} change.

Note that only a small part of per-instruction calls to riscv_subset_supports are taken care of and the effect of this change is small. My next optimization patchset deals with this issue.

riscv_gpr_names is initialized to a non-NULL value when

  • The first disassembler option is given, or
  • Not initialized with disassembler options (in the first print_insn_riscv function call).

We can safely initialize the default disassembler options prior to the print_insn_riscv function call and this per-instruction checking of riscv_gpr_names can be safely removed.

Whether the opcode hash table (both optimized one and old one) is initialized is checked per-instruction but can be replaced, too. For now, first time initialization is sufficient.

Instead, this patchset adds two new functions:

  • init_dis_state_for_arch
    (called when the architecture is changed)
  • init_dis_state_for_arch_and_options
    (called when either the architecture or an option is changed)

We can group common state initialization together with those.

1b. Minimize default architecture initialization (PATCH 2/7)

objdump reuses the disassembler returned by disassembler function.

That's good and benefits well from my optimizations.

However, by default, GDB (default_print_insn in gdb/arch-utils.c) assumes that disassembler function logic is simple and calls that function for every instruction to be disassembled. This is clearly a waste of time because it probes BFD (ELF information) and re-initializes riscv_rps_dis for every instruction.

To deal with this, we cache default architecture string and minimize calls to the riscv_parse_subset function (don't call it when arch is unchanged).

Long disas command on GDB benefits from this:

# Example: disassemble most of the code in the Linux kernel
file  /path/to/vmlinux
disas 0xffffffff80002000,0xffffffff80658aa4

On benchmark using GDB batch file like an example above, I measured big performance improvements (27-83% on various big RISC-V programs). Unfortunately, on interactive usecases of GDB, this improvement is rarely observable since we don't usually disassemble such a big chunk at once and the current disassembler is not very slow.

2. Move register name array initialization (PATCH 3/7)

This doesn't actually improve the performance but...

  • Register name initialization (involving Zfinx) is now clearer and
  • We can prepare future implementation of the RV32E disassembler.

3. Fix CSR hash table initialization (PATCH 4/7)

Current disassembler intends to initialize the CSR hash table when CSR name parsing is required at the first time. This is managed by a function-scope static variable init_csr but... there's a problem.

It's never set to true.

Because of this bug, current disassembler actually initializes the CSR hash table every time when CSR name parsing is required.

After I fixed the bug, I got about 70% performance improvements when thousands of only CSR instructions are disassembled (CSR instructions are rare in general so real world performance improvement is not that high ― minor but measurable, though).

When I fixed the bug, I found another issue (covered by the bug above).

When default_priv_spec is changed, CSR hash table must be reinitialized so that it would reflect specified privileged specification. On objdump, there's no way to change it dynamically but GDB has one. Before I fix the bug above, CSR hash table is generated every time so specified privileged specification is reflected immediately. However, now CSR hash table has to be reinitialized manually.

So, I replaced init_csr with a file scope static variable is_init_csr and reset this when either the architecture or the privileged specification is changed.

4. Various Tidying (PATCH 5-7/7)

They are intended to clarify the core disassembler.

  1. PATCH 5/7: Make boolean variable `bool'
  2. PATCH 6/7: Make "private" symbols really private (making them static)
  3. PATCH 7/7: Add some comments

Performance Improvements

On disassembling linked RISC-V ELF programs using objdump, performance improvements achieved by this patchset is usually about 4-5%. Not bad for simple changes.

This is relative to the previous optimization.

objdump -d (ELF)

Program Improvements Notes
Busybox 1.35.1 (RV64GC) 4.3-4.3%
OpenSBI 1.1 (generic fw_*.elf) 8.1-8.4%
Linux kernel 5.18 (vmlinux) 4.5-4.7%
Linux kernel 5.18 (vmlinux.o) 1.7-4.3% Not finally linked
glibc (libc.so.6) 5.0-5.3%

objdump -d (ELF-based archive)

Program Improvements
glibc (libc.a) 2.8-3.4%
newlib (libc.a) 2.9-3.9%

objdump -D (binary)

Program Improvements
Linux kernel 5.18 (vmlinux) 4.6-5.7%
Random files (/dev/urandom) 4.4-4.8%
1M (1048576) CSR instructions 82.1%

gdb: disas of near all code region

Program Improvements
Linux kernel 5.18 (vmlinux) with debug info 27.3%
Linux kernel 5.18 (vmlinux) without debug info 83.7%
OpenSBI 1.1 (generic fw_*.elf) 70.8-71.0%
1M (1048576) CSR instructions (ELF) 132.0%

Batch: objdump -d on Linux distribution

Serial Run: All /usr/bin Programs

System N Improvements
Ubuntu 22.04 LTS (image for HiFive Unmatched) 563 3.7%
Debian unstable (as of 2022-07-20) 269 3.7%

Parallel Run: Top 100 in Size (including data-only ELFs)

System N Improvements
Ubuntu 22.04 LTS (image for HiFive Unmatched) 100 3.0%
Debian unstable (as of 2022-07-20) 100 0.4%

Parallel Run: All (including data-only ELFs)

System N Improvements
Ubuntu 22.04 LTS (image for HiFive Unmatched) 7666 2.8%
Debian unstable (as of 2022-07-20) 946 1.9%

Batch: objdump -D (as binary) on Linux distribution

Serial Run: All /usr/bin Programs

System N Improvements
Ubuntu 22.04 LTS (image for HiFive Unmatched) 563 4.9%
Debian unstable (as of 2022-07-20) 269 4.8%

Parallel Run: Top 100 in Size (including data-only ELFs)

System N Improvements
Ubuntu 22.04 LTS (image for HiFive Unmatched) 100 4.9%
Debian unstable (as of 2022-07-20) 100 4.9%

Parallel Run: All (including data-only ELFs)

System N Improvements
Ubuntu 22.04 LTS (image for HiFive Unmatched) 7666 4.6%
Debian unstable (as of 2022-07-20) 946 4.5%
Clone this wiki locally