-
Notifications
You must be signed in to change notification settings - Fork 1
riscv_dis_opts_minor_1
- Status: Waiting for a prerequisite to be Merged
- Branch:
riscv-dis-opts-minor-1
- Tracking PR: #20 (view Pull Request and Diff)
- Mailing List:
- Not yet
- Disassembler: Use faster hash table
- (You are here) Disassembler: Minor optimizations (batch 1)
- Disassembler: Cache instruction class support
This patchset intends to organize the core disassembler for minor optimizations (contained in this patchset), future optimizations with bigger performance improvements and clarity.
There are some big performance improvements on certain circumstances but they are rarely significant in the real world benchmark.
It will be a prerequisite of upcoming core disassembler changes:
- RV32E disassembler support
- Further optimization (another 5-8% performance boost)
-
Upcoming RFC PATCH v3:
arch
disassembler option
Current disassembler repeatedly checks current state per instruction:
- Whether
riscv_gpr_names
is initialized to a non-NULL
value - Whether the
Zfinx
extension is available - Whether the hash table is initialized
... but they are not frequently changed.
Calling riscv_subset_supports
repeatedly harms the performance in a measurable way (3-13% in total) and this patchset reduces this Zfinx
query call from per-instruction to per-{arch,option} change.
Note that only a small part of per-instruction calls to riscv_subset_supports
are taken care of and this change alone wouldn't measurably improve the performance (I will submit another batch of real optimizations later).
riscv_gpr_names
is initialized to a non-NULL
value when
- The first disassembler option is given, or
- Not initialized with disassembler options (in the first
print_insn_riscv
function call).
We can safely initialize default disassembler options prior to the print_insn_riscv
function call and this per-instruction checking of riscv_gpr_names
can be safely removed.
Whether the opcode hash table (both optimized one and old one) is initialized is checked per-instruction but can be replaced, too. For now, first time initialization is sufficient.
Instead, this patchset adds two new functions:
-
init_dis_state_for_arch
(called when the architecture is changed) -
init_dis_state_for_arch_and_options
(called when either the architecture or an option is changed)
We can group common state initialization together with those.
objdump
reuses the disassembler returned by disassembler
function.
That's good and benefits well from my optimizations.
However, by default, GDB (default_print_insn
in gdb/arch-utils.c
) assumes that disassembler
function logic is simple and calls that function for every instruction to be disassembled. This is clearly a waste of time because it probes BFD (ELF information) and re-initializes riscv_rps_dis
for every instruction.
To deal with this, we cache default architecture string and minimize calls to the riscv_parse_subset
function (don't call it when arch is unchanged).
Long disas
command on GDB benefits from this:
# Example: disassemble most of the code in the Linux kernel
file /path/to/vmlinux
disas 0xffffffff80002000,0xffffffff80658aa4
On benchmark using GDB batch file like an example above, I measured big performance improvements (27-83% on various big RISC-V programs). Unfortunately, on interactive usecases of GDB, this improvement is rarely observable since we don't usually disassemble such a big chunk at once and the current disassembler is not very slow.
This doesn't actually improve the performance but...
- Register name initialization (involving
Zfinx
) is now clearer and - We can prepare future implementation of the
RV32E
disassembler.
Current disassembler intends to initialize the CSR hashtable when CSR name parsing is required at the first time. This is managed by a function-scope static variable init_csr
but... there's a problem.
It's never set to true
.
Because of this bug, current disassembler actually initializes the CSR hashtable every time when CSR name parsing is required.
After I fixed the bug, I got about 70% performance improvements when thousands of only CSR instructions are disassembled (CSR instructions are rare in general so real world performance improvement is not that high ― even hardly measurable).
When I fixed the bug, I found another issue (covered by the bug above).
When default_priv_spec
is changed, CSR hashtable must be reinitialized so that it would reflect specified privileged specification. On objdump
, there's no way to change it dynamically but GDB has one. Before I fix the bug above, CSR hashtable is generated every time so specified privileged specification is reflected immediately. However, now CSR hashtable has to be reinitialized manually.
So, I replaced init_csr
with a file scope static variable is_init_csr
and reset this when init_dis_state_for_arch_and_options
is called.
On disassembling linked RISC-V ELF files using objdump
, performance improvements achieved by this patchset is about 4-8%. Not bad for simple changes.
This is relative to the previous optimization.
Program | Improvements | Notes |
---|---|---|
Busybox 1.35.1 (RV64GC ) |
4.6-5.5% | |
OpenSBI 1.1 (generic fw_*.elf ) |
6.9-7.7% | |
Linux kernel 5.18 (vmlinux ) |
4.5-4.9% | |
Linux kernel 5.18 (vmlinux.o ) |
(-0.3)-1.8% | Not finally linked |
glibc (libc.so.6 ) |
4.2-5.1% |
Program | Improvements |
---|---|
glibc (libc.a ) |
(-0.1)-0.2% |
newlib (libc.a ) |
0.9-2.7% |
Program | Improvements |
---|---|
Linux kernel 5.18 (vmlinux ) |
5.3-6.3% |
Random files (/dev/urandom ) |
5.1-6.1% |
1M (1048576) CSR instructions | 73.1% |
Program | Improvements |
---|---|
Linux kernel 5.18 (vmlinux ) with debug info |
27.3% |
Linux kernel 5.18 (vmlinux ) without debug info |
83.0% |
OpenSBI 1.1 (generic fw_*.elf ) |
65.6-67.4% |
1M (1048576) CSR instructions (ELF) | 131.8% |