-
Notifications
You must be signed in to change notification settings - Fork 1
riscv_dis_opts_minor_1
- Status: WITHDRAWN
Combined with another fixes and superseded byriscv-dis-opts-batch-1
- Branch:
riscv-dis-opts-minor-1
- Tracking PR: #20 (view Pull Request and Diff)
Aggregate performance benchmark should be available here.
- Disassembler: Use faster hash table
- (You are here) Disassembler: Minor optimizations (batch 1)
- Disassembler: Cache instruction class support
This patchset intends to organize the core disassembler for minor optimizations (contained in this patchset) and improves code clarity for further optimizations.
There are some big performance improvements on certain circumstances but they are rarely significant in the real world benchmark.
It will be a prerequisite of upcoming core disassembler changes:
- RV32E disassembler support
- Further optimization (another 5-7% performance boost)
-
Upcoming RFC PATCH v3:
arch
disassembler option - Make ELF
priv-spec
overridable
Current disassembler repeatedly checks current state per instruction:
- Whether
riscv_gpr_names
is initialized to a non-NULL
value - Whether the
Zfinx
extension is available - Whether the hash table is initialized
... but they are not frequently changed.
Calling riscv_subset_supports
repeatedly harms the performance in a measurable way (3-13% in total) and this patchset reduces this Zfinx
query call from per-instruction to per-{arch,option} change.
Note that only a small part of per-instruction calls to riscv_subset_supports
are taken care of and the effect of this change is small. My next optimization patchset deals with this issue.
riscv_gpr_names
is initialized to a non-NULL
value when
- The first disassembler option is given, or
- Not initialized with disassembler options (in the first
print_insn_riscv
function call).
We can safely initialize the default disassembler options prior to the print_insn_riscv
function call and this per-instruction checking of riscv_gpr_names
can be safely removed.
Whether the opcode hash table (both optimized one and old one) is initialized is checked per-instruction but can be replaced, too. For now, first time initialization is sufficient.
Instead, this patchset adds two new functions:
-
init_dis_state_for_arch
(called when the architecture is changed) -
init_dis_state_for_arch_and_options
(called when either the architecture or an option is changed)
We can group common state initialization together with those.
objdump
reuses the disassembler returned by disassembler
function.
That's good and benefits well from my optimizations.
However, by default, GDB (default_print_insn
in gdb/arch-utils.c
) assumes that disassembler
function logic is simple and calls that function for every instruction to be disassembled. This is clearly a waste of time because it probes BFD (ELF information) and re-initializes riscv_rps_dis
for every instruction.
To deal with this, we cache default architecture string and minimize calls to the riscv_parse_subset
function (don't call it when arch is unchanged).
Long disas
command on GDB benefits from this:
# Example: disassemble most of the code in the Linux kernel
file /path/to/vmlinux
disas 0xffffffff80002000,0xffffffff80658aa4
On benchmark using GDB batch file like an example above, I measured big performance improvements (27-83% on various big RISC-V programs). Unfortunately, on interactive usecases of GDB, this improvement is rarely observable since we don't usually disassemble such a big chunk at once and the current disassembler is not very slow.
This doesn't actually improve the performance but...
- Register name initialization (involving
Zfinx
) is now clearer and - We can prepare future implementation of the
RV32E
disassembler.
Current disassembler intends to initialize the CSR hash table when CSR name parsing is required at the first time. This is managed by a function-scope static variable init_csr
but... there's a problem.
It's never set to true
.
Because of this bug, current disassembler actually initializes the CSR hash table every time when CSR name parsing is required.
After I fixed the bug, I got about 70% performance improvements when thousands of only CSR instructions are disassembled (CSR instructions are rare in general so real world performance improvement is not that high ― minor but measurable, though).
When I fixed the bug, I found another issue (covered by the bug above).
When default_priv_spec
is changed, CSR hash table must be reinitialized so that it would reflect specified privileged specification. On objdump
, there's no way to change it dynamically but GDB has one. Before I fix the bug above, CSR hash table is generated every time so specified privileged specification is reflected immediately. However, now CSR hash table has to be reinitialized manually.
So, I replaced init_csr
with a file scope static variable is_init_csr
and reset this when either the architecture or the privileged specification is changed.
They are intended to clarify the core disassembler.
-
PATCH 5/7
: Make boolean variable `bool' -
PATCH 6/7
: Make "private" symbols really private (making them static) -
PATCH 7/7
: Add some comments
On disassembling linked RISC-V ELF programs using objdump
, performance improvements achieved by this patchset is usually about 4-5%. Not bad for simple changes.
This is relative to the previous optimization.
Program | Improvements | Notes |
---|---|---|
Busybox 1.35.1 (RV64GC ) |
4.3-4.3% | |
OpenSBI 1.1 (generic fw_*.elf ) |
8.1-8.4% | |
Linux kernel 5.18 (vmlinux ) |
4.5-4.7% | |
Linux kernel 5.18 (vmlinux.o ) |
1.7-4.3% | Not finally linked |
glibc (libc.so.6 ) |
5.0-5.3% |
Program | Improvements |
---|---|
glibc (libc.a ) |
2.8-3.4% |
newlib (libc.a ) |
2.9-3.9% |
Program | Improvements |
---|---|
Linux kernel 5.18 (vmlinux ) |
4.6-5.7% |
Random files (/dev/urandom ) |
4.4-4.8% |
1M (1048576) CSR instructions | 82.1% |
Program | Improvements |
---|---|
Linux kernel 5.18 (vmlinux ) with debug info |
27.3% |
Linux kernel 5.18 (vmlinux ) without debug info |
83.7% |
OpenSBI 1.1 (generic fw_*.elf ) |
70.8-71.0% |
1M (1048576) CSR instructions (ELF) | 132.0% |
System | N | Improvements |
---|---|---|
Ubuntu 22.04 LTS (image for HiFive Unmatched) | 563 | 3.7% |
Debian unstable (as of 2022-07-20) | 269 | 3.7% |
System | N | Improvements |
---|---|---|
Ubuntu 22.04 LTS (image for HiFive Unmatched) | 100 | 3.0% |
Debian unstable (as of 2022-07-20) | 100 | 0.4% |
System | N | Improvements |
---|---|---|
Ubuntu 22.04 LTS (image for HiFive Unmatched) | 7666 | 2.8% |
Debian unstable (as of 2022-07-20) | 946 | 1.9% |
System | N | Improvements |
---|---|---|
Ubuntu 22.04 LTS (image for HiFive Unmatched) | 563 | 4.9% |
Debian unstable (as of 2022-07-20) | 269 | 4.8% |
System | N | Improvements |
---|---|---|
Ubuntu 22.04 LTS (image for HiFive Unmatched) | 100 | 4.9% |
Debian unstable (as of 2022-07-20) | 100 | 4.9% |
System | N | Improvements |
---|---|---|
Ubuntu 22.04 LTS (image for HiFive Unmatched) | 7666 | 4.6% |
Debian unstable (as of 2022-07-20) | 946 | 4.5% |