-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loading perf.data is slow #225
Comments
Easy enough to reproduce! Even |
Yep, that seems to be it: The slowness pointed to by your profile of hotspot-perfparser arises from us having to symbolize addresses within rustc repeatedly, cf.:
I assume that we could improve the performance a lot by moving the symbol/inline-frame cache a level higher such that it's per DSO and keyed by DSO offset, not per PID and keyed by address. Nevertheless, I'm quite surprised to see that even the trivial |
OK, my hunch wasn't correct, but I found a way to improve the loading time by roughly 7x on my laptop. Please wait for the latest appimage to finish building and then try again: https://github.com/KDAB/hotspot/releases/tag/continuous |
Much better, thank you! |
Might have found another place which does not hit cache enough. This time running |
that one is for inlined frames, not sure we can cache much there... will have to investigate |
When I profile It's really odd that |
Looks like |
The symbol table isn't necessarily sorted, and thus repeated lookups in there can be expensive when a DSO has many entries in its symtab. For example, the librustc_driver from rustc 1.40.0 has about 202594 symbols. A single call to dwfl_module_addrinfo can take milliseconds on my laptop. Every time we get a sample at a so far unknown address, we have to find the corresponding symbol. So we called this function a lot, which can add up to a significant amount of time. Now, we cache the symbol name and its offset and size information in a sorted list and try to lookup the symbol there quickly. The impact of this patch on the overall time required to analyze a ~1GB perf.data file for a `cargo build` process (and it's child processes) is huge: before: ``` 447.681,66 msec task-clock:u # 0,989 CPUs utilized 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 45.214 page-faults:u # 0,101 K/sec 1.272.289.956.854 cycles:u # 2,842 GHz 3.497.255.264.964 instructions:u # 2,75 insn per cycle 863.671.557.196 branches:u # 1929,209 M/sec 2.666.320.642 branch-misses:u # 0,31% of all branches 452,806895428 seconds time elapsed 441,996666000 seconds user 2,557237000 seconds sys ``` after: ``` 63.770,08 msec task-clock:u # 0,995 CPUs utilized 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 35.102 page-faults:u # 0,550 K/sec 191.267.750.628 cycles:u # 2,999 GHz 501.316.536.714 instructions:u # 2,62 insn per cycle 122.234.405.333 branches:u # 1916,799 M/sec 443.671.470 branch-misses:u # 0,36% of all branches 64,063443896 seconds time elapsed 62,188041000 seconds user 1,136533000 seconds sys ``` That means we are now roughly 7x faster than before. Fixes: KDAB/hotspot#225 Change-Id: Ib7dbc800c9372044a847de68a8459dd7f7b0d3da Reviewed-by: Ulf Hermann <[email protected]>
Describe the bug
Loading in a 1.5G file captured over 40s takes tens of minutes to load.
To Reproduce
perf record --call-graph dwarf cargo build
Expected behavior
Hotspot to load the file much faster :)
Screenshots
I attached perf to hotspot as it was loading the file and took a screenshot of the flamegraph after capturing a few seconds.
Version Info (please complete the following information):
The text was updated successfully, but these errors were encountered: