Loading perf.data is slow #225

Marwes · 2020-01-04T00:27:01Z

Describe the bug

Loading in a 1.5G file captured over 40s takes tens of minutes to load.

To Reproduce

Download the rust compiler https://www.rust-lang.org/learn/get-started
Checkout https://github.com/Marwes/combine
Run perf record --call-graph dwarf cargo build
Open the file with hotspot

Expected behavior
Hotspot to load the file much faster :)

Screenshots
I attached perf to hotspot as it was loading the file and took a screenshot of the flamegraph after capturing a few seconds.

Version Info (please complete the following information):

Linux Kernel version: 4.15.0-72-generic
perf version: 4.15.18
hotspot version (appimage? selfcompiled?): 1.2 appimage
if self-compiled hotspot, what version of elfutils:

The text was updated successfully, but these errors were encountered:

milianw · 2020-01-05T20:34:18Z

Easy enough to reproduce! Even perf report takes ages, thanks for the report! I wonder what's going on there... My hunch is lots of short lived processes and us positioning the cache at the wrong place (per pid instead of a global per-dso cache).

milianw · 2020-01-05T20:43:36Z

Yep, that seems to be it: The slowness pointed to by your profile of hotspot-perfparser arises from us having to symbolize addresses within rustc repeatedly, cf.:

$ perf report -s pid,comm -g none --stdio
...
    86.21%    86.21%    12393:rustc            rustc          
     3.86%     3.86%    12370:rustc            rustc          
     1.94%     1.94%    12380:rustc            rustc          
     1.49%     1.49%    12318:rustc            rustc          
     0.93%     0.93%    12319:rustc            rustc          
     0.28%     0.28%    12369:ld               ld             
     0.28%     0.28%    12395:rustc            rustc          
     0.26%     0.26%    12394:rustc            rustc          
     0.26%     0.26%    12347:ld               ld             
     0.24%     0.24%    12396:rustc            rustc          
     0.21%     0.21%    12356:rustc            rustc          
     0.16%     0.16%    12330:rustc            rustc          
     0.14%     0.14%    12307:cargo            cargo          
     0.12%     0.12%    12352:rustc            rustc          
     0.12%     0.12%    12354:rustc            rustc          
     0.12%     0.12%    12344:rustc            rustc          
     0.11%     0.11%    12342:rustc            rustc          
     0.11%     0.11%    12324:rustc            rustc          
     0.11%     0.11%    12343:rustc            rustc          
     0.11%     0.11%    12323:rustc            rustc          
     0.11%     0.11%    12353:rustc            rustc          
     0.10%     0.10%    12335:rustc            rustc          
     0.10%     0.10%    12350:rustc            rustc          
     0.10%     0.10%    12358:rustc            rustc          
     0.09%     0.09%    12378:rustc            rustc          
     0.08%     0.08%    12325:rustc            rustc          
     0.08%     0.08%    12348:rustc            rustc          
     0.08%     0.08%    12349:rustc            rustc          
     0.07%     0.07%    12326:rustc            rustc          
     0.07%     0.07%    12341:rustc            rustc          
     0.07%     0.07%    12351:rustc            rustc          
     0.06%     0.06%    12357:rustc            rustc          
     0.06%     0.06%    12338:rustc            rustc          
     0.06%     0.06%    12331:rustc            rustc          
     0.06%     0.06%    12332:rustc            rustc          
     0.06%     0.06%    12361:rustc            rustc          
     0.06%     0.06%    12379:rustc            rustc          
     0.05%     0.05%    12329:rustc            rustc          
     0.05%     0.05%    12333:rustc            rustc          
     0.05%     0.05%    12402:rustc            rustc          
     0.05%     0.05%    12308:rustc            rustc          
     0.05%     0.05%    12400:rustc            rustc          
     0.05%     0.05%    12334:rustc            rustc          
     0.04%     0.04%    12407:rustc            rustc          
     0.04%     0.04%    12337:rustc            rustc          
     0.04%     0.04%    12429:rustc            rustc          
     0.04%     0.04%    12382:rustc            rustc          
     0.04%     0.04%    12397:rustc            rustc          
     0.03%     0.03%    12408:rustc            rustc          
     0.03%     0.03%    12383:rustc            rustc          
     0.03%     0.03%    12362:rustc            rustc          
     0.03%     0.03%    12381:rustc            rustc          
     0.03%     0.03%    12431:rustc            rustc          
     0.03%     0.03%    12366:rustc            rustc          
     0.03%     0.03%    12336:rustc            rustc          
     0.03%     0.03%    12392:rustc            rustc          
     0.03%     0.03%    12399:rustc            rustc          
     0.03%     0.03%    12430:rustc            rustc          
     0.03%     0.03%    12364:rustc            rustc          
     0.03%     0.03%    12317:rustc            rustc          
     0.03%     0.03%    12316:rustc            rustc          
     0.03%     0.03%    12401:rustc            rustc          
     0.03%     0.03%    12340:rustc            rustc          
     0.03%     0.03%    12310:rustc            rustc          
     0.03%     0.03%    12405:rustc            rustc          
     0.02%     0.02%    12432:rustc            rustc          
     0.02%     0.02%    12410:rustc            rustc          
     0.02%     0.02%    12377:rustc            rustc          
     0.02%     0.02%    12373:rustc            rustc          
     0.02%     0.02%    12403:rustc            rustc          
     0.02%     0.02%    12311:rustdoc          rustdoc        
     0.02%     0.02%    12309:rustc            rustc          
     0.02%     0.02%    12436:rustc            rustc          
     0.02%     0.02%    12409:rustc            rustc          
     0.02%     0.02%    12404:rustc            rustc          
     0.02%     0.02%    12433:rustc            rustc          
     0.02%     0.02%    12406:rustc            rustc          
     0.02%     0.02%    12437:rustc            rustc          
     0.02%     0.02%    12438:rustc            rustc          
     0.02%     0.02%    12339:rustc            rustc          
     0.01%     0.01%    12384:rustc            rustc          
     0.01%     0.01%    12442:rustc            rustc          
     0.01%     0.01%    12434:rustc            rustc          
     0.01%     0.01%    12435:rustc            rustc          
     0.01%     0.01%    12443:rustc            rustc          
     0.01%     0.01%    12448:rustc            rustc          
     0.01%     0.01%    12441:rustc            rustc          
     0.01%     0.01%    12439:rustc            rustc          
     0.01%     0.01%    12440:rustc            rustc          
     0.01%     0.01%    12447:rustc            rustc          
     0.01%     0.01%    12450:rustc            rustc          
     0.01%     0.01%    12446:rustc            rustc          
     0.01%     0.01%    12449:rustc            rustc          
     0.01%     0.01%    12444:rustc            rustc          
     0.01%     0.01%    12322:rustc            rustc          
     0.01%     0.01%    12460:rustc            rustc          
     0.01%     0.01%    12452:rustc            rustc          
     0.01%     0.01%    12385:rustc            rustc          
     0.01%     0.01%    12387:rustc            rustc          
     0.01%     0.01%    12453:rustc            rustc          
     0.01%     0.01%    12388:rustc            rustc          
     0.01%     0.01%    12445:rustc            rustc          
     0.01%     0.01%    12451:rustc            rustc          
     0.01%     0.01%    12462:rustc            rustc          
     0.01%     0.01%    12386:rustc            rustc          
     0.01%     0.01%    12459:rustc            rustc          
     0.01%     0.01%    12455:rustc            rustc          
     0.01%     0.01%    12461:rustc            rustc          
     0.01%     0.01%    12457:rustc            rustc          
     0.01%     0.01%    12454:rustc            rustc          
     0.01%     0.01%    12355:rustc            rustc          
     0.01%     0.01%    12456:rustc            rustc          
     0.01%     0.01%    12458:rustc            rustc          
     0.01%     0.01%    12463:rustc            rustc          
     0.00%     0.00%    12428:rustc            rustc          
     0.00%     0.00%    12345:cc               cc             
     0.00%     0.00%    12321:rustc            rustc          
     0.00%     0.00%    12367:cc               cc             
     0.00%     0.00%    12391:rustc            rustc          
     0.00%     0.00%    12376:rustc            rustc          
     0.00%     0.00%    12314:cargo            cargo          
     0.00%     0.00%    12374:cargo            cargo          
     0.00%     0.00%    12365:cargo            cargo          
     0.00%     0.00%    12372:build-script-bu  build-script-bu
     0.00%     0.00%    12360:build-script-bu  build-script-bu
     0.00%     0.00%    12328:rustc            rustc          
     0.00%     0.00%    12327:rustc            rustc          
     0.00%     0.00%    12390:cargo            cargo          
     0.00%     0.00%    12315:cargo            cargo          
     0.00%     0.00%    12368:collect2         collect2       
     0.00%     0.00%    12371:cargo            cargo          
     0.00%     0.00%    12346:collect2         collect2       
     0.00%     0.00%    12312:rustdoc          rustdoc        
     0.00%     0.00%    12359:cargo            cargo          
     0.00%     0.00%    12345:cc               rustc          
     0.00%     0.00%    12427:rustc            rustc          
     0.00%     0.00%    12372:build-script-bu  cargo          
     0.00%     0.00%    12392:rustc            cargo          
     0.00%     0.00%    12311:rustdoc          cargo          
     0.00%     0.00%    12313:cargo            cargo

I assume that we could improve the performance a lot by moving the symbol/inline-frame cache a level higher such that it's per DSO and keyed by DSO offset, not per PID and keyed by address.

Nevertheless, I'm quite surprised to see that even the trivial perf report command above is taking ~2min on my laptop - it should basically just read through the mmapped file and not do anything fancy, which should be much faster than that. Potentially perf report also could be optimized for this scenario ;-)

milianw · 2020-01-06T11:56:18Z

OK, my hunch wasn't correct, but I found a way to improve the loading time by roughly 7x on my laptop. Please wait for the latest appimage to finish building and then try again:

https://github.com/KDAB/hotspot/releases/tag/continuous

Marwes · 2020-01-06T18:49:03Z

Much better, thank you!

Marwes · 2020-01-08T18:54:19Z

Might have found another place which does not hit cache enough. This time running cargo check on https://github.com/clap-rs/clap .

milianw · 2020-01-08T21:31:31Z

that one is for inlined frames, not sure we can cache much there... will have to investigate

milianw · 2020-01-09T19:02:37Z

When I profile cargo check, the report still points me at dwfl_module_addrinfo, and not at dwarf_getscopes like in your profile. The problem for me is that we touch ~9000 different symbols and each time it takes roughly 5-10ms to call dwfl_module_addrinfo which accumulates to 45s - 1:30min overall...

It's really odd that dwfl_module_addrinfo is so slow. Maybe that's b/c it's all compiled with clang and .dwarf_aranges is missing? I'll have to think about alternative ways to speed this up.

Marwes · 2020-01-09T20:02:53Z

Looks like .dwarf_aranges was just recently added, though I am pretty sure I ran it with a recent enough rust version for that to get emitted rust-lang/rust#66532 🤔

The symbol table isn't necessarily sorted, and thus repeated lookups in there can be expensive when a DSO has many entries in its symtab. For example, the librustc_driver from rustc 1.40.0 has about 202594 symbols. A single call to dwfl_module_addrinfo can take milliseconds on my laptop. Every time we get a sample at a so far unknown address, we have to find the corresponding symbol. So we called this function a lot, which can add up to a significant amount of time. Now, we cache the symbol name and its offset and size information in a sorted list and try to lookup the symbol there quickly. The impact of this patch on the overall time required to analyze a ~1GB perf.data file for a `cargo build` process (and it's child processes) is huge: before: ``` 447.681,66 msec task-clock:u # 0,989 CPUs utilized 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 45.214 page-faults:u # 0,101 K/sec 1.272.289.956.854 cycles:u # 2,842 GHz 3.497.255.264.964 instructions:u # 2,75 insn per cycle 863.671.557.196 branches:u # 1929,209 M/sec 2.666.320.642 branch-misses:u # 0,31% of all branches 452,806895428 seconds time elapsed 441,996666000 seconds user 2,557237000 seconds sys ``` after: ``` 63.770,08 msec task-clock:u # 0,995 CPUs utilized 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 35.102 page-faults:u # 0,550 K/sec 191.267.750.628 cycles:u # 2,999 GHz 501.316.536.714 instructions:u # 2,62 insn per cycle 122.234.405.333 branches:u # 1916,799 M/sec 443.671.470 branch-misses:u # 0,36% of all branches 64,063443896 seconds time elapsed 62,188041000 seconds user 1,136533000 seconds sys ``` That means we are now roughly 7x faster than before. Fixes: KDAB/hotspot#225 Change-Id: Ib7dbc800c9372044a847de68a8459dd7f7b0d3da Reviewed-by: Ulf Hermann <[email protected]>

Marwes added the bug label Jan 4, 2020

milianw closed this as completed in KDAB/perfparser@1a2689c Jan 6, 2020

milianw reopened this Jan 8, 2020

milianw closed this as completed in KDAB/perfparser@3bc88fd Jan 10, 2020

milianw mentioned this issue May 15, 2020

proper rust symbol demangling #237

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading perf.data is slow #225

Loading perf.data is slow #225

Marwes commented Jan 4, 2020 •

edited

Loading

milianw commented Jan 5, 2020

milianw commented Jan 5, 2020

milianw commented Jan 6, 2020

Marwes commented Jan 6, 2020

Marwes commented Jan 8, 2020

milianw commented Jan 8, 2020

milianw commented Jan 9, 2020

Marwes commented Jan 9, 2020

Loading perf.data is slow #225

Loading perf.data is slow #225

Comments

Marwes commented Jan 4, 2020 • edited Loading

milianw commented Jan 5, 2020

milianw commented Jan 5, 2020

milianw commented Jan 6, 2020

Marwes commented Jan 6, 2020

Marwes commented Jan 8, 2020

milianw commented Jan 8, 2020

milianw commented Jan 9, 2020

Marwes commented Jan 9, 2020

Marwes commented Jan 4, 2020 •

edited

Loading