Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading perf.data is slow #225

Closed
Marwes opened this issue Jan 4, 2020 · 8 comments
Closed

Loading perf.data is slow #225

Marwes opened this issue Jan 4, 2020 · 8 comments
Labels

Comments

@Marwes
Copy link

Marwes commented Jan 4, 2020

Describe the bug

Loading in a 1.5G file captured over 40s takes tens of minutes to load.

To Reproduce

  1. Download the rust compiler https://www.rust-lang.org/learn/get-started
  2. Checkout https://github.com/Marwes/combine
  3. Run perf record --call-graph dwarf cargo build
  4. Open the file with hotspot

Expected behavior
Hotspot to load the file much faster :)

Screenshots
I attached perf to hotspot as it was loading the file and took a screenshot of the flamegraph after capturing a few seconds.

Screenshot from 2020-01-04 01-18-04

Version Info (please complete the following information):

  • Linux Kernel version: 4.15.0-72-generic
  • perf version: 4.15.18
  • hotspot version (appimage? selfcompiled?): 1.2 appimage
  • if self-compiled hotspot, what version of elfutils:
@Marwes Marwes added the bug label Jan 4, 2020
@milianw
Copy link
Member

milianw commented Jan 5, 2020

Easy enough to reproduce! Even perf report takes ages, thanks for the report! I wonder what's going on there... My hunch is lots of short lived processes and us positioning the cache at the wrong place (per pid instead of a global per-dso cache).

@milianw
Copy link
Member

milianw commented Jan 5, 2020

Yep, that seems to be it: The slowness pointed to by your profile of hotspot-perfparser arises from us having to symbolize addresses within rustc repeatedly, cf.:

$ perf report -s pid,comm -g none --stdio
...
    86.21%    86.21%    12393:rustc            rustc          
     3.86%     3.86%    12370:rustc            rustc          
     1.94%     1.94%    12380:rustc            rustc          
     1.49%     1.49%    12318:rustc            rustc          
     0.93%     0.93%    12319:rustc            rustc          
     0.28%     0.28%    12369:ld               ld             
     0.28%     0.28%    12395:rustc            rustc          
     0.26%     0.26%    12394:rustc            rustc          
     0.26%     0.26%    12347:ld               ld             
     0.24%     0.24%    12396:rustc            rustc          
     0.21%     0.21%    12356:rustc            rustc          
     0.16%     0.16%    12330:rustc            rustc          
     0.14%     0.14%    12307:cargo            cargo          
     0.12%     0.12%    12352:rustc            rustc          
     0.12%     0.12%    12354:rustc            rustc          
     0.12%     0.12%    12344:rustc            rustc          
     0.11%     0.11%    12342:rustc            rustc          
     0.11%     0.11%    12324:rustc            rustc          
     0.11%     0.11%    12343:rustc            rustc          
     0.11%     0.11%    12323:rustc            rustc          
     0.11%     0.11%    12353:rustc            rustc          
     0.10%     0.10%    12335:rustc            rustc          
     0.10%     0.10%    12350:rustc            rustc          
     0.10%     0.10%    12358:rustc            rustc          
     0.09%     0.09%    12378:rustc            rustc          
     0.08%     0.08%    12325:rustc            rustc          
     0.08%     0.08%    12348:rustc            rustc          
     0.08%     0.08%    12349:rustc            rustc          
     0.07%     0.07%    12326:rustc            rustc          
     0.07%     0.07%    12341:rustc            rustc          
     0.07%     0.07%    12351:rustc            rustc          
     0.06%     0.06%    12357:rustc            rustc          
     0.06%     0.06%    12338:rustc            rustc          
     0.06%     0.06%    12331:rustc            rustc          
     0.06%     0.06%    12332:rustc            rustc          
     0.06%     0.06%    12361:rustc            rustc          
     0.06%     0.06%    12379:rustc            rustc          
     0.05%     0.05%    12329:rustc            rustc          
     0.05%     0.05%    12333:rustc            rustc          
     0.05%     0.05%    12402:rustc            rustc          
     0.05%     0.05%    12308:rustc            rustc          
     0.05%     0.05%    12400:rustc            rustc          
     0.05%     0.05%    12334:rustc            rustc          
     0.04%     0.04%    12407:rustc            rustc          
     0.04%     0.04%    12337:rustc            rustc          
     0.04%     0.04%    12429:rustc            rustc          
     0.04%     0.04%    12382:rustc            rustc          
     0.04%     0.04%    12397:rustc            rustc          
     0.03%     0.03%    12408:rustc            rustc          
     0.03%     0.03%    12383:rustc            rustc          
     0.03%     0.03%    12362:rustc            rustc          
     0.03%     0.03%    12381:rustc            rustc          
     0.03%     0.03%    12431:rustc            rustc          
     0.03%     0.03%    12366:rustc            rustc          
     0.03%     0.03%    12336:rustc            rustc          
     0.03%     0.03%    12392:rustc            rustc          
     0.03%     0.03%    12399:rustc            rustc          
     0.03%     0.03%    12430:rustc            rustc          
     0.03%     0.03%    12364:rustc            rustc          
     0.03%     0.03%    12317:rustc            rustc          
     0.03%     0.03%    12316:rustc            rustc          
     0.03%     0.03%    12401:rustc            rustc          
     0.03%     0.03%    12340:rustc            rustc          
     0.03%     0.03%    12310:rustc            rustc          
     0.03%     0.03%    12405:rustc            rustc          
     0.02%     0.02%    12432:rustc            rustc          
     0.02%     0.02%    12410:rustc            rustc          
     0.02%     0.02%    12377:rustc            rustc          
     0.02%     0.02%    12373:rustc            rustc          
     0.02%     0.02%    12403:rustc            rustc          
     0.02%     0.02%    12311:rustdoc          rustdoc        
     0.02%     0.02%    12309:rustc            rustc          
     0.02%     0.02%    12436:rustc            rustc          
     0.02%     0.02%    12409:rustc            rustc          
     0.02%     0.02%    12404:rustc            rustc          
     0.02%     0.02%    12433:rustc            rustc          
     0.02%     0.02%    12406:rustc            rustc          
     0.02%     0.02%    12437:rustc            rustc          
     0.02%     0.02%    12438:rustc            rustc          
     0.02%     0.02%    12339:rustc            rustc          
     0.01%     0.01%    12384:rustc            rustc          
     0.01%     0.01%    12442:rustc            rustc          
     0.01%     0.01%    12434:rustc            rustc          
     0.01%     0.01%    12435:rustc            rustc          
     0.01%     0.01%    12443:rustc            rustc          
     0.01%     0.01%    12448:rustc            rustc          
     0.01%     0.01%    12441:rustc            rustc          
     0.01%     0.01%    12439:rustc            rustc          
     0.01%     0.01%    12440:rustc            rustc          
     0.01%     0.01%    12447:rustc            rustc          
     0.01%     0.01%    12450:rustc            rustc          
     0.01%     0.01%    12446:rustc            rustc          
     0.01%     0.01%    12449:rustc            rustc          
     0.01%     0.01%    12444:rustc            rustc          
     0.01%     0.01%    12322:rustc            rustc          
     0.01%     0.01%    12460:rustc            rustc          
     0.01%     0.01%    12452:rustc            rustc          
     0.01%     0.01%    12385:rustc            rustc          
     0.01%     0.01%    12387:rustc            rustc          
     0.01%     0.01%    12453:rustc            rustc          
     0.01%     0.01%    12388:rustc            rustc          
     0.01%     0.01%    12445:rustc            rustc          
     0.01%     0.01%    12451:rustc            rustc          
     0.01%     0.01%    12462:rustc            rustc          
     0.01%     0.01%    12386:rustc            rustc          
     0.01%     0.01%    12459:rustc            rustc          
     0.01%     0.01%    12455:rustc            rustc          
     0.01%     0.01%    12461:rustc            rustc          
     0.01%     0.01%    12457:rustc            rustc          
     0.01%     0.01%    12454:rustc            rustc          
     0.01%     0.01%    12355:rustc            rustc          
     0.01%     0.01%    12456:rustc            rustc          
     0.01%     0.01%    12458:rustc            rustc          
     0.01%     0.01%    12463:rustc            rustc          
     0.00%     0.00%    12428:rustc            rustc          
     0.00%     0.00%    12345:cc               cc             
     0.00%     0.00%    12321:rustc            rustc          
     0.00%     0.00%    12367:cc               cc             
     0.00%     0.00%    12391:rustc            rustc          
     0.00%     0.00%    12376:rustc            rustc          
     0.00%     0.00%    12314:cargo            cargo          
     0.00%     0.00%    12374:cargo            cargo          
     0.00%     0.00%    12365:cargo            cargo          
     0.00%     0.00%    12372:build-script-bu  build-script-bu
     0.00%     0.00%    12360:build-script-bu  build-script-bu
     0.00%     0.00%    12328:rustc            rustc          
     0.00%     0.00%    12327:rustc            rustc          
     0.00%     0.00%    12390:cargo            cargo          
     0.00%     0.00%    12315:cargo            cargo          
     0.00%     0.00%    12368:collect2         collect2       
     0.00%     0.00%    12371:cargo            cargo          
     0.00%     0.00%    12346:collect2         collect2       
     0.00%     0.00%    12312:rustdoc          rustdoc        
     0.00%     0.00%    12359:cargo            cargo          
     0.00%     0.00%    12345:cc               rustc          
     0.00%     0.00%    12427:rustc            rustc          
     0.00%     0.00%    12372:build-script-bu  cargo          
     0.00%     0.00%    12392:rustc            cargo          
     0.00%     0.00%    12311:rustdoc          cargo          
     0.00%     0.00%    12313:cargo            cargo 

I assume that we could improve the performance a lot by moving the symbol/inline-frame cache a level higher such that it's per DSO and keyed by DSO offset, not per PID and keyed by address.

Nevertheless, I'm quite surprised to see that even the trivial perf report command above is taking ~2min on my laptop - it should basically just read through the mmapped file and not do anything fancy, which should be much faster than that. Potentially perf report also could be optimized for this scenario ;-)

@milianw
Copy link
Member

milianw commented Jan 6, 2020

OK, my hunch wasn't correct, but I found a way to improve the loading time by roughly 7x on my laptop. Please wait for the latest appimage to finish building and then try again:

https://github.com/KDAB/hotspot/releases/tag/continuous

@Marwes
Copy link
Author

Marwes commented Jan 6, 2020

Much better, thank you!

@Marwes
Copy link
Author

Marwes commented Jan 8, 2020

Might have found another place which does not hit cache enough. This time running cargo check on https://github.com/clap-rs/clap .

Screenshot from 2020-01-08 19-08-32

@milianw milianw reopened this Jan 8, 2020
@milianw
Copy link
Member

milianw commented Jan 8, 2020

that one is for inlined frames, not sure we can cache much there... will have to investigate

@milianw
Copy link
Member

milianw commented Jan 9, 2020

When I profile cargo check, the report still points me at dwfl_module_addrinfo, and not at dwarf_getscopes like in your profile. The problem for me is that we touch ~9000 different symbols and each time it takes roughly 5-10ms to call dwfl_module_addrinfo which accumulates to 45s - 1:30min overall...

It's really odd that dwfl_module_addrinfo is so slow. Maybe that's b/c it's all compiled with clang and .dwarf_aranges is missing? I'll have to think about alternative ways to speed this up.

@Marwes
Copy link
Author

Marwes commented Jan 9, 2020

Looks like .dwarf_aranges was just recently added, though I am pretty sure I ran it with a recent enough rust version for that to get emitted rust-lang/rust#66532 🤔

cnd pushed a commit to cynede/perfparser that referenced this issue Jan 13, 2020
The symbol table isn't necessarily sorted, and thus repeated lookups
in there can be expensive when a DSO has many entries in its symtab.

For example, the librustc_driver from rustc 1.40.0 has about 202594
symbols. A single call to dwfl_module_addrinfo can take milliseconds
on my laptop. Every time we get a sample at a so far unknown address,
we have to find the corresponding symbol. So we called this function
a lot, which can add up to a significant amount of time. Now, we
cache the symbol name and its offset and size information in a sorted
list and try to lookup the symbol there quickly. The impact of this
patch on the overall time required to analyze a ~1GB perf.data file
for a `cargo build` process (and it's child processes) is huge:

before:
```
        447.681,66 msec task-clock:u              #    0,989 CPUs utilized
                 0      context-switches:u        #    0,000 K/sec
                 0      cpu-migrations:u          #    0,000 K/sec
            45.214      page-faults:u             #    0,101 K/sec
 1.272.289.956.854      cycles:u                  #    2,842 GHz
 3.497.255.264.964      instructions:u            #    2,75  insn per cycle
   863.671.557.196      branches:u                # 1929,209 M/sec
     2.666.320.642      branch-misses:u           #    0,31% of all branches

     452,806895428 seconds time elapsed

     441,996666000 seconds user
       2,557237000 seconds sys
```

after:

```
         63.770,08 msec task-clock:u              #    0,995 CPUs utilized
                 0      context-switches:u        #    0,000 K/sec
                 0      cpu-migrations:u          #    0,000 K/sec
            35.102      page-faults:u             #    0,550 K/sec
   191.267.750.628      cycles:u                  #    2,999 GHz
   501.316.536.714      instructions:u            #    2,62  insn per cycle
   122.234.405.333      branches:u                # 1916,799 M/sec
       443.671.470      branch-misses:u           #    0,36% of all branches

      64,063443896 seconds time elapsed

      62,188041000 seconds user
       1,136533000 seconds sys
```

That means we are now roughly 7x faster than before.

Fixes: KDAB/hotspot#225
Change-Id: Ib7dbc800c9372044a847de68a8459dd7f7b0d3da
Reviewed-by: Ulf Hermann <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants