Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Easier flamegraph / profiling support for datafusion benchmarks #2174

Closed
alamb opened this issue Apr 6, 2022 · 14 comments
Closed

Easier flamegraph / profiling support for datafusion benchmarks #2174

alamb opened this issue Apr 6, 2022 · 14 comments
Assignees
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Apr 6, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I want to make flamegraphs to understand performance of PRs like #2146 from @yjshen and other benchmarks in a cross platform, easy to use way. I don't want to switch between Instruments on Mac to perf on linux

It is not easy to get them (see this mailing list topic, and I hit the same "flamegraph takes forever to make" issue when I tried to run this on my development machine

Describe the solution you'd like
Use code in pprof crate -- @mkmik did this in influxdb_iox have used it to good effect in IOx

So this would look like adding an optional feature pprof to benchmark program that would generate flamegraphs and profile.proto format output. You would run it like this:

# --profile also writes flamegraph.svg and profile.proto files
target/release/tpch benchmark datafusion --profile --iterations 3 --path /tpch-parquet --format parquet --query 1

You could use the profile.proto in the (excellent) golang tooling like:

go tool pprof cpu.prof

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@alamb alamb added the enhancement New feature or request label Apr 6, 2022
@alamb alamb self-assigned this Apr 6, 2022
@tustvold
Copy link
Contributor

tustvold commented Apr 6, 2022

Have you checked out cargo-flamegraph? I think it might fit the bill?

@alamb
Copy link
Contributor Author

alamb commented Apr 6, 2022

I did try to run cargo flamegraph ... and it was spending an absurd amount of time calling addr2line (as reported in the mailing list topic). Perhaps due to some old version of prof or something on the GCP debian machine I was using

@Dandandan
Copy link
Contributor

FWIW, I never have experienced the in the mailinglist mentioned slowness of flamegraph (or the Hotspot UI I mentioned there), using a recent Pop!_OS version.

@tustvold
Copy link
Contributor

tustvold commented Apr 6, 2022

Perhaps this might help? FWIW I also use the hotspot UI, which also has some nice functionality for digging into the profiles, viewing thread activity, etc... It chugs a bit on larger profiles (multiple GB) but otherwise works well enough for my purposes...

Edit: turns out cargo-flamegraph is already using inferno. Unfortunately pprof also uses it, so switching to using pprof may not help matters...

@bobtins
Copy link

bobtins commented Apr 6, 2022

@alamb someone drilled deep into why perf is slow.
TL;DR--GPLv2-license-compatible perf is really inefficient.
@Dandandan perf can be compiled with libbfd (fast) or without (slow). I am running Ubuntu but the author above only reports slow results on Debian. Maybe Pop!_OS is cheating and compiling with libbfd?
Summing up these results:

  • me, Ubuntu 20.04: slow
  • me, recompiled perf: fast
  • @Dandandan , Pop!_OS: fast
  • @alamb , MacOS: slow

@alamb applying the solution from IOx sounds like a great way to go. I haven't been on this for awhile (busy) but I probably would've taken a more laborious approach, perhaps reinventing a wheel.

@tustvold
Copy link
Contributor

tustvold commented Apr 6, 2022

@bobtins I think you maybe meant to link to this https://eighty-twenty.org/2021/09/09/perf-addr2line-speed-improvement ?

The good news is a mitigation was merged at some point last year, and has apparently been released, so perhaps the GCP machines was running an old Debian version? - https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=be8ecc57f180415e8a7c1cc5620c5236be2a7e56

@bobtins
Copy link

bobtins commented Apr 6, 2022

@tustvold I did try out hotspot; it's really nice! It can record data (which just runs perf) or open perf files that were recorded.
hotspot_3

@bobtins
Copy link

bobtins commented Apr 6, 2022

@bobtins I think you maybe meant to link to this https://eighty-twenty.org/2021/09/09/perf-addr2line-speed-improvement ?

Hm, good news, thanks for the updated info.

@realno
Copy link
Contributor

realno commented Apr 6, 2022

This is great, looking forward to seeing some results!

@alamb
Copy link
Contributor Author

alamb commented Apr 6, 2022

Thanks for the suggestions -- I'l try the various tools on this thread and will report back

@mkmik
Copy link
Contributor

mkmik commented Apr 7, 2022

FWIW I found the pprof approach to be very useful for online performance analysis (e.g on demand on production services), and less useful when running ad-hoc performance analysis runs locally (e.g. benchmarks)

@houqp
Copy link
Member

houqp commented Apr 7, 2022

@alamb have you tried cargo-flamegraph's --no-inline argument?

@mingmwang
Copy link
Contributor

I had use pprof-rs to benchmark the DataFusion/Arrow Parquet reader performance. It can generate the flamegraph pictures easily.

@alamb
Copy link
Contributor Author

alamb commented Jul 21, 2023

I don't think there is any work action for this, standard profiling tools work great. Closing

@alamb alamb closed this as completed Jul 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants