-
Notifications
You must be signed in to change notification settings - Fork 727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High CPU usage when misusing spans #458
Comments
Looking at your flamegraph, this appears to be a bug in tracing/tracing-subscriber/src/fmt/span.rs Lines 333 to 394 in bec40bc
It's worth noting that all of this code has been rewritten in the upcoming |
I can try this right away. How would the git link in the Cargo.toml look like? Sorry about the stupid question, never done this before. I checked here https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html but could not find how to specify a git link to a subproject. |
Never mind. this is the right dependency spec, right?
|
OK, I made the experiment. This seems to be the issue. Performance with the 0.2 alpha is much better than 0.1. See attached videos. So when is 0.2 coming out? I am afraid we have to disable all tracing until then, as you can see from the video.... This has been driving me crazy, since apparently I was more affected by this than my colleagues due to me having the most virtual cores... |
Yeah, that should work. Just FYI, you don't need to specify everything you've listed here, the following should be equivalent: tracing-subscriber = { git = "https://github.com/tokio-rs/tracing" } :) Note that if you were actually going to use a git dependency in production, you would probably want to pin to a fixed git revision, like so: tracing-subscriber = { git = "https://github.com/tokio-rs/tracing", rev = "3f14e9d7127c4f2ca70ab1c5c2ad7c9439f5c40b" } |
FYI It seems that this is only related to spans. I have another branch where I have disabled all the spans but left in all the debug! and trace! calls, and it performs well and does not have the problem! |
Awesome, that's great to hear! Glad the 0.2 release fixed it!
There's an I'd like to get a release version of 0.2 out soon, but there are a couple of other potentially breaking changes I'd like to spend some more time looking into. |
Yeah, that's not surprising — the issue is specifically in the code for storing in-progress spans in |
I think we would use |
It should be totally compatible with everything else. The only breaking changes in 0.2 are to some of the APIs for customizing the behaviour of the |
OK, so then it would be very cool if you could publish |
So I tried using an explicit reference to the most recent tracing on master for now. Like this:
works fine on my machine, and fixes the perf issues, but our CI fails for one of the ARM builds.
Apparently this is due to the new dependency on sharded-slab
Any way to get around this? It is marked as an optional dep. |
Yeah, this is a known issue (hawkw/sharded-slab#9) which has a fix up — I can merge that as soon as I can get that passing CI, and publish a new
It's required by the BTW, the |
As soon as there is a new version I will try this out (on Monday) and see if it causes any issues on our various build targets. If not I would love to use it.
Yes, I saw that after investigating the feature dependencies more closely... |
@rklaehn published a new alpha: https://crates.io/crates/tracing-subscriber/0.2.0-alpha.2 Sorry it took a couple days, had to un-break some unrelated build failures first. |
Thanks a lot ❤️ . Will try if this works on all our targets. Can you give a rough estimate when tracing 0.2.0 will be out? When it's done is also a perfectly valid answer. |
"When it's done, but hopefully soon"? There are a couple of APIs I want to try to clean up before releasing 0.2, but I haven't had the time to work on that lately. If it ends up taking too long, I can probably push some of that to an eventual 0.3 — there's already a lot of new stuff in 0.2. |
OK, I have now switched to alpha 2, it passes all tests and has no perf impact anymore. We are going to use this in production. Is there a regression test to make sure the perf does not degrade, or is it such an obvious change that that is not necessary? In any case, thanks for the quick help. Feel free to close this issue. For us it is solved with 0.2 alpha 2. |
There are some benchmarks, but I'll have to double check that this specific case is covered. |
Bug Report
Version
Platform
Description
Not sure if this is a bug since the API was not used correctly, but since rust has the motto of fearless concurrency, I thought I might report it anyway. Maybe this can be better documented or caught.
We have a fairly large project that is using tracing internally. I had this very confusing behaviour where the thing would consume 100% CPU on all cores sometimes. Using flamegraph, I have narrowed this down to tracing, and then tracing spans.
The issue turned out to be that somebody had created a tracing span on a different thread to the thread where the span was entered. I still don't know what exactly happened then, but tracing did not like it at all!
There was no crash and the program still executed correctly, but all 12 virtual cores were loaded for about a minute.
Here is the code that caused so much trouble:
Here is the fix:
I attached a flamegraph of the behaviour I saw.
flamegraph_tracing_100.zip
Maybe something can be done in code or in documentation to prevent people from running into the same issue.
Anyway, feel free to close this if you think it is not worth addressing. And thanks for the awesome library!
The text was updated successfully, but these errors were encountered: