-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tracing: wrongly emitting trace events for session's first txn #59203
Comments
We are still seeing memory issues on tpccbench/nodes=6/cpu=16/multi-az which need to be investigated. Turn off background tracing while we do. Touches #58298. We're also reverting an earlier commit as part of this one (d252400). This revert is needed given we've not yet addressed an underlying bug (#59203). Release note: None
I don't think we're double emitting trace events. I think it's more that we're not resetting trace events in a span correctly. We used to expect the following data driven output for the following query:
Which fails with:
Where's the DelRng coming from? Probably some previous trace. I double checked to see that running that query+trace capture in isolation (without any of the preceding trace captures) doesn't run into the same error. So we're definitely not resetting the span correctly, somewhere. |
I think it's got to do with how we expect traces to behave with
This is what I expect to behave like, but it doesn't, the "dist sender send r35: sending batch 2 CPut to (n1,s1):1" recording from the rolled back statement still shows up in future traces, despite "resetting" the trace using
|
I see that we're grabbing onto the first txn's span in the session here: cockroach/pkg/sql/exec_util.go Lines 1483 to 1485 in 854246a
And since we're not really ever "consuming" span data from the aborted txn, it'll show up in future traces. We've had this behaviour for a while, and not sure if it's a sensible default. Handy repro #59458. |
...for sessions with aborted txns. Attempt to repro cockroachdb#59203 Release note: None
We are still seeing memory issues on tpccbench/nodes=6/cpu=16/multi-az which need to be investigated. Turn off background tracing while we do. Touches #58298. We're also reverting an earlier commit as part of this one (d252400). This revert is needed given we've not yet addressed an underlying bug (#59203). Release note: None
yo @asubiotto, I think this is in your ballpark. I don't know if we'll have time to do anything about it, or if it's worthwhile for 21.1, but I do think it's a "bug". |
Hmm, thanks for raising this. We'll hopefully get to trace-related fixes during the stability period. |
Are we still planning on addressing this? |
It's a nice to have for 21.1 so not certain we'll definitely have it done for 21.1. We'll take a look at addressing it once we've completed higher-priority items. |
I'm having to get into the weeds of this SessionTracer guy to work through the implications of #61777. Specifically, if we're not creating real spans unless absolutely needed, we can't rely on the assumptions of this code: cockroach/pkg/sql/exec_util.go Lines 1645 to 1656 in a86cf54
I think we'll want to hijack the txn's span + context in the same manner we do for the conn below. Does that sound right to you? +cc @cockroachdb/kv-east / @tbg. |
That sounds reasonable to me, but this code is new to me too. (cc @andreimatei in case you want to chime in) |
See #59193. We bisected a failure in
TestExecBuild/local/autocommit_nonmetamorphic
to #58897, which simply enabled always-on tracing. We're now doubly emitting trace data when we shouldn't be.The text was updated successfully, but these errors were encountered: