Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing,sql: remove option to bypass the tracing registry #59315

Closed
irfansharif opened this issue Jan 22, 2021 · 5 comments · Fixed by #61438
Closed

tracing,sql: remove option to bypass the tracing registry #59315

irfansharif opened this issue Jan 22, 2021 · 5 comments · Fixed by #61438
Assignees
Labels
A-tracing Relating to tracing in CockroachDB. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@irfansharif
Copy link
Contributor

In #58902 we introduced a leak-detection for un-Finish()ed tracing.Spans. We did this by making sure that on server shutdown the tracing registry was empty (all newly started spans create an entry there, and remove themselves when finished). In doing so we observed that spans created within txnState.resetForNewSQLTxn aren't Finish()-ed during server shut down. We introduced a tracing.WithBypassRegistry() for that codepath, with the intention of addressing it pre-21.1. It'd be nice to figure out how we could ensure that the Spans are finished explicitly so that we're able to grab a hold of them in the registry (which will then let us introspect them).

@irfansharif irfansharif added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-tracing Relating to tracing in CockroachDB. labels Jan 22, 2021
@asubiotto
Copy link
Contributor

Was there a specific test that was failing? I assume this is specific to an edge case in the draining code path, but knowing which test will make it clear how to reproduce and what a solution might look like.

@irfansharif
Copy link
Contributor Author

I don't remember off-hand, but I think TestIdleInTransactionSessionTimeoutCommitWaitState? Seeing as how the leak-detection is applicable for all tests making use of TestCluster, I think you'll find that testing the kvserver package will induce a lot of failures with the option removed.

@tbg
Copy link
Member

tbg commented Feb 15, 2021

Just to make sure this is clear, this is a must-fix for 21.1 as otherwise the value of the tracing work is severely diminished - if SQL spans are not in the registry, we get a lot less insight from the registry.
It's also annoying from a testing perspective, since the registry is the natural place to write logic tests against.

@tbg
Copy link
Member

tbg commented Feb 15, 2021

Heads up that #59992 introduces a testing knob that can be used from logic tests which works around the problem. That means we should be good for testing, but we (SQL exec 😄) still need to look into this in the stability period.

tbg added a commit to tbg/cockroach that referenced this issue Feb 15, 2021
1. Require admin role for `crdb_internal.node_inflight_trace_spans`
2. Add num_structured column to that table (this can be seen as
   temporary for testing; we can remove it once cockroachdb#55733 is fully
   addressed and payloads can be introspected separately).
3. Add workaround for failure to finish sql txn spans (cockroachdb#59315)
   for use in logic tests, which together with 2.) and 4.) enables
   a better `contention_event` test.
4. Add `crdb_internal.trace_id` builtin.

Release note: None
@asubiotto
Copy link
Contributor

Confirming that we're planning on taking a look at this during the stability period next milestone.

tbg added a commit to tbg/cockroach that referenced this issue Feb 17, 2021
1. Require admin role for `crdb_internal.node_inflight_trace_spans`
2. Add num_structured column to that table (this can be seen as
   temporary for testing; we can remove it once cockroachdb#55733 is fully
   addressed and payloads can be introspected separately).
3. Add workaround for failure to finish sql txn spans (cockroachdb#59315)
   for use in logic tests, which together with 2.) and 4.) enables
   a better `contention_event` test.
4. Add `crdb_internal.trace_id` builtin.

Release note: None
angelapwen pushed a commit to angelapwen/cockroach that referenced this issue Feb 17, 2021
1. Require admin role for `crdb_internal.node_inflight_trace_spans`
2. Add num_structured column to that table (this can be seen as
   temporary for testing; we can remove it once cockroachdb#55733 is fully
   addressed and payloads can be introspected separately).
3. Add workaround for failure to finish sql txn spans (cockroachdb#59315)
   for use in logic tests, which together with 2.) and 4.) enables
   a better `contention_event` test.
4. Add `crdb_internal.trace_id` builtin.

Release note: None
tbg added a commit to tbg/cockroach that referenced this issue Feb 17, 2021
1. Require admin role for `crdb_internal.node_inflight_trace_spans`
2. Add num_structured column to that table (this can be seen as
   temporary for testing; we can remove it once cockroachdb#55733 is fully
   addressed and payloads can be introspected separately).
3. Add workaround for failure to finish sql txn spans (cockroachdb#59315)
   for use in logic tests, which together with 2.) and 4.) enables
   a better `contention_event` test.
4. Add `crdb_internal.trace_id` builtin.

Release note: None
@craig craig bot closed this as completed in 83a177c Mar 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tracing Relating to tracing in CockroachDB. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants