-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
execbuilder: unexpectedly changed KV usage due to tracing commit #60672
Labels
A-sql-execution
Relating to SQL execution.
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Comments
tbg
added
A-sql-execution
Relating to SQL execution.
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
labels
Feb 17, 2021
Actually upon rewriting the file I get more random diffs: diff --git a/pkg/sql/opt/exec/execbuilder/testdata/autocommit_nonmetamorphic b/pkg/sql/opt/exec/execbuilder/testdata/autocommit_nonmetamorphic
index 7191149f25..0a016b3606 100644
--- a/pkg/sql/opt/exec/execbuilder/testdata/autocommit_nonmetamorphic
+++ b/pkg/sql/opt/exec/execbuilder/testdata/autocommit_nonmetamorphic
@@ -110,6 +110,7 @@ WHERE message LIKE '%r$rangeid: sending batch%'
AND message NOT LIKE '%PushTxn%'
AND message NOT LIKE '%QueryTxn%'
----
+dist sender send r36: sending batch 2 CPut to (n1,s1):1
dist sender send r36: sending batch 2 CPut, 1 EndTxn to (n1,s1):1
# TODO(radu): allow non-side-effecting projections.
@@ -133,6 +134,7 @@ WHERE message LIKE '%r$rangeid: sending batch%'
AND operation NOT LIKE '%async%'
----
dist sender send r36: sending batch 2 CPut to (n1,s1):1
+dist sender send r36: sending batch 2 CPut to (n1,s1):1
dist sender send r36: sending batch 1 EndTxn to (n1,s1):1
# Insert with RETURNING statement with side-effects should not auto-commit.
@@ -157,6 +159,7 @@ WHERE message LIKE '%r$rangeid: sending batch%'
AND operation NOT LIKE '%async%'
----
dist sender send r36: sending batch 2 CPut to (n1,s1):1
+dist sender send r36: sending batch 2 CPut to (n1,s1):1
dist sender send r36: sending batch 1 EndTxn to (n1,s1):1
# Another way to test the scenario above: generate an error and ensure that the
@@ -192,6 +195,7 @@ WHERE message LIKE '%r$rangeid: sending batch%'
AND message NOT LIKE '%PushTxn%'
AND message NOT LIKE '%QueryTxn%'
----
+dist sender send r36: sending batch 2 CPut to (n1,s1):1
dist sender send r36: sending batch 1 Put, 1 EndTxn to (n1,s1):1
# Multi-row upsert should auto-commit.
@@ -213,6 +217,7 @@ WHERE message LIKE '%r$rangeid: sending batch%'
AND message NOT LIKE '%PushTxn%'
AND message NOT LIKE '%QueryTxn%'
----
+dist sender send r36: sending batch 2 CPut to (n1,s1):1
dist sender send r36: sending batch 2 Put, 1 EndTxn to (n1,s1):1
# No auto-commit inside a transaction.
@@ -261,6 +266,7 @@ WHERE message LIKE '%r$rangeid: sending batch%'
AND message NOT LIKE '%PushTxn%'
AND message NOT LIKE '%QueryTxn%'
----
+dist sender send r36: sending batch 2 Put to (n1,s1):1
dist sender send r36: sending batch 2 Put, 1 EndTxn to (n1,s1):1
# TODO(radu): allow non-side-effecting projections.
@@ -284,6 +290,7 @@ WHERE message LIKE '%r$rangeid: sending batch%'
AND operation NOT LIKE '%async%'
----
dist sender send r36: sending batch 2 Put to (n1,s1):1
+dist sender send r36: sending batch 2 Put to (n1,s1):1
dist sender send r36: sending batch 1 EndTxn to (n1,s1):1
# Upsert with RETURNING statement with side-effects should not auto-commit.
@@ -308,6 +315,7 @@ WHERE message LIKE '%r$rangeid: sending batch%'
AND operation NOT LIKE '%async%'
----
dist sender send r36: sending batch 2 Put to (n1,s1):1
+dist sender send r36: sending batch 2 Put to (n1,s1):1
dist sender send r36: sending batch 1 EndTxn to (n1,s1):1
# Another way to test the scenario above: generate an error and ensure that the
@@ -343,6 +351,7 @@ WHERE message LIKE '%r$rangeid: sending batch%'
AND message NOT LIKE '%PushTxn%'
AND message NOT LIKE '%QueryTxn%'
----
+dist sender send r36: sending batch 2 Put to (n1,s1):1
dist sender send r36: sending batch 1 Scan to (n1,s1):1
dist sender send r36: sending batch 2 Put, 1 EndTxn to (n1,s1):1
@@ -394,6 +403,8 @@ WHERE message LIKE '%r$rangeid: sending batch%'
AND message NOT LIKE '%QueryTxn%'
----
dist sender send r36: sending batch 1 Scan to (n1,s1):1
+dist sender send r36: sending batch 2 Put to (n1,s1):1
+dist sender send r36: sending batch 1 Scan to (n1,s1):1
dist sender send r36: sending batch 2 Put, 1 EndTxn to (n1,s1):1
# TODO(radu): allow non-side-effecting projections.
@@ -418,6 +429,8 @@ WHERE message LIKE '%r$rangeid: sending batch%'
----
dist sender send r36: sending batch 1 Scan to (n1,s1):1
dist sender send r36: sending batch 2 Put to (n1,s1):1
+dist sender send r36: sending batch 1 Scan to (n1,s1):1
+dist sender send r36: sending batch 2 Put to (n1,s1):1
dist sender send r36: sending batch 1 EndTxn to (n1,s1):1
# Update with RETURNING statement with side-effects should not auto-commit.
@@ -443,6 +456,8 @@ WHERE message LIKE '%r$rangeid: sending batch%'
----
dist sender send r36: sending batch 1 Scan to (n1,s1):1
dist sender send r36: sending batch 2 Put to (n1,s1):1
+dist sender send r36: sending batch 1 Scan to (n1,s1):1
+dist sender send r36: sending batch 2 Put to (n1,s1):1
dist sender send r36: sending batch 1 EndTxn to (n1,s1):1
# Another way to test the scenario above: generate an error and ensure that the
@@ -478,6 +493,8 @@ WHERE message LIKE '%r$rangeid: sending batch%'
AND message NOT LIKE '%PushTxn%'
AND message NOT LIKE '%QueryTxn%'
----
+dist sender send r36: sending batch 1 Scan to (n1,s1):1
+dist sender send r36: sending batch 2 Put to (n1,s1):1
dist sender send r36: sending batch 1 DelRng, 1 EndTxn to (n1,s1):1
# Multi-row delete should auto-commit.
@@ -499,6 +516,8 @@ WHERE message LIKE '%r$rangeid: sending batch%'
AND message NOT LIKE '%PushTxn%'
AND message NOT LIKE '%QueryTxn%'
----
+dist sender send r36: sending batch 1 Scan to (n1,s1):1
+dist sender send r36: sending batch 2 Put to (n1,s1):1
dist sender send r36: sending batch 1 DelRng, 1 EndTxn to (n1,s1):1
# No auto-commit inside a transaction.
@@ -547,6 +566,7 @@ WHERE message LIKE '%r$rangeid: sending batch%'
AND message NOT LIKE '%PushTxn%'
AND message NOT LIKE '%QueryTxn%'
----
+dist sender send r36: sending batch 1 DelRng to (n1,s1):1
dist sender send r36: sending batch 1 Scan to (n1,s1):1
dist sender send r36: sending batch 2 Del, 1 EndTxn to (n1,s1):1
@@ -570,6 +590,7 @@ WHERE message LIKE '%r$rangeid: sending batch%'
AND message NOT LIKE '%QueryTxn%'
AND operation NOT LIKE '%async%'
----
+dist sender send r36: sending batch 1 DelRng to (n1,s1):1
dist sender send r36: sending batch 1 Scan to (n1,s1):1
dist sender send r36: sending batch 2 Del to (n1,s1):1
dist sender send r36: sending batch 1 EndTxn to (n1,s1):1
@@ -595,6 +616,7 @@ WHERE message LIKE '%r$rangeid: sending batch%'
AND message NOT LIKE '%QueryTxn%'
AND operation NOT LIKE '%async%'
----
+dist sender send r36: sending batch 1 DelRng to (n1,s1):1
dist sender send r36: sending batch 1 Scan to (n1,s1):1
dist sender send r36: sending batch 2 Del to (n1,s1):1
dist sender send r36: sending batch 1 EndTxn to (n1,s1):1
@@ -644,6 +666,7 @@ WHERE message LIKE '%r$rangeid: sending batch%'
AND message NOT LIKE '%QueryTxn%'
AND operation NOT LIKE '%async%'
----
+dist sender send r36: sending batch 1 DelRng to (n1,s1):1
dist sender send r36: sending batch 2 CPut to (n1,s1):1
dist sender send r36: sending batch 2 Scan to (n1,s1):1
dist sender send r36: sending batch 1 EndTxn to (n1,s1):1
@@ -667,6 +690,7 @@ WHERE message LIKE '%r$rangeid: sending batch%'
AND message NOT LIKE '%QueryTxn%'
AND operation NOT LIKE '%async%'
----
+dist sender send r36: sending batch 1 DelRng to (n1,s1):1
dist sender send r36: sending batch 1 Scan to (n1,s1):1
dist sender send r36: sending batch 1 Put to (n1,s1):1
dist sender send r36: sending batch 1 Scan to (n1,s1):1
@@ -692,6 +716,7 @@ WHERE message LIKE '%r$rangeid: sending batch%'
AND message NOT LIKE '%QueryTxn%'
AND operation NOT LIKE '%async%'
----
+dist sender send r36: sending batch 1 DelRng to (n1,s1):1
dist sender send r36: sending batch 1 Scan to (n1,s1):1
dist sender send r36: sending batch 1 Del to (n1,s1):1
dist sender send r36: sending batch 1 Scan to (n1,s1):1
@@ -720,6 +745,7 @@ WHERE message LIKE '%r$rangeid: sending batch%'
AND operation NOT LIKE '%async%'
----
dist sender send r36: sending batch 1 DelRng to (n1,s1):1
+dist sender send r36: sending batch 1 DelRng to (n1,s1):1
dist sender send r36: sending batch 1 Scan to (n1,s1):1
dist sender send r36: sending batch 1 Del, 1 EndTxn to (n1,s1):1
@@ -749,6 +775,7 @@ WHERE message LIKE '%r$rangeid: sending batch%'
AND message NOT LIKE '%QueryTxn%'
AND operation NOT LIKE '%async%'
----
+dist sender send r36: sending batch 1 DelRng to (n1,s1):1
dist sender send r36: sending batch 2 CPut to (n1,s1):1
dist sender send r36: sending batch 2 CPut to (n1,s1):1
dist sender send r36: sending batch 1 EndTxn to (n1,s1):1
@@ -774,6 +801,7 @@ WHERE message LIKE '%r$rangeid: sending batch%'
AND message NOT LIKE '%QueryTxn%'
AND operation NOT LIKE '%async%'
----
+dist sender send r36: sending batch 1 DelRng to (n1,s1):1
dist sender send r36: sending batch 2 CPut to (n1,s1):1
dist sender send r36: sending batch 2 CPut to (n1,s1):1
dist sender send r36: sending batch 1 EndTxn to (n1,s1):1 |
tbg
added a commit
to tbg/cockroach
that referenced
this issue
Feb 17, 2021
We were previously propagating RPCs across boundaries only when they were verbose. We now propagate any span (except the noop span) regardless of current verbosity. This ensures that SQL (which always creates real, non-verbose spans by default) can trace the entirety of its operations. This enables the background collection of contention metadata in 21.1 and more generally, the idea of always-on tracing. Note that we had previously considered always-on tracing to mean avoiding the noop span altogether; due to memory retention issues[1][2] encountered we are taking a step back from that idea for the 21.1 release. Somehow creating additional trace spans seems to have fouled up some tracing-based tests. I can't boil that ocean here, so I filed a follow-up issue: cockroachdb#60672 [1]: cockroachdb#59370 [2]: cockroachdb#59431 Release note: None
This seems like a dup of #59203. |
tbg
added a commit
to tbg/cockroach
that referenced
this issue
Feb 17, 2021
We were previously propagating RPCs across boundaries only when they were verbose. We now propagate any span (except the noop span) regardless of current verbosity. This ensures that SQL (which always creates real, non-verbose spans by default) can trace the entirety of its operations. This enables the background collection of contention metadata in 21.1 and more generally, the idea of always-on tracing. Note that we had previously considered always-on tracing to mean avoiding the noop span altogether; due to memory retention issues[1][2] encountered we are taking a step back from that idea for the 21.1 release. Somehow creating additional trace spans seems to have fouled up some tracing-based tests. I can't boil that ocean here, so I filed a follow-up issue: cockroachdb#60672 [1]: cockroachdb#59370 [2]: cockroachdb#59431 Release note: None
I got another diff as a result of #60725: commit d0fbf6fd9d88cadab57da0a50ca0096c27ef33c4
Author: Tobias Grieger <[email protected]>
Date: Thu Feb 18 12:04:00 2021 +0100
tracing: harden against use-after-Finish
`net/trace`'s `Trace` is notoriously panicky when used after a call to
`.Finish()`. We don't like such calls and try to avoid them, but they
are hard to conclusively disprove. Besides, the opposite - never calling
`.Finish()` is even worse, so in general we'd like to err on the side of
use-after-free and want that to be safe.
This commit does that: we move `Span` to `spanInner` and `*Span` becomes
a simple wrapper that turns most operations into no-ops once the first
call to `.Finish` has been initiated.
While I was there, I also removed `IsBlackHole()` which wasn't really
useful any more.
This fixes the following crash:
https://github.com/cockroachdb/cockroach/issues/58489#issuecomment-781263005
which was surfaced by a recent switch in that test to enable tracing
throughout (precisely to find these crashes!).
This commit added additional diffs in `TestExecBuild`, presumably
because some logging occurs to a finished Span, and that logging is now
dropped. We already know there are issues there, and I mentioned this
change in tracking issue #60672 as well.
Release note (bug fix): fixed a bug that could result in crashes during
tracing when using the `trace.debug.enable` cluster setting.
diff --git a/pkg/sql/opt/exec/execbuilder/testdata/autocommit_nonmetamorphic b/pkg/sql/opt/exec/execbuilder/testdata/autocommit_nonmetamorphic
index 0a016b3606..33570a162d 100644
--- a/pkg/sql/opt/exec/execbuilder/testdata/autocommit_nonmetamorphic
+++ b/pkg/sql/opt/exec/execbuilder/testdata/autocommit_nonmetamorphic
@@ -668,7 +668,6 @@ WHERE message LIKE '%r$rangeid: sending batch%'
----
dist sender send r36: sending batch 1 DelRng to (n1,s1):1
dist sender send r36: sending batch 2 CPut to (n1,s1):1
-dist sender send r36: sending batch 2 Scan to (n1,s1):1
dist sender send r36: sending batch 1 EndTxn to (n1,s1):1
query B
@@ -693,7 +692,6 @@ WHERE message LIKE '%r$rangeid: sending batch%'
dist sender send r36: sending batch 1 DelRng to (n1,s1):1
dist sender send r36: sending batch 1 Scan to (n1,s1):1
dist sender send r36: sending batch 1 Put to (n1,s1):1
-dist sender send r36: sending batch 1 Scan to (n1,s1):1
dist sender send r36: sending batch 1 EndTxn to (n1,s1):1
query B
@@ -719,7 +717,6 @@ WHERE message LIKE '%r$rangeid: sending batch%'
dist sender send r36: sending batch 1 DelRng to (n1,s1):1
dist sender send r36: sending batch 1 Scan to (n1,s1):1
dist sender send r36: sending batch 1 Del to (n1,s1):1
-dist sender send r36: sending batch 1 Scan to (n1,s1):1
dist sender send r36: sending batch 1 EndTxn to (n1,s1):1
# Test with a single cascade, which should use autocommit.
@@ -746,7 +743,6 @@ WHERE message LIKE '%r$rangeid: sending batch%'
----
dist sender send r36: sending batch 1 DelRng to (n1,s1):1
dist sender send r36: sending batch 1 DelRng to (n1,s1):1
-dist sender send r36: sending batch 1 Scan to (n1,s1):1
dist sender send r36: sending batch 1 Del, 1 EndTxn to (n1,s1):1
# -----------------------
diff --git a/pkg/sql/opt/exec/execbuilder/testdata/partial_index_nonmetamorphic b/pkg/sql/opt/exec/execbuilder/testdata/partial_index_nonmetamorphic
index e45575fce1..f4e618e0e8 100644
--- a/pkg/sql/opt/exec/execbuilder/testdata/partial_index_nonmetamorphic
+++ b/pkg/sql/opt/exec/execbuilder/testdata/partial_index_nonmetamorphic
@@ -905,7 +905,6 @@ query T kvtrace
DELETE FROM t57085_p WHERE p = 1;
----
DelRange /Table/56/1/1 - /Table/56/1/1/#
-Scan /Table/57/{1-2}
Del /Table/57/2/1/10/0
Del /Table/57/1/10/0
Del /Table/57/1/20/0
|
tbg
added a commit
to tbg/cockroach
that referenced
this issue
Feb 18, 2021
`net/trace`'s `Trace` is notoriously panicky when used after a call to `.Finish()`. We don't like such calls and try to avoid them, but they are hard to conclusively disprove. Besides, the opposite - never calling `.Finish()` is even worse, so in general we'd like to err on the side of use-after-free and want that to be safe. This commit does that: we move `Span` to `spanInner` and `*Span` becomes a simple wrapper that turns most operations into no-ops once the first call to `.Finish` has been initiated. While I was there, I also removed `IsBlackHole()` which wasn't really useful any more. This fixes the following crash: cockroachdb#58489 (comment) which was surfaced by a recent switch in that test to enable tracing throughout (precisely to find these crashes!). This commit added additional diffs in `TestExecBuild`, presumably because some logging occurs to a finished Span, and that logging is now dropped. We already know there are issues there, and I mentioned this change in tracking issue cockroachdb#60672 as well. Release note (bug fix): fixed a bug that could result in crashes during tracing when using the `trace.debug.enable` cluster setting.
Changing |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-sql-execution
Relating to SQL execution.
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
The test case here:
cockroach/pkg/sql/opt/exec/execbuilder/testdata/autocommit_nonmetamorphic
Lines 107 to 113 in c17795f
changed as a result of a commit that did not touch anything but tracing, namely this one (SHA is still unstable at time of writing) from #59992:
I adapted the test case to match the new expectation:
based on the assumption that the extra request was always there, but did not get traced.
Since this is the 1PC optimization, it's worth double checking what's going on here.
The text was updated successfully, but these errors were encountered: