-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sqlbase: avoid a race in sqlbase.CancelChecker #35548
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great find. I wonder why callsSinceLastCheck
is an int32
, looks like we might as well make it a uint32
but not a big deal, I don't think we care about overflow in this case anyway (could maybe use a smaller uint
).
Prior to the distsql-ification of planNode execution, all the execution steps in a local query were interleaved (using coroutine-style concurrency), so that the calls to `(*CancelChecker).Check()` were all sequential. With distsql now it's possible for different planNode `Next()` or `startExec()` methods to be running in concurrent goroutines on multiple cores, i.e. truly parallel. This in turn requires atomic access to the counter in the cancel checker. Without atomic access, there is a race condition and the possibility that the cancel checker does not work well (some increments can be performed two times, which could cause the condition of the check to occasionally fail). Found with SQLSmith. Release note: None
fbc9b43
to
1ca3629
Compare
Good idea. Done. |
TFYR! bors r+ |
35521: sql: reduce non-determinism in the show_trace logic tests r=knz a=knz Fixes #33720. First commit from #35519. The test is really about asserting the KV operations sent on behalf of SQL statements. The distsender traffic is really irrelevant. Since that's where most of the test flakes / non-determinism is coming from, simply remove it. Release note: None 35548: sqlbase: avoid a race in sqlbase.CancelChecker r=knz a=knz Fixes #35539. Prior to the distsql-ification of planNode execution, all the execution steps in a local query were interleaved (using coroutine-style concurrency), so that the calls to `(*CancelChecker).Check()` were all sequential. With distsql now it's possible for different planNode `Next()` or `startExec()` methods to be running in concurrent goroutines on multiple cores, i.e. truly parallel. This in turn requires atomic access to the counter in the cancel checker. Without atomic access, there is a race condition and the possibility that the cancel checker does not work well (some increments can be performed two times, which could cause the condition of the check to occasionally fail). Found with SQLSmith. Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]>
Build succeeded |
Fixes #35539.
Prior to the distsql-ification of planNode execution, all the
execution steps in a local query were interleaved (using
coroutine-style concurrency), so that the calls to
(*CancelChecker).Check()
were all sequential.With distsql now it's possible for different planNode
Next()
orstartExec()
methods to be running in concurrent goroutines onmultiple cores, i.e. truly parallel.
This in turn requires atomic access to the counter in the cancel
checker. Without atomic access, there is a race condition and the
possibility that the cancel checker does not work well (some
increments can be performed two times, which could cause the condition
of the check to occasionally fail).
Found with SQLSmith.
Release note: None