-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: fix a race when reading server
object
#108371
Conversation
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
522845a
to
1ab8ccf
Compare
Avoid returning `server` when it's not ready - we shouldn't read it when `started` is false (that's a race). Testing: this test (sometimes) fails without this pr and succeeds with it: `./dev test pkg/ccl/streamingccl/streamingest:streamingest_test -f TestDataDriven --race -- --runs_per_test=10` Epic: none Informs: cockroachdb#107930 Release note: None
1ab8ccf
to
c64f4b7
Compare
@@ -106,7 +106,10 @@ var _ serverState = (*serverStateUsingChannels)(nil) | |||
|
|||
// getServer is part of the serverState interface. | |||
func (s *serverStateUsingChannels) getServer() (orchestratedServer, bool) { | |||
return s.server, s.started.Get() | |||
if s.started.Get() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change seems incorrect. It was already the case that the callers of this method should not consider the 1st return value if the 2nd was false.
you observing a difference means there are some callers that were not using the API properly. That is the bug we need to find and fix.
I have searched all the current callers of this, and I cannot see any one of them using the API incorrectly.
Can you share the steps you used to assert this is making a difference?
An alternate approach would be to recognize there is a race condition between the Maybe there should be a mutex here to tie them together. |
let's continue here: #108401 |
I think the mutex is better than atomic here, so no need for this PR but.. quickly responding to your questions:
|
108401: server: avoid a race in the server controller r=lidorcarmel a=knz Co-authored with `@lidorcarmel` Prior to this patch, the Go race detector was complaining about racy concurrent accesses to `serverStateUsingChannels.server`, via `(*serverStateUsingChannels) getServer()` and `(*channelOrchestrator) startControlledServer()`. These racy accesses happened to be safe because the writes and reads to that field were correctly ordered around updates to an atomic bool (the `started` field). However, the race detector is not sufficiently sophisticated to detect this ordering and satisfy itself that the state transition can only happen once. In order to silence the race detector (with no change in correctness), this patch replaces the atomic bool by a mutex, whose access semantics are properly understood by the race detector. This change incidentally makes the code slightly easier to read and understand. Supersedes #108371. Fixes #107930. Epic: CRDB-28893 Co-authored-by: Raphael 'kena' Poss <[email protected]>
Avoid returning
server
when it's not ready - we shouldn't read it whenstarted
is false (that's a race).Testing: this test (sometimes) fails without this pr and succeeds with it:
./dev test pkg/ccl/streamingccl/streamingest:streamingest_test -f TestDataDriven --race -- --runs_per_test=10
Epic: none
Informs: #107930
Release note: None