-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backupccl: fix key rewriter race in generative split and scatter processor #96313
Conversation
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
return nil, err | ||
} | ||
numSplitWorkers := int(spec.NumNodes) | ||
numSecondSplitWorkers := 2 * numSplitWorkers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a comment here about why we chose 2 * numSplitWorkers? I realize this is probably not your doing but a high level reason explaining the factor of 2 would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, I can't really find out why exactly the factor of 2 was chosen but I edited the TODO to perhaps tune it in the future.
@@ -342,8 +360,8 @@ func runGenerativeSplitAndScatter( | |||
|
|||
// TODO(pbardea): This tries to cover for a bad scatter by having 2 * the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move this TODO with the 2* and think about whether it is still applicable? If we want to punt the thinking for later can we add your or my name in the TODO 😋
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the TODO to where the factor of 2 is and put my name on it.
// scatterers contain the splitAndScatterers for the first group of split and | ||
// scatter workers. Each worker needs its own scatterer as one cannot be used | ||
// concurrently. | ||
scatterers []splitAndScatterer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about calling this one chunkSplitAndScatterer
and in the comment indicating that these are used to split and scatter importSpanChunks?
Then the one below can be called chunkEntrySplitAndScatterer
or entrySplitAndScatterer
? It doesn't perform scatters so maybe we mention that in the comment and describe what it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, renamed and added the comments.
flowCtx: flowCtx, | ||
spec: spec, | ||
output: output, | ||
scatterers: scatterers[:numSplitWorkers], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sanity checking that this creates a copy of the scatterers slice and doesn't just copy the pointer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these do just copy the pointer but I changed the code to use two slices just so that we are not confused in the future.
doneScatterCh chan<- entryNode, | ||
) error { | ||
log.Infof(ctx, "Running generative split and scatter with %d total spans, %d chunk size, %d nodes", | ||
spec.NumEntries, spec.ChunkSize, spec.NumNodes) | ||
g := ctxgroup.WithContext(ctx) | ||
|
||
splitWorkers := int(spec.NumNodes) | ||
splitWorkers := len(scatterers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe we call this splitAndScatterWorkers since they do both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -291,6 +308,7 @@ func runGenerativeSplitAndScatter( | |||
importSpanChunksCh := make(chan scatteredChunk, splitWorkers*2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should've done this in your previous PR sorry, but do you think you could add a line or two above each goroutine in this method describing what each of them does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, added comments.
cb7fb4a
to
21dde30
Compare
you need a rebase to get past the failing DD tests, I ran into the same thing this morning. Since you're going to push anyways can I also bug you to change the |
…essor The generative split and scatter processor is currently causing tests to fail under race because there are many goroutines that are operating with the same splitAndScatterer, which cannot be used concurrently as the underlying key rewriter cannot be used concurrently. Modify the processor so that every worker that uses the splitAndScatterer now uses its own instance. Fixes: cockroachdb#95808 Release note: None
21dde30
to
100b2b9
Compare
bors r+ |
Build succeeded: |
The generative split and scatter processor is currently causing tests to fail under race because there are many goroutines that are operating with the same splitAndScatterer, which cannot be used concurrently as the underlying key rewriter cannot be used concurrently. Modify the processor so that every worker that uses the splitAndScatterer now uses its own instance.
Fixes: #95808
Release note: None