-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
schema: allow transaction retry in schema changer backfill #107954
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 9 files at r1, 13 of 13 files at r2, 8 of 8 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ecwall, @rytaft, and @stevendanna)
pkg/kv/kvclient/kvcoord/testing_knobs.go
line 64 at r1 (raw file):
// loops to randomly inject retriable errors. EnableRandomTransactionRetryErrors bool }
Should this setting be a probability value instead?
pkg/kv/kvclient/kvcoord/txn_coord_sender.go
line 43 at r1 (raw file):
// // See: https://github.com/cockroachdb/cockroach/pull/73512. var DisableCommitSanityCheck = envutil.EnvOrDefaultBool("COCKROACH_DISABLE_COMMIT_SANITY_CHECK", false)
Should this be in this commit?
pkg/sql/rowexec/bulk_row_writer.go
line 146 at r3 (raw file):
Name: sp.tableDesc.GetName(), MinBufferSize: bufferSize, // We disallow shadowing here to ensure that we report errors when builds
Update the comment here, also I think this will prevent errors from surfacing I think in other cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @fqazi and @stevendanna)
pkg/kv/kvclient/kvcoord/txn_coord_sender.go
line 43 at r1 (raw file):
Previously, fqazi (Faizan Qazi) wrote…
Should this be in this commit?
This is a change from @stevendanna that has not been merged yet. My change extends some of the functionality.
pkg/sql/rowexec/bulk_row_writer.go
line 146 at r3 (raw file):
Previously, fqazi (Faizan Qazi) wrote…
Update the comment here, also I think this will prevent errors from surfacing I think in other cases?
After talking with the DR team (https://cockroachlabs.slack.com/archives/C03JCUUSCD6/p1691669801528989), I will have to rethink this part of the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @fqazi and @stevendanna)
pkg/kv/kvclient/kvcoord/testing_knobs.go
line 64 at r1 (raw file):
Previously, fqazi (Faizan Qazi) wrote…
Should this setting be a probability value instead?
Only some of the filtering is probability based so I left it to the filter impl to do that calculation.
pkg/sql/rowexec/bulk_row_writer.go
Outdated
@@ -133,6 +133,21 @@ func (sp *bulkRowWriter) wrapDupError(ctx context.Context, orig error) error { | |||
|
|||
func (sp *bulkRowWriter) ingestLoop(ctx context.Context, kvCh chan row.KVBatch) error { | |||
writeTS := sp.spec.Table.CreateAsOfTime | |||
|
|||
// Delete existing span before ingestion to prevent key collisions. | |||
if sp.spec.DeleteSpan { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this operation is distributed, don't we risk stepping on ourselves? Should this be done at a higher level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't think we should be doing this in the processor; I think we should be doing this in the coordinator before we setup the flow, since a) now every proc is deleting the same span, which will all the levels of the LSM with duplicate range keys pushing all writes into L0, and more importantly b) one proc might already be writing by the time another starts up, gets scheduled and runs its deletes, leading to data corruption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, I moved this up into backfillQueryIntoTable()
.
Occasionally, we see bugs caused by mismangement of state inside a function that will be retried. Namely, the functions `(*kv.DB).Txn` and `(*sql.InternalExecutor).Txn` take funcs that must be suitably idempotent. This is testing knob provides cheap way to help us hunt down such bugs. In the long run, it would be nice to turn this on by default, but we should probably make sure TestServer runs reliably with this turned on before flipping the default. I also considered retrying _every_ transaction once rather than relying on randomness. However, that would require adding more state in kv.Txn, which I wasn't sure I really wanted to do for testing purposes. Informs #106417 Epic: none Release note: None
…etryFilter Adds TransactionRetryFilter which allows the retry behavior to be customized. EnableRandomTransactionRetryErrors has been refactored to a TransactionRetryFilter implementation that randomly retries. Release note: None
Release note: None
…st.go Release note: None
Fixes #106696 This change adds a test to simulate transaction retries during the schema changer backfill. Some retries caused `ingested key collides with an existing one` errors during addSSTable because the schema change's source data changed. For example: `CREATE TABLE t AS SELECT nextval('seq');` - `nextval` is non-transactional so each time it is selected, the value changes. This change also allows the schema changer to specify that addSSTable is allowed to shadow (replace) keys' values if they are different to handle this case. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 18 of 18 files at r9, 13 of 13 files at r10, 1 of 1 files at r11, 1 of 1 files at r12, 8 of 8 files at r13, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt and @stevendanna)
bors r=fqazi |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from c9ed884 to blathers/backport-release-23.1-107954: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
This commit includes a collection of kv-level cleanups for the randomized retryable error injection functionality recently introduced in cockroachdb#107954. Notably, it: - removes the dependency from `kvcoord.TxnCoordSender` on `kv.Txn` by adjusting the parameter of `ClientTestingKnobs.TransactionRetryFilter`. - unexports `kv.Txn.DebugNameLocked` - deletes `COCKROACH_DISABLE_COMMIT_SANITY_CHECK` again, which was unused - removes rng sources in favor of the global rand, for simplicity - cleans up and adds some comments Epic: None Release note: None
This commit includes a collection of kv-level cleanups for the randomized retryable error injection functionality recently introduced in cockroachdb#107954. Notably, it: - removes the dependency from `kvcoord.TxnCoordSender` on `kv.Txn` by adjusting the parameter of `ClientTestingKnobs.TransactionRetryFilter`. - unexports `kv.Txn.DebugNameLocked` - deletes `COCKROACH_DISABLE_COMMIT_SANITY_CHECK` again, which was unused - removes rng sources in favor of the global rand, for simplicity - cleans up and adds some comments Epic: None Release note: None
108828: kv: clean up randomized retryable error injection r=nvanbenschoten a=nvanbenschoten This commit includes a collection of kv-level cleanups for the randomized retryable error injection functionality recently introduced in #107954. Notably, it: - removes the dependency from `kvcoord.TxnCoordSender` on `kv.Txn` by adjusting the parameter of `ClientTestingKnobs.TransactionRetryFilter`. - unexports `kv.Txn.DebugNameLocked` - deletes `COCKROACH_DISABLE_COMMIT_SANITY_CHECK` again, which was unused - removes rng sources in favor of the global rand, for simplicity - cleans up and adds some comments Epic: None Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]>
Fixes #106696
This change adds a test to simulate transaction retries during the schema
changer backfill.
Some retries caused
ingested key collides with an existing one
errorsduring addSSTable because the schema change's source data changed.
For example:
CREATE TABLE t AS SELECT nextval('seq');
-nextval
isnon-transactional so each time it is selected, the value changes.
This change also allows the schema changer to specify that addSSTable is
allowed to shadow (replace) keys' values if they are different to handle
this case.
Release note: None