-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: clear target table before backfilling CREATE…AS #43780
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @lucy-zhang, and @miretskiy)
pkg/sql/schema_changer.go, line 598 at r1 (raw file):
// and our contract is that the created table contains the results of, and // only of, one execution, we need to ensure each attempt to backfill // starts with empty target keyspace.
Nice comment
pkg/sql/schema_changer.go, line 606 at r1 (raw file):
}, }) sc.db.Run(ctx, b)
Should we check for error result here?
Do you think you could add a test that tests this behavior?
Since the backfill can be retried -- either at the schema change level or at the txn closure level -- it is possible that the backfill may start with some data already present in the target key space. This is can then manifest as incorrect results -- CREATE TABLE … AS is supposed to generate a new table with the results of the execution of the supplied query, not a mix of results of partial results of multiple executions of said query. Since souce queries may not have order or may be impure, there is no general way to 'resume' a prior backfill -- so the only correct behavior is to start fresh eash time until a backfill succeeds in ingesting exectly one execution of the source query. Release note (bug fix): Fix bug where interrupted CREATE TABLE ... AS execution could yield duplicate rows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @lucy-zhang, and @miretskiy)
pkg/sql/schema_changer.go, line 606 at r1 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
Should we check for error result here?
Do you think you could add a test that tests this behavior?
Done.
Yeah, I'm spinning up a roachprod cluster when I get in today to see if this fixes the repro in #43752. If that ends up working, i'll try to figure out how best to inject a restart (i still don't know if it was a txn vs schema-change restart).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @dt, @lucy-zhang, and @miretskiy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @dt)
pkg/sql/schema_changer.go, line 589 at r2 (raw file):
if table.Adding() && table.IsAs() { if err := sc.db.Txn(ctx, func(ctx context.Context, txn *client.Txn) error {
nit: Is this an extra newline?
pkg/sql/schema_changer.go, line 594 at r2 (raw file):
// This is a new table and should be empty, so this ClearRange should be a // noop. However, if there _is_ anything already there, we need to clear // it before backfilling. One such case is if this backfill was restarted:
Is there anything else that would cause the range to not be empty?
I finally managed to test this and it isn't enough -- it isn't actually retries that are hitting me but concurrent execution. more details over on #43752 (comment) |
this didn't end up being the issue -- not holding the lease was -- but it might still be a good idea? I donno, might be more of a question for schema folks. |
Since the backfill can be retried -- either at the schema change level or at the txn closure level -- it is possible that the backfill may start with some data already present in the target key space. This is can then manifest as incorrect results -- CREATE TABLE … AS is supposed to generate a new table with the results of the execution of the supplied query, not a mix of results of partial results of multiple executions of said query. Since source queries may not have order or may be impure, there is no general way to 'resume' a prior backfill --
so the only correct behavior is to start fresh eash time until a backfill succeeds in ingesting exectly one execution of the source query.
Release note (bug fix): Fix bug where interrupted CREATE TABLE ... AS execution could yield duplicate rows.