-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backupccl: update RestoreDataProcessor to use ProcessorBase #53905
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't look at the test, but the processor changes LGTM.
nit: for the release note: this bug has been present only on 20.2 branch, right? I'd then explicitly call it out.
Reviewed 2 of 3 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy, @pbardea, and @yuzefovich)
pkg/ccl/backupccl/full_cluster_backup_restore_test.go, line 446 at r1 (raw file):
// Bugger the backup by removing the SST files. (Note this messes up all of // the backups, but there is only one at this point).
super nit: period should be before parenthesis :)
pkg/ccl/backupccl/restore_data_processor.go, line 43 at r1 (raw file):
progCh chan execinfrapb.RemoteProducerMetadata_BulkProcessorProgress lastErr error alloc *rowenc.DatumAlloc
nit: I think usually we use rowenc.DatumAlloc
(value, not pointer), then we don't need to explicitly initialize it.
pkg/ccl/backupccl/restore_data_processor.go, line 83 at r1 (raw file):
func (rd *restoreDataProcessor) Start(ctx context.Context) context.Context { ctx = rd.StartInternal(ctx, "restore-data")
Not sure how important this is, but I think we usually call input.Start(ctx)
(and ignore it's returned possibly updated ctx
) before calling StartInternal
.
pkg/ccl/backupccl/restore_data_processor.go, line 84 at r1 (raw file):
func (rd *restoreDataProcessor) Start(ctx context.Context) context.Context { ctx = rd.StartInternal(ctx, "restore-data") ctx, span := tracing.ChildSpan(ctx, "restoreDataProcessor")
I don't think we need this anymore, right? StartInternal
will create a span.
pkg/ccl/backupccl/restore_data_processor.go, line 92 at r1 (raw file):
rd.input.Start(ctx) var err error rd.kr, err = storageccl.MakeKeyRewriterFromRekeys(rd.spec.Rekeys)
Maybe this should happen in the constructor above? Then we can propagate the error.
pkg/ccl/backupccl/restore_data_processor.go, line 102 at r1 (raw file):
} // Run implements the execinfra.Processor interface.
nit: the comment needs an adjustment.
pkg/ccl/backupccl/restore_data_processor.go, line 147 at r1 (raw file):
} log.VEventf(context.TODO(), 1 /* level */, "importing span %v", entry.Span)
You can use rd.Ctx
here and below.
3c8b597
to
89dc041
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy and @yuzefovich)
pkg/ccl/backupccl/full_cluster_backup_restore_test.go, line 446 at r1 (raw file):
Previously, yuzefovich wrote…
super nit: period should be before parenthesis :)
Done.
pkg/ccl/backupccl/restore_data_processor.go, line 43 at r1 (raw file):
Previously, yuzefovich wrote…
nit: I think usually we use
rowenc.DatumAlloc
(value, not pointer), then we don't need to explicitly initialize it.
Done.
pkg/ccl/backupccl/restore_data_processor.go, line 83 at r1 (raw file):
Previously, yuzefovich wrote…
Not sure how important this is, but I think we usually call
input.Start(ctx)
(and ignore it's returned possibly updatedctx
) before callingStartInternal
.
Done.
pkg/ccl/backupccl/restore_data_processor.go, line 84 at r1 (raw file):
Previously, yuzefovich wrote…
I don't think we need this anymore, right?
StartInternal
will create a span.
Oh, nice. Done.
pkg/ccl/backupccl/restore_data_processor.go, line 92 at r1 (raw file):
Previously, yuzefovich wrote…
Maybe this should happen in the constructor above? Then we can propagate the error.
Done. Also moved the initialization of the progCh there since that's a better place for it.
pkg/ccl/backupccl/restore_data_processor.go, line 102 at r1 (raw file):
Previously, yuzefovich wrote…
nit: the comment needs an adjustment.
Done, and added comments to the RowSource methods.
pkg/ccl/backupccl/restore_data_processor.go, line 147 at r1 (raw file):
Previously, yuzefovich wrote…
You can use
rd.Ctx
here and below.
Ah, the wonders of processorBase. Thanks - done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy and @pbardea)
pkg/ccl/backupccl/restore_data_processor.go, line 76 at r2 (raw file):
} // We don't have to worry about this go routine leaking because next we loop over progCh
nit: the comment seems to be incorrect now.
I think we need to close this channel in close()
method.
Also, I don't think I see progCh
being used now, am I missing something?
89dc041
to
b289e9d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy and @yuzefovich)
pkg/ccl/backupccl/restore_data_processor.go, line 76 at r2 (raw file):
Previously, yuzefovich wrote…
nit: the comment seems to be incorrect now.
I think we need to close this channel in
close()
method.Also, I don't think I see
progCh
being used now, am I missing something?
Nope, that's right. Needed to be cleaned up, removed entirely.
b289e9d
to
cb128ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if Yevgeniy is happy with the tests.
Reviewed 1 of 1 files at r3.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @miretskiy)
bbf3398
to
ed43545
Compare
Previously, RestoreDataProcessor would not properly signal to consumers that it had encountered an error and was closing. This meant that it would not drain its inputs. This could result in the restore DistSQL flow becoming stuck, since the SplitAndScatter processor would be blocked on sending a row to the RestoreDataProcessor which would already be closed. Release justification: bug fix Release note (bug fix): A failure while restoring data, may have sometimes resulted in the restore job becoming stuck. This bug was only present on 20.2 alphas and betas.
ed43545
to
97f8b95
Compare
@miretskiy could I get a quick review for the tests here? Also cc @dt for a riskiness-signoff since this is a larger change bug fix. |
TFTRs! |
bors r=yuzefovich,dt |
Build succeeded: |
Previously, RestoreDataProcessor would not properly signal to consumers
that it had encountered an error and was closing. This meant that it
would not drain its inputs. This could result in the restore DistSQL
flow becoming stuck, since the SplitAndScatter processor would be
blocked on sending a row to the RestoreDataProcessor which would already
be closed.
Fixes #53900.
Release justification: bug fix
Release note (bug fix): A failure while restoring data, may have
sometimes resulted in the restore job becoming stuck.