Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

importccl: speed up revert of IMPORT INTO empty table #52754

Merged
merged 1 commit into from
Aug 20, 2020

Conversation

dt
Copy link
Member

@dt dt commented Aug 13, 2020

When IMPORT INTO fails, it reverts the tables to their pre-IMPORT state.
Typically this requires running a somewhat expensive RevertRange operation
that finds the keys written by the IMPORT in amongst all the table data
and deletes just those keys. This is somewhat expensive -- we need to
iterate the keys in the target table and check them to see if they
need to be reverted.

Non-INTO style IMPORTs create the table into which they will IMPORT and
thus can just drop it wholesale on failure, instead of doing this expensive
revert. However INTO-style IMPORTs could use a similarly fast/cheap
wholesale delete if they knew the table was empty when the IMPORT was
started.

This change tracks which tables were empty when the IMPORT started and
then deletes, rather than reverts, the table span on failure.

Release note (performance improvement): Cleaning up after a failure during IMPORT INTO a table which was empty is now faster.

@dt dt requested review from pbardea and a team August 13, 2020 02:49
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@dt dt force-pushed the import-into-empty branch from beb8620 to 2ae164d Compare August 13, 2020 02:51
details.Walltime = p.ExecCfg().Clock.Now().WallTime

// Check if the tables being imported into are starting empty, in which
// case we can a cheap clear-range instead of revert-range to cleanup.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "... we can a cheap ..."

for i := range details.Tables {
if !details.Tables[i].IsNew {
tblSpan := sqlbase.NewImmutableTableDescriptor(*details.Tables[i].Desc).TableSpan(keys.TODOSQLCodec)
res, err := p.ExecCfg().DB.Scan(ctx, tblSpan.Key, tblSpan.EndKey, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 1 /* maxRows */

When IMPORT INTO fails, it reverts the tables to their pre-IMPORT state.
Typically this requires running a somewhat expensive RevertRange operation
that finds the keys written by the IMPORT in amongst all the table data
and deletes just those keys. This is somewhat expensive -- we need to
iterate the keys in the target table and check them to see if they
need to be reverted.

Non-INTO style IMPORTs create the table into which they will IMPORT and
thus can just drop it wholesale on failure, instead of doing this expensive
revert. However INTO-style IMPORTs could use a similarly fast/cheap
wholesale delete *if they knew the table was empty* when the IMPORT was
started.

This change tracks which tables were empty when the IMPORT started and
then deletes, rather than reverts, the table span on failure.

Release note (performance improvement): Cleaning up after a failure
during IMPORT INTO a table which was empty is now faster.
@dt dt force-pushed the import-into-empty branch from c4333ee to d8ac1f1 Compare August 20, 2020 16:07
@dt
Copy link
Member Author

dt commented Aug 20, 2020

bors r+

@craig
Copy link
Contributor

craig bot commented Aug 20, 2020

Build succeeded:

@craig craig bot merged commit 979127c into cockroachdb:master Aug 20, 2020
@dt dt deleted the import-into-empty branch October 26, 2020 03:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants