Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: buffer/sort returned files before flushing to remote #66876

Merged
merged 2 commits into from
Jun 28, 2021

Conversation

dt
Copy link
Member

@dt dt commented Jun 25, 2021

BACKUP has multiple workers sending export requests concurrently so the
returned files may arrive out of order. Previously this would always
force the writer that flushed returned files out to the remote file to
close the file it had been writing an open a new one, as any one file
must be in-order. This adds a small queue of returned files to the sink
to which it adds files it is given to write to remote storage. Once the
queue has accumulated, it is sorted and partially drained to the remote
file. This should increase the odds that files are added to the remote
file in-order and thus do not require closing and re-opening additional
remote files.

With default worker (3) on a tpcc5k cluster running an incremental backup
this was observed to reduce the number of files a node wrote from ~50-70 to
<10.

Release note: none.

@dt dt requested review from pbardea and a team June 25, 2021 04:17
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@dt dt force-pushed the backup-sort branch 2 times, most recently from 3abf924 to 6a6c5ae Compare June 25, 2021 14:19
dt added 2 commits June 28, 2021 14:04
BACKUP has multiple workers sending export requests concurrently so the
returned files may arrive out of order. Previously this would always
force the writer that flushed returned files out to the remote file to
close the file it had been writing an open a new one, as any one file
must be in-order. This adds a small queue of returned files to the sink
to which it adds files it is given to write to remote storage. Once the
queue has accumulated, it is sorted and partially drained to the remote
file. This should increase the odds that files are added to the remote
file in-order and thus do not require closing and re-opening additional
remote files.

With default workers on a tpcc5k cluster running an incremental backup
this was observed to reduce the number of files a node wrote from ~50-70 to
<10.

Release note: none.
@dt
Copy link
Member Author

dt commented Jun 28, 2021

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Jun 28, 2021

Build succeeded:

@craig craig bot merged commit 9785a7b into cockroachdb:master Jun 28, 2021
@dt dt deleted the backup-sort branch June 28, 2021 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants