Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: paginate export requests with a size limit #43356

Closed
ajwerner opened this issue Dec 19, 2019 · 7 comments · Fixed by #44482
Closed

storage: paginate export requests with a size limit #43356

ajwerner opened this issue Dec 19, 2019 · 7 comments · Fixed by #44482
Assignees

Comments

@ajwerner
Copy link
Contributor

Is your feature request related to a problem? Please describe.

In order to bound memory usage during backup and then ultimately during restore, we need to bound the size of exported SSTs. Currently in the implementation of export we build SSTs in memory and then write them to external storage (or in rare cases return them to clients).

The logic which creates these SSTs lives here:

data, summary, err := e.ExportToSst(args.Key, args.EndKey, args.StartTime, h.Timestamp, exportAllRevisions, io)

In today's implementation the entire range of [args.Start, args.End) will be written to a single SST. Today's ranges are generally bound to 64MB (in the happy case) which provides something of an upper bound for the size of SSTs created during for a backup. As we look towards moving to larger range sizes (#39717), putting the entire range into a single SST becomes problematic.

Once we do this we can be confident that SST files created for backups are not larger than today's files.
Describe the solution you'd like

The proposal in this issue is to:

  1. Add a cluster setting to dictate the maximum size of an SST for export (there are other options to plumb in such a number but a cluster setting seems the easiest).
  2. Modify the engine.ExportToSst interface to accept a size limit and return a resume key
  3. Plumb the maximum size from the cluster setting into the ExportToSst call in evalExport and paginate the export across multiple files
    • The ExportResponse is already combinable so there should be no need to change the API whatsoever.

Describe alternatives you've considered

The proposal here will ensure that larger ranges do not make the memory situation worse than it is today for BACKUP and RESTORE. There are approaches which could make the situation better. Ideally we'd stream the SST straight to the storage endpoint rather than buffering it in RAM completely.

Additional context

Once this is in place we'll additionally want to split up spans in the backup at file boundaries; that is a tiny change.

Another user of export requests is CDC which uses them for backfills. It too should make sure to not buffer too much data in ram. To achieve that goal it may need to receive the resume key in the response and provide a way to indicate that it does not want the entire response. That can be follow up work.

@ajwerner ajwerner added the A-storage Relating to our storage engine (Pebble) on-disk storage. label Dec 19, 2019
@ajwerner
Copy link
Contributor Author

cc @petermattis for triage. I don't think this is a ton of work but it is the primary known blocker for larger default range sizes.

@petermattis
Copy link
Collaborator

engine.ExportToSst already takes a start key. I think what would be needed is to add a size limit and return the resume key. That seems like a pretty trivial change on the storage/engine side. The bigger complexity is the changes in the bulk IO code.

Cc @dt in case there are any other complexities here.

@ajwerner
Copy link
Contributor Author

I think what would be needed is to add a size limit and return the resume key.

That's exactly it.

Cc @dt in case there are any other complexities here. Cc @dt in case there are any other complexities here.

I've been coordinating with @dt. The changes on the bulk io side are pretty minimal. I'll be working with @pbardea to get them done once the engine interface changes comes in.

@dt
Copy link
Member

dt commented Dec 19, 2019

I'm 👍 on this approach -- the changes on the bulk-io side are actually looking very minimal here, at least compared compared to trying to stream SSTs straight to cloud-storage during construction. Once the engine.ExportToSST function supports a size_limit and returning/populating a resumeKey, we can wrap that in a loop inside the ExportRequest so that it still exports the entirety of its [StartKey, EndKey) span -- just in multiple files instead of one. The rest of the pipeline already supports multiple files, since these requests are split-able/merge-able anyway (i.e. in case the range had split out from under the BACKUP).

There will be some changes in RESTORE as well so that it doesn't try to download all the files at once in one big ImportRequest, but that can happen separately as a follow-up.

@ajwerner
Copy link
Contributor Author

@petermattis I can pick this up if the storage team is feeling squeezed this milestone.

@petermattis
Copy link
Collaborator

See #44440

@ajwerner Did you want to keep this issue open? Given that you've already done the storage specific part (thank you!), I'm going to punt this whole issue back over to you.

@petermattis petermattis removed the A-storage Relating to our storage engine (Pebble) on-disk storage. label Jan 29, 2020
@ajwerner
Copy link
Contributor Author

I’m going to type the second half of this issue to pick up the API change today. You are correct that I attached the wrong issue to the PR.

ajwerner added a commit to ajwerner/cockroach that referenced this issue Jan 30, 2020
It would be a major change to allow pagination of exported SSTs within versions
of an individual key. Given that the previous changes always include all
versions of a key, there's a concern that keys with very large numbers of
versions could create SSTs which could not be restored or which might OOM a
server.

We'd rather fail to create a backup than OOM or create an unusable backup.
To deal with this, this commit adds a new maxSize parameter above which the
ExportToSst call will fail. In a follow-up commit there will be a cluster
setting to configure this value, for now it is set to unlimited.

If customers are to hit this error when creating a backup they'll need
to either set a lower GC TTL and run GC or use a point-in-time backup
rather than a backup which contains all of the versions.

The export tests were extended to ensure that this parameter behaves as
expected and was stressed on the teeing engine to ensure that the behavior
matches between pebble and rocksdb.

Relates to cockroachdb#43356

CC @dt

Release note: None
@craig craig bot closed this as completed in b67d7d1 Feb 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants