storage: paginate export requests with a size limit #43356

ajwerner · 2019-12-19T16:50:37Z

Is your feature request related to a problem? Please describe.

In order to bound memory usage during backup and then ultimately during restore, we need to bound the size of exported SSTs. Currently in the implementation of export we build SSTs in memory and then write them to external storage (or in rare cases return them to clients).

The logic which creates these SSTs lives here:

cockroach/pkg/ccl/storageccl/export.go

Line 134 in 605d8ef

    
           data, summary, err := e.ExportToSst(args.Key, args.EndKey, args.StartTime, h.Timestamp, exportAllRevisions, io)

In today's implementation the entire range of [args.Start, args.End) will be written to a single SST. Today's ranges are generally bound to 64MB (in the happy case) which provides something of an upper bound for the size of SSTs created during for a backup. As we look towards moving to larger range sizes (#39717), putting the entire range into a single SST becomes problematic.

Once we do this we can be confident that SST files created for backups are not larger than today's files.
Describe the solution you'd like

The proposal in this issue is to:

Add a cluster setting to dictate the maximum size of an SST for export (there are other options to plumb in such a number but a cluster setting seems the easiest).
Modify the engine.ExportToSst interface to accept a size limit and return a resume key
Plumb the maximum size from the cluster setting into the ExportToSst call in evalExport and paginate the export across multiple files
- The ExportResponse is already combinable so there should be no need to change the API whatsoever.

Describe alternatives you've considered

The proposal here will ensure that larger ranges do not make the memory situation worse than it is today for BACKUP and RESTORE. There are approaches which could make the situation better. Ideally we'd stream the SST straight to the storage endpoint rather than buffering it in RAM completely.

Additional context

Once this is in place we'll additionally want to split up spans in the backup at file boundaries; that is a tiny change.

Another user of export requests is CDC which uses them for backfills. It too should make sure to not buffer too much data in ram. To achieve that goal it may need to receive the resume key in the response and provide a way to indicate that it does not want the entire response. That can be follow up work.

The text was updated successfully, but these errors were encountered:

ajwerner · 2019-12-19T16:51:21Z

cc @petermattis for triage. I don't think this is a ton of work but it is the primary known blocker for larger default range sizes.

petermattis · 2019-12-19T18:45:05Z

engine.ExportToSst already takes a start key. I think what would be needed is to add a size limit and return the resume key. That seems like a pretty trivial change on the storage/engine side. The bigger complexity is the changes in the bulk IO code.

Cc @dt in case there are any other complexities here.

ajwerner · 2019-12-19T18:47:06Z

I think what would be needed is to add a size limit and return the resume key.

That's exactly it.

Cc @dt in case there are any other complexities here. Cc @dt in case there are any other complexities here.

I've been coordinating with @dt. The changes on the bulk io side are pretty minimal. I'll be working with @pbardea to get them done once the engine interface changes comes in.

dt · 2019-12-19T19:25:40Z

I'm 👍 on this approach -- the changes on the bulk-io side are actually looking very minimal here, at least compared compared to trying to stream SSTs straight to cloud-storage during construction. Once the engine.ExportToSST function supports a size_limit and returning/populating a resumeKey, we can wrap that in a loop inside the ExportRequest so that it still exports the entirety of its [StartKey, EndKey) span -- just in multiple files instead of one. The rest of the pipeline already supports multiple files, since these requests are split-able/merge-able anyway (i.e. in case the range had split out from under the BACKUP).

There will be some changes in RESTORE as well so that it doesn't try to download all the files at once in one big ImportRequest, but that can happen separately as a follow-up.

ajwerner · 2020-01-13T15:21:10Z

@petermattis I can pick this up if the storage team is feeling squeezed this milestone.

petermattis · 2020-01-29T12:26:42Z

See #44440

@ajwerner Did you want to keep this issue open? Given that you've already done the storage specific part (thank you!), I'm going to punt this whole issue back over to you.

ajwerner · 2020-01-29T12:29:16Z

I’m going to type the second half of this issue to pick up the API change today. You are correct that I attached the wrong issue to the PR.

@dt

It would be a major change to allow pagination of exported SSTs within versions of an individual key. Given that the previous changes always include all versions of a key, there's a concern that keys with very large numbers of versions could create SSTs which could not be restored or which might OOM a server. We'd rather fail to create a backup than OOM or create an unusable backup. To deal with this, this commit adds a new maxSize parameter above which the ExportToSst call will fail. In a follow-up commit there will be a cluster setting to configure this value, for now it is set to unlimited. If customers are to hit this error when creating a backup they'll need to either set a lower GC TTL and run GC or use a point-in-time backup rather than a backup which contains all of the versions. The export tests were extended to ensure that this parameter behaves as expected and was stressed on the teeing engine to ensure that the behavior matches between pebble and rocksdb. Relates to cockroachdb#43356 CC @dt Release note: None

ajwerner added the A-storage Relating to our storage engine (Pebble) on-disk storage. label Dec 19, 2019

petermattis added the A-disaster-recovery label Dec 19, 2019

petermattis removed the A-storage Relating to our storage engine (Pebble) on-disk storage. label Jan 29, 2020

petermattis assigned ajwerner Jan 29, 2020

ajwerner mentioned this issue Jan 29, 2020

storageccl: add setting to control export file size and paginate export #44482

Merged

tbg mentioned this issue Jan 30, 2020

[dnm] kv: expose (and use) byte batch response size limit #44341

Closed

craig bot closed this as completed in b67d7d1 Feb 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: paginate export requests with a size limit #43356

storage: paginate export requests with a size limit #43356

ajwerner commented Dec 19, 2019

ajwerner commented Dec 19, 2019

petermattis commented Dec 19, 2019

ajwerner commented Dec 19, 2019

dt commented Dec 19, 2019 •

edited

Loading

ajwerner commented Jan 13, 2020

petermattis commented Jan 29, 2020

ajwerner commented Jan 29, 2020

storage: paginate export requests with a size limit #43356

storage: paginate export requests with a size limit #43356

Comments

ajwerner commented Dec 19, 2019

ajwerner commented Dec 19, 2019

petermattis commented Dec 19, 2019

ajwerner commented Dec 19, 2019

dt commented Dec 19, 2019 • edited Loading

ajwerner commented Jan 13, 2020

petermattis commented Jan 29, 2020

ajwerner commented Jan 29, 2020

dt commented Dec 19, 2019 •

edited

Loading