-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: paginate export requests with a size limit #43356
Comments
cc @petermattis for triage. I don't think this is a ton of work but it is the primary known blocker for larger default range sizes. |
Cc @dt in case there are any other complexities here. |
That's exactly it.
I've been coordinating with @dt. The changes on the bulk io side are pretty minimal. I'll be working with @pbardea to get them done once the engine interface changes comes in. |
I'm 👍 on this approach -- the changes on the bulk-io side are actually looking very minimal here, at least compared compared to trying to stream SSTs straight to cloud-storage during construction. Once the There will be some changes in |
@petermattis I can pick this up if the storage team is feeling squeezed this milestone. |
I’m going to type the second half of this issue to pick up the API change today. You are correct that I attached the wrong issue to the PR. |
It would be a major change to allow pagination of exported SSTs within versions of an individual key. Given that the previous changes always include all versions of a key, there's a concern that keys with very large numbers of versions could create SSTs which could not be restored or which might OOM a server. We'd rather fail to create a backup than OOM or create an unusable backup. To deal with this, this commit adds a new maxSize parameter above which the ExportToSst call will fail. In a follow-up commit there will be a cluster setting to configure this value, for now it is set to unlimited. If customers are to hit this error when creating a backup they'll need to either set a lower GC TTL and run GC or use a point-in-time backup rather than a backup which contains all of the versions. The export tests were extended to ensure that this parameter behaves as expected and was stressed on the teeing engine to ensure that the behavior matches between pebble and rocksdb. Relates to cockroachdb#43356 CC @dt Release note: None
Is your feature request related to a problem? Please describe.
In order to bound memory usage during backup and then ultimately during restore, we need to bound the size of exported SSTs. Currently in the implementation of export we build SSTs in memory and then write them to external storage (or in rare cases return them to clients).
The logic which creates these SSTs lives here:
cockroach/pkg/ccl/storageccl/export.go
Line 134 in 605d8ef
In today's implementation the entire range of
[args.Start, args.End)
will be written to a single SST. Today's ranges are generally bound to 64MB (in the happy case) which provides something of an upper bound for the size of SSTs created during for a backup. As we look towards moving to larger range sizes (#39717), putting the entire range into a single SST becomes problematic.Once we do this we can be confident that SST files created for backups are not larger than today's files.
Describe the solution you'd like
The proposal in this issue is to:
engine.ExportToSst
interface to accept a size limit and return a resume keyExportToSst
call inevalExport
and paginate the export across multiple filesDescribe alternatives you've considered
The proposal here will ensure that larger ranges do not make the memory situation worse than it is today for BACKUP and RESTORE. There are approaches which could make the situation better. Ideally we'd stream the SST straight to the storage endpoint rather than buffering it in RAM completely.
Additional context
Once this is in place we'll additionally want to split up spans in the backup at file boundaries; that is a tiny change.
Another user of export requests is CDC which uses them for backfills. It too should make sure to not buffer too much data in ram. To achieve that goal it may need to receive the resume key in the response and provide a way to indicate that it does not want the entire response. That can be follow up work.
The text was updated successfully, but these errors were encountered: