-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backupccl: add hidden cluster setting to split exports by TS #68981
backupccl: add hidden cluster setting to split exports by TS #68981
Conversation
@dt Would this be sufficient for backup processor to use new split by timestamp functionality? I don't see any immediate tests within backupccl that test that. |
I think we want to make sure that that the processor does not flush a file that ends mid-key, on this line:
To do that, I think we'll need to add a parameter to |
Technically, once we read the full range, the ending timestamp will be empty as well as the resume span. Otherwise there's no guarantee that we will ever hit the last timestamp of any key until the end of the range. Will that be a problem? |
We can't report progress until we flush a file. Ideally we'd do so before range-end, but I'm okay with waiting until a range-end if we have to (e.g. the whole range could be revisions of a single key in the worst case no matter what). If we decide we care, we could flush mid-iteration of the returned SST into the cloud file writer. |
3820802
to
e3ff1f9
Compare
I see what you mean! I added the flag first, but it looks like completedSpans is doing exactly the same thing. If response had no ResumeSpan we reached end of range and we use it as criteria to set completedSpans to 1. |
So yeah, it is true that completedSpans will always be zero if returnedSST ends mid-key, so technically we could just add But today, we can still flush an in-progress cloud file and open a new one to obey our size limit on that line even when completedSpans is zero, i.e. we got a resumeSpan -- we just will not increase the number of "completed" spans when we update the progress, but we will record the span covered by that flushed file. We only really want to not flush the file, and record its coverage, if its current coverage ends mid-key. I think I'd just add a new bool to returnedSST to indicate the mid-key ending, and then check that in the flush conditional. (Indeed, as an optimization, I might even go back to |
236d8b7
to
7475388
Compare
What I realised adding test is that if we request breaking mid key we start generating timestamps even for the case where we don't want history. I think that should be tackled inside kv/pebble where we send resume span and avoid timestamp if we stop at new key. I'll make a separate change for that as it is 1 liner but needs 100 lines of test. |
This diff enables usage of split keys by timestamp during export in backup processor. This feature is enables by setting bulkio.backup.split_keys_on_timestamps to true. When enabled, generated backup SST files would be split on timestamps of a key when backing up history. Release note: None
7475388
to
5526fc8
Compare
bors r=dt |
Build succeeded: |
This diff enables usage of split keys by timestamp during export in
backup processor. This feature is enables by setting
bulkio.backup.split_keys_on_timestamps to true.
When enabled, generated backup SST files would be split on timestamps
of a key when backing up history.
Release note: None