-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backupccl: SHOW BACKUP FILES IN (on collection) returns the full SST path #78251
backupccl: SHOW BACKUP FILES IN (on collection) returns the full SST path #78251
Conversation
Huh, that seems weird? "mybackups" is part of the path to the collection, not a path to a file within it, if I'm reading your example right? |
that's intended. This prevents confusion when
|
2¢: I'm lukewarm on the verbose full paths all the time just because the option might be set, especially since we changed the default location so fewer people will need to set the option. Maybe a compromise would be to do full paths only if there's an out-of-collection inc store location? |
what is the downside of verbose file output? security concerns? or the desire to keep exact file locations a black box to prevent naive users from moving stuff around and screwing things up? I only ask because it's slightly easier to implement. |
Mostly just simplicity / compartmentalization: if you There is a little security component, or more specifically a redaction one, if we start to show a real s3 URI, now we're on the hook for it not being too real of an s3 URI since as we know, even if we're just echo'ing back a key that was in the URI we were sent, someone will decide they think it should have been redacted and throw a fit. But then it is sorta confusing, if they aren' actually s3 URIs, i.e. if we strip the query params? should we keep some params? like what if your enc storage was different only by the AWS_ENDPOINT param? This just seems to get sorta messy quickly. path within collection on the other hand is well defined regardless of storage: it's just the path, from the collection root to the file. But maybe partitioned backups, where there are multiple roots, suggest even that's a fiction. or maybe there should be another column for partition and just put the loc kv in it. |
that all makes sense! In that case, I agree with you -- the path should start at the root of the collection, for all calls of
|
While you're here, want to try one with 3 partitions? I think we might want to add a column when there is >1 partition, which is the locality kv of the partition it came from? |
sure! I'll also add a column that can either be UPDATE: we realized that SHOW is not locality aware, so we will address the locality kv info in another pr. |
nit: should update the commit message and PR text with the updated output example. |
…path Previously, SHOW BACKUP FILES on a backup collection would return the SST file path relative to the manifest directory. Given that the incremental backup and full backup manifests are stored in different directories, the file paths that SHOW BACKUP FILES should reflect that. This patch changes the path `SHOW BACKUP FILES IN` returns to the backup path relative to the collection root. As an example: Previously, the command `SHOW BACKUP FILES LATEST IN s3://mybackups`, would return: data/001.SST // from a full backup data/002.SST // from an incremental backup Now, the command will return (assuming the full and inc live in same subdir): /2020/12/25-060000.00/data/001.SST /2020/12/25-060000.00/20201225/070000.00/data/002.SST Note: when a user passes the incremental_location parameter, the output result will be slightly misleading because the incrementals will have a different collection root. To aid in this confusion, I added a backup_type column equal to 'incremental' or 'full'. I plan to test this change in the PR for cockroachdb#77694 Release note: None
a9a1003
to
72c970d
Compare
@dt refactored the pr-- it's slightly more invasive. Let me know if you want to give it one more pass. |
bors r=dt |
Build succeeded: |
backupccl: SHOW BACKUP FILES IN (on collection) returns the full SST path
Previously, SHOW BACKUP FILES on a backup collection would return the SST
file path relative to the manifest directory. Given that the incremental backup
and full backup manifests are stored in different directories, the file paths
that SHOW BACKUP FILES should reflect that.
This patch changes the path
SHOW BACKUP FILES IN
returns to the backuppath relative to the collection root. As an example:
Previously, the command
SHOW BACKUP FILES LATEST IN s3://mybackups
, wouldreturn:
Now, the command will return (assuming the full and inc live in same subdir):
Note: when a user passes the incremental_location parameter, the output result
will be slightly misleading because the incrementals will have a different
collection root. To aid in this confusion, I added a backup_type column
equal to 'incremental' or 'full'.
I plan to test this change in the PR for #77694
Release note: None