Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: SHOW BACKUP FILES IN (on collection) returns the full SST path #78251

Merged
merged 1 commit into from
Mar 24, 2022

Conversation

msbutler
Copy link
Collaborator

@msbutler msbutler commented Mar 22, 2022

backupccl: SHOW BACKUP FILES IN (on collection) returns the full SST path

Previously, SHOW BACKUP FILES on a backup collection would return the SST
file path relative to the manifest directory. Given that the incremental backup
and full backup manifests are stored in different directories, the file paths
that SHOW BACKUP FILES should reflect that.

This patch changes the path SHOW BACKUP FILES IN returns to the backup
path relative to the collection root. As an example:

Previously, the command SHOW BACKUP FILES LATEST IN s3://mybackups, would
return:

data/001.SST // from a full backup
data/002.SST  // from an incremental backup

Now, the command will return (assuming the full and inc live in same subdir):

/2020/12/25-060000.00/data/001.SST
/2020/12/25-060000.00/20201225/070000.00/data/002.SST

Note: when a user passes the incremental_location parameter, the output result
will be slightly misleading because the incrementals will have a different
collection root. To aid in this confusion, I added a backup_type column
equal to 'incremental' or 'full'.

I plan to test this change in the PR for #77694

Release note: None

@msbutler msbutler requested a review from dt March 22, 2022 14:56
@msbutler msbutler self-assigned this Mar 22, 2022
@msbutler msbutler requested a review from a team March 22, 2022 14:56
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@dt
Copy link
Member

dt commented Mar 22, 2022

mybackups/...

Huh, that seems weird? "mybackups" is part of the path to the collection, not a path to a file within it, if I'm reading your example right?

@msbutler
Copy link
Collaborator Author

that's intended. This prevents confusion when incremental_location is passed. E.g.:

[email protected]:26257/movr> BACKUP DATABASE movr INTO  'userfile:///foo';
        job_id       |  status   | fraction_completed | rows | index_entries | bytes
---------------------+-----------+--------------------+------+---------------+---------
  746775420519055361 | succeeded |                  1 | 2570 |          1015 | 466947
(1 row)


Time: 277ms total (execution 277ms / network 1ms)

[email protected]:26257/movr> CREATE TABLE movr.bo (i int);
CREATE TABLE


Time: 7ms total (execution 7ms / network 0ms)

[email protected]:26257/movr> INSERT INTO movr.bo VALUES (1);
INSERT 1


Time: 6ms total (execution 6ms / network 0ms)

[email protected]:26257/movr> BACKUP DATABASE movr INTO LATEST IN  'userfile:///foo' WITH incremental_location='userfile:///baz';
        job_id       |  status   | fraction_completed | rows | index_entries | bytes
---------------------+-----------+--------------------+------+---------------+--------
  746775606316728321 | succeeded |                  1 |    1 |             0 |    22
(1 row)


Time: 303ms total (execution 303ms / network 0ms)

[email protected]:26257/movr> SHOW BACKUP FILES LATEST IN 'userfile:///foo' WITH incremental_location='userfile:///baz';
                                        path                                        |     start_pretty      |      end_pretty       |     start_key     |      end_key      | size_bytes | rows
------------------------------------------------------------------------------------+-----------------------+-----------------------+-------------------+-------------------+------------+-------
  userfile:/foo/2022/03/22-165612.37/data/746775421023649793.sst                    | /Tenant/2/Table/106/1 | /Tenant/2/Table/106/2 | \xfe\x8a\xf2\x89  | \xfe\x8a\xf2\x8a  |       5011 |   50
  userfile:/foo/2022/03/22-165612.37/data/746775421023649793.sst                    | /Tenant/2/Table/107/1 | /Tenant/2/Table/107/3 | \xfe\x8a\xf3\x89  | \xfe\x8a\xf3\x8b  |       3242 |   15
  userfile:/foo/2022/03/22-165612.37/data/746775421078700033.sst                    | /Tenant/2/Table/108/1 | /Tenant/2/Table/108/4 | \xfe\x8a\xf4\x89  | \xfe\x8a\xf4\x8c  |     159387 |  500
  userfile:/foo/2022/03/22-165612.37/data/746775421023649793.sst                    | /Tenant/2/Table/109/1 | /Tenant/2/Table/109/2 | \xfe\x8a\xf5\x89  | \xfe\x8a\xf5\x8a  |      75918 | 1000
  userfile:/foo/2022/03/22-165612.37/data/746775421023649793.sst                    | /Tenant/2/Table/110/1 | /Tenant/2/Table/110/2 | \xfe\x8a\xf6n\x89 | \xfe\x8a\xf6n\x8a |     222973 | 1000
  userfile:/foo/2022/03/22-165612.37/data/746775421002088449.sst                    | /Tenant/2/Table/111/1 | /Tenant/2/Table/111/2 | \xfe\x8a\xf6o\x89 | \xfe\x8a\xf6o\x8a |        416 |    5
  userfile:/baz/2022/03/22-165612.37/20220322/165709.07/data/746775606983983105.sst | /Tenant/2/Table/114/1 | /Tenant/2/Table/114/2 | \xfe\x8a\xf6r\x89 | \xfe\x8a\xf6r\x8a |         22 |    1
(7 rows)


Time: 86ms total (execution 86ms / network 0ms)

[email protected]:26257/movr> 

@dt
Copy link
Member

dt commented Mar 22, 2022

2¢: I'm lukewarm on the verbose full paths all the time just because the option might be set, especially since we changed the default location so fewer people will need to set the option. Maybe a compromise would be to do full paths only if there's an out-of-collection inc store location?

@msbutler
Copy link
Collaborator Author

what is the downside of verbose file output? security concerns? or the desire to keep exact file locations a black box to prevent naive users from moving stuff around and screwing things up? I only ask because it's slightly easier to implement.

@dt
Copy link
Member

dt commented Mar 22, 2022

what is the downside of verbose file output? security concerns?

Mostly just simplicity / compartmentalization: if you SHOW a given backup in a collection, you get the same results, independent of whether that backup is hosted on gcs, s3, nodelocal -- regardless of where the the collection happens to be.

There is a little security component, or more specifically a redaction one, if we start to show a real s3 URI, now we're on the hook for it not being too real of an s3 URI since as we know, even if we're just echo'ing back a key that was in the URI we were sent, someone will decide they think it should have been redacted and throw a fit. But then it is sorta confusing, if they aren' actually s3 URIs, i.e. if we strip the query params? should we keep some params? like what if your enc storage was different only by the AWS_ENDPOINT param? This just seems to get sorta messy quickly.

path within collection on the other hand is well defined regardless of storage: it's just the path, from the collection root to the file.

But maybe partitioned backups, where there are multiple roots, suggest even that's a fiction. or maybe there should be another column for partition and just put the loc kv in it.

@msbutler
Copy link
Collaborator Author

msbutler commented Mar 22, 2022

that all makes sense! In that case, I agree with you -- the path should start at the root of the collection, for all calls of SHOW BACKUP FILES. In the below corner case, the result looks slightly misleading but the user can figure it out, given their explicitly passed incremental_location. To get the full path, the user can easily join the correct URI to the returned path.

[email protected]:26257/movr> SHOW BACKUP FILES LATEST IN 'userfile:///foo' WITH incremental_location='userfile:///baz';

                                        path                                        |     start_pretty      |      end_pretty       |     start_key     |      end_key      | size_bytes | rows
------------------------------------------------------------------------------------+-----------------------+-----------------------+-------------------+-------------------+------------+-------
  2022/03/22-165612.37/data/746775421023649793.sst                    | /Tenant/2/Table/106/1 | /Tenant/2/Table/106/2 | \xfe\x8a\xf2\x89  | \xfe\x8a\xf2\x8a  |       5011 |   50
  2022/03/22-165612.37/data/746775421023649793.sst                    | /Tenant/2/Table/107/1 | /Tenant/2/Table/107/3 | \xfe\x8a\xf3\x89  | \xfe\x8a\xf3\x8b  |       3242 |   15
  2022/03/22-165612.37/data/746775421078700033.sst                    | /Tenant/2/Table/108/1 | /Tenant/2/Table/108/4 | \xfe\x8a\xf4\x89  | \xfe\x8a\xf4\x8c  |     159387 |  500
  2022/03/22-165612.37/data/746775421023649793.sst                    | /Tenant/2/Table/109/1 | /Tenant/2/Table/109/2 | \xfe\x8a\xf5\x89  | \xfe\x8a\xf5\x8a  |      75918 | 1000
  2022/03/22-165612.37/data/746775421023649793.sst                    | /Tenant/2/Table/110/1 | /Tenant/2/Table/110/2 | \xfe\x8a\xf6n\x89 | \xfe\x8a\xf6n\x8a |     222973 | 1000
  2022/03/22-165612.37/data/746775421002088449.sst                    | /Tenant/2/Table/111/1 | /Tenant/2/Table/111/2 | \xfe\x8a\xf6o\x89 | \xfe\x8a\xf6o\x8a |        416 |    5
  2022/03/22-165612.37/20220322/165709.07/data/746775606983983105.sst | /Tenant/2/Table/114/1 | /Tenant/2/Table/114/2 | \xfe\x8a\xf6r\x89 | \xfe\x8a\xf6r\x8a |         22 |    1
(7 rows)

@dt
Copy link
Member

dt commented Mar 22, 2022

While you're here, want to try one with 3 partitions? I think we might want to add a column when there is >1 partition, which is the locality kv of the partition it came from?

@msbutler
Copy link
Collaborator Author

msbutler commented Mar 22, 2022

sure! I'll also add a column that can either be full or incremental, to aid with the above corner case.

UPDATE: we realized that SHOW is not locality aware, so we will address the locality kv info in another pr.

@dt
Copy link
Member

dt commented Mar 23, 2022

nit: should update the commit message and PR text with the updated output example.

…path

Previously, SHOW BACKUP FILES on a backup collection  would return the  SST
file path relative to the manifest directory. Given that the incremental backup
and full backup manifests are stored in different directories, the file paths
that SHOW BACKUP FILES should reflect that.

This patch changes the path `SHOW BACKUP FILES IN` returns to the backup
path relative to the collection root. As an example:

Previously, the command `SHOW BACKUP FILES LATEST IN s3://mybackups`, would
return:

data/001.SST // from a full backup
data/002.SST  // from an incremental backup

Now, the command will return (assuming the full and inc live in same subdir):

/2020/12/25-060000.00/data/001.SST
/2020/12/25-060000.00/20201225/070000.00/data/002.SST

Note: when a user passes the incremental_location parameter, the output result
will be slightly misleading because the incrementals will have a different
collection root. To aid in this confusion, I added a backup_type column
equal to 'incremental' or 'full'.

I plan to test this change in the PR for cockroachdb#77694

Release note: None
@msbutler msbutler force-pushed the butler-show-collection-files branch from a9a1003 to 72c970d Compare March 23, 2022 17:38
@msbutler
Copy link
Collaborator Author

@dt refactored the pr-- it's slightly more invasive. Let me know if you want to give it one more pass.

@msbutler
Copy link
Collaborator Author

bors r=dt

@craig
Copy link
Contributor

craig bot commented Mar 24, 2022

Build succeeded:

@craig craig bot merged commit d51cd4f into cockroachdb:master Mar 24, 2022
@msbutler msbutler deleted the butler-show-collection-files branch March 24, 2022 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants