Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cliccl/debug_backup.go: add --with-revisions flag to allow debugging revisions of data #64285

Merged
merged 1 commit into from
May 6, 2021

Conversation

Elliebababa
Copy link
Contributor

@Elliebababa Elliebababa commented Apr 27, 2021

This patch adds --with-revisions on debug export to allow users
to export revisions of table data. If --with-revisions is specified,
revisions of data are returned to users, with an extra column
displaying the revision time of that record.

Note:
Schema changes make it hard to maintain correct display of columns at different timestamps in one shot,
So this feature:

  • do not display schema changes across revisions.
  • display data changes since last schema change only.

That is, --with-revisons only display part of revisions history instead of all revision history.
We migitate this by giving message saying DETECTED SCHEMA CHANGE AT t1, ONLY SHOWING UPDATES IN RANGE [t1,t2].
Then users are able to take exploratory steps to trace back to revision history of data by repeatedly running
cockroach debug backup export <backup_url> --table=<table> --with-revisions --up-to=t1-1


Example usage:

$cockroach debug backup export <backup_url> --table=<table> --with-revisions 
Data changes happened between t1 and t2 :
DETECTED SCHEMA CHANGE AT t1, ONLY SHOWING UPDATES IN RANGE [t1,t2]
1,null,2021-04-22 18:12:47.685284 +0000 UTC
2,'3rd update',2021-04-22 18:13:41.27284 +0000 UTC
2,'2nd update',2021-04-22 18:13:34.718665 +0000 UTC
2,'1st update',2021-04-22 18:13:28.966741 +0000 UTC
2,null,2021-04-22 18:12:50.053996 +0000 UTC
$cockroach debug backup export <backup_url> --table=<table> --with-revisions --up-to=’2021-04-22 18:13:40’
DETECTED SCHEMA CHANGE AT t1, ONLY SHOWING UPDATES IN RANGE [t1,2021-04-22 18:13:40]
1,null,2021-04-22 18:12:47.685284 +0000 UTC
2,'2nd update',2021-04-22 18:13:34.718665 +0000 UTC
2,'1st update',2021-04-22 18:13:28.966741 +0000 UTC
2,null,2021-04-22 18:12:50.053996 +0000 UTC

t1 is the last schema changes before the time specified by --up-to
t2 is the endtime of backup


Release note (cli change):
This is an experimenal/beta feature of backup debug tool to
allow users to export revisions of data from backup. We add
--with-revisions on debug export to allow users to export
revisions of table data. If --with-revisions is specified, revisions
of data are returned to users, with an extra column displaying the
revision time of that record.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@Elliebababa Elliebababa force-pushed the debug_revisions branch 2 times, most recently from 825edcf to b70078d Compare April 27, 2021 19:41
@Elliebababa Elliebababa requested a review from pbardea April 27, 2021 21:12
@Elliebababa Elliebababa force-pushed the debug_revisions branch 4 times, most recently from 5b81e71 to 78520e5 Compare April 28, 2021 00:36
@Elliebababa Elliebababa marked this pull request as ready for review April 28, 2021 21:51
@Elliebababa Elliebababa requested a review from a team April 28, 2021 21:51
@Elliebababa Elliebababa requested a review from a team as a code owner April 28, 2021 21:51
Copy link
Contributor

@pbardea pbardea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Elliebababa)


pkg/ccl/backupccl/backup_planning.go, line 61 at r2 (raw file):

	// BackupOptRevisionHistory is the option name for backup with revision history
	// exported to cliccl for backup revision inspection
	BackupOptRevisionHistory    = "revision_history"

Let's avoid exporting this and just define another "revision_history" flag in the CLI package. Although there's a similarity in what they do, they're not logically the same flag so I don't think we want to tie the 2 flags together (even though they have the same name).


pkg/ccl/backupccl/targets.go, line 547 at r2 (raw file):

				revDesc, _ := catalog.AsTableDescriptor(d)
				if !reflect.DeepEqual(revDesc.PublicColumns(), tbDesc.PublicColumns()) {
					goto EXIT

I generally find that a goto to exit a nested loop like this a bit of a code-smell, that begs the question if this piece of code is better suited as a standalone function.

In this case, these 2 loops are finding the appropriate lastSchemaChangeTime, and I think in this case this would be more readable in its own helper function with an early return rather than a goto.


pkg/sql/row/kv_fetcher.go, line 206 at r2 (raw file):

	res := make([]roachpb.KeyValue, 0)

	fn := func(key roachpb.Key, value []byte) roachpb.KeyValue {

copyKV might be a more descriptive name here.


pkg/sql/row/kv_fetcher.go, line 213 at r2 (raw file):

		return roachpb.KeyValue{
			Key:   keyCopy,
			Value: roachpb.Value{RawBytes: valueCopy, Timestamp: f.iter.UnsafeKey().Timestamp},

Why does this use the timestamp from the iterator? A comment would be helpful.

Copy link
Contributor Author

@Elliebababa Elliebababa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @pbardea)


pkg/ccl/backupccl/backup_planning.go, line 61 at r2 (raw file):

Previously, pbardea (Paul Bardea) wrote…

Let's avoid exporting this and just define another "revision_history" flag in the CLI package. Although there's a similarity in what they do, they're not logically the same flag so I don't think we want to tie the 2 flags together (even though they have the same name).

Done.


pkg/ccl/backupccl/targets.go, line 547 at r2 (raw file):

Previously, pbardea (Paul Bardea) wrote…

I generally find that a goto to exit a nested loop like this a bit of a code-smell, that begs the question if this piece of code is better suited as a standalone function.

In this case, these 2 loops are finding the appropriate lastSchemaChangeTime, and I think in this case this would be more readable in its own helper function with an early return rather than a goto.

Done.


pkg/sql/row/kv_fetcher.go, line 206 at r2 (raw file):

Previously, pbardea (Paul Bardea) wrote…

copyKV might be a more descriptive name here.

Done.


pkg/sql/row/kv_fetcher.go, line 213 at r2 (raw file):

copyKV
ooops.. should pass the timestamp of the mvccKey to be correct.
Sorry for the mistake here. Thanks for catching!

@Elliebababa Elliebababa force-pushed the debug_revisions branch 2 times, most recently from b548cea to 8c57bd5 Compare May 4, 2021 16:34
@Elliebababa Elliebababa requested a review from pbardea May 4, 2021 17:59
Copy link
Contributor

@pbardea pbardea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost good to go, just a few minor comments

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Elliebababa)


pkg/ccl/cliccl/debug_backup.go, line 502 at r4 (raw file):

		startT := entry.LastSchemaChangeTime.GoTime().UTC()
		endT := endTime.GoTime().UTC()
		fmt.Printf("DETECTED SCHEMA CHANGE AT %s, ONLY SHOWING UPDATES IN RANGE [%s, %s]\n", startT, startT, endT)

Could we output this to stderr?


pkg/ccl/cliccl/debug_backup_test.go, line 730 at r4 (raw file):

	sqlDB.Exec(t, fmt.Sprintf(`BACKUP TO $1 AS OF SYSTEM TIME '%s' WITH revision_history`, ts1.AsOfSystemTime()), backupPathWithRev)

	sqlDB.Exec(t, `ALTER TABLE fooTable ADD COLUMN active BOOL`)

Let's also do a schema change that doesn't change the columns (e.g. an index addition) and check that we can read over it.

Copy link
Contributor Author

@Elliebababa Elliebababa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @pbardea)


pkg/ccl/cliccl/debug_backup.go, line 502 at r4 (raw file):

Previously, pbardea (Paul Bardea) wrote…

Could we output this to stderr?

Done.


pkg/ccl/cliccl/debug_backup_test.go, line 730 at r4 (raw file):

Previously, pbardea (Paul Bardea) wrote…

Let's also do a schema change that doesn't change the columns (e.g. an index addition) and check that we can read over it.

Done.

@Elliebababa Elliebababa requested a review from pbardea May 5, 2021 16:14
…g revisions of data

This patch adds `--with-revisions` on `debug export` to allow users
to export revisions of table data. If `--with-revisions` is specified,
revisions of data are returned to users, with an extra column
displaying the revision time of that record.

Release note (cli change): This is an experimenal/beta feature of backup debug
tool to allow users to export revisions of data from backup. We add `--with-revisions`
on `debug export` to allow users to export revisions of table data.
If `--with-revisions` is specified, revisions of data are returned to users,
with an extra column displaying the revision time of that record.
@Elliebababa
Copy link
Contributor Author

TFTR!

bors r=pbardea

@craig
Copy link
Contributor

craig bot commented May 6, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants