Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backup: MVCC Bulk Operations Are Missing from Backups #71155

Closed
erikgrinaker opened this issue Oct 5, 2021 · 1 comment · Fixed by #84875
Closed

backup: MVCC Bulk Operations Are Missing from Backups #71155

erikgrinaker opened this issue Oct 5, 2021 · 1 comment · Fixed by #84875

Comments

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Oct 5, 2021

In #70412, we will implement MVCC range deletion tombstones. Backup and restore must be extended to handle these correctly, including testing Revision ALL exports.

Epic CRDB-11252

Jira issue: CRDB-10419

@erikgrinaker erikgrinaker added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-disaster-recovery T-disaster-recovery labels Oct 5, 2021
@erikgrinaker erikgrinaker changed the title backupccl: handle MVCC range tombstones backupccl: handle MVCC range tombstones in backup/restore Oct 5, 2021
@erikgrinaker
Copy link
Contributor Author

Depends on #71398. May be a noop if Export ends up synthesizing point deletes instead of emitting range deletes.

@exalate-issue-sync exalate-issue-sync bot changed the title backupccl: handle MVCC range tombstones in backup/restore MVCC Bulk Operations Are Missing from Backups Oct 11, 2021
@exalate-issue-sync exalate-issue-sync bot removed C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-disaster-recovery labels Oct 11, 2021
@dt dt changed the title MVCC Bulk Operations Are Missing from Backups backup: MVCC Bulk Operations Are Missing from Backups Oct 11, 2021
@exalate-issue-sync exalate-issue-sync bot changed the title backup: MVCC Bulk Operations Are Missing from Backups Non-MVCC: MVCC Bulk Operations Are Missing from Backups Nov 19, 2021
@exalate-issue-sync exalate-issue-sync bot changed the title Non-MVCC: MVCC Bulk Operations Are Missing from Backups backup: MVCC Bulk Operations Are Missing from Backups Nov 19, 2021
msbutler added a commit to msbutler/cockroach that referenced this issue Mar 2, 2022
Previously while ingesting SSTs in the restore processor,
`processRestoreSpanEntry` would manually skip delete tombstones and keys with
timestamps later than the AS OF SYSTEM TIME timestamp provided in the RESTORE
command.

This PR introduces the readAsOfMVCCIterator which wraps a SimpleMVCCIterator
such that all keys returned are not tombstones and will have timestamps less
than the AOST timestamp. This PR also hooks this iterator into the restore
processor. Some methods in this iterator may get updated as cockroachdb#71155 gets
addressed.

Fixes cockroachdb#77276

Release justification: low risk, high benefit changes to existing functionality

Release note: None
msbutler added a commit to msbutler/cockroach that referenced this issue Mar 2, 2022
Previously while ingesting SSTs in the restore processor,
`processRestoreSpanEntry` would manually skip delete tombstones and keys with
timestamps later than the AS OF SYSTEM TIME timestamp provided in the RESTORE
command.

This PR introduces the readAsOfMVCCIterator which wraps a SimpleMVCCIterator
such that all keys returned are not tombstones and will have timestamps less
than the AOST timestamp. This PR also hooks this iterator into the restore
processor. Some methods in this iterator may get updated as cockroachdb#71155 gets
addressed.

Fixes cockroachdb#77276

Release justification: low risk, high benefit changes to existing functionality

Release note: None
msbutler added a commit to msbutler/cockroach that referenced this issue Apr 29, 2022
Previously while ingesting SSTs in the restore processor,
`processRestoreSpanEntry` would manually skip delete tombstones and keys with
timestamps later than the AS OF SYSTEM TIME timestamp provided in the RESTORE
command.

This PR introduces the readAsOfMVCCIterator which wraps a SimpleMVCCIterator
such that all keys returned are not tombstones and will have timestamps less
than the AOST timestamp. This PR also hooks this iterator into the restore
processor. Some methods in this iterator may get updated as cockroachdb#71155 gets
addressed.

Fixes cockroachdb#77276

Release justification: low risk, high benefit changes to existing functionality

Release note: None
msbutler added a commit to msbutler/cockroach that referenced this issue May 24, 2022
Previously while ingesting SSTs in the restore processor,
`processRestoreSpanEntry` would manually skip delete tombstones and keys with
timestamps later than the AS OF SYSTEM TIME timestamp provided in the RESTORE
command.

This PR introduces the readAsOfMVCCIterator which wraps a SimpleMVCCIterator
such that all keys returned are not tombstones and will have timestamps less
than the AOST timestamp. This PR also hooks this iterator into the restore
processor. Some methods in this iterator may get updated as cockroachdb#71155 gets
addressed.

Fixes cockroachdb#77276

Release justification: low risk, high benefit changes to existing functionality

Release note: None
@exalate-issue-sync exalate-issue-sync bot assigned dt and unassigned erikgrinaker and msbutler Jul 6, 2022
@msbutler msbutler self-assigned this Jul 7, 2022
msbutler added a commit to msbutler/cockroach that referenced this issue Jul 15, 2022
Previously, the readAsOfIterator used in RESTORE could not handle range keys.
This PR implements the new SimpleMVCCIterator methods that handle range keys.
Further, this patch ensures the  readAsOfIterator skips over point keys
shadowed by range keys  at or below the caller's specified asOf timestamp.

Next, Backup needs to be tought about RangeKeys.

Informs cockroachdb#71155

Release note: none
msbutler added a commit to msbutler/cockroach that referenced this issue Jul 15, 2022
Previously, the readAsOfIterator used in RESTORE could not handle range keys.
This PR implements the new SimpleMVCCIterator methods that handle range keys.
Further, this patch ensures the  readAsOfIterator skips over point keys
shadowed by range keys  at or below the caller's specified asOf timestamp.

Next, Backup needs to be tought about RangeKeys.

Informs cockroachdb#71155

Release note: none
msbutler added a commit to msbutler/cockroach that referenced this issue Jul 19, 2022
This PR refactors all call sites of ExternalSSTReader(), to support using the
new PebbleIterator, which has baked in range key support. Most notably, this
PR replaces the multiIterator used in the restore data processor with the
PebbleSSTIterator.

This patch is apart of a larger effort to teach backup and restore about MVCC
bulk operations. Next, the readAsOfIterator will need to learn how to
deal with range keys.

Informs cockroachdb#71155

This PR addresses a bug created in cockroachdb#83984: loop variables in
ExternalSSTReader were captured by reference, leading to roachtest failures
(cockroachdb#84240, cockroachdb#84162).

Informs #71155i

Fixes: cockroachdb#84240, cockroachdb#84162, cockroachdb#84181

Release note: none
msbutler added a commit to msbutler/cockroach that referenced this issue Jul 19, 2022
This PR refactors all call sites of ExternalSSTReader(), to support using the
new PebbleIterator, which has baked in range key support. Most notably, this
PR replaces the multiIterator used in the restore data processor with the
PebbleSSTIterator.

This patch is apart of a larger effort to teach backup and restore about MVCC
bulk operations. Next, the readAsOfIterator will need to learn how to
deal with range keys.

Informs cockroachdb#71155

This PR addresses a bug created in cockroachdb#83984: loop variables in
ExternalSSTReader were captured by reference, leading to roachtest failures
(cockroachdb#84240, cockroachdb#84162).

Informs #71155i

Fixes: cockroachdb#84240, cockroachdb#84162, cockroachdb#84181

Release note: none
msbutler added a commit to msbutler/cockroach that referenced this issue Jul 19, 2022
Previously, the readAsOfIterator used in RESTORE could not handle range keys.
This PR implements the new SimpleMVCCIterator methods that handle range keys.
Further, this patch ensures the  readAsOfIterator skips over point keys
shadowed by range keys  at or below the caller's specified asOf timestamp.

Next, Backup needs to be tought about RangeKeys.

Informs cockroachdb#71155

Release note: none
msbutler added a commit to msbutler/cockroach that referenced this issue Jul 19, 2022
This PR refactors all call sites of ExternalSSTReader(), to support using the
new PebbleIterator, which has baked in range key support. Most notably, this
PR replaces the multiIterator used in the restore data processor with the
PebbleSSTIterator.

This patch is apart of a larger effort to teach backup and restore about MVCC
bulk operations. Next, the readAsOfIterator will need to learn how to
deal with range keys.

Informs cockroachdb#71155

This PR addresses a bug created in cockroachdb#83984: loop variables in
ExternalSSTReader were captured by reference, leading to roachtest failures
(cockroachdb#84240, cockroachdb#84162).

Informs #71155i

Fixes: cockroachdb#84240, cockroachdb#84162, cockroachdb#84181

Release note: none
msbutler added a commit to msbutler/cockroach that referenced this issue Jul 20, 2022
Previously, the readAsOfIterator used in RESTORE could not handle range keys.
This PR implements the new SimpleMVCCIterator methods that handle range keys.
Further, this patch ensures the  readAsOfIterator skips over point keys
shadowed by range keys  at or below the caller's specified asOf timestamp.

Next, Backup needs to be tought about RangeKeys.

Informs cockroachdb#71155

Release note: none
msbutler added a commit to msbutler/cockroach that referenced this issue Jul 20, 2022
This PR refactors all call sites of ExternalSSTReader(), to support using the
new PebbleIterator, which has baked in range key support. Most notably, this
PR replaces the multiIterator used in the restore data processor with the
PebbleSSTIterator.

This patch is apart of a larger effort to teach backup and restore about MVCC
bulk operations. Next, the readAsOfIterator will need to learn how to
deal with range keys.

Informs cockroachdb#71155

This PR addresses a bug created in cockroachdb#83984: loop variables in
ExternalSSTReader were captured by reference, leading to roachtest failures
(cockroachdb#84240, cockroachdb#84162).

Informs #71155i

Fixes: cockroachdb#84240, cockroachdb#84162, cockroachdb#84181

Release note: none
craig bot pushed a commit that referenced this issue Jul 21, 2022
84452: sql: add troubleshooting mode session variable r=THardy98 a=THardy98

Resolves: #84429

This change introduces a `troubleshooting_mode_enabled` session
variable. When enabled, this session variable is intended to be used as
a way to avoid performing additional work on queries, particularly when
the cluster is experiencing issues/unavailability/failure. By default,
this session variable is disabled.  Currently, this session variable is
only used to avoid collecting/emitting telemetry data.

Release note (sql change): Introduce new `troubleshooting_mode_enabled`
session variable, to avoid doing additional work on queries when
possible (i.e. collection telemetry data). By default, this session
variable is disabled.

84666: storageccl: use the new PebbleIterator in ExternalSSTReader r=erikgrinaker a=msbutler

This PR refactors all call sites of ExternalSSTReader(), to support using the
new PebbleIterator, which has baked in range key support. Most notably, this
PR replaces the multiIterator used in the restore data processor with the
PebbleSSTIterator.

This patch is apart of a larger effort to teach backup and restore about MVCC
bulk operations. Next, the readAsOfIterator will need to learn how to
deal with range keys.

Informs #71155

This PR addresses a bug created in #83984: loop variables in
ExternalSSTReader were captured by reference, leading to roachtest failures
(#84240, #84162).

Informs #71155

Fixes: #84240, #84162, #84181

Release note: none

Co-authored-by: Thomas Hardy <[email protected]>
Co-authored-by: Michael Butler <[email protected]>
craig bot pushed a commit that referenced this issue Jul 21, 2022
84208: Update activerecord.go v7 support r=rafiss a=gemma-shay

cockroachdb/docs#14485
[DOC-4878](https://cockroachlabs.atlassian.net/browse/DOC-4878)
cc `@kernfeld-cockroach` 

84214: storage: handle range keys in readAsOfIterator r=erikgrinaker a=msbutler

Previously, the readAsOfIterator used in RESTORE could not handle range keys.
This PR implements the new SimpleMVCCIterator methods that handle range keys.
Further, this patch ensures the  readAsOfIterator skips over point keys
shadowed by range keys  at or below the caller's specified asOf timestamp.

Next, Backup needs to be tought about RangeKeys.

Informs #71155

Release note: none

84323: tracing: delete old field r=andreimatei a=andreimatei

The RecordedSpan.RedactableLogs. This field was unused since 22.1.

Release note: None

84742: sql/distsql: delete unused lazyInternalExecutor r=andreimatei a=andreimatei

Release note: None

84784: sql/sqlinstance/instancestorage: use CommitInBatch to optimize round-… r=ajwerner a=ajwerner

…trips

By using CommitInBatch we can hit the 1PC optimization and avoid a round-trip
to the leaseholder of the range in question.

Release note: None

84849: opt: clarify plan gist comment r=mgartner a=mgartner

Release note: None

84851: sql: fix recent leak of a context r=yuzefovich a=yuzefovich

Fixes: #84801.

Release note: None

84853: ccl/streamingccl/streamingest: skip TestTenantStreamingPauseResumeIngestion r=yuzefovich a=yuzefovich

Refs: #84414

Reason: flaky test

Generated by bin/skip-test.

Release justification: non-production code changes

Release note: None

84862: logcrash: fix test on arm r=dt a=dt

Release note: none.

Co-authored-by: Gemma Shay <[email protected]>
Co-authored-by: Michael Butler <[email protected]>
Co-authored-by: Andrei Matei <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: David Taylor <[email protected]>
msbutler added a commit to msbutler/cockroach that referenced this issue Jul 21, 2022
Previously BACKUP would not back up range tombstones. With this patch, BACKUPs
with revision_history will backup range tombstones. Non-revision history backups
are not affected by this diff because MVCCExportToSST filters all tombstones
out of the backup already.

Specifically, this patch replaces the iterators used in the backup_processor
with the pebbleIterator, which has baked in range key support.

Fixes cockroachdb#71155

Release note: none
msbutler added a commit to msbutler/cockroach that referenced this issue Jul 25, 2022
Previously BACKUP would not back up range tombstones. With this patch, BACKUPs
with revision_history will backup range tombstones. Non-revision history backups
are not affected by this diff because MVCCExportToSST filters all tombstones
out of the backup already.

Specifically, this patch replaces the iterators used in the backup_processor
with the pebbleIterator, which has baked in range key support.

At this point a backup with range keys is restorable, thanks to cockroachdb#84214. Note
that the restore codebase still touches iterators that are not range key aware.
This is not a problem because restored data does not have range keys, nor do
the empty ranges restore dumps data into. These iterators (e.g. in SSTBatcher
and in CheckSSTConflicts) will be updated when cockroachdb#70428 gets fixed.

Fixes cockroachdb#71155

Release note: none
craig bot pushed a commit that referenced this issue Jul 27, 2022
84875: backupccl: handle range keys in BACKUP r=erikgrinaker a=msbutler

Previously BACKUP would not back up range tombstones. With this patch, BACKUPs
with revision_history will backup range tombstones. Non-revision history backups
are not affected by this diff because MVCCExportToSST filters all tombstones
out of the backup already.

Specifically, this patch replaces the iterators used in the backup_processor
with the pebbleIterator, which has baked in range key support. This refactor
introduces a 5% regression in backup runtime, even when the backup has no range
keys, though #83051 hopes to address this gap. See details below on the
benchmark experiment.

At this point a backup with range keys is restorable, thanks to #84214. Note
that the restore codebase still touches iterators that are not range key aware.
This is not a problem because restored data does not have range keys, nor do
the empty ranges restore dumps data into. These iterators (e.g. in SSTBatcher
and in CheckSSTConflicts) will be updated when #70428 gets fixed.

Fixes #71155

Release note: none

To benchmark this diff, the following commands were used on the following sha
a5ccdc3, with and without this commit, over
three trials:
```
roachprod create -n 5 --gce-machine-type=n2-standard-16 $CLUSTER
roachprod put $CLUSTER [build] cockroach

roachprod wipe $CLUSTER; roachprod start $CLUSTER;
roachprod run $CLUSTER:1 -- "./cockroach workload init bank --rows 1000000000"
roachprod sql $CLUSTER:1 -- -e "BACKUP INTO 'gs://somebucket/michael-rangkey?AUTH=implicit'"
```

The backup on the control binary took on average 478 seconds with a stdev of 13
seconds, while the backup with the treatment binary took on average 499 seconds
with stddev of 8 seconds.

84883: kvserver: add server-side transaction retry metrics r=arulajmani a=arulajmani

This patch adds a few new metrics to track successful/failed
server-side transaction retries. Specifically, whenever we attempt
to retry a read or write batch or run into a read within uncertainty
interval error, we increment specific counters indicating if the
retry was successful or not.

Release note: None

85074: upgrades: add checkpointing for `raftAppliedIndexTermMigration` r=irfansharif a=erikgrinaker

Forward-port of #84909, for posterity.

----

The `raftAppliedIndexTermMigration` upgrade migration could be
unreliable. It iterates over all ranges and runs a `Migrate` request
which must be applied on all replicas. However, if any ranges merge or
replicas are unavailable, the migration fails and starts over from the
beginning. In large clusters with many ranges, this meant that it might
never complete.

This patch makes the upgrade more robust, by retrying each `Migrate`
request 5 times, and checkpointing the progress after every fifth batch
(1000 ranges), allowing resumption on failure. At some point this should
be made part of the migration infrastructure.

NB: This fix was initially submitted for 22.1, and even though the
migration will be removed for 22.2, it is forward-ported for posterity.

Release note: None

85086: eval: stop ignoring all ResolveOIDFromOID errors r=ajwerner a=rafiss

fixes #84448

The decision about whether an error is safe to ignore is made at the
place where the error is created/returned. This way, the callers don't
need to be aware of any new error codes that the implementation may
start returning in the future.

Release note (bug fix): Fixed incorrect error handling that could cause
casts to OID types to fail in some cases.

Co-authored-by: Michael Butler <[email protected]>
Co-authored-by: Arul Ajmani <[email protected]>
Co-authored-by: Erik Grinaker <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
@craig craig bot closed this as completed in d5de88d Jul 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants