Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql-foundations: tweak regionless restore syntax and semantics #111348

Closed
msbutler opened this issue Sep 27, 2023 · 4 comments · Fixed by #111443
Closed

sql-foundations: tweak regionless restore syntax and semantics #111348

msbutler opened this issue Sep 27, 2023 · 4 comments · Fixed by #111443
Assignees
Labels
branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@msbutler
Copy link
Collaborator

msbutler commented Sep 27, 2023

#110606 Added the new strip_localities option to RESTORE. The DR team thinks two small things should change before this feature is released into the wild. I apologize for overlooking these during the review!!

  1. we should change strip_localities to remove_regions (or something similar), because 1) strip_localities sounds too similar to the existing skip_localities option and 2) and "localities" (e.g. locality aware backups) are slightly different from "regions". This feature strips regions from sql descriptors.
  2. this is more philosophical: the restore cmd has a contract that all tables must be available to serve writes once the restore command finishes. Currently, an rbr table restored via this command is unavailable for writes. The DR team thinks an unavailable table, even if there is a post restore workaround, is worse than a failed Restore. To that end, the DR team proposes:
  • a quick fix that @dt will comment about in issue
  • if this workaround is a no go, regionless restores that contain an rbr table should fail.
  1. since the plan is for this to be backported to 23.1, we should add mixed-version testing

Jira issue: CRDB-31866

@msbutler msbutler added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels Sep 27, 2023
@blathers-crl
Copy link

blathers-crl bot commented Sep 27, 2023

Hi @msbutler, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@msbutler msbutler added the branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 label Sep 27, 2023
annrpom added a commit to annrpom/cockroach that referenced this issue Sep 27, 2023
This changes the syntax from newly added RESTORE option,
strip_localities, to remove_regions. The former was
inadequate because it was too similar to an existing
option, along with not being an accurately descriptive enough
name.

Epic: none
Informs: cockroachdb#111348
Part of: https://cockroachlabs.atlassian.net/browse/CRDB-29129

Release note (sql change): the syntax for RESTORE option,
strip_localities, is now remove_regions.
@dt
Copy link
Member

dt commented Sep 27, 2023

For an RBR table restored into a non-RBR database, can we a) restore the enum but rename it to something like tblname_gateway_region and just make it a normal UDT and then b) change the DEFAULT expression on the column to just be a constant (maybe we add a new value to the enum like unknown or empty when we convert it to a plain UDF and then use that)?

@msbutler
Copy link
Collaborator Author

@annrpom one other thing we ought to do: version gate this feature, i.e. in restore_planning.go, add the following pseudocode:

if !p.ExecCfg().Settings.Version.IsActive(ctx, cluserv23_2) && args.removeLocalities {
return errors.New("can only use remove localities on a cluster that has fully updgraded to 23.2")
}

This pseudo code would avoid the following mixed version bug: a 23.2 node plans the regionless restore, then a 23.1 node picks up the job but is completely unaware of this new feature and will execute a restore as if this feature did not exist.

craig bot pushed a commit that referenced this issue Sep 28, 2023
108863: streamingest: periodically output replication-frontier.txt file r=msbutler a=adityamaru

Please see individual commits for details.

Informs: #108374

111275: gc: separate out lock table scan in the gc queue r=nvanbenschoten a=arulajmani

Previously, we'd handle resolving intents while scanning through the
MVCC keyspace if we came across one. We would resolve it if it was
older than some configurable threshold. In all this, we never needed
to scan the lock table keyspace directly.

We're introducing replicated shared and exclusive locks. The GC queue
is expected to resolve extant replicated locks, and as such, needs to
concern itself with an explicit lock table scan. This patch removes
intent handling from the scan of the MVCC keyspace. For now, no behavior
changes -- we only look for and resolve intents. In a future patch we'll
extend this behavior to include shareed and exclusive replicated locks.

While here, we also modify an exsiting test
(`TestIntentAgeThresholdSetting`) to include an intent on a range local
key. I've ensured that the test fails if we were to only scan the lock
table corresponding to the global keyspace of a range.

Informs #111215

Release note: None

111356: ccl: syntax change restore option strip_localities to remove_regions r=msbutler a=annrpom

This changes the syntax from newly added RESTORE option, strip_localities, to remove_regions. The former was inadequate because it was too similar to an existing option, along with not being an accurately descriptive enough name.

Epic: none
Informs: #111348
Part of: https://cockroachlabs.atlassian.net/browse/CRDB-29129

Release note (sql change): The syntax for RESTORE option, strip_localities, is now remove_regions.

111376: dev: remove `merge-test-xmls` command r=rail a=rickystewart

This was used in old-style `stress`. No need for it now that `dev test --stress` uses `--runs_per_test`.

Epic: none
Release note: None

111413: roachtest: remove backup creation during node upgrades r=aliher1911 a=aliher1911

In upgrade tests existing nodes try to start backup schedule on every binary restart. This is not adding any value as this feature tests that cluster will run backups created at initial cluster start.
At the same time those checks pollute logs constantly. This commit changes options for node restarts in upgrade tests to exclude backups.

Epic: none

Release note: None

111417: logic: Fix a few incorrect syntax in `scrub` logic test r=Xiang-Gu a=Xiang-Gu

This commit fixes a few incorrect syntax in `scrub` logic test.

Epic: None
Release note: None

Co-authored-by: adityamaru <[email protected]>
Co-authored-by: Arul Ajmani <[email protected]>
Co-authored-by: Annie Pompa <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
Co-authored-by: Oleg Afanasyev <[email protected]>
Co-authored-by: Xiang Gu <[email protected]>
@annrpom
Copy link
Contributor

annrpom commented Sep 28, 2023

i was going to leave a comment about where i landed with the result of (2.), but @msbutler is quick with it :)

essentially, i think that if we cannot completely undo the effects of regional by row, we shouldn't try to at all (otherwise, everything can "seem" to be working, but internally not be what the user expects - which might cause some trickle-down bugs). i believe it is up to the user to figure out how they want regional by row to be consolidated

this was said after I tinkered with David's solution - I could be wrong, but it seemed like there was a little more to it (in particular, i am not sure how to resolve the fact that if such enum is renamed, its "array type descriptor" still causes a collision)

thus, i have my PR up for "failing fast" (or as fast as I think we can) if someone tries to perform a regionless restore with rbr table(s)

feel free to correct me if i am wrong or pair with me if you would like to dig into this further

annrpom added a commit to annrpom/cockroach that referenced this issue Oct 3, 2023
Previously, backing up an MR cluster/db/table that had a regional by
row table did not work out-of-the-box (users could not write to said
table without performing some operations to ensure hidden column added
by regional by row enablement, `crdb_region`, and zone config behaved
according to the target RESTORE cluster. This was inadequate because
we allowed users to perform a RESTORE where some tables were not
"available" right away. This patch addresses this by making sure we
"fail fast" if users do decide to attempt a RESTORE where the BACKUP
object has a regional by row table.

Also: tests. A small amount of tests were fixed to separate the behavior
of restoring a database vs a table (just dropping a table after we
performed a db restore to test table restore is not a properly
isolated test of behavior).

Epic: none
Informs: cockroachdb#111348
Part of: https://cockroachlabs.atlassian.net/browse/CRDB-29129

Release note (sql change): Users are not able to perform a
`remove_regions` RESTORE if the object being restored contains
a regional by row table.
annrpom added a commit to annrpom/cockroach that referenced this issue Oct 4, 2023
This patch adds a >= 23.2 version gate to the `remove_regions`
RESTORE option, in addition to a mixed-version test to ensure
that RESTORE fails fast if it is on a cluster version < 23.32.

Epic: none
Fixes: cockroachdb#111348
Part of: https://cockroachlabs.atlassian.net/browse/CRDB-29129

Release note (sql change): Clusters with versions < 23.2 are not able to use
the RESTORE option `remove_regions` (this feature has been version
gated).
annrpom added a commit to annrpom/cockroach that referenced this issue Oct 5, 2023
This changes the syntax from newly added RESTORE option,
strip_localities, to remove_regions. The former was
inadequate because it was too similar to an existing
option, along with not being an accurately descriptive enough
name.

Epic: none
Informs: cockroachdb#111348
Part of: https://cockroachlabs.atlassian.net/browse/CRDB-29129

Release note (sql change): the syntax for RESTORE option,
strip_localities, is now remove_regions.
annrpom added a commit to annrpom/cockroach that referenced this issue Oct 5, 2023
Previously, backing up an MR cluster/db/table that had a regional by
row table did not work out-of-the-box (users could not write to said
table without performing some operations to ensure hidden column added
by regional by row enablement, `crdb_region`, and zone config behaved
according to the target RESTORE cluster. This was inadequate because
we allowed users to perform a RESTORE where some tables were not
"available" right away. This patch addresses this by making sure we
"fail fast" if users do decide to attempt a RESTORE where the BACKUP
object has a regional by row table.

Also: tests. A small amount of tests were fixed to separate the behavior
of restoring a database vs a table (just dropping a table after we
performed a db restore to test table restore is not a properly
isolated test of behavior).

Epic: none
Informs: cockroachdb#111348
Part of: https://cockroachlabs.atlassian.net/browse/CRDB-29129

Release note (sql change): Users are not able to perform a
`remove_regions` RESTORE if the object being restored contains
a regional by row table.
annrpom added a commit to annrpom/cockroach that referenced this issue Oct 5, 2023
This patch adds a >= 23.2 version gate to the `remove_regions`
RESTORE option, in addition to a mixed-version test to ensure
that RESTORE fails fast if it is on a cluster version < 23.2.

Epic: none
Fixes: cockroachdb#111348
Part of: https://cockroachlabs.atlassian.net/browse/CRDB-29129

Release note: None
annrpom added a commit to annrpom/cockroach that referenced this issue Oct 5, 2023
This changes the syntax from newly added RESTORE option,
strip_localities, to remove_regions. The former was
inadequate because it was too similar to an existing
option, along with not being an accurately descriptive enough
name.

Epic: none
Informs: cockroachdb#111348
Part of: https://cockroachlabs.atlassian.net/browse/CRDB-29129

Release note (sql change): the syntax for RESTORE option,
strip_localities, is now remove_regions.
annrpom added a commit to annrpom/cockroach that referenced this issue Oct 5, 2023
Previously, backing up an MR cluster/db/table that had a regional by
row table did not work out-of-the-box (users could not write to said
table without performing some operations to ensure hidden column added
by regional by row enablement, `crdb_region`, and zone config behaved
according to the target RESTORE cluster. This was inadequate because
we allowed users to perform a RESTORE where some tables were not
"available" right away. This patch addresses this by making sure we
"fail fast" if users do decide to attempt a RESTORE where the BACKUP
object has a regional by row table.

Also: tests. A small amount of tests were fixed to separate the behavior
of restoring a database vs a table (just dropping a table after we
performed a db restore to test table restore is not a properly
isolated test of behavior).

Epic: none
Informs: cockroachdb#111348
Part of: https://cockroachlabs.atlassian.net/browse/CRDB-29129

Release note (sql change): Users are not able to perform a
`remove_regions` RESTORE if the object being restored contains
a regional by row table.
annrpom added a commit to annrpom/cockroach that referenced this issue Oct 5, 2023
This patch adds a >= 23.1 version gate to the `remove_regions`
RESTORE option, in addition to a mixed-version test to ensure
that RESTORE fails fast if it is on a cluster version < 23.1.

Epic: none
Fixes: cockroachdb#111348
Part of: https://cockroachlabs.atlassian.net/browse/CRDB-29129

Release note: None
THardy98 pushed a commit to THardy98/cockroach that referenced this issue Oct 6, 2023
This changes the syntax from newly added RESTORE option,
strip_localities, to remove_regions. The former was
inadequate because it was too similar to an existing
option, along with not being an accurately descriptive enough
name.

Epic: none
Informs: cockroachdb#111348
Part of: https://cockroachlabs.atlassian.net/browse/CRDB-29129

Release note (sql change): the syntax for RESTORE option,
strip_localities, is now remove_regions.
aliher1911 pushed a commit to aliher1911/cockroach that referenced this issue Oct 9, 2023
This changes the syntax from newly added RESTORE option,
strip_localities, to remove_regions. The former was
inadequate because it was too similar to an existing
option, along with not being an accurately descriptive enough
name.

Epic: none
Informs: cockroachdb#111348
Part of: https://cockroachlabs.atlassian.net/browse/CRDB-29129

Release note (sql change): the syntax for RESTORE option,
strip_localities, is now remove_regions.
annrpom added a commit to annrpom/cockroach that referenced this issue Oct 9, 2023
This changes the syntax from newly added RESTORE option,
strip_localities, to remove_regions. The former was
inadequate because it was too similar to an existing
option, along with not being an accurately descriptive enough
name.

Epic: none
Informs: cockroachdb#111348
Part of: https://cockroachlabs.atlassian.net/browse/CRDB-29129

Release note (sql change): the syntax for RESTORE option,
strip_localities, is now remove_regions.
annrpom added a commit to annrpom/cockroach that referenced this issue Oct 9, 2023
Previously, backing up an MR cluster/db/table that had a regional by
row table did not work out-of-the-box (users could not write to said
table without performing some operations to ensure hidden column added
by regional by row enablement, `crdb_region`, and zone config behaved
according to the target RESTORE cluster. This was inadequate because
we allowed users to perform a RESTORE where some tables were not
"available" right away. This patch addresses this by making sure we
"fail fast" if users do decide to attempt a RESTORE where the BACKUP
object has a regional by row table.

Also: tests. A small amount of tests were fixed to separate the behavior
of restoring a database vs a table (just dropping a table after we
performed a db restore to test table restore is not a properly
isolated test of behavior).

Epic: none
Informs: cockroachdb#111348
Part of: https://cockroachlabs.atlassian.net/browse/CRDB-29129

Release note (sql change): Users are not able to perform a
`remove_regions` RESTORE if the object being restored contains
a regional by row table.
blathers-crl bot pushed a commit that referenced this issue Oct 18, 2023
This changes the syntax from newly added RESTORE option,
strip_localities, to remove_regions. The former was
inadequate because it was too similar to an existing
option, along with not being an accurately descriptive enough
name.

Epic: none
Informs: #111348
Part of: https://cockroachlabs.atlassian.net/browse/CRDB-29129

Release note (sql change): the syntax for RESTORE option,
strip_localities, is now remove_regions.
blathers-crl bot pushed a commit that referenced this issue Oct 18, 2023
Previously, backing up an MR cluster/db/table that had a regional by
row table did not work out-of-the-box (users could not write to said
table without performing some operations to ensure hidden column added
by regional by row enablement, `crdb_region`, and zone config behaved
according to the target RESTORE cluster. This was inadequate because
we allowed users to perform a RESTORE where some tables were not
"available" right away. This patch addresses this by making sure we
"fail fast" if users do decide to attempt a RESTORE where the BACKUP
object has a regional by row table.

Also: tests. A small amount of tests were fixed to separate the behavior
of restoring a database vs a table (just dropping a table after we
performed a db restore to test table restore is not a properly
isolated test of behavior).

Epic: none
Informs: #111348
Part of: https://cockroachlabs.atlassian.net/browse/CRDB-29129

Release note (sql change): Users are not able to perform a
`remove_regions` RESTORE if the object being restored contains
a regional by row table.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants