Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: rerunning an intermediate system table user ID migration will fail #91449

Closed
andyyang890 opened this issue Nov 7, 2022 · 0 comments · Fixed by #91756
Closed

sql: rerunning an intermediate system table user ID migration will fail #91449

andyyang890 opened this issue Nov 7, 2022 · 0 comments · Fixed by #91756
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@andyyang890
Copy link
Collaborator

andyyang890 commented Nov 7, 2022

I came across this issue when trying to write a series of migrations adding user ID columns to the system.role_members table.

For every schema change migration, a schemaExistsFn (defined here and used here) needs to be provided, which checks whether the schema change already exists. One such function is hasColumn (defined here), which verifies that either the column does not already exist or it matches the table descriptor in pkg/sql/catalog/systemschema/system.go exactly.

The previously merged internal user ID migrations, which can be found in pkg/upgrade/upgrades/system_users_role_id_migration.go and pkg/upgrade/upgrades/role_options_table_migration.go, roughly performs a system table schema change through the following series of migrations:

  1. Add the new column(s) as nullable (NULL) columns to the table and create indexes for them
  2. Backfill the column(s)
  3. Change the column(s) to be not nullable (NOT NULL)

The migrations in step 1 for both of the existing migrations make use of hasColumn and so if the migration were to be run again after the adding of the columns has succeeded (but before the step 3 migration successfully completes), they would fail since hasColumn expects the column to either not exist, or be NOT NULL (to match the desired final state specified in the descriptor). This would prevent the completion of that migration as well as any further migrations. I believe this issue can be fixed by writing a custom schemaExistsFn for the migration in step 1 that basically mirrors hasColumn but allows the expected column descriptor to have Nullable == false while the actual column descriptor has Nullable == true.

However, I'm not sure if this would actually ever be a problem because it would mean that the migration succeeded in adding the column, but was somehow forced to run again. This only occurred during my local development because I had an incorrect query later on in my step 1 migration.

Jira issue: CRDB-21269

Epic CRDB-19134

@andyyang890 andyyang890 added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels Nov 7, 2022
@andyyang890 andyyang890 self-assigned this Nov 14, 2022
craig bot pushed a commit that referenced this issue Nov 15, 2022
90828: roachtest: add --debug-always flag r=stevendanna a=stevendanna

Occasionally, it is very useful to keep a cluster around even if the workload happened to complete without error. The --debug-always is like --debug but saves the cluster even if the test was successful.

Epic: None

Release note: None

91359: keys: accurately size pre-allocated buffers in MakeRangeKey r=arulajmani a=nvanbenschoten

This commit includes a trio of changes that accurately size the byte buffers in `MakeRangeKey` and `MakeRangeKeyPrefix` to avoid unnecessary slice resizing (allocation + memcpy) when constructing range-local keys (for example, `RangeDescriptorKey` and `TransactionKey`).

The first change is to include the size of the suffix and detail slices when pre-allocating the byte buffer. We know that `MakeRangeKey` will be appending these to the key, so they will force a resize if not accounted for upfront.

The second change is to correctly account for the overhead of `EncodeBytesAscending`. The code was getting this wrong in two ways. First, it was failing to account for the 3 bytes of unconditional overhead added by the encoding scheme for the encoding type marker and terminator. Second, it was failing to account for the conditional overhead when bytes need to be escaped. We now accurately and efficiently compute the overhead ahead of time to avoid resizing.

The third change is to redefine the transaction tombstone and push marker keys used with the timestamp cache. Previously, the tombstone and push specifiers were additional suffixes that we added after the `LocalTransactionSuffix`. Now, these specifiers are the only suffix added to the key, which avoids an additional key resize.

```
name                         old time/op    new time/op    delta
KV/Update/Native/rows=1-10     70.6µs ± 2%    69.6µs ± 2%  -1.38%  (p=0.000 n=100+93)
KV/Insert/Native/rows=1-10     46.0µs ± 2%    45.7µs ± 2%  -0.62%  (p=0.000 n=95+97)
KV/Delete/Native/rows=1-10     47.3µs ± 2%    47.1µs ± 2%  -0.55%  (p=0.000 n=97+94)
KV/Insert/Native/rows=10-10    74.4µs ± 2%    74.1µs ± 3%  -0.45%  (p=0.000 n=96+98)
KV/Update/Native/rows=10-10     171µs ± 3%     170µs ± 3%  -0.36%  (p=0.045 n=96+98)
KV/Delete/Native/rows=10-10    85.1µs ± 2%    84.8µs ± 2%  -0.29%  (p=0.020 n=98+93)
KV/Update/SQL/rows=1-10         174µs ± 3%     174µs ± 3%  -0.28%  (p=0.041 n=97+89)
KV/Insert/SQL/rows=1-10         131µs ± 2%     131µs ± 4%    ~     (p=0.961 n=89+91)
KV/Insert/SQL/rows=10-10        186µs ± 3%     186µs ± 3%    ~     (p=0.970 n=94+92)
KV/Update/SQL/rows=10-10        336µs ± 2%     336µs ± 3%    ~     (p=0.947 n=92+92)
KV/Delete/SQL/rows=1-10         149µs ± 2%     149µs ± 3%    ~     (p=0.917 n=96+96)
KV/Delete/SQL/rows=10-10        226µs ± 8%     225µs ±10%    ~     (p=0.057 n=97+98)

name                         old alloc/op   new alloc/op   delta
KV/Insert/Native/rows=1-10     17.9kB ± 1%    17.6kB ± 1%  -1.95%  (p=0.000 n=100+100)
KV/Delete/Native/rows=1-10     18.2kB ± 1%    17.9kB ± 1%  -1.92%  (p=0.000 n=94+94)
KV/Update/Native/rows=1-10     25.1kB ± 0%    24.7kB ± 1%  -1.44%  (p=0.000 n=97+97)
KV/Insert/Native/rows=10-10    44.9kB ± 1%    44.5kB ± 1%  -0.88%  (p=0.000 n=100+99)
KV/Delete/Native/rows=10-10    42.4kB ± 1%    42.0kB ± 0%  -0.87%  (p=0.000 n=96+94)
KV/Delete/SQL/rows=1-10        52.9kB ± 1%    52.6kB ± 1%  -0.48%  (p=0.000 n=96+100)
KV/Update/Native/rows=10-10    75.2kB ± 1%    74.8kB ± 1%  -0.43%  (p=0.000 n=98+99)
KV/Update/SQL/rows=1-10        52.6kB ± 1%    52.4kB ± 0%  -0.35%  (p=0.000 n=95+95)
KV/Insert/SQL/rows=1-10        45.4kB ± 1%    45.3kB ± 0%  -0.24%  (p=0.000 n=97+93)
KV/Update/SQL/rows=10-10        119kB ± 1%     119kB ± 1%  -0.16%  (p=0.002 n=97+99)
KV/Delete/SQL/rows=10-10       88.0kB ± 1%    87.8kB ± 1%  -0.14%  (p=0.017 n=91+95)
KV/Insert/SQL/rows=10-10       93.9kB ± 1%    93.7kB ± 1%  -0.14%  (p=0.000 n=98+100)

name                         old allocs/op  new allocs/op  delta
KV/Insert/Native/rows=1-10        142 ± 0%       130 ± 0%  -8.45%  (p=0.000 n=99+100)
KV/Delete/Native/rows=1-10        143 ± 0%       131 ± 0%  -8.39%  (p=0.000 n=96+94)
KV/Update/Native/rows=1-10        198 ± 0%       186 ± 0%  -6.06%  (p=0.000 n=99+98)
KV/Delete/Native/rows=10-10       275 ± 0%       263 ± 0%  -4.36%  (p=0.000 n=90+90)
KV/Insert/Native/rows=10-10       295 ± 0%       283 ± 0%  -4.07%  (p=0.000 n=93+91)
KV/Update/Native/rows=10-10       472 ± 1%       460 ± 1%  -2.55%  (p=0.000 n=98+98)
KV/Insert/SQL/rows=1-10           365 ± 0%       356 ± 0%  -2.53%  (p=0.000 n=97+75)
KV/Delete/SQL/rows=1-10           402 ± 0%       393 ± 0%  -2.24%  (p=0.000 n=97+98)
KV/Update/SQL/rows=1-10           509 ± 0%       500 ± 0%  -1.81%  (p=0.000 n=96+95)
KV/Insert/SQL/rows=10-10          589 ± 0%       580 ± 0%  -1.53%  (p=0.000 n=94+95)
KV/Delete/SQL/rows=10-10          623 ± 1%       614 ± 1%  -1.47%  (p=0.000 n=98+97)
KV/Update/SQL/rows=10-10          858 ± 1%       849 ± 0%  -1.03%  (p=0.000 n=95+93)
```

I confirmed in heap profiles that this change eliminates all resizes of these buffers.

Release note: None

Epic: None

91719: storage: add MVCCExportFingerprintOptions r=stevendanna a=stevendanna

This adds MVCCExportFingerprintOptions with two new options:

 - StripTenantPrefix
 - StripValueChecksum

The goal of these options is to produce a fingerprint that can be used for comparing data across two tenants.

Note that if arbitrary keys and values are encountered, both options have the possibility of erroneously removing data from the fingerprint that isn't actually a tenant prefix or checksum.

Fixes #91150

Release note: None

91750: kvserver: factor out ReplicatedCmd r=pavelkalinnikov a=tbg

**Background**

We would like to apply log entries at startup time, and so we are
working towards making all code related to entry application
stand-alone (and, as a nice side product, more unit testable).

We already have a decent set of abstractions in place:

- `apply.Decoder` can consume log entries and turn them into
  (iterators over) `apply.Command`.
- `apply.StateMachine` can make an `apply.Batch` which is
  a pebble batch along with the management of any below-raft
  side effects (replicated or in-memory)
- `apply.Batch` has a `Stage` method that handles an `apply.Command`,
  in the simplest case doing little more than adding it to its pebble
  batch (but in the most complex cases, locking adjacent replicas for
  complicated split-merge dances).
- `Stage` returns a `CheckedCmd`, which can be acked to the client.

**This PR**

This PR (i.e. this and the lead-up of preceding commits) provides an
implementation of `apply.Command` (and friends) outside of `kvserver` in
a way that does not lead to code duplication.

Before this PR, `apply.Command` was implemented only by
`*kvserver.replicatedCmd`[^rc], which is a struct wrapping the decoded
raft entry and adding some state (such as whether the entry applies with
a forced error, i.e. as an empty command, or not). Some of this state is
necessary in standalone application as well (like the forced error),
other state isn't (like a tracing span, or client's context). In
particular, `replicatedCmd` holds on to a `ProposalData` which itself
references types related to latching and the quota pool. These are hard
to extract, and besides this isn't necessary as standalone application
doesn't need them.

Instead of trying to transitively extract `replicatedCmd` from
`kvserver`, we split `replicatedCmd` into the standalone part, and let
`replicatedCmd` extend the standalone part with the fields only
applicable during application in the context of a `*Replica`. In effect,
`raftlog.ReplicatedCmd` determines deterministically the effect of a raft
command onto the state machine, whereas `replicatedCmd` additionally
deals with additional concerns outside of that scope, such as managing
concurrency between different state machines, acknowledging clients,
updating in-memory counters, etc.

This decoupling results in the type introduced in this PR,
`ReplicatedCmd`, which is embedded into `replicatedCmd`.

Both `*replicatedCmd` and `*ReplicatedCmd` implement `apply.Command`
(as well as `apply.CheckedCommand` and `apply.AppliedCommand`).

To avoid duplication, the implementations on `*replicatedCmd` fall
through to those of `*ReplicatedCmd` whereever possible (via the
embedding of the latter into the former).

For example, the `IsTrivial` method is purely a property of the
`ReplicatedEvalResult`, i.e. of the raft entry, and can thus be answered
by `ReplicatedCmd.IsTrivial()`; `*replicatedCmd` does not override
this method.[^concern]

For another, a bit less trivial example, the `Rejected()` method checks
whether there is a forced error for this command. This is determined
below raft (i.e. not a property of the entry itself), but both
stand-alone and regular application need to come to the same results
here. This is why the forced error is a field on `ReplicatedCmd` and
the implementation sits on the base as well, simply returning whether
the field is set.

An example where the implementations differ is `IsLocal()`: in
standalone applications commands are never considered "local" (we never
have a client waiting to be acked), so the base implementation returns
`false`, but the implementation on `*replicatedCmd` overrides this to
`return c.proposal != nil`.

With this in place, we have an implementation of
`apply.{,Checked,Applied}Command` present in `raftlog`.

[^rc]: https://github.com/cockroachdb/cockroach/blob/fb4014a31b9b8d8235dc48de52196e64b185f490/pkg/kv/kvserver/replica_application_cmd.go#L27-L79
[^concern]: it might be too opaque to implement an interface wholly or in part via an embedded
implementation. Reviewers should consider whether they'd prefer explicit
"passthrough" methods to be added.

**Next Steps**

It's straightforward to implement the `apply.Decoder` seen in the
walkthrough at the top. The tricky part is `apply.StateMachine`.

Today's implementation of that is essentially the `*Replica`[^sm];
the meat of it is in `*replicaAppBatch`[^rb], which is the
`apply.Batch`, and which holds a `*Replica` that it calls out to on
many occasions, in particular during `Stage()`.

The next challenge will be a similar separation of concerns as carried
out above for `replicatedCmd`. Large parts of `Stage` are related to the
deterministic handling of log entries during application, but others
deal with updating in-memory state, keeping counters, notifying
rangefeeds of events, etc, all concerns not relevant to standalone
application and rather side-effects of state machine application vs
a part of it.

The goal is to introduce a handler interface which can separate these,
and for which the standalone application provides an implementation of
the essentials with no-ops for everything which is not a property of
state machine application in itself. With this interface contained in it,
`replicaAppBatch` can be moved to a leaf package, and can be used in
both standalone and regular log entry application.

[^sm]: https://github.com/cockroachdb/cockroach/blob/ccac3ddd85ca2fd4a8d02a89c82cd04761a1ce26/pkg/kv/kvserver/replica_application_state_machine.go#L106-L121
[^rb]: https://github.com/cockroachdb/cockroach/blob/ccac3ddd85ca2fd4a8d02a89c82cd04761a1ce26/pkg/kv/kvserver/replica_application_state_machine.go#L202-L234

Touches #75729.

Epic: CRDB-220
Release note: None


91755: upgrades: add weaker column schema exists funcs for use with migrations r=ajwerner,rafiss a=andyyang890

This patch adds two schema exists functions for use with migrations
that involve multiple schema changes on the same column(s) in order
to preserve the idempotence of the migration(s). They are weaker
in the sense that they do not check that the stored and final
expected descriptor match.

Informs #91449

Release note: None

91806: streampb: move out of CCL r=adityamaru a=stevendanna

This moves streampb into pkg/streaming so that non-CCL code doesn't need to import CCL code.

It also renames pkg/streaming to pkg/repstream.

Fixes #91005

Release note: None

91920: ptsstorage: allow synthetic timestamps in pts storage r=HonoreDB a=aliher1911

Previously synthetic timestamps were causing failures in changefeeds if checkpoint contained a synthetic timestamps. Timestamp representation was parsed as decimal for storage which is not the case for synthetic timestamps.
This commit changes pts storage to strip synthetic flag to mitigate the issue.
Stripping synthetic flag should be safe as protected timestamp is not used to update key or transaction timestamps but to compare against GC thresholds.

Release note: None

Fixes #91922

Co-authored-by: Steven Danna <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Tobias Grieger <[email protected]>
Co-authored-by: Andy Yang <[email protected]>
Co-authored-by: Oleg Afanasyev <[email protected]>
craig bot pushed a commit that referenced this issue Nov 16, 2022
91756: upgrades: fix idempotence issue in system table role id migrations r=ajwerner,rafiss a=andyyang890

This patch fixes an idempotence issue in the role id migrations for
the system.users and system.role_options tables.

Fixes #91449

Release note: None

Co-authored-by: Andy Yang <[email protected]>
@craig craig bot closed this as completed in 6594b56 Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant