Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrades: index validation should ignore CreatedAtNanos field #85228

Closed
postamar opened this issue Jul 28, 2022 · 3 comments · Fixed by #87462
Closed

upgrades: index validation should ignore CreatedAtNanos field #85228

postamar opened this issue Jul 28, 2022 · 3 comments · Fixed by #87462
Assignees
Labels
A-schema-catalog Related to the schema descriptors collection and the catalog API in general. branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. GA-blocker

Comments

@postamar
Copy link
Contributor

postamar commented Jul 28, 2022

Presently, a cluster upgrade step which adds a new index may fail during validation because the validation expects the CreatedAtNanos to be zero and the field may already have a non-zero value.

Right now the only place this can happen in existing releases is the upgrade step in 22.1 which adds an index to system.statement_diagnostics_requests.

Jira issue: CRDB-18121

@postamar postamar added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Jul 28, 2022
@blathers-crl blathers-crl bot added the T-sql-schema-deprecated Use T-sql-foundations instead label Jul 28, 2022
@postamar postamar added A-schema-catalog Related to the schema descriptors collection and the catalog API in general. backport-22.1.x GA-blocker and removed T-sql-schema-deprecated Use T-sql-foundations instead labels Jul 28, 2022
@blathers-crl
Copy link

blathers-crl bot commented Jul 28, 2022

Hi @postamar, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@postamar postamar added the branch-master Failures and bugs on the master branch. label Jul 28, 2022
@postamar
Copy link
Contributor Author

The fix itself should be extremely straightforward: zero the CreatedAtNanos field values in hasIndex. The fix should however be accompanied by a regression test which should catch this as well as future instances of this class of bugs, where we add a new field with non-deterministic contents but forget to zero it in this upgrade validation logic.

@ajwerner suggested we take inspiration from https://github.com/cockroachdb/cockroach/blob/614c2b131220db8de85f193c2d059bb7da7e7685/pkg/migration/migrations/retry_jobs_with_exponential_backoff_external_test.go
I agree. I'll add that we shouldn't tie the test to a specific upgrade step, instead we should reframe it as a unit test to cover upgrades.ValidateSchemaExists. We'll have to mock a system table upgrade somehow. This could be done by creating a new, unrelated table for the purposes of the test and piping operations defined in test cases through something that does the same thing as migrateTable does.

@postamar postamar added release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. and removed GA-blocker labels Aug 16, 2022
@ajwerner ajwerner self-assigned this Aug 16, 2022
@ajwerner ajwerner removed branch-master Failures and bugs on the master branch. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Sep 6, 2022
@blathers-crl
Copy link

blathers-crl bot commented Sep 6, 2022

Hi @postamar, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@postamar postamar added the branch-master Failures and bugs on the master branch. label Sep 6, 2022
ajwerner added a commit to ajwerner/cockroach that referenced this issue Sep 6, 2022
…ation

Schema change upgrade migrations to system tables are made idempotent by
checking that the descriptor reaches some expected state. In order to ensure
that it is in that expected state, some volatile fields need to be masked.
We forgot to mask CreatedAtNanos. We also lost the testing which came with
these helper functions we use.

The vast majority of this PR is reviving testing from cockroachdb#66889.

Fixes cockroachdb#85228.

Release justification: Import bug fix for backport

Release note (bug fix): Some upgrade migrations perform schema changes on
system tables. Those upgrades which added indexes could, in some cases, get
caught retrying because they failed to detect that the migration had already
occurred due to the existence of a populated field. When that happens, the
finalization of the new version can hang indefinitely and require manual
intervention. This bug has been fixed.
craig bot pushed a commit that referenced this issue Sep 8, 2022
87027: streamingccl: reduce server count in multinode tests r=samiskin a=samiskin

While these tests would pass under stress locally they would fail CI
stress, which may be because we were starting more server processes than
ever before with 4 source nodes, 4 source tenant pods, and 4 destination
nodes.

This PR reduces the node count to 3 (any lower and scatter doesn't
correctly distribute ranges) and only starts a single tenant pod for the
source cluster.

Release justification: test-only change
Release note: None

87412: cli,server: fix --sql-advertise-addr when --sql-addr is not specified r=a-robinson,ajwerner a=knz

Fixes #87040.
Informs #52266.

cc `@a-robinson` 

Release justification: bug fix

Release note (bug fix): The flag `--sql-advertise-addr` now properly
works even when the SQL and RPC ports are shared (because `--sql-addr`
was not specified). Note that this port sharing is a deprecated
feature in v22.2.

87440: ui: update txn contention insights to use waiting txns as event r=ericharmeling a=ericharmeling

This commit updates the transaction workload insights pages to use the waiting
contended transaction as the primary contention event, rather than the blocking
transaction.

Fixes #87284.

https://www.loom.com/share/383fec4297a74ec79d90e46f11def792

Release justification: bug fixes and low-risk updates to new functionality
Release note: None

87462: upgrade/upgrades: allow CreatedAtNanos to be set when validating migration r=ajwerner a=ajwerner

Schema change upgrade migrations to system tables are made idempotent by checking that the descriptor reaches some expected state. In order to ensure that it is in that expected state, some volatile fields need to be masked. We forgot to mask CreatedAtNanos. We also lost the testing which came with these helper functions we use.

The vast majority of this PR is reviving testing from #66889.

Fixes #85228.

Release justification: Import bug fix for backport

Release note (bug fix): Some upgrade migrations perform schema changes on system tables. Those upgrades which added indexes could, in some cases, get caught retrying because they failed to detect that the migration had already occurred due to the existence of a populated field. When that happens, the finalization of the new version can hang indefinitely and require manual intervention. This bug has been fixed.

Co-authored-by: Shiranka Miskin <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Eric Harmeling <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
@craig craig bot closed this as completed in 52eda07 Sep 8, 2022
blathers-crl bot pushed a commit that referenced this issue Sep 8, 2022
…ation

Schema change upgrade migrations to system tables are made idempotent by
checking that the descriptor reaches some expected state. In order to ensure
that it is in that expected state, some volatile fields need to be masked.
We forgot to mask CreatedAtNanos. We also lost the testing which came with
these helper functions we use.

The vast majority of this PR is reviving testing from #66889.

Fixes #85228.

Release justification: Import bug fix for backport

Release note (bug fix): Some upgrade migrations perform schema changes on
system tables. Those upgrades which added indexes could, in some cases, get
caught retrying because they failed to detect that the migration had already
occurred due to the existence of a populated field. When that happens, the
finalization of the new version can hang indefinitely and require manual
intervention. This bug has been fixed.
ajwerner added a commit to ajwerner/cockroach that referenced this issue Sep 8, 2022
…ation

Schema change upgrade migrations to system tables are made idempotent by
checking that the descriptor reaches some expected state. In order to ensure
that it is in that expected state, some volatile fields need to be masked.
We forgot to mask CreatedAtNanos. We also lost the testing which came with
these helper functions we use.

The vast majority of this PR is reviving testing from cockroachdb#66889.

Fixes cockroachdb#85228.

Release justification: Import bug fix for backport

Release note (bug fix): Some upgrade migrations perform schema changes on
system tables. Those upgrades which added indexes could, in some cases, get
caught retrying because they failed to detect that the migration had already
occurred due to the existence of a populated field. When that happens, the
finalization of the new version can hang indefinitely and require manual
intervention. This bug has been fixed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-schema-catalog Related to the schema descriptors collection and the catalog API in general. branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. GA-blocker
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants