Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/schemachanger: Secondary indexes and PK are inconsistent during ALTER PK #133129

Open
spilchen opened this issue Oct 21, 2024 · 2 comments
Open
Labels
A-schema-changes branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. P-3 Issues/test failures with no fix SLA skipped-test T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@spilchen
Copy link
Contributor

spilchen commented Oct 21, 2024

Describe the problem

There is an issue when altering the primary key (PK) to replace the rowid column with a user-defined column. During this schema change, a transient version of the PK is created before reaching the final version: initially, both the new user column and the rowid coexist in the PK, and eventually, the final PK has only the user column, with rowid removed. However, if the table has secondary indexes, these aren't rebuilt until the final PK version is established.

This creates a problem when queries using the secondary index are run during the schema change. If a query references columns that aren't included in the secondary index but should be based on the new PK columns, the query will fail, reporting that the requested column is missing from the index. The root of the issue is that the old secondary indexes are being used alongside the new PK during the schema change.

Potential solutions include:

  • Rebuild the secondary indexes multiple times in sync with each PK change, ensuring consistency between the PK and secondaries at each stage.
  • Enhance the optimizer to detect this mismatch and force a query retry.
  • Avoid creating multiple versions of the PK during this operation and transition directly to the final version. This would require enforcing a rule that prevents altering the PK in combination with other operations in the same ALTER statement.

This problem has existed since at least 23.1.

To Reproduce

The test was initially found when TestExplainGist would fail occasionally. See #130282.

Jira issue: CRDB-43464

@spilchen spilchen added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-schema-changes T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels Oct 21, 2024
Copy link

blathers-crl bot commented Oct 21, 2024

Hi @spilchen, please add branch-* labels to identify which branch(es) this C-bug affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@spilchen spilchen added branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 labels Oct 21, 2024
spilchen added a commit to spilchen/cockroach that referenced this issue Oct 21, 2024
TestExplainGist occasionally fails when a query using a secondary index
tries to fetch a column not included in that index (see issue cockroachdb#133129).
This change doesn’t address the root cause, but instead ignores the
error when it occurs. I've also created a more reliable reproducer in
the TestDMLInjectionTest, which we can use to validate the eventual fix.

Epic: none
Closes cockroachdb#130282
Release note: none
spilchen added a commit to spilchen/cockroach that referenced this issue Oct 21, 2024
TestExplainGist occasionally fails when a query using a secondary index
tries to fetch a column not included in that index (see issue cockroachdb#130282).
This change doesn’t address the root cause, but instead ignores the
error when it occurs. I've also created a more reliable reproducer in
the TestDMLInjectionTest, which we can use to validate the eventual fix
(cockroachdb#133129).

Epic: none
Closes cockroachdb#130282
Release note: none
@spilchen
Copy link
Contributor Author

We may not be able to transition from the old to the new index. See this slack conversion to understand why the transient PK is needed: https://cockroachlabs.slack.com/archives/C04N0AS14CT/p1677706410017719

spilchen added a commit to spilchen/cockroach that referenced this issue Oct 23, 2024
TestExplainGist occasionally fails when a query using a secondary index
tries to fetch a column not included in that index (see issue cockroachdb#130282).
This change doesn’t address the root cause, but instead ignores the
error when it occurs. I've also created a more reliable reproducer in
the TestDMLInjectionTest, which we can use to validate the eventual fix
(cockroachdb#133129).

Epic: none
Closes cockroachdb#130282
Release note: none
@exalate-issue-sync exalate-issue-sync bot added the P-3 Issues/test failures with no fix SLA label Oct 23, 2024
craig bot pushed a commit that referenced this issue Oct 23, 2024
133130: sqlccl: deflake TestExplainGist when run with concurrent ALTER PK r=rafiss,michae2 a=spilchen

TestExplainGist occasionally fails when a query using a secondary index tries to fetch a column not included in that index (see issue #130282). This change doesn’t address the root cause, but instead ignores the error when it occurs. I've also created a more reliable reproducer in the TestDMLInjectionTest, which we can use to validate the eventual fix (#133129).

Epic: none
Closes #130282
Release note: none

133256: roachprod: default to buffering file sinks in roachprod r=jbowens a=jbowens

In roachprod clusters, default to using buffering in file sinks. This is required by a subsequent change that will default to using WAL failover in roachprod clusters.

Informs #133248
Informs #129922
Epic: CRDB-37534
Release note: none

Co-authored-by: Matt Spilchen <[email protected]>
Co-authored-by: Jackson Owens <[email protected]>
blathers-crl bot pushed a commit that referenced this issue Oct 24, 2024
TestExplainGist occasionally fails when a query using a secondary index
tries to fetch a column not included in that index (see issue #130282).
This change doesn’t address the root cause, but instead ignores the
error when it occurs. I've also created a more reliable reproducer in
the TestDMLInjectionTest, which we can use to validate the eventual fix
(#133129).

Epic: none
Closes #130282
Release note: none
blathers-crl bot pushed a commit that referenced this issue Oct 24, 2024
TestExplainGist occasionally fails when a query using a secondary index
tries to fetch a column not included in that index (see issue #130282).
This change doesn’t address the root cause, but instead ignores the
error when it occurs. I've also created a more reliable reproducer in
the TestDMLInjectionTest, which we can use to validate the eventual fix
(#133129).

Epic: none
Closes #130282
Release note: none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-schema-changes branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. P-3 Issues/test failures with no fix SLA skipped-test T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
Development

No branches or pull requests

2 participants