-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/ccl/testccl/sqlccl/sqlccl_test: TestExplainGist failed #130282
Comments
Looks like an ALTER PRIMARY KEY was running when the failure happened:
|
It was |
Here's the preceding schema changer state:
|
I see there are indexes like this:
Where column |
Looking at the mutations on the table descriptor, when the crash happens we see:
During this state the old secondary indexes are public along with the new primary index, which is bad, since the optimizer assumes all secondary indexes will have PK columns. This delay happens probably because of this rule:
|
Let's see if we can reproduce this by using the query in the initial issue description. Perhaps we can make a deterministic repro by adding a test in https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/schemachanger/dml_injection_test.go that selects from a secondary index during |
…sible in the same stage We encountered a situation where, during a primary key modification, the old secondary index was used to retrieve a column from the new primary key. This triggered an assertion failure because the query assumed that all secondary indexes include the primary key columns as a prefix, which was not the case for the old secondary index. This change ensures that when the new primary key becomes visible, the old secondary index is made invisible within the same stage. There are two dependency rules involved here: one ensures the primary key is in place before the secondary index, and another handles the swap between old and new secondary indexes. Both rules have been adjusted to execute within the same stage. Note: I was unable to reproduce the issue, so no new test case has been added. Epic: None Closes cockroachdb#130282 Release note: None
We saw this fail a few times in #130755. |
To fix this in the declarative schema changer, we need to rebuild transient secondary indexes during an ALTER PRIMARY KEY operation, specifically when changing the PK from rowid to a user-defined column. The declarative schema changer currently decomposes this into the following steps:
The issue arises if a query using a secondary index is executed after step 1 but before the secondary indexes are rebuilt. If the query requires the newly added PK column, it assumes that column is in the secondary index’s PK prefix. However, since the secondary indexes are based on the old PK (which includes only the rowid), the column won’t be available in the PK prefix. To address this, we can update the schema changer process as follows:
In this approach, the first two steps make the new PK and secondary indexes visible at the same time, and the same happens for the final two steps. The trade-off with this method is that we are rebuilding secondary indexes twice, which only benefits inflight queries during the schema change. Depending on the number of indexes and the table size, this could significantly impact the overall performance of the operation. |
pkg/ccl/testccl/sqlccl/sqlccl_test.TestExplainGist failed on master @ 3ec1c388c59b62ebc98cf515a87db9ef6d0f21a7:
Parameters:
|
TestExplainGist occasionally fails when a query using a secondary index tries to fetch a column not included in that index (see issue cockroachdb#133129). This change doesn’t address the root cause, but instead ignores the error when it occurs. I've also created a more reliable reproducer in the TestDMLInjectionTest, which we can use to validate the eventual fix. Epic: none Closes cockroachdb#130282 Release note: none
TestExplainGist occasionally fails when a query using a secondary index tries to fetch a column not included in that index (see issue cockroachdb#130282). This change doesn’t address the root cause, but instead ignores the error when it occurs. I've also created a more reliable reproducer in the TestDMLInjectionTest, which we can use to validate the eventual fix (cockroachdb#133129). Epic: none Closes cockroachdb#130282 Release note: none
133130: sqlccl: deflake TestExplainGist when run with concurrent ALTER PK r=rafiss,michae2 a=spilchen TestExplainGist occasionally fails when a query using a secondary index tries to fetch a column not included in that index (see issue #130282). This change doesn’t address the root cause, but instead ignores the error when it occurs. I've also created a more reliable reproducer in the TestDMLInjectionTest, which we can use to validate the eventual fix (#133129). Epic: none Closes #130282 Release note: none 133256: roachprod: default to buffering file sinks in roachprod r=jbowens a=jbowens In roachprod clusters, default to using buffering in file sinks. This is required by a subsequent change that will default to using WAL failover in roachprod clusters. Informs #133248 Informs #129922 Epic: CRDB-37534 Release note: none Co-authored-by: Matt Spilchen <[email protected]> Co-authored-by: Jackson Owens <[email protected]>
Based on the specified backports for linked PR #133130, I applied the following new label(s) to this issue: branch-release-23.1, branch-release-23.2, branch-release-24.1, branch-release-24.2, branch-release-24.3. Please adjust the labels as needed to match the branches actually affected by this issue, including adding any known older branches. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
TestExplainGist occasionally fails when a query using a secondary index tries to fetch a column not included in that index (see issue #130282). This change doesn’t address the root cause, but instead ignores the error when it occurs. I've also created a more reliable reproducer in the TestDMLInjectionTest, which we can use to validate the eventual fix (#133129). Epic: none Closes #130282 Release note: none
TestExplainGist occasionally fails when a query using a secondary index tries to fetch a column not included in that index (see issue #130282). This change doesn’t address the root cause, but instead ignores the error when it occurs. I've also created a more reliable reproducer in the TestDMLInjectionTest, which we can use to validate the eventual fix (#133129). Epic: none Closes #130282 Release note: none
pkg/ccl/testccl/sqlccl/sqlccl_test.TestExplainGist failed on master @ fa9c0528fc0d06be1b4cfc534ec0501448111fbe:
Parameters:
attempt=1
run=19
shard=2
Help
See also: How To Investigate a Go Test Failure (internal)
This test on roachdash | Improve this report!
Jira issue: CRDB-41972
The text was updated successfully, but these errors were encountered: