Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/ccl/testccl/sqlccl/sqlccl_test: TestExplainGist failed #130282

Closed
cockroach-teamcity opened this issue Sep 7, 2024 · 10 comments · Fixed by #133130
Closed

pkg/ccl/testccl/sqlccl/sqlccl_test: TestExplainGist failed #130282

cockroach-teamcity opened this issue Sep 7, 2024 · 10 comments · Fixed by #133130
Assignees
Labels
branch-master Failures and bugs on the master branch. branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. P-2 Issues/test failures with a fix SLA of 3 months T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Sep 7, 2024

pkg/ccl/testccl/sqlccl/sqlccl_test.TestExplainGist failed on master @ fa9c0528fc0d06be1b4cfc534ec0501448111fbe:

        LIMIT
        	58:::INT8;
        DELETE FROM
        	defaultdb.public.seed AS tab_725
        USING
        	defaultdb.public.seed AS tab_726, defaultdb.public.seed AS tab_727, defaultdb.public.seed AS tab_728
        LIMIT
        	91:::INT8;
        
    explain_test.go:204: pq: internal error: requested column col_647 not in index: WITH
        	with_108 (col_1144)
        		AS (
        			SELECT * FROM (VALUES ('hello':::greeting), ('added_val_1':::greeting)) AS tab_733 (col_1144)
        			UNION
        				SELECT
        					*
        				FROM
        					(
        						VALUES
        							('howdy':::greeting),
        							(
        								(
        									SELECT
        										'added_val_1':::greeting AS col_1145
        									FROM
        										defaultdb.public.seed AS tab_734
        									WHERE
        										false
        									GROUP BY
        										tab_734._timestamptz
        									HAVING
        										bool_or(tab_734._bool::BOOL)::BOOL
        									ORDER BY
        										tab_734._timestamptz DESC
        									LIMIT
        										1:::INT8
        								)
        							)
        					)
        						AS tab_735 (col_1146)
        		),
        	with_109 (col_1147, col_1148)
        		AS (SELECT * FROM (VALUES (NULL, e'7!$\x126$\x0b':::STRING:::NAME)) AS tab_736 (col_1147, col_1148))
        SELECT
        	(-15528):::INT8 AS col_1149, 'morning':::greeting AS col_1150, tab_737._inet AS col_1151
        FROM
        	with_109 AS cte_ref_33, defaultdb.public.seed@seed__int8__float8__date_idx AS tab_737
        WHERE
        	tab_737._bool
    --- FAIL: TestExplainGist/main (8.95s)

Parameters:

  • attempt=1
  • run=19
  • shard=2
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

Jira issue: CRDB-41972

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-sql-queries SQL Queries Team labels Sep 7, 2024
@github-project-automation github-project-automation bot moved this to Triage in SQL Queries Sep 7, 2024
@DrewKimball DrewKimball self-assigned this Sep 10, 2024
@DrewKimball
Copy link
Collaborator

Looks like an ALTER PRIMARY KEY was running when the failure happened:

I240907 06:29:19.845188 7037 sql/schemachanger/scrun/scrun.go:184 ⋮ [T10,Vtest-tenant,n1,job=‹NEW SCHEMA CHANGE id=1001456136745910273›] 656  executing declarative schema change PostCommitPhase stage 10 of 15 with 4 BackfillType ops (rollback=false) for ALTER TABLE
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657  encountered internal error:
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +requested column ‹col_647› not in index
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +(1) assertion failure
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +Wraps: (2) attached stack trace
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  -- stack trace:
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | github.com/cockroachdb/cockroach/pkg/sql/rowenc.InitIndexFetchSpec
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | 	github.com/cockroachdb/cockroach/pkg/sql/rowenc/index_fetch.go:118
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | github.com/cockroachdb/cockroach/pkg/sql.initTableReaderSpecTemplate
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | 	github.com/cockroachdb/cockroach/pkg/sql/distsql_physical_planner.go:2020
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).createTableReaders
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | 	github.com/cockroachdb/cockroach/pkg/sql/distsql_physical_planner.go:2038
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).createPhysPlanForPlanNode
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | 	github.com/cockroachdb/cockroach/pkg/sql/distsql_physical_planner.go:3913
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).createPlanForIndexJoin
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | 	github.com/cockroachdb/cockroach/pkg/sql/distsql_physical_planner.go:3096
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).createPhysPlanForPlanNode
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | 	github.com/cockroachdb/cockroach/pkg/sql/distsql_physical_planner.go:3842
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).createPhysPlanForPlanNode
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | 	github.com/cockroachdb/cockroach/pkg/sql/distsql_physical_planner.go:3822
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).createPhysPlanForPlanNode
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | 	github.com/cockroachdb/cockroach/pkg/sql/distsql_physical_planner.go:3883
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).createPhysPlan
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | 	github.com/cockroachdb/cockroach/pkg/sql/distsql_physical_planner.go:3781
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).PlanAndRun
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | 	github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:1983
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).PlanAndRunAll.func3
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | 	github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:1704
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).PlanAndRunAll
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | 	github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:1707
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execWithDistSQLEngine
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | 	github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:2427
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).dispatchToExecutionEngine
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | 	github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:1974
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmtInOpenState
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | 	github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:1181
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmt.func1
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | 	github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:146
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execWithProfiling
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | 	github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:3436
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmt
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | 	github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:145
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execCmd.func1
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | 	github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:2320
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execCmd
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | 	github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:2325
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | 	github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:2242
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | github.com/cockroachdb/cockroach/pkg/sql.(*Server).ServeConn
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | 	github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:961
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*conn).processCommands
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | 	github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:256
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*Server).serveImpl.func4
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | 	github.com/cockroachdb/cockroach/pkg/sql/pgwire/server.go:1136
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | runtime.goexit
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +  | 	src/runtime/asm_amd64.s:1695
E240907 06:29:19.869452 2354 sql/sqltelemetry/report.go:57 ⋮ [T10,Vtest-tenant,n1,client=127.0.0.1:56442,hostssl,user=root] 657 +Wraps: (3) requested column ‹col_647› not in index

@DrewKimball
Copy link
Collaborator

It was ALTER TABLE seed ALTER PRIMARY KEY USING COLUMNS (col_647);

@DrewKimball
Copy link
Collaborator

Here's the preceding schema changer state:

I240907 06:29:19.815202 7037 sql/schemachanger/scrun/scrun.go:184 ⋮ [T10,Vtest-tenant,n1,job=‹NEW SCHEMA CHANGE id=1001456136745910273›] 653  executing declarative schema change PostCommitPhase stage 9 of 15 with 6 MutationType ops (rollback=false) for ALTER TABLE
I240907 06:29:19.836162 7037 sql/catalog/lease/lease.go:278 ⋮ [T10,Vtest-tenant,n1,job=‹NEW SCHEMA CHANGE id=1001456136745910273›] 654  waiting for 1 leases to expire: desc=[{‹seed› 106 27}]
I240907 06:29:19.840742 7582 sql/catalog/lease/descriptor_state.go:147 ⋮ [n1] 655  new lease: tabledesc.immutable: {ID: 106, Version: 28, ModificationTime: "1725690559.815055485,0", ParentID: 100, ParentSchemaID: 101, State: PUBLIC, NextColumnID: 20, Columns: [{ID: 1, TypeID: 21, Null: true}, {ID: 2, TypeID: 23, Null: true}, {ID: 3, TypeID: 20, Null: true}, {ID: 4, TypeID: 700, Null: true}, {ID: 5, TypeID: 701, Null: true}, {ID: 6, TypeID: 1082, Null: true}, {ID: 7, TypeID: 1114, Null: true}, {ID: 8, TypeID: 1184, Null: true}, {ID: 9, TypeID: 1186, Null: true}, {ID: 10, TypeID: 16, Null: true}, {ID: 11, TypeID: 1700, Null: true}, {ID: 12, TypeID: 25, Null: true}, {ID: 13, TypeID: 17, Null: true}, {ID: 14, TypeID: 2950, Null: true}, {ID: 15, TypeID: 869, Null: true}, {ID: 16, TypeID: 3802, Null: true}, {ID: 17, TypeID: 100104, Null: true}, {ID: 18, TypeID: 20, Null: true, Hidden: true, HasDefault: true}, {ID: 19, TypeID: 2950, Null: false, HasDefault: true}], NextFamilyID: 1, Families: [{ID: 0, Columns: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]}], Mutations: [{MutationID: 2, Direction: DROP, State: WRITE_ONLY, ConstraintType: NOT_NULL, NotNullColumn: 18, Check: {Columns: [18], Validity: Dropping, State: DROP, MutationID: 2}}, {MutationID: 2, Direction: DROP, State: DELETE_ONLY, Index: {ID: 17, Unique: true, KeyColumns: [{ID: 19, Dir: ASC}], StoreColumns: [18, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17], State: DROP, MutationID: 2}}, {MutationID: 2, Direction: ADD, State: BACKFILLING, Index: {ID: 18, Unique: true, KeyColumns: [{ID: 19, Dir: ASC}], StoreColumns: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17], State: ADD, MutationID: 2}}, {MutationID: 2, Direction: DROP, State: WRITE_ONLY, Index: {ID: 6, Unique: true, KeyColumns: [{ID: 18, Dir: ASC}], StoreColumns: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19], State: DROP, MutationID: 2}}, {MutationID: 2, Direction: ADD, State: WRITE_ONLY, Index: {ID: 19, Unique: true, KeyColumns: [{ID: 19, Dir: ASC}], StoreColumns: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17], State: ADD, MutationID: 2}}, {MutationID: 2, Direction: ADD, State: BACKFILLING, Index: {ID: 10, Unique: false, KeyColumns: [{ID: 3, Dir: ASC}, {ID: 5, Dir: ASC}, {ID: 6, Dir: ASC}], KeySuffixColumns: [19], State: ADD, MutationID: 2}}, {MutationID: 2, Direction: ADD, State: WRITE_ONLY, Index: {ID: 11, Unique: false, KeyColumns: [{ID: 3, Dir: ASC}, {ID: 5, Dir: ASC}, {ID: 6, Dir: ASC}], KeySuffixColumns: [19], State: ADD, MutationID: 2}}, {MutationID: 2, Direction: ADD, State: BACKFILLING, Index: {ID: 12, Unique: false, KeyColumns: [{ID: 16, Dir: ASC}], KeySuffixColumns: [19], State: ADD, MutationID: 2}}, {MutationID: 2, Direction: ADD, State: WRITE_ONLY, Index: {ID: 13, Unique: false, KeyColumns: [{ID: 16, Dir: ASC}], KeySuffixColumns: [19], State: ADD, MutationID: 2}}, {MutationID: 2, Direction: ADD, State: BACKFILLING, Index: {ID: 14, Unique: false, KeyColumns: [{ID: 5, Dir: DESC}], KeySuffixColumns: [19], StoreColumns: [13], State: ADD, MutationID: 2}}, {MutationID: 2, Direction: ADD, State: WRITE_ONLY, Index: {ID: 15, Unique: false, KeyColumns: [{ID: 5, Dir: DESC}], KeySuffixColumns: [19], StoreColumns: [13], State: ADD, MutationID: 2}}], PrimaryIndex: 16, NextIndexID: 20, Indexes: [{ID: 16, Unique: true, KeyColumns: [{ID: 19, Dir: ASC}], StoreColumns: [18, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]}, {ID: 2, Unique: false, KeyColumns: [{ID: 3, Dir: ASC}, {ID: 5, Dir: ASC}, {ID: 6, Dir: ASC}], KeySuffixColumns: [18]}, {ID: 4, Unique: false, KeyColumns: [{ID: 16, Dir: ASC}], KeySuffixColumns: [18]}, {ID: 8, Unique: false, KeyColumns: [{ID: 5, Dir: DESC}], KeySuffixColumns: [18], StoreColumns: [13]}], Checks: [{Columns: [], Validity: Validated}, {Columns: [], Validity: Validated}, {Columns: [18], Validity: Dropping}]}

@DrewKimball
Copy link
Collaborator

I see there are indexes like this:

{ID: 4, Unique: false, KeyColumns: [{ID: 16, Dir: ASC}], KeySuffixColumns: [18]}

Where column 18 is the key suffix column instead of 19 (which is the new primary key).

@fqazi fqazi added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-queries SQL Queries Team labels Sep 11, 2024
@fqazi
Copy link
Collaborator

fqazi commented Sep 11, 2024

Looking at the mutations on the table descriptor, when the crash happens we see:

tabledesc.immutable: {ID: 106, Version: 27, ModificationTime: "1725690559.590404436,0", ParentID: 100, ParentSchemaID: 101, State: PUBLIC, NextColumnID
: 20, Columns: [{ID: 1, TypeID: 21, Null: true}, {ID: 2, TypeID: 23, Null: true}, {ID: 3, TypeID: 20, Null: true}, {ID: 4, TypeID: 700, Null: true}, {ID: 5, TypeID: 701, Null: true}, {ID: 6, TypeID: 1082, Null: true}, {ID: 7, TypeID: 1114, Null:
 true}, {ID: 8, TypeID: 1184, Null: true}, {ID: 9, TypeID: 1186, Null: true}, {ID: 10, TypeID: 16, Null: true}, {ID: 11, TypeID: 1700, Null: true}, {ID: 12, TypeID: 25, Null: true}, {ID: 13, TypeID: 17, Null: true}, {ID: 14, TypeID: 2950, Null:
true}, {ID: 15, TypeID: 869, Null: true}, {ID: 16, TypeID: 3802, Null: true}, {ID: 17, TypeID: 100104, Null: true}, {ID: 18, TypeID: 20, Null: true, Hidden: true, HasDefault: true}, {ID: 19, TypeID: 2950, Null: false, HasDefault: true}], NextFam
ilyID: 1, Families: [{ID: 0, Columns: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]}], Mutations: [{MutationID: 2, Direction: DROP, State: WRITE_ONLY, ConstraintType: NOT_NULL, NotNullColumn: 18, Check: {Columns: [18], Vali
dity: Dropping, State: DROP, MutationID: 2}}, {MutationID: 2, Direction: DROP, State: DELETE_ONLY, Index: {ID: 17, Unique: true, KeyColumns: [{ID: 19, Dir: ASC}], StoreColumns: [18, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17], Sta
te: DROP, MutationID: 2}}, {MutationID: 2, Direction: ADD, State: BACKFILLING, Index: {ID: 18, Unique: true, KeyColumns: [{ID: 19, Dir: ASC}], StoreColumns: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17], State: ADD, MutationID: 2}}
, {MutationID: 2, Direction: DROP, State: WRITE_ONLY, Index: {ID: 6, Unique: true, KeyColumns: [{ID: 18, Dir: ASC}], StoreColumns: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19], State: DROP, MutationID: 2}}, {MutationID: 2, Dir
ection: ADD, State: DELETE_ONLY, Index: {ID: 19, Unique: true, KeyColumns: [{ID: 19, Dir: ASC}], StoreColumns: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17], State: ADD, MutationID: 2}}, {MutationID: 2, Direction: ADD, State: BACKF
ILLING, Index: {ID: 10, Unique: false, KeyColumns: [{ID: 3, Dir: ASC}, {ID: 5, Dir: ASC}, {ID: 6, Dir: ASC}], KeySuffixColumns: [19], State: ADD, MutationID: 2}}, {MutationID: 2, Direction: ADD, State: DELETE_ONLY, Index: {ID: 11, Unique: false,
 KeyColumns: [{ID: 3, Dir: ASC}, {ID: 5, Dir: ASC}, {ID: 6, Dir: ASC}], KeySuffixColumns: [19], State: ADD, MutationID: 2}}, {MutationID: 2, Direction: ADD, State: BACKFILLING, Index: {ID: 12, Unique: false, KeyColumns: [{ID: 16, Dir: ASC}], Key
SuffixColumns: [19], State: ADD, MutationID: 2}}, {MutationID: 2, Direction: ADD, State: DELETE_ONLY, Index: {ID: 13, Unique: false, KeyColumns: [{ID: 16, Dir: ASC}], KeySuffixColumns: [19], State: ADD, MutationID: 2}}, {MutationID: 2, Direction
: ADD, State: BACKFILLING, Index: {ID: 14, Unique: false, KeyColumns: [{ID: 5, Dir: DESC}], KeySuffixColumns: [19], StoreColumns: [13], State: ADD, MutationID: 2}}, {MutationID: 2, Direction: ADD, State: DELETE_ONLY, Index: {ID: 15, Unique: fals
e, KeyColumns: [{ID: 5, Dir: DESC}], KeySuffixColumns: [19], StoreColumns: [13], State: ADD, MutationID: 2}}], PrimaryIndex: 16, NextIndexID: 20, Indexes: [{ID: 16, Unique: true, KeyColumns: [{ID: 19, Dir: ASC}], StoreColumns: [18, 1, 2, 3, 4, 5
, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]}, {ID: 2, Unique: false, KeyColumns: [{ID: 3, Dir: ASC}, {ID: 5, Dir: ASC}, {ID: 6, Dir: ASC}], KeySuffixColumns: [18]}, {ID: 4, Unique: false, KeyColumns: [{ID: 16, Dir: ASC}], KeySuffixColumns: [18
]}, {ID: 8, Unique: false, KeyColumns: [{ID: 5, Dir: DESC}], KeySuffixColumns: [18], StoreColumns: [13]}], Checks: [{Columns: [], Validity: Validated}, {Columns: [], Validity: Validated}, {Columns: [18], Validity: Dropping}]}

During this state the old secondary indexes are public along with the new primary index, which is bad, since the optimizer assumes all secondary indexes will have PK columns. This delay happens probably because of this rule:

.

@DrewKimball DrewKimball removed their assignment Sep 12, 2024
@rafiss rafiss removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Sep 17, 2024
@rafiss
Copy link
Collaborator

rafiss commented Sep 17, 2024

Let's see if we can reproduce this by using the query in the initial issue description. Perhaps we can make a deterministic repro by adding a test in https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/schemachanger/dml_injection_test.go that selects from a secondary index during ALTER PRIMARY KEY.

@exalate-issue-sync exalate-issue-sync bot added the P-2 Issues/test failures with a fix SLA of 3 months label Sep 17, 2024
@mgartner mgartner removed this from SQL Queries Sep 17, 2024
spilchen added a commit to spilchen/cockroach that referenced this issue Sep 27, 2024
…sible in the same stage

We encountered a situation where, during a primary key modification, the old
secondary index was used to retrieve a column from the new primary key. This
triggered an assertion failure because the query assumed that all secondary
indexes include the primary key columns as a prefix, which was not the case for
the old secondary index.

This change ensures that when the new primary key becomes visible, the old
secondary index is made invisible within the same stage. There are two
dependency rules involved here: one ensures the primary key is in place before
the secondary index, and another handles the swap between old and new secondary
indexes. Both rules have been adjusted to execute within the same stage.

Note: I was unable to reproduce the issue, so no new test case has been added.

Epic: None
Closes cockroachdb#130282
Release note: None
@mgartner
Copy link
Collaborator

mgartner commented Oct 8, 2024

We saw this fail a few times in #130755.

@spilchen
Copy link
Contributor

To fix this in the declarative schema changer, we need to rebuild transient secondary indexes during an ALTER PRIMARY KEY operation, specifically when changing the PK from rowid to a user-defined column. The declarative schema changer currently decomposes this into the following steps:

  1. Create a new PK that includes the user-defined column.
  2. Create a new PK with the rowid removed.
  3. Rebuild secondary indexes for the new PK prefix.

The issue arises if a query using a secondary index is executed after step 1 but before the secondary indexes are rebuilt. If the query requires the newly added PK column, it assumes that column is in the secondary index’s PK prefix. However, since the secondary indexes are based on the old PK (which includes only the rowid), the column won’t be available in the PK prefix.

To address this, we can update the schema changer process as follows:

  1. Create a new PK with the user-defined column.
  2. Rebuild all secondary indexes so they include the new PK prefix.
  3. Create a new PK with the rowid removed.
  4. Rebuild all secondary indexes again with the final PK prefix.

In this approach, the first two steps make the new PK and secondary indexes visible at the same time, and the same happens for the final two steps.

The trade-off with this method is that we are rebuilding secondary indexes twice, which only benefits inflight queries during the schema change. Depending on the number of indexes and the table size, this could significantly impact the overall performance of the operation.

Copy link

pkg/ccl/testccl/sqlccl/sqlccl_test.TestExplainGist failed on master @ 3ec1c388c59b62ebc98cf515a87db9ef6d0f21a7:

        	false
        ORDER BY
        	tab_748._float8 DESC NULLS LAST
        LIMIT
        	80:::INT8;
        INSERT
        INTO
        	defaultdb.public.tab_543 AS tab_759 (_inet, _uuid, _decimal, _timestamp, _float8, _int4)
        VALUES
        	(
        		NULL,
        		'2759052f-2c14-46f9-9aab-73601252ee93':::UUID,
        		3.059257562550128425E+30:::DECIMAL,
        		'2009-01-31 10:54:26.000304':::TIMESTAMP,
        		(-1.558196046520631):::FLOAT8,
        		(-1664980493):::INT8
        	);
        UPDATE
        	defaultdb.public.tab_543 AS tab_762
        SET
        	_bool = tab_762._bool, _int2 = NULL
        FROM
        	defaultdb.public.tab_543 AS tab_763
        LIMIT
        	83:::INT8;
        
    explain_test.go:201: pq: internal error: requested column col0_13 not in index: INSERT
        INTO
        	defaultdb.public.tab_147 AS tab_771
        		(col_315, col0_10, col0_9, col0_8, col0_7, col0_6, col0_5, col0_4, col0_3, col0_2)
        SELECT
        	tab_772.col_315 AS col_1202,
        	tab_772.col0_10 AS col_1203,
        	tab_772.col0_7 AS col_1204,
        	(tab_772.col_315::TIMESTAMPTZ || e'T\x1d[H\x1dg^\x03':::STRING::STRING)::STRING AS col_1205,
        	(tab_772.col0_9 AND tab_772.col0_6) AS col_1206,
        	tab_772.col0_6 AS col_1207,
        	uuid_generate_v4()::UUID AS col_1208,
        	tab_772.col0_4 AS col_1209,
        	8.693877347825926943E+23:::DECIMAL AS col_1210,
        	B'0100100101100001100010111100001110010101011101000' AS col_1211
        FROM
        	defaultdb.public.tab_147@tab_147_col0_14_key AS tab_772
        WHERE
        	tab_772.col0_6
        ORDER BY
        	tab_772.col0_14 ASC NULLS FIRST, tab_772.col_315 DESC NULLS LAST
        LIMIT
        	58:::INT8
    --- FAIL: TestExplainGist/main (8.60s)

Parameters:

  • attempt=1
  • run=1
  • shard=2
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

spilchen added a commit to spilchen/cockroach that referenced this issue Oct 21, 2024
TestExplainGist occasionally fails when a query using a secondary index
tries to fetch a column not included in that index (see issue cockroachdb#133129).
This change doesn’t address the root cause, but instead ignores the
error when it occurs. I've also created a more reliable reproducer in
the TestDMLInjectionTest, which we can use to validate the eventual fix.

Epic: none
Closes cockroachdb#130282
Release note: none
spilchen added a commit to spilchen/cockroach that referenced this issue Oct 21, 2024
TestExplainGist occasionally fails when a query using a secondary index
tries to fetch a column not included in that index (see issue cockroachdb#130282).
This change doesn’t address the root cause, but instead ignores the
error when it occurs. I've also created a more reliable reproducer in
the TestDMLInjectionTest, which we can use to validate the eventual fix
(cockroachdb#133129).

Epic: none
Closes cockroachdb#130282
Release note: none
craig bot pushed a commit that referenced this issue Oct 23, 2024
133130: sqlccl: deflake TestExplainGist when run with concurrent ALTER PK r=rafiss,michae2 a=spilchen

TestExplainGist occasionally fails when a query using a secondary index tries to fetch a column not included in that index (see issue #130282). This change doesn’t address the root cause, but instead ignores the error when it occurs. I've also created a more reliable reproducer in the TestDMLInjectionTest, which we can use to validate the eventual fix (#133129).

Epic: none
Closes #130282
Release note: none

133256: roachprod: default to buffering file sinks in roachprod r=jbowens a=jbowens

In roachprod clusters, default to using buffering in file sinks. This is required by a subsequent change that will default to using WAL failover in roachprod clusters.

Informs #133248
Informs #129922
Epic: CRDB-37534
Release note: none

Co-authored-by: Matt Spilchen <[email protected]>
Co-authored-by: Jackson Owens <[email protected]>
@craig craig bot closed this as completed in 302522b Oct 23, 2024
Copy link

blathers-crl bot commented Oct 24, 2024

Based on the specified backports for linked PR #133130, I applied the following new label(s) to this issue: branch-release-23.1, branch-release-23.2, branch-release-24.1, branch-release-24.2, branch-release-24.3. Please adjust the labels as needed to match the branches actually affected by this issue, including adding any known older branches.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl blathers-crl bot added branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 labels Oct 24, 2024
blathers-crl bot pushed a commit that referenced this issue Oct 24, 2024
TestExplainGist occasionally fails when a query using a secondary index
tries to fetch a column not included in that index (see issue #130282).
This change doesn’t address the root cause, but instead ignores the
error when it occurs. I've also created a more reliable reproducer in
the TestDMLInjectionTest, which we can use to validate the eventual fix
(#133129).

Epic: none
Closes #130282
Release note: none
blathers-crl bot pushed a commit that referenced this issue Oct 24, 2024
TestExplainGist occasionally fails when a query using a secondary index
tries to fetch a column not included in that index (see issue #130282).
This change doesn’t address the root cause, but instead ignores the
error when it occurs. I've also created a more reliable reproducer in
the TestDMLInjectionTest, which we can use to validate the eventual fix
(#133129).

Epic: none
Closes #130282
Release note: none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. P-2 Issues/test failures with a fix SLA of 3 months T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
6 participants