Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/tests: TestRandomSyntaxGeneration failed [ALTER TENANT (SELECT 'foo') START REPLICATION ... immediately crashes the database process] #135784

Closed
cockroach-teamcity opened this issue Nov 20, 2024 · 6 comments
Assignees
Labels
A-disaster-recovery branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. P-0 Issues/test failures with a fix SLA of 2 weeks release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-disaster-recovery

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Nov 20, 2024

sql/tests.TestRandomSyntaxGeneration failed with artifacts on release-24.2.5-rc @ beddec16a5a438d66bd3f5f4202085cf21c0973c:

*   | 	github.com/cockroachdb/cockroach/pkg/sql/sem/tree/type_check.go:355
*   | github.com/cockroachdb/cockroach/pkg/sql.(*planner).analyzeExpr
*   | 	github.com/cockroachdb/cockroach/pkg/sql/analyze_expr.go:34
*   | github.com/cockroachdb/cockroach/pkg/sql.(*planner).planTenantSpec
*   | 	github.com/cockroachdb/cockroach/pkg/sql/tenant_spec.go:63
*   | github.com/cockroachdb/cockroach/pkg/sql.(*planner).LookupTenantInfo
*   | 	github.com/cockroachdb/cockroach/pkg/sql/tenant_spec.go:148
*   | github.com/cockroachdb/cockroach/pkg/ccl/crosscluster/physical.alterReplicationJobHook.func1
*   | 	github.com/cockroachdb/cockroach/pkg/ccl/crosscluster/physical/alter_replication_job.go:205
*   | github.com/cockroachdb/cockroach/pkg/sql.(*hookFnNode).startExec.func1
*   | 	github.com/cockroachdb/cockroach/pkg/sql/planhook.go:194
*   | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2
*   | 	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:480
*   | runtime.goexit
*   | 	src/runtime/asm_amd64.s:1695
* Wraps: (4) ReturnType called on TypedExpr with empty typeAnnotation. Was the underlying Expr type-checked before asserting a type of TypedExpr?
* Error types: (1) *withstack.withStack (2) *assert.withAssertionFailure (3) *withstack.withStack (4) *errutil.leafError
*
panic: ReturnType called on TypedExpr with empty typeAnnotation. Was the underlying Expr type-checked before asserting a type of TypedExpr? [recovered]
	panic: ReturnType called on TypedExpr with empty typeAnnotation. Was the underlying Expr type-checked before asserting a type of TypedExpr?

goroutine 420896 [running]:
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).recover(0x454825?, {0x85580b0, 0xc00dbaa570})
	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:226 +0x65
panic({0x64ffe00?, 0xc018b907b0?})
	GOROOT/src/runtime/panic.go:770 +0x132
github.com/cockroachdb/cockroach/pkg/sql/sem/tree.typeAnnotation.assertTyped(...)
	github.com/cockroachdb/cockroach/pkg/sql/sem/tree/expr.go:148
github.com/cockroachdb/cockroach/pkg/sql/sem/tree.(*Subquery).TypeCheck(0x8558078?, {0xc008b7fea0?, 0x7f610785bae0?}, 0xc006822908?, 0x0?)
	github.com/cockroachdb/cockroach/pkg/sql/sem/tree/type_check.go:1742 +0xd3
github.com/cockroachdb/cockroach/pkg/sql/sem/tree.(*ArrayFlatten).TypeCheck(0xc01f909e60, {0x85580b0?, 0xc00dbaa570?}, 0x6c8b272?, 0x850000c006822908?)
	github.com/cockroachdb/cockroach/pkg/sql/sem/tree/type_check.go:1922 +0x46
github.com/cockroachdb/cockroach/pkg/sql/sem/tree.TypeCheck({0x85580b0?, 0xc00dbaa570?}, {0x8558f58?, 0xc01f909e60?}, 0x6bb7284?, 0xc0015353e0?)
	github.com/cockroachdb/cockroach/pkg/sql/sem/tree/type_check.go:345 +0x8a
github.com/cockroachdb/cockroach/pkg/sql/sem/tree.TypeCheckAndRequire({0x85580b0?, 0xc00dbaa570?}, {0x8558f58?, 0xc01f909e60?}, 0x0?, 0xc365320, {0x6c79266, 0x21})
	github.com/cockroachdb/cockroach/pkg/sql/sem/tree/type_check.go:355 +0x54
github.com/cockroachdb/cockroach/pkg/sql.(*planner).analyzeExpr(0xc0148d86b8, {0x85580b0, 0xc00dbaa570}, {0x8558f58?, 0xc01f909e60?}, {{0x0, 0x0, 0x0}, {0x0, 0x0}}, ...)
	github.com/cockroachdb/cockroach/pkg/sql/analyze_expr.go:34 +0xaf
github.com/cockroachdb/cockroach/pkg/sql.(*planner).planTenantSpec(0xc00dbaa570?, {0x85580b0?, 0xc00dbaa570?}, 0xc00000001e?, {0x6c79266?, 0xc01539d680?})
	github.com/cockroachdb/cockroach/pkg/sql/tenant_spec.go:63 +0x190
github.com/cockroachdb/cockroach/pkg/sql.(*planner).LookupTenantInfo(0xc0148d86b8, {0x85580b0, 0xc00dbaa570}, 0x8536580?, {0x6c79266?, 0xc006822908?})
	github.com/cockroachdb/cockroach/pkg/sql/tenant_spec.go:148 +0x27
github.com/cockroachdb/cockroach/pkg/ccl/crosscluster/physical.alterReplicationJobHook.func1({0x85580b0, 0xc00dbaa570}, {0x0?, 0xc006199f00?, 0xd54e14?}, 0xc00e4f3f80)
	github.com/cockroachdb/cockroach/pkg/ccl/crosscluster/physical/alter_replication_job.go:205 +0x150
github.com/cockroachdb/cockroach/pkg/sql.(*hookFnNode).startExec.func1({0x85580b0, 0xc00dbaa570})
	github.com/cockroachdb/cockroach/pkg/sql/planhook.go:194 +0x4a
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2()
	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:480 +0x13a
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx in goroutine 420884
	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:471 +0x3fe
Help

See also: How To Investigate a Go Test Failure (internal)

Same failure on other branches

/cc @cockroachdb/sql-foundations

This test on roachdash | Improve this report!

Jira issue: CRDB-44704

@cockroach-teamcity cockroach-teamcity added branch-release-24.2.5-rc C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels Nov 20, 2024
@spilchen
Copy link
Contributor

We are hitting this assert:

func (ta typeAnnotation) assertTyped() {
if ta.typ == nil {
panic(errors.AssertionFailedf(
"ReturnType called on TypedExpr with empty typeAnnotation. " +
"Was the underlying Expr type-checked before asserting a type of TypedExpr?"))
}
}

I can't seem to locate the SQL statement that was being parsed, which would be helpful for debugging.

@spilchen spilchen changed the title sql/tests: TestRandomSyntaxGeneration failed sql/tests: TestRandomSyntaxGeneration failed [ReturnType called on TypedExpr with empty typeAnnotation] Nov 26, 2024
@spilchen spilchen changed the title sql/tests: TestRandomSyntaxGeneration failed [ReturnType called on TypedExpr with empty typeAnnotation] sql/tests: TestRandomSyntaxGeneration failed [Alter PCR Stream] Nov 26, 2024
@spilchen spilchen added T-disaster-recovery and removed T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels Nov 26, 2024
Copy link

blathers-crl bot commented Nov 26, 2024

cc @cockroachdb/disaster-recovery

@spilchen spilchen added A-disaster-recovery branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 labels Nov 26, 2024
@exalate-issue-sync exalate-issue-sync bot added the P-0 Issues/test failures with a fix SLA of 2 weeks label Dec 2, 2024
@rafiss rafiss changed the title sql/tests: TestRandomSyntaxGeneration failed [Alter PCR Stream] sql/tests: TestRandomSyntaxGeneration failed [ALTER TENANT (SELECT 'foo') START REPLICATION ... immediately crashes the database process] Dec 2, 2024
@rafiss
Copy link
Collaborator

rafiss commented Dec 2, 2024

Repro on a fresh cluster with no other setup:

root@localhost:26257/defaultdb> ALTER TENANT (SELECT 'foo') START REPLICATION OF 'bar' ON 'baz' WITH RETENTION = '1s';
ERROR: unexpected EOF
warning: error retrieving the transaction status: connection closed unexpectedly: conn closed
warning: connection lost!
opening new connection: all session settings will be lost
warning: error retrieving the database name: failed to connect to `host=localhost user=root database=`: dial error (dial tcp [::1]:26257: connect: connection refused)

Failures on other branches: #136274 and #136339

@msbutler
Copy link
Collaborator

msbutler commented Dec 2, 2024

@rafiss thanks for digging up a precise repro! I'm noticing something strange with how we parse invalid statements. In this experimental branch, I play with the test TestAlterTenantStartReplicationAfterRestore and notice that if i only add a subquery to the ALTER TENANT stmt, i.e.:

db.Exec(t, "ALTER TENANT (select 't1') START REPLICATION OF t1 ON $1", u.String())

the client receives the expected pq: subqueries are not allowed in ALTER VIRTUAL CLUSTER REPLICATION.

However, if the user passes a bogus pgurl ('foo'), we do not conduct the subquery check, and panic:

db.Exec(t, "ALTER TENANT (select 't1') START REPLICATION OF t1 ON 'foo'")

Has this TestRandomSyntaxGeneration test hit similar errors before? I don't understand why this subqueries check is only hit on the happy pgurl path.

@msbutler
Copy link
Collaborator

msbutler commented Dec 2, 2024

Oh wait, actually, if foo is set via placeholder, we do see the correct subquery check

db.Exec(t, "ALTER TENANT (select 't1') START REPLICATION OF t1 ON $1", "foo")

returns pq: subqueries are not allowed in ALTER VIRTUAL CLUSTER REPLICATION

while this panics:

db.Exec(t, "ALTER TENANT (select 't1') START REPLICATION OF t1 ON 'foo'")

So, now i'm wondering why place holder evaluation triggers the right subquery check.

@msbutler
Copy link
Collaborator

msbutler commented Dec 2, 2024

So we catch the subquery error for placeholders in alterReplicationJobTypeCheck, which for some reason, only runs when the stmt has placeholders. I think we should be running the type checker regardless of there being place holders.
https://github.com/msbutler/cockroach/blob/butler-debug-alter-parse/pkg/crosscluster/physical/alter_replication_job.go#L128

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. P-0 Issues/test failures with a fix SLA of 2 weeks release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-disaster-recovery
Projects
None yet
Development

No branches or pull requests

5 participants