Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: panic in query sampling for COPY #88229

Closed
cockroach-teamcity opened this issue Sep 20, 2022 · 4 comments · Fixed by #88322
Closed

sql: panic in query sampling for COPY #88229

cockroach-teamcity opened this issue Sep 20, 2022 · 4 comments · Fixed by #88322
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sentry Originated from an in-the-wild panic report.

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Sep 20, 2022

This issue was autofiled by Sentry. It represents a crash or reported error on a live cluster with telemetry enabled.

Sentry link: https://sentry.io/organizations/cockroach-labs/issues/3605902199/?referrer=webhooks_plugin

Panic message:

conn_executor.go:761: runtime error: invalid memory address or nil pointer dereference
(1) attached stack trace
-- stack trace:
| github.com/cockroachdb/cockroach/pkg/sql.(*Server).ServeConn.func1
| github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:761
| [...repeated from below...]
Wraps: (2) attached stack trace
-- stack trace:
| github.com/cockroachdb/cockroach/pkg/sql.(*Server).ServeConn.func1
| github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:761
| runtime.gopanic
| GOROOT/src/runtime/panic.go:1038
| runtime.panicmem
| GOROOT/src/runtime/panic.go:221
| runtime.sigpanic
| GOROOT/src/runtime/signal_unix.go:735
| github.com/cockroachdb/cockroach/pkg/sql.(*planner).maybeLogStatementInternal
| github.com/cockroachdb/cockroach/pkg/sql/exec_log.go:401
| github.com/cockroachdb/cockroach/pkg/sql.(*planner).maybeLogStatement
| github.com/cockroachdb/cockroach/pkg/sql/exec_log.go:165
| github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execCopyIn.func5
| github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:2433
| github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execCopyIn
| github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:2479
| github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execCmd
| github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:2038
| github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run
| github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1817
| github.com/cockroachdb/cockroach/pkg/sql.(*Server).ServeConn
| github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:763
| github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*conn).processCommandsAsync.func1
| github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:728
| runtime.goexit
| GOROOT/src/runtime/asm_amd64.s:1581
Wraps: (3) runtime error: invalid memory address or nil pointer dereference
Error types: (1) *withstack.withStack (2) *withstack.withStack (3) runtime.errorString
-- report composition:
runtime.errorString
conn_executor.go:761: *withstack.withStack (top exception)
conn_executor.go:761: *withstack.withStack (1)
(check the extra data payloads)

Stacktrace (expand for inline code snippets):

r := recover()
h.ex.closeWrapper(ctx, r)
}()
in pkg/sql.(*Server).ServeConn.func1
GOROOT/src/runtime/panic.go#L1037-L1039 in runtime.gopanic
GOROOT/src/runtime/panic.go#L220-L222 in runtime.panicmem
GOROOT/src/runtime/signal_unix.go#L734-L736 in runtime.sigpanic
CostEstimate: p.curPlan.instrumentation.costEstimate,
Distribution: p.curPlan.instrumentation.distribution.String(),
PlanGist: p.curPlan.instrumentation.planGist.String(),
in pkg/sql.(*planner).maybeLogStatementInternal
) {
p.maybeLogStatementInternal(ctx, execType, isCopy, numRetries, txnCounter, rows, err, queryReceived, hasAdminRoleCache, telemetryLoggingMetrics, stmtFingerprintID, queryStats)
}
in pkg/sql.(*planner).maybeLogStatement
var stats topLevelQueryStats
ex.planner.maybeLogStatement(
ctx,
in pkg/sql.(*connExecutor).execCopyIn.func5
}
return nil, nil, nil
}
in pkg/sql.(*connExecutor).execCopyIn
var err error
ev, payload, err = ex.execCopyIn(ctx, tcmd)
if err != nil {
in pkg/sql.(*connExecutor).execCmd
var err error
if err = ex.execCmd(); err != nil {
if errors.IsAny(err, io.EOF, errDrainingComplete) {
in pkg/sql.(*connExecutor).run
}()
return h.ex.run(ctx, s.pool, reserved, cancel)
}
in pkg/sql.(*Server).ServeConn
reservedOwned = false // We're about to pass ownership away.
retErr = sqlServer.ServeConn(ctx, connHandler, reserved, cancelConn)
}()
in pkg/sql/pgwire.(*conn).processCommandsAsync.func1
GOROOT/src/runtime/asm_amd64.s#L1580-L1582 in runtime.goexit

pkg/sql/conn_executor.go in pkg/sql.(*Server).ServeConn.func1 at line 761
GOROOT/src/runtime/panic.go in runtime.gopanic at line 1038
GOROOT/src/runtime/panic.go in runtime.panicmem at line 221
GOROOT/src/runtime/signal_unix.go in runtime.sigpanic at line 735
pkg/sql/exec_log.go in pkg/sql.(*planner).maybeLogStatementInternal at line 401
pkg/sql/exec_log.go in pkg/sql.(*planner).maybeLogStatement at line 165
pkg/sql/conn_executor.go in pkg/sql.(*connExecutor).execCopyIn.func5 at line 2433
pkg/sql/conn_executor.go in pkg/sql.(*connExecutor).execCopyIn at line 2479
pkg/sql/conn_executor.go in pkg/sql.(*connExecutor).execCmd at line 2038
pkg/sql/conn_executor.go in pkg/sql.(*connExecutor).run at line 1817
pkg/sql/conn_executor.go in pkg/sql.(*Server).ServeConn at line 763
pkg/sql/pgwire/conn.go in pkg/sql/pgwire.(*conn).processCommandsAsync.func1 at line 728
GOROOT/src/runtime/asm_amd64.s in runtime.goexit at line 1581
Tag Value
Cockroach Release v22.1.7
Cockroach SHA: a346e7a
Platform linux amd64
Distribution CCL
Environment v22.1.7
Command server
Go Version ``
# of CPUs
# of Goroutines

Jira issue: CRDB-19738

gz#14032

@cockroach-teamcity cockroach-teamcity added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sentry Originated from an in-the-wild panic report. labels Sep 20, 2022
@otan otan self-assigned this Sep 21, 2022
@otan
Copy link
Contributor

otan commented Sep 21, 2022

This is interesting - my guess is because instrumentation is nil, but i'm surprised the panic is at line 401 rather than 400:

CostEstimate: p.curPlan.instrumentation.costEstimate,
Distribution: p.curPlan.instrumentation.distribution.String(),
PlanGist: p.curPlan.instrumentation.planGist.String(),

This has a nasty implication - if someone uses COPY in production, and telemetry is turned on (this is true specifically for the CC case), a panic on COPY can happen if it is the first SQL interaction with the server. It was introduced in #87091.

This is weird because I thought I had tested this here --

{`SET CLUSTER SETTING sql.telemetry.query_sampling.enabled = true`},
-- but apparently that wasn't enough - the code here only gets hit some of the time.

In the interim, this fix is to run SET CLUSTER SETTING sql.telemetry.query_sampling.enabled = false on affected clusters.

Update 9/21: On further investigation, this can also affect clusters with slow query log or audit log enabled.

We could also consider reverting the telemetry and logging PRs if we consider them too risky. I thought it wasn't but after this I admit defeat against the logging machinery. @rafiss do you have any opinions?

Further thoughts:

  • we should also put a workload using COPY as part of the qualification run on releases.
  • maybeLogStatements could probably do with a recover() as it should never bring a cluster down.

@otan
Copy link
Contributor

otan commented Sep 21, 2022

#88322 will fix this

#88325 and #88326 prepared as backports for v22.1 and v21.2 (will have merge conflicts)

@otan otan changed the title sentry: conn_executor.go:761: runtime error: invalid memory address or nil pointer dereference (1) attached stack trace -- stack trace: | github.com/cockroachdb/cockroach/pkg/sql.(*Server).ServeConn.func1... sql: panic in query sampling for COPY Sep 21, 2022
@rafiss
Copy link
Collaborator

rafiss commented Sep 21, 2022

Given that we've already backported it I think we are somewhat committed to keeping it and fixing it as long as we are able. I'd only want to revert if we encounter an "unsolvable" bug.

@otan
Copy link
Contributor

otan commented Sep 22, 2022

reopening until backports merged

@otan otan reopened this Sep 22, 2022
@rafiss rafiss closed this as completed Sep 23, 2022
otan added a commit to cockroachdb/docs that referenced this issue Oct 3, 2022
After cockroachdb/cockroach#88229, we want to
advertise a higher minimum version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sentry Originated from an in-the-wild panic report.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants