Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sentry: panic during window function planning due to OOB when adding render expressions #35179

Closed
cockroach-teamcity opened this issue Feb 25, 2019 · 2 comments · Fixed by #36681
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sentry Originated from an in-the-wild panic report. S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting.

Comments

@cockroach-teamcity
Copy link
Member

This issue was autofiled by Sentry. It represents a crash or reported error on a live cluster with telemetry enabled.

Sentry link: https://sentry.io/organizations/cockroach-labs/issues/907105840/?project=164528&referrer=webhooks_plugin

Panic message:

conn_executor.go:645: panic while executing 1 statements: SELECT _, _, , _ - lag(, _) OVER (PARTITION BY _ ORDER BY _) AS , _ - lag(, _) OVER (PARTITION BY _ ORDER BY _) AS , _ - lag(, _) OVER (PARTITION BY _ ORDER BY _) AS , _ - lag(, _) OVER (PARTITION BY _ ORDER BY _) AS , _ - lag(, _) OVER (PARTITION BY _ ORDER BY _) AS _ FROM (SELECT rank() OVER (PARTITION BY ., .::DATE ORDER BY . DESC) AS _, . AS _, . AS _, . AS _, . AS _, . AS _, . AS _, . AS , . AS , * FROM .. AS _ JOIN .. AS _ ON . = . WHERE . IN ( [...]: caused by

Stacktrace (expand for inline code snippets):

r := recover()
h.ex.closeWrapper(ctx, r)
}()
in pkg/sql.(*Server).ServeConn.func1
/usr/local/go/src/runtime/asm_amd64.s#L572-L574 in runtime.call32
/usr/local/go/src/runtime/panic.go#L501-L503 in runtime.gopanic
/usr/local/go/src/runtime/panic.go#L27-L29 in runtime.panicindex
post.RenderExprs = append(post.RenderExprs, newExpr)
outTypes = append(outTypes, p.ResultTypes[c.ColIdx])
}
in pkg/sql/distsqlplan.(*PhysicalPlan).AddRendering
}
if err := s.plan.AddRendering(renderExprs, s.evalCtx, s.plan.PlanToStreamColMap, renderTypes); err != nil {
return false, err
in pkg/sql.(*windowPlanState).addRenderingIfNecessary
// After all window functions are computed, we might need to add rendering.
if renderingAdded, err := windowPlanState.addRenderingIfNecessary(); err != nil {
return PhysicalPlan{}, err
in pkg/sql.(*DistSQLPlanner).createPlanForWindow
case *windowNode:
plan, err = dsp.createPlanForWindow(planCtx, n)
in pkg/sql.(*DistSQLPlanner).createPlanForNode
case *sortNode:
plan, err = dsp.createPlanForNode(planCtx, n.plan)
if err != nil {
in pkg/sql.(*DistSQLPlanner).createPlanForNode
case *limitNode:
plan, err = dsp.createPlanForNode(planCtx, n.plan)
if err != nil {
in pkg/sql.(*DistSQLPlanner).createPlanForNode
physPlan, err := dsp.createPlanForNode(planCtx, plan)
if err != nil {
in pkg/sql.(*DistSQLPlanner).PlanAndRun
// the planner whether or not to plan remote table readers.
ex.server.cfg.DistSQLPlanner.PlanAndRun(
ctx, evalCtx, &planCtx, planner.txn, planner.curPlan.plan, recv)
in pkg/sql.(*connExecutor).execWithDistSQLEngine
ex.sessionTracing.TraceExecStart(ctx, "distributed")
err = ex.execWithDistSQLEngine(ctx, planner, stmt.AST.StatementType(), res, distributePlan)
} else {
in pkg/sql.(*connExecutor).dispatchToExecutionEngine
p.autoCommit = os.ImplicitTxn.Get() && !ex.server.cfg.TestingKnobs.DisableAutoCommit
if err := ex.dispatchToExecutionEngine(ctx, stmt, p, res); err != nil {
return nil, nil, err
in pkg/sql.(*connExecutor).execStmtInOpenState
case stateOpen:
ev, payload, err = ex.execStmtInOpenState(ctx, stmt, pinfo, res)
switch ev.(type) {
in pkg/sql.(*connExecutor).execStmt
ctx := withStatement(ex.Ctx(), ex.curStmt)
ev, payload, err = ex.execStmt(ctx, curStmt, stmtRes, pinfo, pos)
if err != nil {
in pkg/sql.(*connExecutor).run
}()
return h.ex.run(ctx, s.pool, reserved, cancel)
}
in pkg/sql.(*Server).ServeConn
go func() {
writerErr = sqlServer.ServeConn(ctx, connHandler, reserved, cancelConn)
// TODO(andrei): Should we sometimes transmit the writerErr's to the
in pkg/sql/pgwire.(*conn).serveImpl.func4

pkg/sql/conn_executor.go in pkg/sql.(*Server).ServeConn.func1 at line 387
/usr/local/go/src/runtime/asm_amd64.s in runtime.call32 at line 573
/usr/local/go/src/runtime/panic.go in runtime.gopanic at line 502
/usr/local/go/src/runtime/panic.go in runtime.panicindex at line 28
pkg/sql/distsqlplan/physical_plan.go in pkg/sql/distsqlplan.(*PhysicalPlan).AddRendering at line 517
pkg/sql/distsql_plan_window.go in pkg/sql.(*windowPlanState).addRenderingIfNecessary at line 330
pkg/sql/distsql_physical_planner.go in pkg/sql.(*DistSQLPlanner).createPlanForWindow at line 3225
pkg/sql/distsql_physical_planner.go in pkg/sql.(*DistSQLPlanner).createPlanForNode at line 2358
pkg/sql/distsql_physical_planner.go in pkg/sql.(*DistSQLPlanner).createPlanForNode at line 2299
pkg/sql/distsql_physical_planner.go in pkg/sql.(*DistSQLPlanner).createPlanForNode at line 2317
pkg/sql/distsql_running.go in pkg/sql.(*DistSQLPlanner).PlanAndRun at line 752
pkg/sql/conn_executor_exec.go in pkg/sql.(*connExecutor).execWithDistSQLEngine at line 982
pkg/sql/conn_executor_exec.go in pkg/sql.(*connExecutor).dispatchToExecutionEngine at line 824
pkg/sql/conn_executor_exec.go in pkg/sql.(*connExecutor).execStmtInOpenState at line 402
pkg/sql/conn_executor_exec.go in pkg/sql.(*connExecutor).execStmt at line 96
pkg/sql/conn_executor.go in pkg/sql.(*connExecutor).run at line 1175
pkg/sql/conn_executor.go in pkg/sql.(*Server).ServeConn at line 389
pkg/sql/pgwire/conn.go in pkg/sql/pgwire.(*conn).serveImpl.func4 at line 313
Tag Value
Cockroach Release v2.1.5
Cockroach SHA: 1634c6b
Platform linux amd64
Distribution CCL
Environment v2.1.5
Command server
Go Version go1.10.7
# of CPUs 2
# of Goroutines 325
@cockroach-teamcity cockroach-teamcity added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sentry Originated from an in-the-wild panic report. labels Feb 25, 2019
@jordanlewis
Copy link
Member

@yuzefovich could you take a look at this and see if you can make heads or tails of it?

@yuzefovich
Copy link
Member

It is not related to the bug I fixed earlier (#33340), and unfortunately I cannot replicate it locally by writing a similar query to the one that caused the panic. My guess is that the problem is somewhere around here https://github.com/cockroachdb/cockroach/blob/release-2.1/pkg/sql/distsql_physical_planner.go#L3104 where we're adjusting indices of the columns to put the results of window functions computations in due to lag function having two arguments and outputting a single one, but I don't immediately see a problem in the code. It'd be great if a randomized testing (like sqlsmith) supported window functions, probably it would be able to reproduce this bug.

@jordanlewis jordanlewis added the S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting. label Feb 28, 2019
@asubiotto asubiotto changed the title sentry: conn_executor.go:645: panic while executing 1 statements: SELECT _, _, _, _ - lag(_, _) OVER (PARTITION BY _ ORDER BY _) AS _, _ - lag(_, _) OVER (PARTITION BY _ ORDER BY _) AS _, _ - lag(_, _) OVER (PARTITION BY _ ORDER BY _) AS _, _ - lag(_, _) OVER (PARTITION BY _ ORDER BY _) AS _, _ - lag(_, _) OVER (PARTITION BY _ ORDER BY _) AS _ FROM (SELECT rank() OVER (PARTITION BY _._, _._::DATE ORDER BY _._ DESC) AS _, _._ AS _, _._ AS _, _._ AS _, _._ AS _, _._ AS _, _._ AS _, _._ AS _, _._ AS _, * FROM _._._ AS _ JOIN _._._ AS _ ON _._ = _._ WHERE _._ IN ( [...]: caused by <redacted> sentry: panic during window function planning due to OOB when adding render expressions Mar 4, 2019
craig bot pushed a commit that referenced this issue Apr 19, 2019
36598: sql: Unify the SQL type system r=andy-kimball a=andy-kimball

This PR unifies the three different SQL type systems in CRDB:

  `sqlbase.ColumnType` - protobuf representation of table column types
  `coltypes.T` - in-memory representation of table column types
  `types.T` - computation types

The type systems are quite similar, but also subtly different. This causes bugs on a regular basis throughout the codebase. Each type system supports a different subset of the PG type functionality. For example, `sqlbase.ColumnType` stores column widths. `types.T` does not store widths, but it can store a Postgres OID value, which `sqlbase.ColumnType` is unable to do.

In addition, even for supported functionality, the API's are often tricky and error-prone. For example, `types.T` values can sometimes be compared using `==`, but other times `Equivalent` must be called, and it is up to the user to know when to use which. As another example, support for OID values require wrapping `types.T` instances in a special `TOIDWrapper` type. Our developers are expected to know when it needs to be done, but typically they either don't understand when it's needed, or else forget to do it. It also breaks `==` comparison semantics, leading to subtle bugs that only manifest in edge cases with unusual OID values.

Another big problem that comes up repeatedly are lossy conversions. Throughout the codebase, there are frequent conversions to/from the various type systems. Many conversions discard important type information, which often leads to incompatibilities with Postgres, as well as providing a breeding ground for bugs.

This PR unifies the three type systems into just one. The unified type is called `types.T` (in new `sql/types` package). It uses a protobuf-generated struct that is backwards-compatible with `sqlbase.ColumnType` so that it can be embedded in place of `ColumnType` in larger protobuf messages (like `sqlbase.ColumnDescriptor`). The API is locked down so that it's difficult to ignorantly or accidentally create invalid types. The unified type supports everything that the three previous type systems did, and is more compatible with PG as a result (with more work, even more type compatibility is possible).

Resolves #32639

36681: distsqlplan: reset MergeOrdering if windower is planned r=yuzefovich a=yuzefovich

Windower doesn't guarantee maintaining the order, so merge ordering
of the plan should be set to empty.

Fixes: #35179.

Release note: None

36962: storage/engine: fix handling of 0-length varstrings in RocksDBBatchReader r=tbg a=petermattis

Previously, 0-length varstrings inadvertently cuased the reader to
truncate the batch repr which would usually result in a call to `Next`
complaining about the batch containing an unexpected number of
entries. So far, it looks like the only effect of this would be "invalid
batch" errors while using `cockroach debug` commands. It is possible
there is a more serious effect, though.

See #36937

Release note (bug fix): Fixed a bug in write batch decoding that could
cause "invalid batch" errors while using `cockroach debug` commands to
analyze data.

Co-authored-by: Andrew Kimball <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Peter Mattis <[email protected]>
@craig craig bot closed this as completed in #36681 Apr 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sentry Originated from an in-the-wild panic report. S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants