Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: CTE causes panic that crashes nodes #87733

Closed
mgartner opened this issue Sep 9, 2022 · 1 comment · Fixed by #88396
Closed

sql: CTE causes panic that crashes nodes #87733

mgartner opened this issue Sep 9, 2022 · 1 comment · Fixed by #88396
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team

Comments

@mgartner
Copy link
Collaborator

mgartner commented Sep 9, 2022

A user has reported that a statement with multiple CTEs can cause panics that crash a node. The crash is reproducible on v21.2, v22.1, and a recent build off of master. We need to determine:

  1. Why does the error crash the node?
  2. What is causing the error?

To reproduce:

CREATE TABLE a (a INT);
CREATE TABLE b (b INT);
INSERT INTO a VALUES (1);

WITH
t1 AS (SELECT a FROM a),
t2 AS MATERIALIZED (SELECT a, b FROM t1 JOIN b ON true)
SELECT NULL
FROM t1
LEFT JOIN LATERAL (
  WITH t3 AS (SELECT * FROM t2 WHERE t2.a = t1.a)
  SELECT array_agg(CASE WHEN v = '' THEN b END)
  FROM (
    SELECT '' AS v, b FROM t3 ORDER BY b DESC
  )
) ON true;

Error message:

*
* ERROR: a SQL panic has occurred while executing the following statement:
* WITH t1 AS (SELECT a FROM a), t2 AS MATERIALIZED (SELECT a, b FROM t1 JOIN b ON true) SELECT NULL FROM t1 LEFT JOIN LATERAL (WITH t3 AS (SELECT * FROM t2 WHERE t2.a = t1.a) SELECT array_agg(CASE WHEN v = '' T
HEN b END) FROM (SELECT '' AS v, b FROM t3 ORDER BY b DESC)) ON true
*
*
* ERROR: a panic has occurred!
* no binding for WithID 1
* (1) attached stack trace
*   -- stack trace:
*   | github.com/cockroachdb/cockroach/pkg/sql.(*Server).ServeConn.func1
*   |   github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:761
*   | runtime.gopanic
*   |   GOROOT/src/runtime/panic.go:1038
*   | github.com/cockroachdb/cockroach/pkg/sql/colexecerror.CatchVectorizedRuntimeError.func1
*   |   github.com/cockroachdb/cockroach/pkg/sql/colexecerror/error.go:58
*   | runtime.gopanic
*   |   GOROOT/src/runtime/panic.go:1038
*   | github.com/cockroachdb/cockroach/pkg/sql/colexecerror.CatchVectorizedRuntimeError.func1
*   |   github.com/cockroachdb/cockroach/pkg/sql/colexecerror/error.go:58
*   | runtime.gopanic
*   |   GOROOT/src/runtime/panic.go:1038
*   | github.com/cockroachdb/cockroach/pkg/sql/colexecerror.CatchVectorizedRuntimeError.func1
*   |   github.com/cockroachdb/cockroach/pkg/sql/colexecerror/error.go:58
*   | runtime.gopanic
*   |   GOROOT/src/runtime/panic.go:1038
*   | github.com/cockroachdb/cockroach/pkg/sql/colexecerror.CatchVectorizedRuntimeError.func1
*   |   github.com/cockroachdb/cockroach/pkg/sql/colexecerror/error.go:58
*   | runtime.gopanic
*   |   GOROOT/src/runtime/panic.go:1038
*   | github.com/cockroachdb/cockroach/pkg/sql/colexecerror.CatchVectorizedRuntimeError.func1
*   |   github.com/cockroachdb/cockroach/pkg/sql/colexecerror/error.go:58
*   | runtime.gopanic
*   |   GOROOT/src/runtime/panic.go:1038
*   | github.com/cockroachdb/cockroach/pkg/sql/colexecerror.CatchVectorizedRuntimeError.func1
*   |   github.com/cockroachdb/cockroach/pkg/sql/colexecerror/error.go:58
*   | runtime.gopanic
*   |   GOROOT/src/runtime/panic.go:1038
*   | github.com/cockroachdb/cockroach/pkg/sql/opt.(*Metadata).WithBinding
*   |   github.com/cockroachdb/cockroach/pkg/sql/opt/metadata.go:749
*   | github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*statisticsBuilder).colStatWithScan
*   |   github.com/cockroachdb/cockroach/pkg/sql/opt/memo/statistics_builder.go:2501
*   | github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*statisticsBuilder).colStat
*   |   github.com/cockroachdb/cockroach/pkg/sql/opt/memo/statistics_builder.go:470
*   | github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*statisticsBuilder).colStatFromChild
*   |   github.com/cockroachdb/cockroach/pkg/sql/opt/memo/statistics_builder.go:267
*   | github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*statisticsBuilder).colStatFromJoinLeft
*   |   github.com/cockroachdb/cockroach/pkg/sql/opt/memo/statistics_builder.go:1608
*   | github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*statisticsBuilder).colStatJoin
*   |   github.com/cockroachdb/cockroach/pkg/sql/opt/memo/statistics_builder.go:1447
*   | github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*statisticsBuilder).colStat
*   |   github.com/cockroachdb/cockroach/pkg/sql/opt/memo/statistics_builder.go:438
*   | github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*statisticsBuilder).colStatWithScan
*   |   github.com/cockroachdb/cockroach/pkg/sql/opt/memo/statistics_builder.go:2506
*   | github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*statisticsBuilder).colStat
*   |   github.com/cockroachdb/cockroach/pkg/sql/opt/memo/statistics_builder.go:470
*   | github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*statisticsBuilder).colStatFromChild
*   |   github.com/cockroachdb/cockroach/pkg/sql/opt/memo/statistics_builder.go:267
*   | github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*statisticsBuilder).colStatFromInput
*   |   github.com/cockroachdb/cockroach/pkg/sql/opt/memo/statistics_builder.go:322
*   | github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*statisticsBuilder).ensureColStat
*   |   github.com/cockroachdb/cockroach/pkg/sql/opt/memo/statistics_builder.go:2625
*   | github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*statisticsBuilder).updateDistinctCountsFromConstraint
*   |   github.com/cockroachdb/cockroach/pkg/sql/opt/memo/statistics_builder.go:3454
*   | github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*statisticsBuilder).applyConstraintSet
*   |   github.com/cockroachdb/cockroach/pkg/sql/opt/memo/statistics_builder.go:3291
*   | github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*statisticsBuilder).applyFiltersItem
*   |   github.com/cockroachdb/cockroach/pkg/sql/opt/memo/statistics_builder.go:3080
*   | github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*statisticsBuilder).applyFilters
*   |   github.com/cockroachdb/cockroach/pkg/sql/opt/memo/statistics_builder.go:3004
*   | github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*statisticsBuilder).filterRelExpr
*   |   github.com/cockroachdb/cockroach/pkg/sql/opt/memo/statistics_builder.go:2957

Jira issue: CRDB-19505

@mgartner
Copy link
Collaborator Author

mgartner commented Sep 9, 2022

The query plan looks like this:

distribute
 ├── columns: "?column?":21
 ├── stats: [rows=1000]
 ├── cost: 1.6710128e+09
 ├── fd: ()-->(21)
 ├── distribution: test
 ├── input distribution:
 ├── prune: (21)
 └── with &1 (t1)
      ├── columns: "?column?":21
      ├── stats: [rows=1000]
      ├── cost: 1.6710128e+09
      ├── fd: ()-->(21)
      ├── prune: (21)
      ├── scan a
      │    ├── columns: a.a:1
      │    ├── stats: [rows=1000, distinct(1)=100, null(1)=10]
      │    └── cost: 1084.62
      └── with &2 (t2)
           ├── columns: "?column?":21
           ├── materialized
           ├── stats: [rows=1000]
           ├── cost: 1.67101172e+09
           ├── fd: ()-->(21)
           ├── prune: (21)
           ├── cte-uses
           │    └── &1: count=2 used-columns=(1)
           ├── inner-join (cross)
           │    ├── columns: a:5 b.b:6
           │    ├── stats: [rows=1000000, distinct(5)=100, null(5)=10000]
           │    ├── cost: 11114.8063
           │    ├── cte-uses
           │    │    └── &1: count=1 used-columns=(1)
           │    ├── with-scan &1 (t1)
           │    │    ├── columns: a:5
           │    │    ├── mapping:
           │    │    │    └──  a.a:1 => a:5
           │    │    ├── stats: [rows=1000, distinct(5)=100, null(5)=10]
           │    │    ├── cost: 0.02
           │    │    ├── prune: (5)
           │    │    └── cte-uses
           │    │         └── &1: count=1 used-columns=(1)
           │    ├── scan b
           │    │    ├── columns: b.b:6
           │    │    ├── stats: [rows=1000]
           │    │    ├── cost: 1084.62
           │    │    └── unfiltered-cols: (6-9)
           │    └── filters (true)
           └── project
                ├── columns: "?column?":21
                ├── stats: [rows=1000]
                ├── cost: 1.67100061e+09
                ├── fd: ()-->(21)
                ├── prune: (21)
                ├── cte-uses
                │    ├── &1: count=1 used-columns=(1)
                │    └── &2: count=1 used-columns=(5,6)
                ├── distinct-on
                │    ├── columns: rownum:18
                │    ├── grouping columns: rownum:18
                │    ├── internal-ordering: -(14|16)
                │    ├── stats: [rows=1000, distinct(18)=1000, null(18)=0]
                │    ├── cost: 1.67100059e+09
                │    ├── key: (18)
                │    ├── cte-uses
                │    │    ├── &1: count=1 used-columns=(1)
                │    │    └── &2: count=1 used-columns=(5,6)
                │    └── sort
                │         ├── columns: a:10 b:14 column16:16 rownum:18
                │         ├── stats: [rows=9.9e+08, distinct(18)=1000, null(18)=0]
                │         ├── cost: 1.65120057e+09
                │         ├── fd: (18)-->(10), (14)==(16), (16)==(14)
                │         ├── ordering: -(14|16) [actual: -14]
                │         ├── prune: (14,16)
                │         ├── cte-uses
                │         │    ├── &1: count=1 used-columns=(1)
                │         │    └── &2: count=1 used-columns=(5,6)
                │         └── left-join-apply
                │              ├── columns: a:10 b:14 column16:16 rownum:18
                │              ├── stats: [rows=9.9e+08, distinct(18)=1000, null(18)=0]
                │              ├── cost: 10120079.9
                │              ├── fd: (18)-->(10), (14)==(16), (16)==(14)
                │              ├── prune: (14,16)
                │              ├── cte-uses
                │              │    ├── &1: count=1 used-columns=(1)
                │              │    └── &2: count=1 used-columns=(5,6)
                │              ├── ordinality
                │              │    ├── columns: a:10 rownum:18
                │              │    ├── stats: [rows=1000, distinct(18)=1000, null(18)=0]
                │              │    ├── cost: 10.04
                │              │    ├── key: (18)
                │              │    ├── fd: (18)-->(10)
                │              │    ├── prune: (10)
                │              │    ├── cte-uses
                │              │    │    └── &1: count=1 used-columns=(1)
                │              │    └── with-scan &1 (t1)
                │              │         ├── columns: a:10
                │              │         ├── mapping:
                │              │         │    └──  a.a:1 => a:10
                │              │         ├── stats: [rows=1000]
                │              │         ├── cost: 0.02
                │              │         ├── prune: (10)
                │              │         └── cte-uses
                │              │              └── &1: count=1 used-columns=(1)
                │              ├── project
                │              │    ├── columns: column16:16 b:14
                │              │    ├── outer: (10)
                │              │    ├── stats: [rows=990000]
                │              │    ├── cost: 49600.09
                │              │    ├── fd: (14)==(16), (16)==(14)
                │              │    ├── prune: (14,16)
                │              │    ├── cte-uses
                │              │    │    └── &2: count=1 used-columns=(5,6)
                │              │    ├── project
                │              │    │    ├── columns: b:14
                │              │    │    ├── outer: (10)
                │              │    │    ├── stats: [rows=990000]
                │              │    │    ├── cost: 29800.07
                │              │    │    ├── prune: (14)
                │              │    │    ├── cte-uses
                │              │    │    │    └── &2: count=1 used-columns=(5,6)
                │              │    │    ├── select
                │              │    │    │    ├── columns: a:11 b:12
                │              │    │    │    ├── outer: (10)
                │              │    │    │    ├── stats: [rows=990000, distinct(10)=1, null(10)=0, distinct(11)=1, null(11)=0]
                │              │    │    │    ├── cost: 10000.05
                │              │    │    │    ├── fd: ()-->(11)
                │              │    │    │    ├── cte-uses
                │              │    │    │    │    └── &2: count=1 used-columns=(5,6)
                │              │    │    │    ├── with-scan &2 (t2)
                │              │    │    │    │    ├── columns: a:11 b:12
                │              │    │    │    │    ├── mapping:
                │              │    │    │    │    │    ├──  a:5 => a:11
                │              │    │    │    │    │    └──  b.b:6 => b:12
                │              │    │    │    │    ├── stats: [rows=1000000, distinct(11)=100, null(11)=10000]
                │              │    │    │    │    ├── cost: 0.02
                │              │    │    │    │    ├── prune: (11,12)
                │              │    │    │    │    └── cte-uses
                │              │    │    │    │         └── &2: count=1 used-columns=(5,6)
                │              │    │    │    └── filters
                │              │    │    │         └── a:11 = a:10 [outer=(10,11), constraints=(/10: (/NULL - ]; /11: (/NULL - ]), fd=(10)==(11), (11)==(10)]
                │              │    │    └── projections
                │              │    │         └── b:12 [as=b:14, outer=(12)]
                │              │    └── projections
                │              │         └── b:14 [as=column16:16, outer=(14)]
                │              └── filters (true)
                └── projections
                     └── NULL [as="?column?":21]

The problem originates when planning the right side of the apply-join:

├── select
│    ├── columns: a:11 b:12
│    ├── outer: (10)
│    ├── stats: [rows=990000, distinct(10)=1, null(10)=0, distinct(11)=1, null(11)=0]
│    ├── cost: 10000.05
│    ├── fd: ()-->(11)
│    ├── cte-uses
│    │    └── &2: count=1 used-columns=(5,6)
│    ├── with-scan &2 (t2)
│    │    ├── columns: a:11 b:12
│    │    ├── mapping:
│    │    │    ├──  a:5 => a:11
│    │    │    └──  b.b:6 => b:12
│    │    ├── stats: [rows=1000000, distinct(11)=100, null(11)=10000]
│    │    ├── cost: 0.02
│    │    ├── prune: (11,12)
│    │    └── cte-uses
│    │         └── &2: count=1 used-columns=(5,6)
│    └── filters
│         └── a:11 = a:10 [outer=(10,11), constraints=(/10: (/NULL - ]; /11: (/NULL - ]), fd=(10)==(11), (11)==(10)]

When this expression is re-planned, the statistics builder tries to determine the column statistics for a:11, because it is part of the filter expression. a:11 is mapped from a:5 so the statistics builder follows the trail and tries to lookup the stats for a:5. a:5 originates from another CTE, so the stats builder tries to lookup that CTE, but panics because it has not been added to the metadata. It's not added to the metadata because it's not referenced in the right side of the apply-join—we only add WITH expressions to the metadata during an apply join if the there is a WITH scan referencing the WITH expression:

case *memo.WithScanExpr:
// Allow referring to "outer" With expressions. The bound expressions
// are not part of this Memo but they are used only for their relational
// properties, which should be valid.
for i := range withExprs {
if withExprs[i].id == t.With {
memoExpr := b.mem.Metadata().WithBinding(t.With)
f.Metadata().AddWithBinding(t.With, memoExpr)

I think the solution is to add all WITH expressions to the metadata if there is any WITH scan in the right side of the join.

Another issue is that the apply-join code can crash a node because some of the function calls outside of o.Optimize() can panic, but there is no panic catcher. I think copying the panic-catching logic into the planRightSideFn should fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant