Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: incorrect aggregation planning in the vectorized engine #87619

Closed
cockroach-teamcity opened this issue Sep 8, 2022 · 13 comments · Fixed by #87662
Closed

roachtest: incorrect aggregation planning in the vectorized engine #87619

cockroach-teamcity opened this issue Sep 8, 2022 · 13 comments · Fixed by #87662
Assignees
Labels
branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-sql-queries SQL Queries Team
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Sep 8, 2022

roachtest.unoptimized-query-oracle/disable-rules=half failed with artifacts on release-22.2 @ f8e04f32b84a9727c9e813d0096bde2de64d5675:

		  | 	GOROOT/src/runtime/asm_amd64.s:1594
		Wraps: (4) expected unoptimized and optimized results to be equal
		  |   []string(
		  | - 	{
		  | - 		"1662649196136744373.0000000000,false,-4601-01-09 00:00:00 +0000 +0000,{0,0,0,0,0,0},\x15ch,c{\bF,BOX(-1.0956347987282582 -1.32097318"...,
		  | - 		"1662649196136744373.0000000000,false,-4601-01-09 00:00:00 +0000 +0000,{0,0,0,0,0},\x15ch,c{\bF,BOX(-1.0956347987282582 -1.3209731823"...,
		  | - 		"1662649196136744373.0000000000,false,-4601-01-09 00:00:00 +0000 +0000,{0,0,0,0,0},\x15ch,c{\bF,BOX(-1.0956347987282582 -1.3209731823"...,
		  | - 		"1662649196136744373.0000000000,false,-4601-01-09 00:00:00 +0000 +0000,{0,0,0,0},\x15ch,c{\bF,BOX(-1.0956347987282582 -1.320973182306"...,
		  | - 		"1662649196136744373.0000000000,false,-4601-01-09 00:00:00 +0000 +0000,{0,0,0,0},\x15ch,c{\bF,BOX(-1.0956347987282582 -1.320973182306"...,
		  | - 		"1662649196136744373.0000000000,false,-4601-01-09 00:00:00 +0000 +0000,{0,0,0,0},\x15ch,c{\bF,BOX(-1.0956347987282582 -1.320973182306"...,
		  | - 		"1662649196136744373.0000000000,false,-4601-01-09 00:00:00 +0000 +0000,{0,0,0},\x15ch,c{\bF,BOX(-1.0956347987282582 -1.32097318230616"...,
		  | - 		"1662649196136744373.0000000000,false,-4601-01-09 00:00:00 +0000 +0000,{0,0,0},\x15ch,c{\bF,BOX(-1.0956347987282582 -1.32097318230616"...,
		  | - 		"1662649196136744373.0000000000,false,-4601-01-09 00:00:00 +0000 +0000,{0,0,0},\x15ch,c{\bF,BOX(-1.0956347987282582 -1.32097318230616"...,
		  | - 		"1662649196136744373.0000000000,false,-4601-01-09 00:00:00 +0000 +0000,{0,0},\x15ch,c{\bF,BOX(-1.0956347987282582 -1.3209731823061643"...,
		  | - 		"1662649196136744373.0000000000,false,-4601-01-09 00:00:00 +0000 +0000,{0,0},\x15ch,c{\bF,BOX(-1.0956347987282582 -1.3209731823061643"...,
		  | - 		"1662649196136744373.0000000000,false,-4601-01-09 00:00:00 +0000 +0000,{0,0},\x15ch,c{\bF,BOX(-1.0956347987282582 -1.3209731823061643"...,
		  | - 		"1662649196136744373.0000000000,false,-4601-01-09 00:00:00 +0000 +0000,{0,0},\x15ch,c{\bF,BOX(-1.0956347987282582 -1.3209731823061643"...,
		  | - 		"1662649196136744373.0000000000,false,-4601-01-09 00:00:00 +0000 +0000,{0,0},\x15ch,c{\bF,BOX(-1.0956347987282582 -1.3209731823061643"...,
		  | - 		"1662649196136744373.0000000000,false,-4601-01-09 00:00:00 +0000 +0000,{0,0},\x15ch,c{\bF,BOX(-1.0956347987282582 -1.3209731823061643"...,
		  | - 		"1662649196136744373.0000000000,false,-4601-01-09 00:00:00 +0000 +0000,{0,0},\x15ch,c{\bF,BOX(-1.0956347987282582 -1.3209731823061643"...,
		  | - 		...,
		  | - 	},
		  | + 	nil,
		  |   )
		  | sql: SELECT
		  | 	tab_3577.crdb_internal_mvcc_timestamp AS col_11602,
		  | 	false AS col_11603,
		  | 	tab_3577.col4_5 AS col_11604,
		  | 	tab_3577.col4_8 AS col_11605,
		  | 	e'\x15ch,c{\bF':::STRING AS col_11606,
		  | 	'BOX(-1.0956347987282582 -1.3209731823061643,0.0810917297263638 0.41710503833473356)':::BOX2D AS col_11607,
		  | 	e'\x00':::STRING AS col_11608,
		  | 	e'l\r/':::STRING AS col_11609,
		  | 	'aaaaaa':::STRING AS col_11610,
		  | 	'2025-10-23 18:56:55.000584':::TIMESTAMP AS col_11611,
		  | 	0:::OID AS col_11612
		  | FROM
		  | 	defaultdb.public.table4@[0] AS tab_3577
		  | WHERE
		  | 	EXISTS(
		  | 		SELECT
		  | 			true AS col_11601
		  | 		FROM
		  | 			defaultdb.public.table4@[0] AS tab_3578
		  | 		GROUP BY
		  | 			tab_3578.col4_7
		  | 		HAVING
		  | 			tab_3578.col4_7
		  | 	)
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.leafError

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

/cc @cockroachdb/sql-queries

This test on roachdash | Improve this report!

Jira issue: CRDB-19448

@cockroach-teamcity cockroach-teamcity added branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Sep 8, 2022
@cockroach-teamcity cockroach-teamcity added this to the 22.2 milestone Sep 8, 2022
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Sep 8, 2022
@msirek
Copy link
Contributor

msirek commented Sep 8, 2022

looking into reducing this...

@msirek
Copy link
Contributor

msirek commented Sep 8, 2022

Reduced test case:

drop table if exists table4;
CREATE TABLE table4 (col4_0 REGNAMESPACE NULL, col4_1 BOX2D NOT NULL, col4_2 CHAR, col4_3 BOX2D NOT NULL, col4_4 TIMESTAMPTZ NULL, col4_5 DATE NOT NULL, col4_6 TIMESTAMP, col4_7 BOOL NULL, col4_8 OID[] NOT NULL, col4_9 TIMESTAMPTZ, col4_10 STRING AS (CASE WHEN col4_9 IS NULL THEN e'B\x02:xnbY':::STRING ELSE 'gNJ':::STRING END) VIRTUAL, col4_11 STRING AS (CASE WHEN col4_6 IS NULL THEN e'Ce<\x135\x1f':::STRING ELSE e'\'':::STRING END) STORED, col4_12 STRING NULL AS (CASE WHEN col4_4 IS NULL THEN 'N':::STRING ELSE e'#(\x17':::STRING END) STORED, col4_13 STRING NOT NULL AS (lower(CAST(col4_3 AS STRING))) STORED, col4_14 STRING NULL AS (CASE WHEN col4_0 IS NULL THEN e'\bZA':::STRING ELSE NULL END) STORED, INDEX (col4_2 DESC, col4_7, col4_6 DESC, (CASE WHEN col4_0 IS NULL THEN e'\x7fU\x11w)J\x12k\x1d':::STRING ELSE e'w9\x0b\`%u':::STRING END), col4_13 DESC, col4_14) STORING (col4_1, col4_3, col4_4, col4_8, col4_11, col4_12) WHERE ((((((((((table4.col4_14 != 'X':::STRING) AND (table4.col4_9 > '294276-12-31 23:59:59.999999+00:00':::TIMESTAMPTZ)) OR (table4.col4_11 > e'\U00002603':::STRING)) AND (table4.col4_10 < 'X':::STRING)) OR (NOT table4.col4_7)) AND (table4.col4_13 >= '"':::STRING)) AND (table4.col4_6 >= '-2000-01-01 00:00:00':::TIMESTAMP)) OR (table4.col4_2 <= 'X':::STRING)) AND (table4.col4_5 < 'infinity':::DATE)) AND (table4.col4_12 > '"':::STRING)) AND (table4.col4_4 > '0001-01-01 00:00:00+00:00':::TIMESTAMPTZ), FAMILY (col4_0), FAMILY (col4_6, col4_5, col4_4, col4_2), FAMILY (col4_11), FAMILY (col4_14), FAMILY (col4_8), FAMILY (col4_12, col4_9, col4_7), FAMILY (col4_3, col4_1, col4_13));

INSERT
INTO
        table4 AS tab_6 (col4_0, col4_1, col4_2, col4_3, col4_4, col4_5, col4_7, col4_8, col4_9)
VALUES
        (
                0:::OID,
                'BOX(-1 15,299 415)':::BOX2D,
                e'\u00E1':::STRING,
                'BOX(-0.28451483613251705 -1.6883549383765524,0.5285429524178467 1.0272876055649685)':::BOX2D,
                NULL,
                '1992-01-13':::DATE,
                TRUE,
                ARRAY[0:::OID],
                NULL
        );
SET testing_optimizer_random_seed = 3164997759865821235;

SET testing_optimizer_disable_rule_probability = 0.500000;

SET vectorize = off;

-- Returns one row:
SELECT
        tab_3577.crdb_internal_mvcc_timestamp AS col_11602,
        false AS col_11603,
        tab_3577.col4_5 AS col_11604,
        tab_3577.col4_8 AS col_11605,
        e'\x15ch,c{\bF':::STRING AS col_11606,
        'BOX(-1.0956347987282582 -1.3209731823061643,0.0810917297263638 0.41710503833473356)':::BOX2D AS col_11607,
        e'\x00':::STRING AS col_11608,
        e'l\r/':::STRING AS col_11609,
        'aaaaaa':::STRING AS col_11610,
        '2025-10-23 18:56:55.000584':::TIMESTAMP AS col_11611,
        0:::OID AS col_11612
FROM
        table4@[0] AS tab_3577
WHERE
        EXISTS(
                SELECT
                        true AS col_11601
                FROM
                        table4@[0] AS tab_3578
                GROUP BY
                        tab_3578.col4_7
                HAVING
                        tab_3578.col4_7
        );

RESET vectorize;

-- Returns no rows:
SELECT
        tab_3577.crdb_internal_mvcc_timestamp AS col_11602,
        false AS col_11603,
        tab_3577.col4_5 AS col_11604,
        tab_3577.col4_8 AS col_11605,
        e'\x15ch,c{\bF':::STRING AS col_11606,
        'BOX(-1.0956347987282582 -1.3209731823061643,0.0810917297263638 0.41710503833473356)':::BOX2D AS col_11607,
        e'\x00':::STRING AS col_11608,
        e'l\r/':::STRING AS col_11609,
        'aaaaaa':::STRING AS col_11610,
        '2025-10-23 18:56:55.000584':::TIMESTAMP AS col_11611,
        0:::OID AS col_11612
FROM
        table4@[0] AS tab_3577
WHERE
        EXISTS(
                SELECT
                        true AS col_11601
                FROM
                        table4@[0] AS tab_3578
                GROUP BY
                        tab_3578.col4_7
                HAVING
                        tab_3578.col4_7
        );

SELECT
  true AS col_11601
  FROM
  table4@[0] AS tab_3578
  GROUP BY
    tab_3578.col4_7
  HAVING
    tab_3578.col4_7;

@msirek
Copy link
Contributor

msirek commented Sep 8, 2022

Incorrect results occur in vectorized mode, but not in row mode.

@msirek
Copy link
Contributor

msirek commented Sep 8, 2022

In the good plan, without random rules disabled, there is a limit in the subquery:

└── • subquery
    │ id: @S1
    │ original sql: EXISTS (SELECT true AS col_11601 FROM table4@[0] AS tab_3578 GROUP BY tab_3578.col4_7 HAVING tab_3578.col4_7)
    │ exec mode: exists
    │
    └── • limit
        │ columns: (col4_7)
        │ count: 1
        │
        └── • filter
            │ columns: (col4_7)
            │ estimated row count: 495 (missing stats)
            │ filter: col4_7
            │
            └── • scan
                  columns: (col4_7)
                  estimated row count: 1,000 (missing stats)
                  table: table4@table4_pkey
                  spans: FULL SCAN (SOFT LIMIT)

This is missing in the bad plan. Not sure why this might matter.
An even simpler test case is:

-- Should return 1 row
SELECT col_11601 FROM
(SELECT
  true AS col_11601
FROM
  defaultdb.public.table4@[0] AS tab_3578 GROUP BY tab_3578.col4_7 HAVING tab_3578.col4_7) WHERE col_11601 LIMIT 1;
  col_11601
-------------
(0 rows)

@yuzefovich
Copy link
Member

The fact that we have the same column in GROUP BY and HAVING seems interesting.

@yuzefovich
Copy link
Member

yuzefovich commented Sep 8, 2022

The problem seems to be that the optimizer is marking the GROUP BY as "non-scalar" meaning that the execution won't produce any rows for an empty input. I guess in this case the input is non-empty, nvm.

@DrewKimball
Copy link
Collaborator

Adding the grouping column to the projections causes the row to be returned:

SELECT
  true AS col_11601, tab_3578.col4_7
  FROM
  table4@[0] AS tab_3578
  GROUP BY
    tab_3578.col4_7
  HAVING
    tab_3578.col4_7;

@yuzefovich
Copy link
Member

It's a problem in the vectorized engine similar to what we fixed in #87451, I'll take it.

@yuzefovich yuzefovich removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Sep 8, 2022
@yuzefovich yuzefovich self-assigned this Sep 8, 2022
@msirek
Copy link
Contributor

msirek commented Sep 8, 2022

The table doesn't seem to show up in the operation tree:

EXPLAIN(VEC,VERBOSE)
SELECT col_11601 FROM
(SELECT
  true AS col_11601
FROM
  defaultdb.public.table4@[0] AS tab_3578 GROUP BY tab_3578.col4_7 HAVING tab_3578.col4_7) WHERE col_11601;
                       info
---------------------------------------------------
  │
  └ Node 1
    └ *colflow.BatchFlowCoordinator
      └ *colexecbase.constBoolOp
        └ *colexecutils.vectorTypeEnforcer
          └ *colexecutils.fixedNumTuplesNoInputOp

Maybe the column is getting reduced to a constant somehow... and that makes the execution think a ColBatchScan is not needed?

@yuzefovich
Copy link
Member

The issue is that #87451 made it so that for non-scalar group by with no aggregation functions we now always return 0 rows even if the input is non-empty (which is wrong). So #87451 fixed one hypothetical bug but introduced another. I'll send a PR shortly.

@DrewKimball
Copy link
Collaborator

DrewKimball commented Sep 9, 2022

I think it would have been wrong even before #87451, right? We should be returning as many rows as there are groups in the non-scalar case, so the fixedNumTuplesNoInputOp operator could never work.

@yuzefovich
Copy link
Member

Can there be more than a single output row when there are no aggregate functions to compute?

@DrewKimball
Copy link
Collaborator

Yes, I think in that case it would be equivalent to a distinct + a project:

postgres=# create table t (x int);
CREATE TABLE
postgres=# insert into t values (1), (1), (2), (3);
INSERT 0 4
postgres=# select 1 from t group by x having x < 3;
 ?column?
----------
        1
        1
(2 rows)

@yuzefovich yuzefovich changed the title roachtest: unoptimized-query-oracle/disable-rules=half failed roachtest: incorrect aggregation planning in the vectorized engine Sep 9, 2022
@craig craig bot closed this as completed in c20e8a1 Sep 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants