Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: tlp failed #69754

Closed
cockroach-teamcity opened this issue Sep 2, 2021 · 18 comments
Closed

roachtest: tlp failed #69754

cockroach-teamcity opened this issue Sep 2, 2021 · 18 comments
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-sql-queries SQL Queries Team

Comments

@cockroach-teamcity
Copy link
Member

roachtest.tlp failed with artifacts on master @ 63679e7bf4a0244c6e46381751543cfe3f32388c:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tlp/run_1
	tlp.go:128,test_runner.go:777: expected unpartitioned count 3 to equal partitioned count 8
		(1) attached stack trace
		  -- stack trace:
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTLPQuery.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tlp.go:191
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runWithTimeout.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tlp.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1371
		Wraps: (2) expected unpartitioned count 3 to equal partitioned count 8
		  | sql: SELECT count(*) FROM defaultdb.public.table2 AS tab_1761
		  | SELECT count(*) FROM (SELECT * FROM defaultdb.public.table2 AS tab_1761 WHERE tab_1761.crdb_internal_idx_expr_3 ~* tab_1761.crdb_internal_idx_expr_4 UNION ALL SELECT * FROM defaultdb.public.table2 AS tab_1761 WHERE NOT (tab_1761.crdb_internal_idx_expr_3 ~* tab_1761.crdb_internal_idx_expr_4) UNION ALL SELECT * FROM defaultdb.public.table2 AS tab_1761 WHERE (tab_1761.crdb_internal_idx_expr_3 ~* tab_1761.crdb_internal_idx_expr_4) IS NULL)
		Error types: (1) *withstack.withStack (2) *errutil.leafError
Reproduce

See: roachtest README

/cc @cockroachdb/sql-queries

This test on roachdash | Improve this report!

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Sep 2, 2021
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Sep 2, 2021
@cucaroach
Copy link
Contributor

This is another one of those where select * returns the right results but select count(*) is returning too high a number:

root@:26257/defaultdb> SELECT count(*) FROM (SELECT * FROM defaultdb.public.table2 AS tab_1761 WHERE tab_1761.crdb_internal_idx_expr_3 ~* tab_1761.crdb_internal_idx_expr_4 UNION ALL SELECT * FROM defaultdb.public.table2 AS tab_1761 WHERE NOT (tab_1761.crdb_internal_idx_expr_3 ~* tab_1761.crdb_internal_idx_expr_4) UNION ALL SELECT * FROM defaultdb.public.table2 AS tab_1761 WHERE (tab_1761.crdb_internal_idx_expr_3 ~* tab_1761.crdb_internal_idx_expr_4) IS NULL);
  count
---------
      8
(1 row)


Time: 3ms total (execution 3ms / network 0ms)

root@:26257/defaultdb> SELECT * FROM (SELECT * FROM defaultdb.public.table2 AS tab_1761 WHERE tab_1761.crdb_internal_idx_expr_3 ~* tab_1761.crdb_internal_idx_expr_4 UNION ALL SELECT * FROM defaultdb.public.table2 AS tab_1761 WHERE NOT (tab_1761.crdb_internal_idx_expr_3 ~* tab_1761.crdb_internal_idx_expr_4) UNION ALL SELECT * FROM defaultdb.public.table2 AS tab_1761 WHERE (tab_1761.crdb_internal_idx_expr_3 ~* tab_1761.crdb_internal_idx_expr_4) IS NULL);
  col2_0 | col2_1 | col2_2 |    col2_3     |  col2_4   |    col2_5     |           col2_6           |                col2_7                |   col2_8   |   col2_9   |               col2_10                |  col2_11
---------+--------+--------+---------------+-----------+---------------+----------------------------+--------------------------------------+------------+------------+--------------------------------------+-------------
  G      | ☃      | false  | -0.0060540563 | yU-2\x02  | a75           | 2024-07-20 16:13:57.000099 | 1b2fd3ef-c4a4-4108-b20e-8e3b85f9d730 | NULL       | NULL       | 1b2fd3ef-c4a4-4108-b20e-8e3b85f9d730 | NULL
  C      | \x00   | false  |    -1.3474761 | \aqVj\n92 | -=\x1a\rLR$U< | 1990-11-30 04:35:44.000597 | e0652f34-a0f9-49e7-aac0-cf5c388928bd | 1514230552 | 1514230552 | e0652f34-a0f9-49e7-aac0-cf5c388928bd | 1514230552
  :      | \x12   |  NULL  |     1.4238816 |           | \x13Kb5       | 1995-11-11 09:56:55.000583 | 897dfa63-8bbf-4954-8b4e-d8fce7a1bda8 | 2027769772 | 2027769772 | 897dfa63-8bbf-4954-8b4e-d8fce7a1bda8 | 2027769772
(3 rows)

Time: 6ms total (execution 6ms / network 0ms)

The extra rows are coming from the IS NULL branch of UNION ALL:

root@:26257/defaultdb> SELECT * FROM defaultdb.public.table2 AS tab_1761 WHERE (tab_1761.crdb_internal_idx_expr_3 ~* tab_1761.crdb_internal_idx_expr_4) IS NULL;
  col2_0 | col2_1 | col2_2 |  col2_3   | col2_4 | col2_5  |           col2_6           |                col2_7                |   col2_8   |   col2_9   |               col2_10                |  col2_11
---------+--------+--------+-----------+--------+---------+----------------------------+--------------------------------------+------------+------------+--------------------------------------+-------------
  :      | \x12   |  NULL  | 1.4238816 |        | \x13Kb5 | 1995-11-11 09:56:55.000583 | 897dfa63-8bbf-4954-8b4e-d8fce7a1bda8 | 2027769772 | 2027769772 | 897dfa63-8bbf-4954-8b4e-d8fce7a1bda8 | 2027769772
(1 row)


Time: 3ms total (execution 3ms / network 0ms)

root@:26257/defaultdb> SELECT count(*) FROM defaultdb.public.table2 AS tab_1761 WHERE (tab_1761.crdb_internal_idx_expr_3 ~* tab_1761.crdb_internal_idx_expr_4) IS NULL;
  count
---------
      6
(1 row)

@cucaroach
Copy link
Contributor

I don't understand what this crdb_internal_idx_expr_# business is all about ... @mgartner any thoughts on where to go next with this?

@cucaroach
Copy link
Contributor

Looking at the column families most have 3 rows but two have 8 rows:

                          |     FAMILY fam_0_col2_2_rowid (col2_2, rowid),
                          |     FAMILY fam_1_col2_8 (col2_8),
                          |     FAMILY fam_2_col2_3 (col2_3),
                          |     FAMILY fam_3_col2_10_col2_1_col2_5_col2_6_col2_4 (col2_10, col2_1, col2_5, col2_6, col2_4),
                          |     FAMILY fam_4_col2_7 (col2_7),
                          |     FAMILY fam_5_col2_0 (col2_0)
                          | )
(1 row)


Time: 10ms total (execution 10ms / network 0ms)

root@:26257/defaultdb> SELECT col2_2 FROM defaultdb.public.table2;
  col2_2
----------
  false
   NULL
   NULL
   NULL
   NULL
   NULL
   NULL
  false
(8 rows)


Time: 2ms total (execution 2ms / network 0ms)

root@:26257/defaultdb> SELECT col2_8 FROM defaultdb.public.table2;
    col2_8
--------------
  2027769772
  NULL
  1514230552
(3 rows)


Time: 2ms total (execution 2ms / network 0ms)

root@:26257/defaultdb> SELECT col2_3 FROM defaultdb.public.table2;
     col2_3
-----------------
  -0.0060540563
      0.9064203
              0
      1.1829036
      -0.588561
      1.4238816
      0.5091441
     -1.3474761
(8 rows)


Time: 2ms total (execution 2ms / network 0ms)

root@:26257/defaultdb> SELECT col2_10, col2_1, col2_5, col2_6, col2_4 FROM defaultdb.public.table2;
                col2_10                | col2_1 |    col2_5     |           col2_6           |  col2_4
---------------------------------------+--------+---------------+----------------------------+------------
  897dfa63-8bbf-4954-8b4e-d8fce7a1bda8 | \x12   | \x13Kb5       | 1995-11-11 09:56:55.000583 |
  1b2fd3ef-c4a4-4108-b20e-8e3b85f9d730 | ☃      | a75           | 2024-07-20 16:13:57.000099 | yU-2\x02
  e0652f34-a0f9-49e7-aac0-cf5c388928bd | \x00   | -=\x1a\rLR$U< | 1990-11-30 04:35:44.000597 | \aqVj\n92
(3 rows)


Time: 3ms total (execution 3ms / network 0ms)

root@:26257/defaultdb> SELECT col2_7 FROM defaultdb.public.table2;
                 col2_7
----------------------------------------
  897dfa63-8bbf-4954-8b4e-d8fce7a1bda8
  1b2fd3ef-c4a4-4108-b20e-8e3b85f9d730
  e0652f34-a0f9-49e7-aac0-cf5c388928bd
(3 rows)


Time: 2ms total (execution 2ms / network 0ms)

root@:26257/defaultdb> SELECT col2_0 FROM defaultdb.public.table2;
  col2_0
----------
  :
  G
  C
(3 rows)

@cucaroach
Copy link
Contributor

cucaroach commented Sep 3, 2021

Despite being seemingly related to index use and expression based indexes this isn't the same as #66706 because it happens in row engine too and isn't related to column families.

@cucaroach cucaroach self-assigned this Sep 3, 2021
@cucaroach
Copy link
Contributor

cucaroach commented Sep 3, 2021

Reducto!

set vectorize = off; -- doesn't matter 
CREATE TABLE table2 (
    col2_0 CHAR,  -- this has to be CHAR, VARCHAR and problem goes away
    col2_4 VARCHAR,
    col2_7 VARCHAR, 
    col2_10 STRING AS (CAST(col2_7 AS VARCHAR)) STORED, -- only repros on STORED, not VIRTUAL
    INDEX (col2_10 ASC, lower(col2_0), col2_4 ASC) -- only repros with lower() around col2_0
    );

INSERT
INTO
	defaultdb.public.table2 (col2_7, col2_4, col2_0)
VALUES
	(
		'7',
		'4',
		' ':::STRING -- this has to be ' ':::STRING, anything else doesn't work
	);

UPDATE
	defaultdb.public.table2
SET
	col2_7 = '2_7'; -- only repros on col2_7 and col2_4 and when col2_0 is not ' ' 

SELECT col2_4,col2_10 FROM table2;

This outputs:

  col2_4 | col2_10
---------+----------
  4      | 2_7
  4      | 7
(2 rows)

And should output:

  col2_4 | col2_10
---------+----------
  4      | 2_7
(1 row)

@mgartner
Copy link
Collaborator

mgartner commented Sep 7, 2021

This looks to be the same issue as #69665 and #69327. It looks like this is an existing issue in v21.1.7, so maybe this should not be labelled as a release blocker?

Here's a simplified example.

In Postgres:

CREATE TABLE t (c CHAR, v TEXT GENERATED ALWAYS AS (c) STORED);

INSERT INTO t VALUES (' '::TEXT);

SELECT * FROM t WHERE v = ' ';
 c | v
---+---
(0 rows)

In cockroach 21.1.7:

CREATE TABLE t (c CHAR, v TEXT GENERATED ALWAYS AS (c) STORED);

INSERT INTO t VALUES (' '::TEXT);

SELECT * FROM t WHERE v = ' ';
  c | v
----+----
    |
(1 row)

I've started work on a fix that should be backportable to 21.1.

@cucaroach cucaroach removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Sep 7, 2021
@cucaroach cucaroach removed their assignment Sep 7, 2021
@cockroach-teamcity

This comment has been minimized.

@mgartner
Copy link
Collaborator

I reduced this last failure to:

CREATE TABLE table2 (
        col2_0
                REGPROC,
        col2_1
                FLOAT4,
        col2_2
                REGNAMESPACE,
        col2_3
                "char",
        col2_4
                VARCHAR,
        col2_6
                STRING,
        col2_7
                FLOAT4,
        col2_8
                GEOMETRY,
        col2_12
                STRING,
        col2_15
                STRING AS (lower(col2_3)) VIRTUAL,
        INDEX (col2_15 ASC)
);

INSERT
INTO
        table2 (col2_1, col2_0, col2_4, col2_6, col2_7, col2_8)
VALUES
        (0, 0, '', e'|<\x0bu', 0, '010300000000000000');

UPDATE
        table2 AS tab_788
SET
        col2_3 = tab_788.col2_6,
        col2_2 = COALESCE(tab_788.col2_0, tab_788.col2_0),
        col2_8 = tab_788.col2_8
WHERE
        inet_same_family(
                '6243:489:803a:8708:fee7:b652:14d8:b27e/36',
                'c337:7d82:d604:65fe:1cba:77de:e2a5:f37d/61' + 0
        );

DELETE FROM table2 WHERE true;

SELECT
        IF(
                (
                        SELECT
                                count(*)
                        FROM
                                (
                                        SELECT count(*) FROM table2
                                        INTERSECT
                                                SELECT
                                                        count(*)
                                                FROM
                                                        (
                                                                SELECT
                                                                        *
                                                                FROM
                                                                        table2 AS tab_2106
                                                                WHERE
                                                                        tab_2106.col2_4
                                                                        NOT ILIKE tab_2106.col2_12
                                                                UNION ALL
                                                                        SELECT
                                                                                *
                                                                        FROM
                                                                                table2 AS tab_2106
                                                                        WHERE
                                                                                NOT
                                                                                   (
                                                                                   tab_2106.col2_4
                                                                                   NOT ILIKE tab_2106.col2_12
                                                                                   )
                                                                UNION ALL
                                                                        SELECT
                                                                                *
                                                                        FROM
                                                                                table2 AS tab_2106
                                                                        WHERE
                                                                                (
                                                                                   tab_2106.col2_4
                                                                                   NOT ILIKE tab_2106.col2_12
                                                                                ) IS NULL
                                                        )
                                )
                )
                = 1,
                1,
                crdb_internal.force_panic('TLP FAILURE')
        );

This has the same ingredients as the other implicit-casting-related TLP failures ("char" column with computed column that depends on it), so I'm assuming it's the same issue.

@cockroach-teamcity

This comment has been minimized.

@cockroach-teamcity

This comment has been minimized.

@cockroach-teamcity

This comment has been minimized.

@cockroach-teamcity

This comment has been minimized.

@cockroach-teamcity

This comment has been minimized.

@cockroach-teamcity

This comment has been minimized.

@cockroach-teamcity
Copy link
Member Author

roachtest.tlp failed with artifacts on master @ 260b7930e717c76bc421ae3f34e9a939fc896b50:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tlp/run_1
	tlp.go:131,test_runner.go:779: expected unpartitioned and partitioned results to be equal
		(1) attached stack trace
		  -- stack trace:
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTLPQuery.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tlp.go:204
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runWithTimeout.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tlp.go:229
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (2) expected unpartitioned and partitioned results to be equal
		  |   []string{
		  | - 	"0",
		  | + 	"8",
		  |   }
		  | sql: SELECT COUNT(first) FROM (SELECT * FROM defaultdb.public.table1 AS tab_4308) tab_4308(first)
		  | SELECT SUM(agg) FROM (SELECT COUNT(first) AS agg FROM (SELECT * FROM defaultdb.public.table1 AS tab_4308 WHERE (false OR (e'1"\x05\f\x02e\x1b:\x1b' !~* tab_4308.crdb_internal_idx_expr_3))) tab_4308(first) UNION ALL SELECT COUNT(first) AS agg FROM (SELECT * FROM defaultdb.public.table1 AS tab_4308 WHERE NOT ((false OR (e'1"\x05\f\x02e\x1b:\x1b' !~* tab_4308.crdb_internal_idx_expr_3)))) tab_4308(first) UNION ALL SELECT COUNT(first) AS agg FROM (SELECT * FROM defaultdb.public.table1 AS tab_4308 WHERE ((false OR (e'1"\x05\f\x02e\x1b:\x1b' !~* tab_4308.crdb_internal_idx_expr_3))) IS NULL) tab_4308(first))
		  | with args: []
		Error types: (1) *withstack.withStack (2) *errutil.leafError
Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.tlp failed with artifacts on master @ 884088d64d0859d72e42ee4fde1cf65192f1cb99:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tlp/run_1
	tlp.go:131,test_runner.go:779: expected unpartitioned and partitioned results to be equal
		(1) attached stack trace
		  -- stack trace:
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTLPQuery.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tlp.go:204
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runWithTimeout.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tlp.go:229
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (2) expected unpartitioned and partitioned results to be equal
		  |   []string{
		  | - 	"4014694650",
		  | + 	"NULL",
		  |   }
		  | sql: SELECT MIN(first) FROM (SELECT * FROM defaultdb.public.table1 AS tab_9524) tab_9524(first)
		  | SELECT MIN(agg) FROM (SELECT MIN(first) AS agg FROM (SELECT * FROM defaultdb.public.table1 AS tab_9524 WHERE tab_9524.crdb_internal_mvcc_timestamp::DECIMAL IN (tab_9524.crdb_internal_mvcc_timestamp, -728.3886853323624670)) tab_9524(first) UNION ALL SELECT MIN(first) AS agg FROM (SELECT * FROM defaultdb.public.table1 AS tab_9524 WHERE NOT (tab_9524.crdb_internal_mvcc_timestamp::DECIMAL IN (tab_9524.crdb_internal_mvcc_timestamp, -728.3886853323624670))) tab_9524(first) UNION ALL SELECT MIN(first) AS agg FROM (SELECT * FROM defaultdb.public.table1 AS tab_9524 WHERE (tab_9524.crdb_internal_mvcc_timestamp::DECIMAL IN (tab_9524.crdb_internal_mvcc_timestamp, -728.3886853323624670)) IS NULL) tab_9524(first))
		  | with args: []
		Error types: (1) *withstack.withStack (2) *errutil.leafError
Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@mgartner
Copy link
Collaborator

The reproductions above here and here have been fixed by recent work that added assignment casts.

It looks like there's been some other failures since then that might be unrelated. I'm going to close this issue for now, but I'll go through each failure and open a new issue if it hasn't been fixed by assignment casts.

@mgartner
Copy link
Collaborator

The failure two comments above will be fixed by #74180.

@mgartner mgartner moved this to Done in SQL Queries Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-sql-queries SQL Queries Team
Projects
Archived in project
Development

No branches or pull requests

3 participants