Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: [- replaced with 0] unoptimized-query-oracle/disable-rules=all/rand-tables failed #118273

Closed
cockroach-teamcity opened this issue Jan 24, 2024 · 13 comments · Fixed by #118587
Assignees
Labels
branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. P-2 Issues/test failures with a fix SLA of 3 months T-sql-queries SQL Queries Team
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Jan 24, 2024

roachtest.unoptimized-query-oracle/disable-rules=all/rand-tables failed with artifacts on release-23.1 @ 337e61aff7f87b3eee7caf51dba2482c9e763799:

  	}, ""),
  	strings.Join({
  		"\x00,",
- 		"-",
+ 		"0",
  		",01:35:24.340338,false,01:35:24.340338,/v\x1d=,à,☃,0000-01-01 00",
  		":00:00 +0000 UTC,NULL,NULL",
  	}, ""),
  	strings.Join({
  		"\x00,",
- 		"-",
+ 		"0",
  		",01:35:24.340338,false,01:35:24.340338,0\f6,à,☃,NULL,0,2029-08",
  		"-30 10:58:14 +0000 UTC",
  	}, ""),
  	strings.Join({
  		"\x00,",
- 		"-",
+ 		"0",
  		",01:35:24.340338,false,01:35:24.340338,0\f6,à,☃,NULL,0,2031-09",
  		"-23 07:25:07.000311 +0000 UTC",
  	}, ""),
  	strings.Join({
  		"\x00,",
- 		"-",
+ 		"0",
  		",01:35:24.340338,false,01:35:24.340338,0\f6,à,☃,NULL,0,NULL",
  	}, ""),
  	... // 79784 identical and 69733 modified elements
  }
sql: SELECT
	tab593."coL1_2" AS "%pcol1894",
	regrole(tab592.col1_4::OID)::REGROLE AS col1895,
	tab593."col1_'9" AS col1896,
	false AS col1897,
	tab593.col͙1_11 AS "'col1898",
	tab592."%abcol1_1" AS "cO \\U000E56A8l\\x3d1899",
	e'\u00E0':::STRING AS "\\U000EF7A0col""%q1900",
	e'\U00002603':::STRING AS col1901,
	tab592.col̢1_5 AS "co l1902",
	tab592."col1_%777" AS "Col1903",
	tab593.col1_3 AS " 😍col1904"
FROM
	defaultdb.public."tabLe1"@[0] AS tab592
	RIGHT JOIN defaultdb.public."tabLe1"@[0] AS tab593 ON
			(tab592.crdb_internal_mvcc_timestamp) = (tab593.crdb_internal_mvcc_timestamp)
			AND (tab592."%abcol1_1") < (tab593.col͙1_11)
ORDER BY
	tab593.col1_4 ASC NULLS LAST, tab592."co}l1_8" ASC NULLS LAST
test artifacts and logs in: /artifacts/unoptimized-query-oracle/disable-rules=all/rand-tables/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

/cc @cockroachdb/sql-queries

This test on roachdash | Improve this report!

Jira issue: CRDB-35656

@cockroach-teamcity cockroach-teamcity added branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-sql-queries SQL Queries Team labels Jan 24, 2024
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Jan 24, 2024
@github-project-automation github-project-automation bot moved this to Triage in SQL Queries Jan 24, 2024
mgartner added a commit to mgartner/cockroach that referenced this issue Jan 25, 2024
…ts logs

In cockroachdb#102038 we started writing statements to the query log in SQL
comments. The change removes newline characters in the statement to
ensure that the statement will be on one line all within the comment.
However, we still see test logs where the comments are broken onto
multiple lines, as in cockroachdb#118273, which makes the file invalid
syntactically. This makes reproducing the test failure more difficult
because the comments have to be manually fixed so that the log file can
be successfully parsed.

I was able to determine that the line breaks are coming from other
"interesting" whitespace characters. The `xxd` output from the log file
in cockroachdb#118273 shows the carriage return character, `0d` being used in the
middle of a column name:

    00000090: 2041 5320 2263 6f0d 6c33 3830 2246 524f   AS "co.l380"FRO

This commit strips all interesting whitespace characters from the
statement to prevent the comment from being broken on multiple lines.

Release note: None
@mgartner
Copy link
Collaborator

I'm working on reducing this now.

@michae2
Copy link
Collaborator

michae2 commented Jan 25, 2024

If it helps, I got this far yesterday with reduce:
repro.sql.txt

@mgartner
Copy link
Collaborator

I didn't get far because reduce thinks that differently ordered results are a failure.

@cockroach-teamcity

This comment was marked as duplicate.

michae2 added a commit to michae2/cockroach that referenced this issue Jan 26, 2024
When reducing costfuzz and unoptimized-query-oracle failures, we compare
two query results to find a failure. This comparison was producing false
positives for rows with a different ordering. Add a sort before the
comparison to eliminate some false positives.

This doesn't exactly match the comparison used by costfuzz and
unoptimized-query-oracle, but that comparison relies on a proper SQL
driver which we don't have in reduce. This comparison with sort seems
like a good enough improvement for now, which we can revisit if costfuzz
or unoptimized-query-oracle find another case that cannot be reduced.

Informs: cockroachdb#118273

Release note: None
@michae2
Copy link
Collaborator

michae2 commented Jan 26, 2024

Latest failure was a dupe of #117806, hiding that.

I am currently reducing the first failure using #118361, we'll see if that works.

craig bot pushed a commit that referenced this issue Jan 26, 2024
118344: rowenc: return assertion errors in test builds around encoding r=yuzefovich a=yuzefovich

This might have caught the recently-fixed bug around Fingerprint method using key-encoding for TSQuery type even though it's not available.

I think that for value encoding it should be non-flaky to add these assertions, but for key encoding it might result in some noise, so we might revert this change later.

Epic: None

Release note: None

118361: cmd/reduce: add sort to -costfuzz and -unoptimized-query-oracle cmp r=mgartner a=michae2

When reducing costfuzz and unoptimized-query-oracle failures, we compare two query results to find a failure. This comparison was producing false positives for rows with a different ordering. Add a sort before the comparison to eliminate some false positives.

This doesn't exactly match the comparison used by costfuzz and unoptimized-query-oracle, but that comparison relies on a proper SQL driver which we don't have in reduce. This comparison with sort seems like a good enough improvement for now, which we can revisit if costfuzz or unoptimized-query-oracle find another case that cannot be reduced.

Informs: #118273

Release note: None

Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Michael Erickson <[email protected]>
mgartner added a commit to mgartner/cockroach that referenced this issue Jan 27, 2024
…ts logs

In cockroachdb#102038 we started writing statements to the query log in SQL
comments. The change removes newline characters in the statement to
ensure that the statement will be on one line all within the comment.
However, we still see test logs where the comments are broken onto
multiple lines, as in cockroachdb#118273, which makes the file invalid
syntactically. This makes reproducing the test failure more difficult
because the comments have to be manually fixed so that the log file can
be successfully parsed.

I was able to determine that the line breaks are coming from other
"interesting" whitespace characters. The `xxd` output from the log file
in cockroachdb#118273 shows the carriage return character, `0d` being used in the
middle of a column name:

    00000090: 2041 5320 2263 6f0d 6c33 3830 2246 524f   AS "co.l380"FRO

This commit strips all interesting whitespace characters from the
statement to prevent the comment from being broken on multiple lines.

Release note: None
@michae2
Copy link
Collaborator

michae2 commented Jan 29, 2024

Continuing to reduce...

@michae2
Copy link
Collaborator

michae2 commented Jan 30, 2024

Well, after 7 hours reduce spit out this, which unfortunately is still producing 132k rows from each select:

maybe_repro.sql.txt

I think it has something to do with vectorization but not much closer to a repro.

@mgartner
Copy link
Collaborator

Did you use the --chunk option of reduce? Something like --chunk 20 can speed up reduction quite a bit.

@michae2
Copy link
Collaborator

michae2 commented Jan 30, 2024

Did you use the --chunk option of reduce? Something like --chunk 20 can speed up reduction quite a bit.

The command was (with #118361):

reduce -chunk 25 -file ~/118273/unoptimized-query-oracle001.log -unoptimized-query-oracle -v | tee out.txt

craig bot pushed a commit that referenced this issue Jan 30, 2024
118312: roachtest: remove interesting whitespace characters in random SQL test logs r=mgartner a=mgartner

In #102038 we started writing statements to the query log in SQL
comments. The change removes newline characters in the statement to
ensure that the statement will be on one line all within the comment.
However, we still see test logs where the comments are broken onto
multiple lines, as in #118273, which makes the file invalid
syntactically. This makes reproducing the test failure more difficult
because the comments have to be manually fixed so that the log file can
be successfully parsed.

I was able to determine that the line breaks are coming from other
"interesting" whitespace characters. The `xxd` output from the log file
in #118273 shows the carriage return character, `0d` being used in the
middle of a column name:

    00000090: 2041 5320 2263 6f0d 6c33 3830 2246 524f   AS "co.l380"FRO

This commit strips all interesting whitespace characters from the
statement to prevent the comment from being broken on multiple lines.

Epic: None

Release note: None


118368: roachprod: fix default backup schedule creation on start r=herkolategan a=renatolabs

For a while now, `roachprod` has created a default backup schedule on cluster creation when `--schedule-backups` is passed. This is also the default in clusters created to run roachtests.

When we fixed an issue with starting external-process tenants in 3715eb5, however, we unadvertently changed the order of two operations performed by roachprod on cluster start: setting the default cluster settings, and creating the default backup schedule.

As a consequence, the command used to create the backup schedule fails because, at that point, we haven't configured a license key yet.

To make matters worse, there was a bug in the error handling of `createFixedBackupSchedule` that prevented errors from being reported to the user; these errors were being swallowed and went unnoticed for a few months.

In this commit, we fix the error checking in that function, and also officially remove code that partially supported creating backups or admin users in tenants. Currently, roachprod will run that part of the setup for the system tenant. In the future, we might revisit this and also create a backup schedule and admin users for application tenants.

Epic: none

Release note: None

118476: changefeedccl: use correct channel size in parallelio r=jayshrivastava a=jayshrivastava

Previously, the channels used for sending requests and receiving results were too small. This meant that a caller could block on sending a request even after acquiring quota. This change ensures that the size of the channels is large enough so that this blocking does not occur.

Closes: #118463
Closes: #118462
Closes: #118461
Closes: #118460
Closes: #118459
Closes: #118458
Epic: none

118482: kvserver: increase shutdown propagation time in range merge test r=andrewbaptist a=kvoli

`TestStoreRangeMergeDuringShutDown` could occasionally flake when the shutdown hadn't propagated before applying the lease.

Increase the post-shutdown sleep from 10ms to 20ms.

Fixes: #118348
Release note: None

118483: roachpb: increase test make priority trials r=andrewbaptist a=kvoli

`TestMakePriority` could (very) rarely flake due to slight differences in the sampled vs underlying distribution. Increase the trial runs by 33% from 750k to 1000k to reduce the likelihood of this occurring.

Fixes: #118399
Release note: None

118498: gcjob_test: deflake TestSchemaChangeGCJob r=rafiss a=rafiss

This updates a test assertion so that if the GC TTL already began, the test does not fail.

fixes #117485
fixes #118467
Release note: None

Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Renato Costa <[email protected]>
Co-authored-by: Jayant Shrivastava <[email protected]>
Co-authored-by: Austen McClernon <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
blathers-crl bot pushed a commit that referenced this issue Jan 30, 2024
…ts logs

In #102038 we started writing statements to the query log in SQL
comments. The change removes newline characters in the statement to
ensure that the statement will be on one line all within the comment.
However, we still see test logs where the comments are broken onto
multiple lines, as in #118273, which makes the file invalid
syntactically. This makes reproducing the test failure more difficult
because the comments have to be manually fixed so that the log file can
be successfully parsed.

I was able to determine that the line breaks are coming from other
"interesting" whitespace characters. The `xxd` output from the log file
in #118273 shows the carriage return character, `0d` being used in the
middle of a column name:

    00000090: 2041 5320 2263 6f0d 6c33 3830 2246 524f   AS "co.l380"FRO

This commit strips all interesting whitespace characters from the
statement to prevent the comment from being broken on multiple lines.

Release note: None
blathers-crl bot pushed a commit that referenced this issue Jan 30, 2024
…ts logs

In #102038 we started writing statements to the query log in SQL
comments. The change removes newline characters in the statement to
ensure that the statement will be on one line all within the comment.
However, we still see test logs where the comments are broken onto
multiple lines, as in #118273, which makes the file invalid
syntactically. This makes reproducing the test failure more difficult
because the comments have to be manually fixed so that the log file can
be successfully parsed.

I was able to determine that the line breaks are coming from other
"interesting" whitespace characters. The `xxd` output from the log file
in #118273 shows the carriage return character, `0d` being used in the
middle of a column name:

    00000090: 2041 5320 2263 6f0d 6c33 3830 2246 524f   AS "co.l380"FRO

This commit strips all interesting whitespace characters from the
statement to prevent the comment from being broken on multiple lines.

Release note: None
craig bot pushed a commit that referenced this issue Apr 30, 2024
118587: tree: correctly format the "unknown" oid for pg compatibility  r=DrewKimball a=DrewKimball

The `REGTYPE`, `REGPROC`, `REGCLASS`, and `REGNAMESPACE` types are all
in the oid type family, but unlike `OID`, display `-` instead of `0` when
the value is zero. Previously, this was handled by setting the `name` field
of `DOid` to `-`, but the handling was incomplete. This commit removes the
old special cases, and instead adds special handling to the functions that
parse an oid from a string, as well as the `DOid.Format()` method. This way,
correctly formatting the "unknown" oid doesn't rely on remembering to set
the `name` field.

In addition, this commit modifies some of the random datum generation
logic to create types other than the canonical `T_oid`, like `T_regrole`.

Fixes #118273

Release note: None

Co-authored-by: Drew Kimball <[email protected]>
@craig craig bot closed this as completed in a4b6234 Apr 30, 2024
@github-project-automation github-project-automation bot moved this from Active to Done in SQL Queries Apr 30, 2024
yuzefovich pushed a commit to yuzefovich/cockroach that referenced this issue Apr 30, 2024
The `REGTYPE`, `REGPROC`, `REGCLASS`, and `REGNAMESPACE` types are all
in the oid type family, but unlike `OID`, display `-` instead of `0` when
the value is zero. Previously, this was handled by setting the `name` field
of `DOid` to `-`, but the handling was incomplete. This commit removes the
old special cases, and instead adds special handling to the functions that
parse an oid from a string, as well as the `DOid.Format()` method. This way,
correctly formatting the "unknown" oid doesn't rely on remembering to set
the `name` field.

In addition, this commit modifies some of the random datum generation
logic to create types other than the canonical `T_oid`, like `T_regrole`.

Fixes cockroachdb#118273

Release note: None
yuzefovich pushed a commit to yuzefovich/cockroach that referenced this issue Apr 30, 2024
The `REGTYPE`, `REGPROC`, `REGCLASS`, and `REGNAMESPACE` types are all
in the oid type family, but unlike `OID`, display `-` instead of `0` when
the value is zero. Previously, this was handled by setting the `name` field
of `DOid` to `-`, but the handling was incomplete. This commit removes the
old special cases, and instead adds special handling to the functions that
parse an oid from a string, as well as the `DOid.Format()` method. This way,
correctly formatting the "unknown" oid doesn't rely on remembering to set
the `name` field.

In addition, this commit modifies some of the random datum generation
logic to create types other than the canonical `T_oid`, like `T_regrole`.

Fixes cockroachdb#118273

Release note: None
yuzefovich pushed a commit to yuzefovich/cockroach that referenced this issue Apr 30, 2024
The `REGTYPE`, `REGPROC`, `REGCLASS`, and `REGNAMESPACE` types are all
in the oid type family, but unlike `OID`, display `-` instead of `0` when
the value is zero. Previously, this was handled by setting the `name` field
of `DOid` to `-`, but the handling was incomplete. This commit removes the
old special cases, and instead adds special handling to the functions that
parse an oid from a string, as well as the `DOid.Format()` method. This way,
correctly formatting the "unknown" oid doesn't rely on remembering to set
the `name` field.

In addition, this commit modifies some of the random datum generation
logic to create types other than the canonical `T_oid`, like `T_regrole`.

Fixes cockroachdb#118273

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. P-2 Issues/test failures with a fix SLA of 3 months T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants