Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: schemachange/mixed-versions failed #99590

Closed
cockroach-teamcity opened this issue Mar 26, 2023 · 1 comment · Fixed by #99665
Closed

roachtest: schemachange/mixed-versions failed #99590

cockroach-teamcity opened this issue Mar 26, 2023 · 1 comment · Fixed by #99665
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Mar 26, 2023

roachtest.schemachange/mixed-versions failed with artifacts on master @ d107217dac5d817cc115bc0e97b7e53c0f2878bf:

test artifacts and logs in: /artifacts/schemachange/mixed-versions/run_1
(versionupgrade.go:341).func1: pq: operation "show cluster setting version" timed out after 2m0s (given timeout 2m0s): value differs between local setting ([18 8 8 22 16 2 24 0 32 83]) and KV ([18 8 8 22 16 2 24 0 32 82]); try again later (<nil> after 1m59.655305888s)

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

/cc @cockroachdb/sql-schema

This test on roachdash | Improve this report!

Jira issue: CRDB-25958

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Mar 26, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Mar 26, 2023
@blathers-crl blathers-crl bot added the T-sql-schema-deprecated Use T-sql-foundations instead label Mar 26, 2023
@cockroach-teamcity
Copy link
Member Author

roachtest.schemachange/mixed-versions failed with artifacts on master @ 2bd2c806ab3044569b09e0a205b5bc0452ad4e2b:

test artifacts and logs in: /artifacts/schemachange/mixed-versions/run_1
(versionupgrade.go:341).func1: pq: operation "show cluster setting version" timed out after 2m0s (given timeout 2m0s): value differs between local setting ([18 8 8 22 16 2 24 0 32 83]) and KV ([18 8 8 22 16 2 24 0 32 82]); try again later (<nil> after 1m59.121819727s)

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

craig bot pushed a commit that referenced this issue Mar 27, 2023
99458: jobs,*: stop writing payload and progress to system.jobs r=adityamaru a=adityamaru

This change introduces a cluster version after which the
payload and progress of a job will not be written to the
system.jobs table. This will ensure that the system.job_info
table is the single, source of truth for these two pieces of
information.

This cluster version has an associated upgrade that schema changes
the `payload` column of the `system.jobs` table to be nullable,
thereby allowing us to stop writing to it. This upgrade step
is necessary for a future patch where we will drop the payload
and progress columns. Without this intermediate upgrade step the
`ALTER TABLE ... DROP COLUMN` upgrade job will attempt to write
to dropped columns as part of its execution thereby failing to
run the upgrade.

Informs: #97762

Release note: None

99543: server: fix flaky drain test under race r=AlexTalks a=AlexTalks

While previously `TestDrain` would issue a drain request twice, and expect that after the second drain request there would be no remaining leases, we have seen in some race builds that a lease extension can occur before that second drain, leaving one lease remaining after the second drain request. This can be seen in the following log example:
```
I230325 00:39:18.151604 14728 1@server/drain.go:145 ⋮ [T1,n1] 383  drain request received with doDrain = true, shutdown = false
...
I230325 00:39:18.155547 986 kv/kvserver/replica_proposal.go:272 ⋮ [T1,n1,s1,r51/1:‹/Table/5{0-1}›,raft] 385  new range lease repl=(n1,s1):1 seq=1 start=0,0 exp=1679704764.152223164,0 pro=1679704758.152223164,0 following repl=(n1,s1):1 seq=1 start=0,0 exp=1679704746.135729956,0 pro=1679704740.135729956,0
I230325 00:39:18.172450 14728 1@server/drain.go:399 ⋮ [T1,n1] 386  (DEBUG) initiating kvserver node drain
I230325 00:39:18.172613 14728 1@kv/kvserver/store.go:1559 ⋮ [T1,drain,n1,s1] 387  (DEBUG) store marked as draining
I230325 00:39:18.182123 14728 1@server/drain.go:293 ⋮ [T1,n1] 388  drain remaining: 1
I230325 00:39:18.182249 14728 1@server/drain.go:295 ⋮ [T1,n1] 389  drain details: range lease iterations: 1
I230325 00:39:18.182404 14728 1@server/drain.go:175 ⋮ [T1,n1] 390  drain request completed without server shutdown
```
This change modifies the test to repeatedly issue drain requests until there is no remaining work, allowing the drain to complete upon subsequent requests.

Fixes: #86974

Release note: None

99665: sql/gc_job,sqlerrors: make GC job robust to missing descriptors r=fqazi a=ajwerner

### sql: do not drop table descriptor independently if we're in drop schema

If we have dropped schema IDs, we know that this is not an individual drop table
schema change. We only have more than one dropped table when we drop a database
or a schema. Before this change, we'd drop the table on its own, and then create
another GC job to drop all the tables. This is not actually a bug because we
should be robust to this, but it's also bad.

### sql/gc_job,sqlerrors: make GC job robust to missing descriptors

The check used for missing descriptors became incorrect in the course of
#94695. That change updated
the underlying error code used in getters by the GC job. The GC job would
subsequently retry forever when the descriptor was missing. This bug
has not been shipped yet, so not writing a release note.

Fixes: #99590

Release note (bug fix): DROP SCHEMA ... CASCADE could create multiple
GC jobs: one for every table and one for the cascaded drop itself. This has
been fixed.


Co-authored-by: adityamaru <[email protected]>
Co-authored-by: Alex Sarkesian <[email protected]>
Co-authored-by: ajwerner <[email protected]>
@craig craig bot closed this as completed in fb2a3cc Mar 27, 2023
blathers-crl bot pushed a commit that referenced this issue Mar 27, 2023
The check used for missing descriptors became incorrect in the course of
#94695. That change updated
the underlying error code used in getters by the GC job. The GC job would
subsequently retry forever when the descriptor was missing. This bug
has not been shipped yet, so not writing a release note.

Fixes: #99590

Release note: None
ajwerner added a commit to ajwerner/cockroach that referenced this issue Mar 27, 2023
The check used for missing descriptors became incorrect in the course of
cockroachdb#94695. That change updated
the underlying error code used in getters by the GC job. The GC job would
subsequently retry forever when the descriptor was missing. This bug
has not been shipped yet, so not writing a release note.

Fixes: cockroachdb#99590

Release note: None
rharding6373 pushed a commit to rharding6373/cockroach that referenced this issue Mar 27, 2023
The check used for missing descriptors became incorrect in the course of
cockroachdb#94695. That change updated
the underlying error code used in getters by the GC job. The GC job would
subsequently retry forever when the descriptor was missing. This bug
has not been shipped yet, so not writing a release note.

Fixes: cockroachdb#99590

Release note: None
aadityasondhi pushed a commit to aadityasondhi/cockroach that referenced this issue Mar 28, 2023
The check used for missing descriptors became incorrect in the course of
cockroachdb#94695. That change updated
the underlying error code used in getters by the GC job. The GC job would
subsequently retry forever when the descriptor was missing. This bug
has not been shipped yet, so not writing a release note.

Fixes: cockroachdb#99590

Release note: None
@exalate-issue-sync exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant