Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jobs,*: stop writing payload and progress to system.jobs #99458

Merged
merged 1 commit into from
Mar 27, 2023

Conversation

adityamaru
Copy link
Contributor

@adityamaru adityamaru commented Mar 24, 2023

This change introduces a cluster version after which the
payload and progress of a job will not be written to the
system.jobs table. This will ensure that the system.job_info
table is the single, source of truth for these two pieces of
information.

This cluster version has an associated upgrade that schema changes
the payload column of the system.jobs table to be nullable,
thereby allowing us to stop writing to it. This upgrade step
is necessary for a future patch where we will drop the payload
and progress columns. Without this intermediate upgrade step the
ALTER TABLE ... DROP COLUMN upgrade job will attempt to write
to dropped columns as part of its execution thereby failing to
run the upgrade.

Informs: #97762

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@adityamaru adityamaru force-pushed the cleanup-missed-jobs-queries branch from 22d8a48 to 2fa8f74 Compare March 24, 2023 14:26
@@ -1,4 +1,4 @@
# LogicTest: default-configs local-mixed-22.2-23.1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @fqazi this test used to directly mutates the payload column in system.jobs. In a local-mixed-22.2-23.1 the job_info table does not exist until the relevant upgrades have been run. Is it true that under this mixed configuration this test will always run at V22_2 i.e. without running any of the V23_1 upgrades? In that case I can keep both the queries and conditionally run them based on the config the test is being run under.

@adityamaru adityamaru force-pushed the cleanup-missed-jobs-queries branch from 2fa8f74 to 22d23d0 Compare March 24, 2023 14:33
@adityamaru adityamaru requested review from miretskiy and dt March 24, 2023 14:33
@adityamaru adityamaru marked this pull request as ready for review March 24, 2023 14:33
@adityamaru adityamaru requested review from a team as code owners March 24, 2023 14:33
@adityamaru adityamaru requested a review from a team March 24, 2023 14:33
@adityamaru adityamaru requested review from a team as code owners March 24, 2023 14:33
@adityamaru adityamaru requested a review from a team March 24, 2023 14:33
@adityamaru adityamaru requested a review from a team as a code owner March 24, 2023 14:33
@adityamaru adityamaru requested a review from a team March 24, 2023 14:33
@adityamaru adityamaru requested a review from a team as a code owner March 24, 2023 14:33
@adityamaru adityamaru requested review from herkolategan, smg260 and rharding6373 and removed request for a team, herkolategan, smg260 and rharding6373 March 24, 2023 14:33
@adityamaru adityamaru added the backport-23.1.x Flags PRs that need to be backported to 23.1 label Mar 24, 2023
@@ -756,7 +756,7 @@ SELECT id, status
payload,
false -- emit_defaults
) AS pl
FROM system.jobs
FROM crdb_internal.system_jobs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a concise explanation on who (what packages) should be allowed to access system.jobs? Is it only underlying jobs package (and maybe tests)? Should there be a lint to warn about
system.jobs usage instead of crdb_internal.system_jobs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we'd have everyone going through accessors but that cat is out of the bag for system.jobs. Hopefully, we'll be a little stricter about directly reaching into job_info in non-test code. As far as a linter I think it'll become much clearer once these columns are dropped at which point there is no option but to go to the job_info table for payload and progress.

@adityamaru adityamaru force-pushed the cleanup-missed-jobs-queries branch from 22d23d0 to 5d89703 Compare March 24, 2023 16:11
@adityamaru adityamaru requested a review from a team as a code owner March 24, 2023 16:11
Copy link
Member

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it was exceedingly tedious. 🙌

@adityamaru
Copy link
Contributor Author

TFTR!

bors r=miretskiy,dt

@craig
Copy link
Contributor

craig bot commented Mar 24, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Mar 24, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Mar 25, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Mar 25, 2023

Merge conflict.

@adityamaru adityamaru force-pushed the cleanup-missed-jobs-queries branch from 5d89703 to 3b1024b Compare March 25, 2023 17:28
@adityamaru
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 25, 2023

Build failed:

@adityamaru
Copy link
Contributor Author

bors retry

@craig
Copy link
Contributor

craig bot commented Mar 26, 2023

Build failed:

@adityamaru
Copy link
Contributor Author

The failures are all INFO: slow quiesce. stack traces: but surprisingly none of the stacks are the job registry so this looks like a new manifestation of the slow quiesce problem.

This change introduces a cluster version after which the
payload and progress of a job will not be written to the
system.jobs table. This will ensure that the system.job_info
table is the single, source of truth for these two pieces of
information.

This cluster version has an associated upgrade that schema changes
the `payload` column of the `system.jobs` table to be nullable,
thereby allowing us to stop writing to it. This upgrade step
is necessary for a future patch where we will drop the payload
and progress columns. Without this intermediate upgrade step the
`ALTER TABLE ... DROP COLUMN` upgrade job will attempt to write
to dropped columns as part of its execution thereby failing to
run the upgrade.

Informs: cockroachdb#97762

Release note: None
@adityamaru adityamaru force-pushed the cleanup-missed-jobs-queries branch from 3b1024b to 5d9b97a Compare March 26, 2023 13:06
@adityamaru
Copy link
Contributor Author

let's try again

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 27, 2023

Build succeeded:

@craig craig bot merged commit 97fee54 into cockroachdb:master Mar 27, 2023
@blathers-crl
Copy link

blathers-crl bot commented Mar 27, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 5d9b97a to blathers/backport-release-23.1-99458: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.1.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants