Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jobs,*: read job payload and progress from the job_info table #98608

Merged
merged 1 commit into from
Mar 18, 2023

Conversation

adityamaru
Copy link
Contributor

@adityamaru adityamaru commented Mar 14, 2023

This change touches all the parts of the code that were
relying on the system.jobs table to read the payload and/or
progress corresponding to a job. If the cluster version has advanced
past V23_1JobInfoTableIsBackfilled, every job in the system.jobs table
has a payload and progress entry written to the job_info table.
This change migrates callers to use this new table when reading the
payload and progress of a job.

The most important changes are in the logic that drives crdb_internal.system_jobs,
which in turn drives crdb_internal.jobs and SHOW JOBS. Additionally,
there are changes in how the registry resolves the progress and payload
when loading or resuming a job.

Several tests that read the payload and progress from system.jobs now
rely on crdb_internal.system_jobs to fetch this information from either
the jobs table or job_info table depending on the cluster version.

Fixes: #97762

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@adityamaru adityamaru force-pushed the read-from-jobs-table branch 5 times, most recently from f904791 to 017c4ca Compare March 16, 2023 03:41
@adityamaru adityamaru changed the title [WIP, DNM] jobs,*: read progress and payload from system.job_info jobs,*: read job payload and progress from the job_info table Mar 16, 2023
@adityamaru adityamaru requested review from miretskiy and dt March 16, 2023 03:42
@adityamaru adityamaru marked this pull request as ready for review March 16, 2023 03:43
@adityamaru adityamaru requested review from a team as code owners March 16, 2023 03:43
@adityamaru adityamaru requested a review from a team March 16, 2023 03:43
@adityamaru adityamaru requested a review from a team as a code owner March 16, 2023 03:43
@adityamaru adityamaru requested a review from a team March 16, 2023 03:43
@adityamaru adityamaru requested a review from a team as a code owner March 16, 2023 03:43
@adityamaru adityamaru requested review from a team, herkolategan and renatolabs and removed request for a team March 16, 2023 03:43
@adityamaru adityamaru force-pushed the read-from-jobs-table branch from 017c4ca to 741873f Compare March 16, 2023 13:30
Copy link
Contributor

@miretskiy miretskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really only have nits.

@@ -733,7 +733,8 @@ func TestChangefeedCursor(t *testing.T) {
// statement timestamp, so only verify this for enterprise.
if e, ok := fooLogical.(cdctest.EnterpriseTestFeed); ok {
var bytes []byte
sqlDB.QueryRow(t, `SELECT payload FROM system.jobs WHERE id=$1`, e.JobID()).Scan(&bytes)
sqlDB.QueryRow(t, `SELECT payload FROM "".crdb_internal.system_jobs WHERE id = $1`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious: why do we need "" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment to jobs_verification.go and changed most test places to use that query instead.

if err != nil {
return nil, nil, errors.Wrapf(err, "failed to get payload for job %d", j.ID())
}
if !exists {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we intend on making progress optional at some point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe. On job creation we expect the job record to have a valid, non-null progress and payload so unless the user manually deletes the progress for a job from system.job_info we will always have an entry. That said SHOW JOBS goes out of its way to handle the absence of a progress row for a job. So I dunno we should decide if having progress is mandatory or not and then apply that through the system.

pkg/jobs/adopt.go Show resolved Hide resolved
@adityamaru adityamaru force-pushed the read-from-jobs-table branch from 741873f to 73e06c7 Compare March 16, 2023 15:43
Copy link
Member

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@adityamaru adityamaru force-pushed the read-from-jobs-table branch 2 times, most recently from 3cde93b to e8c1a09 Compare March 17, 2023 13:08
This change touches all the parts of the code that were
relying on the `system.jobs` table to read the payload and/or
progress corresponding to a job. If the cluster version has advanced
past V23_1JobInfoTableIsBackfilled, every job in the system.jobs table
has a payload and progress entry written to the `job_info` table.
This change migrates callers to use this new table when reading the
payload and progress of a job.

The most important changes are in the logic that drives `crdb_internal.system_jobs`,
which in turn drives `crdb_internal.jobs` and `SHOW JOBS`. Additionally,
there are changes in how the registry resolves the progress and payload
when loading or resuming a job.

Several tests that read the payload and progress from `system.jobs` now
rely on `crdb_internal.system_jobs` to fetch this information from either
the jobs table or job_info table depending on the cluster version.

Fixes: cockroachdb#97762

Release note: None
@adityamaru
Copy link
Contributor Author

TestDecommission is a flake - #96630
The other stress failure is during test server startup: optInToDiagnosticsStatReporting: setting updated but timed out waiting to read new value so nothing seems wrong with the test per-se. Will investigate separately.

@adityamaru
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 17, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Mar 17, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Mar 18, 2023

Build failed:

@adityamaru
Copy link
Contributor Author

bors retry

@craig
Copy link
Contributor

craig bot commented Mar 18, 2023

Build succeeded:

@craig craig bot merged commit ad4463c into cockroachdb:master Mar 18, 2023
@adityamaru adityamaru deleted the read-from-jobs-table branch March 18, 2023 21:58
adityamaru added a commit to adityamaru/cockroach that referenced this pull request Mar 19, 2023
In cockroachdb#98608 some changes were made to the c2c roachtests
that caused them to break. This change fixes the tests
by querying the correct virutal table.

Fixes: cockroachdb#98973
Fixes: cockroachdb#98969
Fixes: cockroachdb#98962
Informs: cockroachdb#98669

Release note: None
adityamaru added a commit to adityamaru/cockroach that referenced this pull request Mar 22, 2023
In cockroachdb#98608 some changes were made to the c2c roachtests
that caused them to break. This change fixes the tests
by querying the correct virutal table.

Fixes: cockroachdb#98973
Fixes: cockroachdb#98969
Fixes: cockroachdb#98962
Informs: cockroachdb#98669

Release note: None
craig bot pushed a commit that referenced this pull request Mar 22, 2023
98981: roachtest: fix c2c roachtests that read job payload r=adityamaru a=adityamaru

In #98608 some changes were made to the c2c roachtests that caused them to break. This change fixes the tests by querying the correct virutal table that interacts with both `system.jobs` and `system.job_info`.

Fixes: #98973
Fixes: #98969
Fixes: #98962
Informs: #98669

Release note: None

99124: kvserver: don't write MinVersion in unit test r=erikgrinaker a=pavelkalinnikov

The checkpoint creation code already writes this file. Since this is a unit test, we are not in a situation that this file can not exist (e.g. mixed versions situation), thus we don't need to write this file again.

Epic: none
Release note: none

Co-authored-by: adityamaru <[email protected]>
Co-authored-by: Pavel Kalinnikov <[email protected]>
blathers-crl bot pushed a commit that referenced this pull request Mar 22, 2023
In #98608 some changes were made to the c2c roachtests
that caused them to break. This change fixes the tests
by querying the correct virutal table.

Fixes: #98973
Fixes: #98969
Fixes: #98962
Informs: #98669

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

jobs: migrate jobs to write their Payload and Progress to the jobs_info table
4 participants