Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jobs: set clusterID and clusterVersion during schedule creation #111374

Merged

Conversation

msbutler
Copy link
Collaborator

@msbutler msbutler commented Sep 27, 2023

upgrade: pass ClusterID to tenantDeps
Release note: None


jobs: set clusterID and clusterVersion during schedule creation
This patch adds two new fields to the ScheduleDetails proto, apart of the
schedule record:

  • ClusterID: the cluster that created the schedule. Note that in a future PR,
    we will allow the backup schedule to alter this field on resume.
  • CreationClusterVersion: the cluster version the schedule was created on.

This patch ensures that these fields must be set during schedule creation.

In the future, schedules after RESTORE and C2C cutover will pause themselves to
prevent conflicting with schedules potentially running on the backup/source
cluster.

Informs #108028

Release note: None

@blathers-crl
Copy link

blathers-crl bot commented Sep 27, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@msbutler msbutler self-assigned this Sep 27, 2023
@msbutler msbutler force-pushed the butler-schedule-persist-cluster-id branch 2 times, most recently from 1b41a39 to 69654f4 Compare September 27, 2023 20:45
@msbutler msbutler changed the title first hack jobs: set clusterID and clusterVersion during schedule creation Sep 27, 2023
@msbutler msbutler force-pushed the butler-schedule-persist-cluster-id branch 9 times, most recently from 08ea23a to a3cf71d Compare September 29, 2023 15:25
@msbutler msbutler marked this pull request as ready for review September 29, 2023 17:36
@msbutler msbutler requested review from a team as code owners September 29, 2023 17:36
@msbutler msbutler requested review from a team, lidorcarmel, jayshrivastava, dt and miretskiy and removed request for a team, lidorcarmel and jayshrivastava September 29, 2023 17:36
Copy link
Contributor

@adityamaru adityamaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, just a few comments

pkg/jobs/jobspb/schedule.proto Show resolved Hide resolved
"github.com/cockroachdb/errors"
)

func ensureSQLSchemaTelemetrySchedule(
ctx context.Context, cs clusterversion.ClusterVersion, d upgrade.TenantDeps,
) error {
return d.DB.Txn(ctx, func(ctx context.Context, txn isql.Txn) error {
// Since this Schedule Creation job is expected to fail before the schedule
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure I follow this, isn't this expected to fail only if a telemetry schedule exists? In other cases this is run as a migration to auto heal the cluster right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. I rephrased the comment.

if j.rec.ScheduleDetails.CreationClusterVersion.Equal(clusterversion.ClusterVersion{}) {
return errors.New("scheduled job created without a cluster version")
}
if j.rec.ScheduleDetails.CreationClusterVersion.Less(clusterversion.ByKey(clusterversion.V23_2)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this strictly necessary? Won't older binaries just not have those fields on the proto they unmarshal the column into. What would make me feel confident here is a mixed version logic test that:

  • creates a schedule on master and loads it on 23.1
  • creates a schedule on 23.1 and loads it on master

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good callout. I added a mixed version test for your perusal.

I version gated the writing of the clusterID and version because in a mixed version state, older nodes could overwrite these new fields. e.g. ALTER BACKUP SCHEDULE ... SET SCHEDULE OPTION on_previous_running = 'wait'; from an old node in a mixed version state would overwrite these new fields. It just seems cleaner to me for these fields to only populate once we're fully upgraded.

// Init a mixed version schedule
sjMixed := h.newScheduledJob(t, "mixed_test_job", "test sql")

// In a mixed version state, clusterID and clusterVersion fields do not populate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm so this is not actually exercising the protoutil.Unmarshal call that a pre-23.2 binary would make on a scheduled job created on 23.2+, which is why I don't think this is enough. This is simply testing that we zero out the fields which as my other comment indicates I'm not sure is needed. When an older version binary attempts to unmarshal protobuf bytes written by a newer version, the added fields will simply be ignored.

Can we maybe replace this with a mixed-version logictest instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mixed version test added. Given my above comment on version gating the writing of these fields, I still think this test case is necessary.

@msbutler msbutler force-pushed the butler-schedule-persist-cluster-id branch 4 times, most recently from 1cf827b to 5a38df8 Compare October 2, 2023 16:28
@msbutler msbutler force-pushed the butler-schedule-persist-cluster-id branch from 5a38df8 to 6b1fe64 Compare October 3, 2023 15:48
Copy link
Collaborator

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: , just one minor comment

Reviewed 27 of 27 files at r2, 26 of 26 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @adityamaru, @dt, @miretskiy, and @msbutler)


pkg/jobs/jobstest/utils.go line 151 at r3 (raw file):

// DummyClusterVersion is used while instantiating dummy schedules
var DummyClusterVersion = clusterversion.ClusterVersion{Version: clusterversion.ByKey(clusterversion.V23_2)}

Maybe just use TestingBinaryVersion here?

@msbutler msbutler force-pushed the butler-schedule-persist-cluster-id branch from 6b1fe64 to bfefbe7 Compare October 4, 2023 18:33
@msbutler msbutler requested review from a team as code owners October 4, 2023 18:33
@msbutler msbutler requested review from herkolategan and smg260 and removed request for a team October 4, 2023 18:33
This patch adds two new fields to the ScheduleDetails proto, apart of the
schedule record:

ClusterID: the cluster that created the schedule. Note that in a future PR,
we will allow the backup schedule to alter this field on resume.
CreationClusterVersion: the cluster version the schedule was created on.
This patch ensures that these fields must be set during schedule creation.

In the future, schedules after RESTORE and C2C cutover will pause themselves to
prevent conflicting with schedules potentially running on the backup/source
cluster.

Informs cockroachdb#108028

Release note: None
@msbutler msbutler force-pushed the butler-schedule-persist-cluster-id branch from bfefbe7 to 4a9b05e Compare October 4, 2023 18:35
@msbutler
Copy link
Collaborator Author

msbutler commented Oct 4, 2023

TFTRs!!

bors r=adityamaru

@craig
Copy link
Contributor

craig bot commented Oct 4, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Oct 4, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants