-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jobs: set clusterID and clusterVersion during schedule creation #111374
jobs: set clusterID and clusterVersion during schedule creation #111374
Conversation
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
1b41a39
to
69654f4
Compare
08ea23a
to
a3cf71d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, just a few comments
"github.com/cockroachdb/errors" | ||
) | ||
|
||
func ensureSQLSchemaTelemetrySchedule( | ||
ctx context.Context, cs clusterversion.ClusterVersion, d upgrade.TenantDeps, | ||
) error { | ||
return d.DB.Txn(ctx, func(ctx context.Context, txn isql.Txn) error { | ||
// Since this Schedule Creation job is expected to fail before the schedule |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure I follow this, isn't this expected to fail only if a telemetry schedule exists? In other cases this is run as a migration to auto heal the cluster right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. I rephrased the comment.
pkg/jobs/scheduled_job.go
Outdated
if j.rec.ScheduleDetails.CreationClusterVersion.Equal(clusterversion.ClusterVersion{}) { | ||
return errors.New("scheduled job created without a cluster version") | ||
} | ||
if j.rec.ScheduleDetails.CreationClusterVersion.Less(clusterversion.ByKey(clusterversion.V23_2)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this strictly necessary? Won't older binaries just not have those fields on the proto they unmarshal the column into. What would make me feel confident here is a mixed version logic test that:
- creates a schedule on master and loads it on 23.1
- creates a schedule on 23.1 and loads it on master
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good callout. I added a mixed version test for your perusal.
I version gated the writing of the clusterID and version because in a mixed version state, older nodes could overwrite these new fields. e.g. ALTER BACKUP SCHEDULE ... SET SCHEDULE OPTION on_previous_running = 'wait';
from an old node in a mixed version state would overwrite these new fields. It just seems cleaner to me for these fields to only populate once we're fully upgraded.
pkg/jobs/scheduled_job_test.go
Outdated
// Init a mixed version schedule | ||
sjMixed := h.newScheduledJob(t, "mixed_test_job", "test sql") | ||
|
||
// In a mixed version state, clusterID and clusterVersion fields do not populate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm so this is not actually exercising the protoutil.Unmarshal
call that a pre-23.2 binary would make on a scheduled job created on 23.2+, which is why I don't think this is enough. This is simply testing that we zero out the fields which as my other comment indicates I'm not sure is needed. When an older version binary attempts to unmarshal protobuf bytes written by a newer version, the added fields will simply be ignored.
Can we maybe replace this with a mixed-version logictest instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mixed version test added. Given my above comment on version gating the writing of these fields, I still think this test case is necessary.
1cf827b
to
5a38df8
Compare
5a38df8
to
6b1fe64
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 27 of 27 files at r2, 26 of 26 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @adityamaru, @dt, @miretskiy, and @msbutler)
pkg/jobs/jobstest/utils.go
line 151 at r3 (raw file):
// DummyClusterVersion is used while instantiating dummy schedules var DummyClusterVersion = clusterversion.ClusterVersion{Version: clusterversion.ByKey(clusterversion.V23_2)}
Maybe just use TestingBinaryVersion here?
Release note: None
6b1fe64
to
bfefbe7
Compare
This patch adds two new fields to the ScheduleDetails proto, apart of the schedule record: ClusterID: the cluster that created the schedule. Note that in a future PR, we will allow the backup schedule to alter this field on resume. CreationClusterVersion: the cluster version the schedule was created on. This patch ensures that these fields must be set during schedule creation. In the future, schedules after RESTORE and C2C cutover will pause themselves to prevent conflicting with schedules potentially running on the backup/source cluster. Informs cockroachdb#108028 Release note: None
bfefbe7
to
4a9b05e
Compare
TFTRs!! bors r=adityamaru |
Build failed (retrying...): |
Build succeeded: |
upgrade: pass ClusterID to tenantDeps
Release note: None
jobs: set clusterID and clusterVersion during schedule creation
This patch adds two new fields to the ScheduleDetails proto, apart of the
schedule record:
we will allow the backup schedule to alter this field on resume.
This patch ensures that these fields must be set during schedule creation.
In the future, schedules after RESTORE and C2C cutover will pause themselves to
prevent conflicting with schedules potentially running on the backup/source
cluster.
Informs #108028
Release note: None