Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reapply "bootstrap: create an explicit zoneconfig for timeseries data" #128032

Merged
merged 2 commits into from
Aug 15, 2024

Conversation

rafiss
Copy link
Collaborator

@rafiss rafiss commented Jul 31, 2024

This partially reverts commit f9d47ce.

This time, rather than copying over the full zone config for the default range to the timeseries range, we only copy over gc.ttlseconds. This means that other attributes, most notably the replication factor, will be inherited from the default range.

This will make it more clear that the timeseries zone config can be changed independently from all the other zone configs.

fixes #123762

Release note (ops change): New clusters that are initialized for the first time will now have a zone config defined for the timeseries range. This zone config only specifies the gc.ttlseconds, so all the other attributes are inherited from the zone config of the default range, as they were before.

Clusters that are upgraded to v24.3 from a previous version will also
have this zone configuration applied to the timeseries range, as long as
that range does not already have a zone config.

@rafiss rafiss requested review from a team as code owners July 31, 2024 19:59
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rafiss rafiss force-pushed the reapply-timeseries-zone branch 2 times, most recently from f80cd42 to ce99667 Compare August 1, 2024 17:43
@rafiss rafiss requested a review from a team as a code owner August 1, 2024 17:43
@rafiss rafiss requested review from DarrylWong and renatolabs and removed request for a team August 1, 2024 17:43
@rafiss
Copy link
Collaborator Author

rafiss commented Aug 1, 2024

I started a roachtest run here to make sure the slow-drain/duration=1m0s test passes without needing any changes (last time #127618 was needed, which is undesirable).

https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_RoachtestNightlyGceBazel/16293779?hideProblemsFromDependencies=false&hideTestsFromDependencies=false


Edit: the test passed

Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: modulo comments.

@rafiss did you have thoughts on whether this change needs to come with a migration or not? I don't have strong feelings either way, but as is, this introduces a difference between clusters that are bootstrapped on 24.3 and clusters that were upgraded to it. cc @andrewbaptist maybe you have thoughts here as well.

Reviewed 12 of 12 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andrewbaptist, @DarrylWong, @rafiss, and @renatolabs)


pkg/cmd/roachtest/tests/gossip.go line 403 at r1 (raw file):

	run(`ALTER RANGE meta %[1]s CONFIGURE ZONE %[2]s 'constraints: {"-rack=0"}'`)
	run(`ALTER RANGE liveness %[1]s CONFIGURE ZONE %[2]s 'constraints: {"-rack=0"}'`)
	run(`ALTER RANGE timeseries %[1]s CONFIGURE ZONE %[2]s 'constraints: {"-rack=0"}'`)

Do we need this change? Would we expect RANGE DEFAULTs constraints to apply to timeseries as well?

Copy link
Collaborator Author

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you have thoughts on whether this change needs to come with a migration or not?

I felt like it was safer not to include a migration. If a customer has already configured the timeseries range (which we may very well have advised them to do while supporting them), then running an upgrade migration would not be safe, or at best, would be surprising since it would clear out any customizations they have made.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andrewbaptist, @arulajmani, @DarrylWong, and @renatolabs)


pkg/cmd/roachtest/tests/gossip.go line 403 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Do we need this change? Would we expect RANGE DEFAULTs constraints to apply to timeseries as well?

this change was needed to make this test pass.

for that matter, the line for TABLE system.public.replication_stats also is needed, even though the zone config for that table also does not specify any constraints and should inherit.

there must be something in this test setup that ends up requiring each zone config to be updated.

@rafiss rafiss requested a review from arulajmani August 7, 2024 19:40
Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a customer has already configured the timeseries range (which we may very well have advised them to do while supporting them), then running an upgrade migration would not be safe, or at best, would be surprising since it would clear out any customizations they have made.

I was imagining that the migration wouldn't modify anything if the timeseries range was unset; we'd only add this explicit entry in system.zones if there wasn't one already.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andrewbaptist, @DarrylWong, and @renatolabs)

@rafiss rafiss force-pushed the reapply-timeseries-zone branch from ce99667 to 79fb797 Compare August 13, 2024 21:25
@rafiss rafiss requested review from a team as code owners August 13, 2024 21:25
@rafiss rafiss force-pushed the reapply-timeseries-zone branch 2 times, most recently from c5644af to 717c58a Compare August 14, 2024 14:52
@rafiss rafiss requested a review from arulajmani August 14, 2024 14:52
@rafiss
Copy link
Collaborator Author

rafiss commented Aug 14, 2024

I was imagining that the migration wouldn't modify anything if the timeseries range was unset; we'd only add this explicit entry in system.zones if there wasn't one already.

I've added a migration that does this. PTAL.

@rafiss rafiss force-pushed the reapply-timeseries-zone branch from 717c58a to a08397a Compare August 14, 2024 15:54
Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, modulo question about backup/restore.

Reviewed 2 of 3 files at r2, 17 of 17 files at r4, 17 of 17 files at r5, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andrewbaptist, @DarrylWong, @rafiss, and @renatolabs)


pkg/upgrade/upgrades/upgrades.go line 107 at r5 (raw file):

		clusterversion.V24_3_AddTimeseriesZoneConfig.Version(),
		addTimeseriesZoneConfig,
		upgrade.RestoreActionNotRequired("this zone config isn't necessary for restore"),

I may be missing something, but I would have expected there to be a need to add this zone configuration on the restore path as well. My understanding was that we backup and restore zone configurations, so we'd need to do handle adding this zone configuration on the restore path as well.

Copy link
Collaborator Author

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andrewbaptist, @arulajmani, @DarrylWong, and @renatolabs)


pkg/upgrade/upgrades/upgrades.go line 107 at r5 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

I may be missing something, but I would have expected there to be a need to add this zone configuration on the restore path as well. My understanding was that we backup and restore zone configurations, so we'd need to do handle adding this zone configuration on the restore path as well.

i assumed we didn't do that during restores, since the current restore logic also does not add in zone configs for other ranges/objects that may not have zone configs in old versions. for example, it doesn't add in zone configs for the system database or other tables that have zone configs added at bootstrap: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/catalog/bootstrap/metadata.go#L537-L541

right now it just copies the zones configs from whatever is in the backup.

i can easily change it to add the timeseries zone config, if we want to treat it differently for some reason. let me know what you think makes sense.

Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Thanks for continuing to push on this!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andrewbaptist, @DarrylWong, @rafiss, and @renatolabs)


pkg/upgrade/upgrades/upgrades.go line 107 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

i assumed we didn't do that during restores, since the current restore logic also does not add in zone configs for other ranges/objects that may not have zone configs in old versions. for example, it doesn't add in zone configs for the system database or other tables that have zone configs added at bootstrap: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/catalog/bootstrap/metadata.go#L537-L541

right now it just copies the zones configs from whatever is in the backup.

i can easily change it to add the timeseries zone config, if we want to treat it differently for some reason. let me know what you think makes sense.

Ack. I don't think there's a good reason to treat this differently. I'm just surprised that's how the current restore logic behaves. Would you mind filing an issue with DR if you agree?

Copy link
Collaborator Author

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andrewbaptist, @arulajmani, @DarrylWong, and @renatolabs)


pkg/upgrade/upgrades/upgrades.go line 107 at r5 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Ack. I don't think there's a good reason to treat this differently. I'm just surprised that's how the current restore logic behaves. Would you mind filing an issue with DR if you agree?

sure thing! filed #129084

@rafiss rafiss force-pushed the reapply-timeseries-zone branch from a08397a to 3690cbb Compare August 15, 2024 20:28
rafiss added 2 commits August 15, 2024 16:58
This partially reverts commit f9d47ce.

This time, rather than copying over the full zone config for the default
range to the timeseries range, we only copy over gc.ttlseconds. This
means that other attributes, most notably the replication factor, will
be inherited from the default range.

This will make it more clear that the timeseries zone config can be
changed independently from all the other zone configs.

Release note (ops change): New clusters that are initialized for the
first time will now have a zone config defined for the `timeseries` range.
This zone config only specifies the gc.ttlseconds, so all the other
attributes are inherited from the zone config of the `default` range, as
they were before.

Clusters that are upgraded to v24.3 from a previous version will also
have this zone configuration applied to the timeseries range, as long as
that range does not already have a zone config.
When upgrading to 24.3, the upgrade will create an explicit zone
configuration for the timeseries range if one does not yet exist.
It will match the zone config that is created during cluster bootstrap.

The release note for this is in the previous commit.

Release note: None
@rafiss rafiss force-pushed the reapply-timeseries-zone branch from 3690cbb to 0ae91a1 Compare August 15, 2024 20:58
Copy link
Collaborator Author

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andrewbaptist, @arulajmani, @DarrylWong, and @renatolabs)


pkg/cmd/roachtest/tests/gossip.go line 403 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

this change was needed to make this test pass.

for that matter, the line for TABLE system.public.replication_stats also is needed, even though the zone config for that table also does not specify any constraints and should inherit.

there must be something in this test setup that ends up requiring each zone config to be updated.

actually, i tracked this down too -- by initializing the zone config with zonepb.NewZoneConfig() rather than with &zonepb.ZoneConfig{} in the bootstrap code, we get the proper inheritance, so this test change isn't needed after all.

@rafiss
Copy link
Collaborator Author

rafiss commented Aug 15, 2024

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Aug 15, 2024

Build failed (retrying...):

@craig craig bot merged commit ec66ae7 into cockroachdb:master Aug 15, 2024
22 of 23 checks passed
@rafiss rafiss deleted the reapply-timeseries-zone branch August 16, 2024 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: explicit zone configuration for the timeseries range on a new cluster
3 participants