Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade: automatically bump the system DB version #123084

Merged

Conversation

rafiss
Copy link
Collaborator

@rafiss rafiss commented Apr 25, 2024

Rather than making authors of an upgrade remember that they should bump
the version of the system DB schema, now it will be bumped automatically
after any upgrade step that has a migration or if upgrading to the final
cluster version for a release. Two tests are added:

  • TestSystemDatabaseSchemaBootstrapVersionBumped makes sure the
    SystemDatabaseSchemaBootstrapVersion is incremented whenever
    clusterversion.Latest changes AND if that version has an upgrade
    migration or is the final version in a release.
  • TestUpgradeHappensAfterMigrations is augmented to check that after all
    upgrades run, the actual system database schema version is equal to
    the SystemDatabaseSchemaBootstrapVersion.

Taken together, these tests ensure that the system database bootstrap
version is manually incremented in the code when it should be, and that
the automatic upgrade process correctly increments the version.

This change results in more work, since it means that the version will
be bumped when it is not strictly necessary to do so. But since it
makes it harder to forget a manual step, it seems worth doing.

fixes #121914
Release note: None

@rafiss rafiss requested a review from a team as a code owner April 25, 2024 19:53
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: @dt should take a look too in case we're missing something.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andyyang890, @fqazi, and @rafiss)


pkg/upgrade/upgrademanager/manager.go line 530 at r1 (raw file):

		}

		// Bump the version of the system database schema.

Should we do this above the interlock above? (that's what happened before, for the upgrades that did bump it)

Copy link
Collaborator

@andyyang890 andyyang890 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @fqazi and @rafiss)


pkg/sql/catalog/systemschema/system.go line 1212 at r1 (raw file):

func MakeSystemDatabaseDesc() catalog.DatabaseDescriptor {
	priv := privilege.List{privilege.CONNECT}
	latestVersion := clusterversion.Latest.Version()

If I recall correctly, I was thinking of doing something similar when I first added this field but someone (maybe it was @dt?) raised an objection over the fact that this would mean we'd need to regenerate the bootstrap testdata for every new version gate regardless of whether it changed the system schema. Is anyone still concerned by this?

@andyyang890 andyyang890 requested a review from dt April 26, 2024 06:04
Copy link
Collaborator

@andyyang890 andyyang890 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @dt, @fqazi, and @rafiss)


pkg/sql/catalog/systemschema/system.go line 1212 at r1 (raw file):

Previously, andyyang890 (Andy Yang) wrote…

If I recall correctly, I was thinking of doing something similar when I first added this field but someone (maybe it was @dt?) raised an objection over the fact that this would mean we'd need to regenerate the bootstrap testdata for every new version gate regardless of whether it changed the system schema. Is anyone still concerned by this?

I think it was this comment #111404 (comment).

Copy link
Collaborator Author

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @dt, @fqazi, and @RaduBerinde)


pkg/sql/catalog/systemschema/system.go line 1212 at r1 (raw file):

Previously, andyyang890 (Andy Yang) wrote…

I think it was this comment #111404 (comment).

i did a test locally and added a new clusterversion, and i did not need to regenerate the bootstrap data. i may not understand why though -- is it because the bootstrap data just has the contents of the system database, but not the system database descriptor itself?


pkg/upgrade/upgrademanager/manager.go line 530 at r1 (raw file):

Previously, RaduBerinde wrote…

Should we do this above the interlock above? (that's what happened before, for the upgrades that did bump it)

sgtm

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @dt and @fqazi)


pkg/sql/catalog/systemschema/system.go line 1212 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

i did a test locally and added a new clusterversion, and i did not need to regenerate the bootstrap data. i may not understand why though -- is it because the bootstrap data just has the contents of the system database, but not the system database descriptor itself?

The bootstrap data has all the KVs on a freshly bootstrapped cluster, it should include this version somewhere no? Otherwise how do we know what version we're reading..

Copy link
Collaborator

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 10 of 10 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @dt, @RaduBerinde, and @rafiss)


pkg/upgrade/upgrademanager/manager.go line 579 at r1 (raw file):

		}
		systemDBDesc.SystemDatabaseSchemaVersion = &cs.Version
		return txn.Descriptors().WriteDesc(ctx, false /* kvTrace */, systemDBDesc, txn.KV())

Just a general question should we worry about long running transactions potentially holding up the upgrade? We need to bump the system database descriptor version, so leases need to expire on the system database. Not sure if we want to make that operation high priority.

Copy link
Collaborator Author

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @dt, @fqazi, and @RaduBerinde)


pkg/upgrade/upgrademanager/manager.go line 579 at r1 (raw file):

Previously, fqazi (Faizan Qazi) wrote…

Just a general question should we worry about long running transactions potentially holding up the upgrade? We need to bump the system database descriptor version, so leases need to expire on the system database. Not sure if we want to make that operation high priority.

that's a fair point. that question would also apply to all the upgrades themselves (example) that modify the system DB right?

Copy link
Collaborator

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @dt, @RaduBerinde, and @rafiss)


pkg/upgrade/upgrademanager/manager.go line 579 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

that's a fair point. that question would also apply to all the upgrades themselves (example) that modify the system DB right?

Yeah ,those have the same problem too

Copy link
Collaborator Author

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @dt and @RaduBerinde)


pkg/sql/catalog/systemschema/system.go line 1212 at r1 (raw file):

Previously, RaduBerinde wrote…

The bootstrap data has all the KVs on a freshly bootstrapped cluster, it should include this version somewhere no? Otherwise how do we know what version we're reading..

yeah, i would expect that reasoning to hold too. still trying to figure out why TestInitialValuesToString is not affected

i also realized that some of the (yet-to-be-seen) churn can be reduced with:

		// Run the actual upgrade, if any.
		mig, exists := m.GetUpgrade(clusterVersion)
		if exists {
			if err := m.runMigration(ctx, mig, user, clusterVersion, !m.knobs.DontUseJobs); err != nil {
				return err
			}
		}

		// Bump the version of the system database schema.
		if exists || (clusterVersion.Equal(clusterversion.Latest.Version()) && clusterVersion.IsFinal()) {
			if err := bumpSystemDatabaseSchemaVersion(ctx, cv, m.deps.DB); err != nil {
				return err
			}
		}

Copy link
Collaborator Author

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @dt and @RaduBerinde)


pkg/sql/catalog/systemschema/system.go line 1212 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

yeah, i would expect that reasoning to hold too. still trying to figure out why TestInitialValuesToString is not affected

i also realized that some of the (yet-to-be-seen) churn can be reduced with:

		// Run the actual upgrade, if any.
		mig, exists := m.GetUpgrade(clusterVersion)
		if exists {
			if err := m.runMigration(ctx, mig, user, clusterVersion, !m.knobs.DontUseJobs); err != nil {
				return err
			}
		}

		// Bump the version of the system database schema.
		if exists || (clusterVersion.Equal(clusterversion.Latest.Version()) && clusterVersion.IsFinal()) {
			if err := bumpSystemDatabaseSchemaVersion(ctx, cv, m.deps.DB); err != nil {
				return err
			}
		}

i think there was a PEBKAC earlier; TestInitialValuesToString does churn when i try this now.

and i also realized that my if exists || (clusterVersion.Equal(clusterversion.Latest.Version()) && clusterVersion.IsFinal()) comment is irrelevant to this part of the code, since here we are talking about bootstrapping, not upgrading.

so what do folks think now: should i avoid the churn, and make this PR only include the test proposed in #121914 (and avoid modifying MakeSystemDatabaseDesc())?

@rafiss rafiss force-pushed the test-system-database-version-upgrade branch from a1c32e4 to aa47896 Compare April 29, 2024 21:15
Copy link
Collaborator Author

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @dt, @fqazi, and @RaduBerinde)


pkg/sql/catalog/systemschema/system.go line 1212 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

i think there was a PEBKAC earlier; TestInitialValuesToString does churn when i try this now.

and i also realized that my if exists || (clusterVersion.Equal(clusterversion.Latest.Version()) && clusterVersion.IsFinal()) comment is irrelevant to this part of the code, since here we are talking about bootstrapping, not upgrading.

so what do folks think now: should i avoid the churn, and make this PR only include the test proposed in #121914 (and avoid modifying MakeSystemDatabaseDesc())?

i gave that option a shot. PTAL at the latest revision.

@rafiss rafiss requested review from fqazi and andyyang890 April 29, 2024 21:16
Copy link
Collaborator

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 5 of 5 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @andyyang890, @dt, @RaduBerinde, and @rafiss)

@rafiss rafiss force-pushed the test-system-database-version-upgrade branch 6 times, most recently from 57cdc4c to 5cdbde5 Compare May 2, 2024 18:53
Rather than making authors of an upgrade remember that they should bump
the version of the system DB schema, now it will be bumped automatically
after any upgrade step that has a migration or if upgrading to the final
cluster version for a release. Two tests are added:

- TestSystemDatabaseSchemaBootstrapVersionBumped makes sure the
  SystemDatabaseSchemaBootstrapVersion is incremented whenever
  clusterversion.Latest changes AND if that version has an upgrade
  migration or is the final version in a release.
- TestUpgradeHappensAfterMigrations is augmented to check that after all
  upgrades run, the actual system database schema version is equal to
  the SystemDatabaseSchemaBootstrapVersion.

Taken together, these tests ensure that the system database bootstrap
version is manually incremented in the code when it should be, and that
the automatic upgrade process correctly increments the version.

This change results in more work, since it means that the version will
be bumped when it is not strictly necessary to do so. But since it
makes it harder to forget a manual step, it seems worth doing.

Release note: None
@rafiss rafiss force-pushed the test-system-database-version-upgrade branch from 5cdbde5 to 614ce3e Compare May 2, 2024 21:18
Copy link
Collaborator Author

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

@craig craig bot merged commit 6ff7eca into cockroachdb:master May 3, 2024
22 checks passed
@rafiss rafiss deleted the test-system-database-version-upgrade branch May 7, 2024 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: add a test that verifies that the system database schema version is correct after upgrade
5 participants