Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade: adding is_draining to system.sql_instance can fail #135737

Merged
merged 1 commit into from
Nov 20, 2024

Conversation

fqazi
Copy link
Collaborator

@fqazi fqazi commented Nov 19, 2024

Previously, the logic to add the is_draining column to the system.sql_instance descriptor would copy the bootstrap descriptor directly. While this works fine for non-multiregion system databases, this approach breaks for multi-region system databases. This is because bootstrap descriptors do not have multi-region modifications applied on top. To address this, this change modifies the upgrade to use ALTER TABLE ADD COLUMN.

Fixes: #135736

Release note: None

@fqazi fqazi added backport-24.3.x Flags PRs that need to be backported to 24.3 backport-24.3.0-rc labels Nov 19, 2024
@fqazi fqazi requested review from stevendanna and a team November 19, 2024 19:56
@fqazi fqazi requested review from a team as code owners November 19, 2024 19:56
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we write a test that tests this upgrade with a multiregion system DB?

) error {
finalDescriptor := systemschema.SQLInstancesTable()
// Replace the stored descriptor with the bootstrap descriptor.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this comment outdated now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@fqazi fqazi force-pushed the fixMigrationForSQLInstance branch from 50584f1 to c36ef72 Compare November 19, 2024 20:06
Copy link
Collaborator Author

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rafiss This PR reintroduces TestMrSystemDatabaseUpgrade which does that for MR system databases. 24.3 has the same test but for some reason it didn't use Latest as the final version.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rafiss and @stevendanna)

) error {
finalDescriptor := systemschema.SQLInstancesTable()
// Replace the stored descriptor with the bootstrap descriptor.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@fqazi fqazi requested a review from rafiss November 19, 2024 21:05
@rafiss rafiss requested a review from shubhamdhama November 19, 2024 21:44
@rafiss
Copy link
Collaborator

rafiss commented Nov 19, 2024

[w0] 2024/11/19 20:34:18 test_runner.go:964: destroying cluster local [tag:] (1 nodes) because: acceptance/validate-system-schema-after-version-upgrade (1) - (validate_system_schema_after_version_upgrade.go:167).validateSystemSchemaAfterUpgradeTest: After upgrading, `USE system; SHOW CREATE ALL TABLES;` does not match expected output after version upgrade for system tenant: diff:
@@ -305,13 +305,11 @@
 	locality JSONB NULL,
 	sql_addr STRING NULL,
 	crdb_region BYTES NOT NULL,
 	binary_version STRING NULL,
 	is_draining BOOL NULL,
-	CONSTRAINT "primary" PRIMARY KEY (crdb_region ASC, id ASC),
-	FAMILY "primary" (id, addr, session_id, locality, sql_addr, crdb_region, binary_version),
-	FAMILY fam_8_is_draining (is_draining)
+	CONSTRAINT "primary" PRIMARY KEY (crdb_region ASC, id ASC)
 );

looks like we need to teach the migration to add the column to the correct column family.

@fqazi fqazi force-pushed the fixMigrationForSQLInstance branch from c36ef72 to ae3184d Compare November 19, 2024 23:06
Copy link
Collaborator Author

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rafiss Should be fixed now, made a bad assumption that it would just pick the only family. Re-ran locally to confirm

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rafiss, @shubhamdhama, and @stevendanna)

Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this up!

Left some observations but no blocking changes.

mutableDesc.Version = version
return txn.Descriptors().WriteDesc(ctx, false, mutableDesc, txn.KV())
_, err := txn.Exec(ctx, "add-draining-column", txn.KV(),
`ALTER TABLE system.sql_instances ADD COLUMN IF NOT EXISTS is_draining BOOL NULL FAMILY "primary"`)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: We have a helper function called migrateTable that does a lot of work that I sort of assume the schema changer already does for us. I suppose we only need that when we want to be idempotent based on some property other than the existence of a column.

TIL: I'm rather surprised we had to specify family here since you wouldn't need to for an ordinary table. But it looks like we have a special case in allocateColumnFamilyIDs for the system db.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Its a bit safer to use that, so I switched the code over to it. The advantage of that one is that it waits for pending schema changes too (in case something crashes).

Comment on lines +585 to +588
// Disable license enforcement for this test.
for _, s := range cluster.Servers {
s.ExecutorConfig().(sql.ExecutorConfig).LicenseEnforcer.Disable(ctx)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: I'm guessing this might be needed outside ccl/multiregionccl package but not here, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the upgrades in 24.3 validates if a valid license is set, so this bypasses this for the upgrade testing. The other option was generating a license in the unit test, but that seems overkill.

@shubhamdhama
Copy link
Contributor

Also, thanks for taking over this.

Previously, the logic to add the is_draining column to the
system.sql_instance descriptor would copy the bootstrap descriptor
directly. While this works fine for non-multiregion system databases, this
approach breaks for multi-region system databases. This is because
bootstrap descriptors do not have multi-region modifications applied on
top. To address this, this change modifies the upgrade to use ALTER
TABLE ADD COLUMN.

Fixes: cockroachdb#135736

Release note: None
@fqazi fqazi force-pushed the fixMigrationForSQLInstance branch from ae3184d to 0509190 Compare November 20, 2024 14:15
Copy link
Collaborator Author

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rafiss, @shubhamdhama, and @stevendanna)

mutableDesc.Version = version
return txn.Descriptors().WriteDesc(ctx, false, mutableDesc, txn.KV())
_, err := txn.Exec(ctx, "add-draining-column", txn.KV(),
`ALTER TABLE system.sql_instances ADD COLUMN IF NOT EXISTS is_draining BOOL NULL FAMILY "primary"`)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Its a bit safer to use that, so I switched the code over to it. The advantage of that one is that it waits for pending schema changes too (in case something crashes).

Comment on lines +585 to +588
// Disable license enforcement for this test.
for _, s := range cluster.Servers {
s.ExecutorConfig().(sql.ExecutorConfig).LicenseEnforcer.Disable(ctx)
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the upgrades in 24.3 validates if a valid license is set, so this bypasses this for the upgrade testing. The other option was generating a license in the unit test, but that seems overkill.

@fqazi
Copy link
Collaborator Author

fqazi commented Nov 20, 2024

@rafiss @stevendanna @shubhamdhama TFTR!

bors r+

@craig craig bot merged commit 873e633 into cockroachdb:master Nov 20, 2024
22 of 23 checks passed
Copy link

blathers-crl bot commented Nov 20, 2024

Based on the specified backports for this PR, I applied new labels to the following linked issue(s). Please adjust the labels as needed to match the branches actually affected by the issue(s), including adding any known older branches.


Issue #135736: branch-release-24.3.0-rc.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Copy link

blathers-crl bot commented Nov 20, 2024

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 0509190 to blathers/backport-release-24.3-135737: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 24.3.x failed. See errors above.


error creating merge commit from 0509190 to blathers/backport-release-24.3.0-rc-135737: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 24.3.0-rc failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-24.3.x Flags PRs that need to be backported to 24.3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

upgrade: adding is_draining to sql_instances fails on MR system database
5 participants