Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add omdb db validate subcommands #4376

Merged
merged 11 commits into from
Nov 20, 2023
Merged

Conversation

jmpesp
Copy link
Contributor

@jmpesp jmpesp commented Oct 27, 2023

In order to diagnose if customer-support#57 is caused by not having the fixes for omicron#3866, add a few commands to omdb to validate the contents of the database:

$ omdb db validate --help
Validate the contents of the database

Usage: omdb db validate <COMMAND>

Commands:
  validate-volume-references     Validate each `volume_references` column in the region snapshots table
  find-deleted-region-snapshots  Find region snapshots that the corresponding Crucible agent says were deleted
  help                           Print this message or the help of the given subcommand(s)

Options:
  -h, --help  Print help

This commit also adds an environment variable OMDB_FETCH_LIMIT, which overrides the default fetch limit.

In order to diagnose if customer-support#57 is caused by _not_ having
the fixes for omicron#3866, add a few commands to omdb to validate the
contents of the database:

    $ omdb db validate --help
    Validate the contents of the database

    Usage: omdb db validate <COMMAND>

    Commands:
      validate-volume-references     Validate each `volume_references` column in the region snapshots table
      find-deleted-region-snapshots  Find region snapshots that the corresponding Crucible agent says were deleted
      help                           Print this message or the help of the given subcommand(s)

    Options:
      -h, --help  Print help

This commit also adds an environment variable OMDB_FETCH_LIMIT, which
overrides the default fetch limit.
@jmpesp jmpesp requested a review from leftwo October 27, 2023 18:18
Copy link
Contributor

@leftwo leftwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it (and we need it) Just some small suggestions

@@ -342,6 +370,21 @@ impl DbArgs {
cmd_db_eips(&opctx, &datastore, self.fetch_limit, *verbose)
.await
}
DbCommands::Validate(ValidateArgs {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for keeping these in ABC order.. Looks like either myself or someone else got S, I, and N mixed up.

.await?
.transaction_async(|conn| async move {
// Selecting all region snapshots requires a full table scan
conn.batch_execute_async(ALLOW_FULL_TABLE_SCAN_SQL).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL
But, is this allowing database crimes that otherwise might not be allowed?
We should ping @davepacheco , the DB supreme court justice, for deciding what constitutes a
database crime.


if matching_volumes != region_snapshot.volume_references as usize {
eprintln!(
"region snapshot {} {} {} has {} volume references when it should be {}!",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could improve this output to take fewer columns, and give us more info.
Right now it looks like:

support@oxz_switch:/tmp$ ./omdb-james-2 db validate validate-volume-references
region snapshot 02453b64-2f3e-400d-a1f3-bbc514030ac3 6364b41e-9a48-4f5b-a2ab-214ef30580a6 bf2b8cfd-bdf0-4f45-855a-a8df335b5722 has 1 volume references when it should be 2!
region snapshot 0725ca78-4538-4017-ab73-54f20884148d 3c6a347d-47e7-4738-8f27-bbaef9018a78 991412d9-1f0f-4765-9020-c8b1941ba63b has 0 volume references when it should be 21!
region snapshot 10318161-88e3-4082-96d2-f8f741746332 4b979c30-4b23-4f3b-b950-5118276b8e19 88d97ec1-1abc-41ed-908b-8616740f1b4c has 1 volume references when it should be 6!

What if instead it looked like:

DATASET_ID                           REGION_ID                            SNAPSHOT_ID                    EXPECTED_REFS FOUND_REFS
02453b64-2f3e-400d-a1f3-bbc514030ac3 6364b41e-9a48-4f5b-a2ab-214ef30580a6 bf2b8cfd-bdf0-4f45-855a-a8df335b5722 2             1
0725ca78-4538-4017-ab73-54f20884148d 3c6a347d-47e7-4738-8f27-bbaef9018a78 991412d9-1f0f-4765-9020-c8b1941ba63b 21            0
10318161-88e3-4082-96d2-f8f741746332 4b979c30-4b23-4f3b-b950-5118276b8e19 88d97ec1-1abc-41ed-908b-8616740f1b4c 6             1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, done in b3f44e4

// before the snapshot's volume is inserted into the DB. There's a
// time between these operations that this function would flag that
// this region snapshot should have `deleting` set to true.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay, in the for loop you are printing two different things... maybe the same header, but add a DELETING column so we know the difference between the two possible outputs? That's probably better than looping through the table twice, though that might be better to look at... LMKWYT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed that output in b3f44e4

.await?
.transaction_async(|conn| async move {
// Selecting all datasets and region snapshots requires a full table scan
conn.batch_execute_async(ALLOW_FULL_TABLE_SCAN_SQL).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR record for FULL_TABLE_SCAN 🏆

// resource records.

eprintln!(
"region snapshot {} {} {} at {} was deleted, please remove its record",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one could use a header for DATASET REGION SNAPSHOT or some such to describe these three.

Seeing as the pattern in repeated below, but just the error changes, you could probably always print the triplet, then just customize the message depending on what the error was.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, changed in 8b12005

@jmpesp jmpesp merged commit 7f42bc9 into oxidecomputer:main Nov 20, 2023
16 checks passed
@jmpesp jmpesp deleted the omdb_db_validate branch November 20, 2023 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants