-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
upgrades: assert descriptor repair has correct set of targets #112700
Conversation
OK I have pushed this to illustrate a confusion: in my I wanted to leave this comment just in case my test is just somehow incorrect. |
05b8453
to
1873b7c
Compare
@@ -56,6 +56,9 @@ func FirstUpgradeFromRelease( | |||
var batch catalog.DescriptorIDSet | |||
const batchSize = 1000 | |||
if err := all.ForEachDescriptor(func(desc catalog.Descriptor) error { | |||
if desc.GetName() == "foo" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it does seem correct that this line would be reached, but for any descriptor that is not corrupt, the check right after this one should cause the repair to short-circuit. i think that should be good enough -- even the invalid_objects
and repairable_catalog_corruptions
tables are populated by first scanning all descriptors, then filtering only for the ones that have issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - ok so I left this in here (probably should've commented about this earlier) because I only wanted to debug the not corrupt table "foo" & this was the way I saw to do this with skipping all of the system(?) descriptors. However, the check right after this one does not cause the repair to short-circuit :(
So my confusion is: is it my testing (in TestUpgradeNoCorruption) that is causing this issue? I shall make a loom tomorrow and send it to you to be more illustrative :^)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pairing up would be great!
2a64760
to
38d4419
Compare
OK! updated tests reflect new discoveries. wish there was a facepalm emote on github. the precondition check (FirstUpgradeFromReleasePrecondition) is actually what will decide whether or not a descriptor can be repaired (for now, we only repair kv_repairable_catalog_corruptions). tests now reflect that not-corrupt descriptors do not show up in kv_repairable_catalog_corruptions and will thus not get automatically repaired. here's something i found while writing said test: function descriptors get their version incremented regardless of 'repaired corruption". thus, if we are looking at a repairable function descriptor, we actually end up increasing its version twice. once during the repair in UnsafeUpsertDescriptor and another during upgrades.grantExecuteToPublicOnAllFunctions -> WriteDescToBatch. |
db8002d
to
62827f3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did we also need that change in first_upgrade.go
, where we'd skip descriptors where the only change is SetModTimeToMVCCTimestamp
?
Couple of notes:
This is also something that I looked at - it seems as if this check was no different than the one we had before (I think the fact that I was incorrectly testing in the same manner as FirstaUpgradeFromRelease lead us to the belief that it would be different). In the new tests, regardless of a corrupt or not-corrupt descriptor, we see that that SetMVCCTimestamp is the only post deserialization change to exist; so, if we still had this check, corrupt descriptors that were repaired will not get upgraded However, this still brings us back to the first problem we looked at - which is: this upgrade process seems to upgrade descriptors that haven't changed :( Is there any other way we can check other than PostDeserializationChanges? Maybe we can make note of a descriptor popping up in kv_repairable_catalog_preconditions (since it will probably be gone from this table once the precondition repairs it) and then have a membership check as the guard to upgrading? |
Which check from before are you referring to?
That seems off to me. The check for
Making the change to check |
"No different from the one we had before" is a bad way to put it - I meant "it is still not looking at the right group of descriptors"
Sorry, I think I am using my words too loosely - yes, they would still repair corruptions. I meant updated as in upgradeDescriptors - assuming that the same check that exists in FirstUpgradeFromRelease will be the same check that exists in upgradeDescriptors (I believe that these two checks should remain consistent, but I could be wrong)
I think that this belief is what is causing confusion? If we make the change to check +++ UPDATE: for anyone that is reading this/future me - this was an oversight in my reading of the code! I did not see that batch was the only set of descriptors that would have been called for upgradeDescriptors |
62827f3
to
8ef1930
Compare
This patch adds a test to assert that during automated repair of corrupt descriptors, we do not try to repair a descriptor that is not corrupted. Epic: none Fixes: cockroachdb#110906 Release note: None
8ef1930
to
64c1f6c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! nice work
Reviewed 2 of 4 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @annrpom)
TFTR! ('-')7 bors r=rafiss |
Build succeeded: |
blathers backport 23.2 |
This patch adds a test to assert that during automated repair of
corrupt descriptors, we do not try to repair a descriptor that
is not corrupted.
This includes an addition to the previous check for post-deserialization changes that we had for the upgrade logic. We found that during the automatic repair, the descriptor already gets updated in place during the unsafe_upsert_descriptor (in WriteDescToBatch) process and that the previous check was a bit redundant. Thus, the check benefits from being a bit stricter - short-circuiting when the only post-deserialization change is SetModTimeToMVCCTimestamp (not having to update descriptors if they have already been updated).
While testing, it was discovered that function descriptors (regardless of corruption/repair) had their descriptor version increased due to
grantExecuteToPublicOnAllFunctions
being called on each function during a cluster upgrade; however, we noticed that the functions we were testing already had execute privileges forpublic
. Thus, a check was added in said logic that ensures functions in this situation (where public already has execute priv. for the func) do not try to grant execute again.Epic: none
Fixes: #110906
Release note: None