Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

catalog: restrict scope of descriptor validation on write #68915

Closed

Conversation

postamar
Copy link
Contributor

Previously, we performed descriptor validation at txn commit time for all
so-called "uncommitted descriptors" which includes descriptors which are
only read and remain unchanged. This commit excludes these from validation.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@postamar postamar force-pushed the restricted-validation-on-write branch from d5b6a70 to 8b743bf Compare August 13, 2021 21:52
@postamar postamar requested a review from ajwerner August 13, 2021 22:30
@postamar
Copy link
Contributor Author

@ajwerner this change shaves off a couple of roundtrips to KV by being more particular about which descriptors we validate at txn commit time. Is this worth it though? I honestly am not sure. Or perhaps we'd be better off altogether by replacing uncommittedDescriptors with something better? I'm curious to hear your thoughts.

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in general this is a worthwhile change but I'm not certain it needs to be as invasive as you're making it to achieve this goal. If you're willing to go in and make changes around here, there is a lot to be done to improve the situation.

Reviewed 4 of 13 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @postamar)


pkg/sql/catalog/descs/kv_descriptors.go, line 350 at r1 (raw file):

func (kd *kvDescriptors) validateUncommittedDescriptors(ctx context.Context, txn *kv.Txn) error {
	descs := make([]catalog.Descriptor, 0, kd.uncommittedDescriptors.Len())
	_ = kd.uncommittedDescriptors.IterateByID(func(descriptor catalog.NameEntry) error {

why didn't IsUncommittedVersion work here?

Something I worry about is the ability to create bugs where we modify in-memory a descriptor and then don't write it back to the store. I think what I want is to change what we store in uncommittedDescriptors to represent more than what we have here. It feels like there's 3 things here, the descriptor originally read from the store, the Immutable version we think currently lives in the store (which is updated when we actually write), and the Mutable version which has accumulated in-memory changes.

Previously, we performed descriptor validation at txn commit time for all
so-called "uncommitted descriptors" which includes descriptors which are
only read and remain unchanged. This commit excludes these from validation.

Release note: None
@postamar postamar force-pushed the restricted-validation-on-write branch from 3a2bf43 to 83140eb Compare August 16, 2021 21:07
@postamar postamar marked this pull request as ready for review August 16, 2021 23:12
@postamar postamar requested review from a team August 16, 2021 23:12
@postamar
Copy link
Contributor Author

postamar commented Aug 16, 2021

IsUncommittedVersion works perfectly fine, I just had a brain fart. This PR is ready for another look.


Sharing your broader concern, it seems like the only way to solve this is to

  1. force all descriptor writes to go through the Collection API; to ensure that store writes are reflected into in-memory descs;
  2. we somehow don't expose MutableDescriptors any more at all, to ensure the opposite.

Doing (1) is quite straightforward, in fact we've roadmapped it, but (2) is harder of course. It seems like the new schema changer will make (2) a lot easier. We could somehow force all Mutables which need to get "checked out" for an op to get "checked in" again. Or we could define a functional API on Collection for descriptor changes, something of the kind

func (tc *Collection) mutateTableDescriptor(id descpb.ID, fn func(desc catalog.TableDescriptor) catalog.TableDescriptor) error

I'm going to keep thinking about this.

@ajwerner
Copy link
Contributor

What if we consider the checkout point the first time the mutable descriptor is requested since start or last checkin and the checkin point is AddUncommittedDescriptor. I think without too much work we could add some bit to indicate whether an uncommitted descriptor is currently checked out. At transaction commit time we want to make sure all descriptors which have been checked out and not checked back in are identical to their current Immutable descriptor.

@postamar
Copy link
Contributor Author

I'd thought of that but what should we do for descriptors which weren't checked back in properly? This may happen in edge cases which we don't test for, so we probably shouldn't panic or anything, but just complaining in the logs feels weak.

@ajwerner
Copy link
Contributor

Not panic but return an error which will fail the transaction. I think it's worth a shot. Maybe we'll uncover some fun bugs and learn that that's more to do. It seems like a thing to at least put up in a PR. It also is on the path for unifying the uncommitted descriptors and all descriptors.

@postamar
Copy link
Contributor Author

I took a stab at what you suggested and again, it turns out to be too much work to be completed in time for this release cycle. I added this check-in/check-out mechanism and it fails all over the place. Possibly the most egregious source of failures is the ginormous for-loop for ALTER TABLEs.

Yet again, good things came out of this effort as a side-effect, which I will push as a separate commit in #69008 because it will fit in well there: I've moved uncommitted descriptors to their own layer alongside kvDescriptors instead of inside of them. I think it's a worthy refactor in and of itself.

@postamar postamar closed this Aug 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants