Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/catalog: provide batched access to descriptors #64388

Closed
ajwerner opened this issue Apr 29, 2021 · 2 comments
Closed

sql/catalog: provide batched access to descriptors #64388

ajwerner opened this issue Apr 29, 2021 · 2 comments
Labels
A-schema-descriptors Relating to SQL table/db descriptor handling. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@ajwerner
Copy link
Contributor

ajwerner commented Apr 29, 2021

Is your feature request related to a problem? Please describe.

Today we do not have any APIs to retrieve a set of descriptors in parallel. There are some good reasons for that. We don't currently permit any concurrency in the *kv.Txn library (#60968). For cases where we may lease descriptors or where we may be looking things up by name, there's likely too much complexity to tell any sort of reasonable story. Fortunately, that's a motivating use case here. There are cases where we know a bunch of IDs which we'd like to look up ahead of time (GRANT, REVOKE, DROP) and we know in these cases we don't care about caching.

Describe the solution you'd like

Add a method to the descs.Collection to fetch a bunch of descriptors in a single batch. Almost certainly this means implementing catalog.BatchDescGetter on the *descs.Collection. The challenge will be to optimize the hydration and validation. It should all be workable.

Additional context

This is one of two remaining work items to fix the linear runtime of GRANT *, REVOKE * (#41930).

Epic: CRDB-8577

@ajwerner ajwerner added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-schema-descriptors Relating to SQL table/db descriptor handling. labels Apr 29, 2021
@jlinder jlinder added the T-sql-schema-deprecated Use T-sql-foundations instead label Jun 16, 2021
@postamar
Copy link
Contributor

Complement to the definition of done:

  1. The Implementation BatchDescGetter on the descs.Collection should be decorated with comments explaining that these methods are not the preferred way of accessing descriptors outside of some specific and well-defined defined use cases. Personally I'm almost tempted to guard them with a linter check of some sort (especially the DescGetter methods!) but that's just me.
  2. Hydration in this context means type hydration. Fetching a table descriptor on its own from kv and deserializing it will produce a working catalog.TableDescriptor interface but it still might need extra hydration (a.k.a. decoration) to be useful. Table descriptors may however have columns which refer to user-defined types, and these types have descriptors, which should be used to hydrate the column descriptors. Notably, to set their *types.T pointers to the correct reference.
  3. Validation is in a similar vein: reading and deserializing a table descriptor allows us to make sure it's valid (a.k.a its values make sense) at an internal level but one can't make any such claims about any references to other descriptors. This is why we have several validation levels in catalog.Validate(). Descriptors which are returned by the descs.Collection are validated at least at the catalog.ValidationLevelCrossReferences level, and at that level the Validate function, in turn, requires a DescGetter to fetch the referred descriptors by ID. This is a bit of a chicken-and-egg problem which we get around with using either the oneLevelUncachedDescGetter or MapDescGetter implementations.
  4. The implementation should come with unit tests that verify the correct implementation of BatchDescGetter in descs.Collection. Concretely, I suppose this means firing up a test cluster and populating it with a reasonably complex set of descriptors by running a series of CREATE statements, then exercising the methods and asserting on their return values.

@postamar
Copy link
Contributor

@ajwerner and I went over this some more yesterday and came to the conclusion that it was better NOT to implement the BatchDescGetter interface on descs.Collection. We realized that this was taking the catalog layer in a direction we didn't like. Also we realized some of our assumptions regarding GRANT/REVOKE were wrong:

  • We already write the mutated descriptors in a batch, so that's done in constant time already.
  • Following the addition of batched job creation, we still have two round-trips per descriptor read; after looking into this we realized this was caused by the cross-reference validation in the catalogkv package. Basically, every time we read a table descriptor, we validate it, the validation sees a reference to the parent database, this triggers a read of the database descriptor. This is dumb when doing a GRANT/REVOKE on ON * because the tables will all have the same parent database so we really only need to read it once.

Consequently we need to redefine the work tracked in this ticket. I'll do this shortly. It'll probably involve adding another public function to catalogkv for batch mutable descriptor reads. It's not what we want in the long term, ideally the catalogkv descriptor access functions would only be called by descs.Collection and nothing else, but we couldn't come up with anything better in the short term and we'd quite like to get rid of this linear-time behaviour of GRANT/REVOKE.

postamar pushed a commit to postamar/cockroach that referenced this issue Feb 3, 2022
Previously the descs.Collection's descriptor retrievals were
done one at a time, outside of some special cases. This commit replaces
the getDescriptorByID method at the heart of these lookups with
a batched equivalent, getDescriptorsByID.

Existing functionality remains unchanged.

Fixes cockroachdb#64388.

Release note: None
postamar pushed a commit to postamar/cockroach that referenced this issue Feb 3, 2022
Previously the descs.Collection's descriptor retrievals were
done one at a time, outside of some special cases. This commit replaces
the getDescriptorByID method at the heart of these lookups with
a batched equivalent, getDescriptorsByID.

Existing functionality remains unchanged.

Fixes cockroachdb#64388.

Release note: None
@craig craig bot closed this as completed in 9259a2e Feb 4, 2022
@healthy-pod healthy-pod added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-schema-descriptors Relating to SQL table/db descriptor handling. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
Development

No branches or pull requests

4 participants