Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tabledesc: fix the referencing_descriptor_ids field in type descriptors #63161

Open
postamar opened this issue Apr 6, 2021 · 4 comments
Open
Labels
A-schema-descriptors Relating to SQL table/db descriptor handling. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@postamar
Copy link
Contributor

postamar commented Apr 6, 2021

There are a few issues related to ReferencingDescriptorIDs in type descriptors, all related to how these descriptor references behave differently than others, like foreign key references, etc.

  1. Although the name would suggest otherwise, ReferencingDescriptorIDs only contains table descriptor IDs. Either rename the field to reflect this, or also include database or type descriptor IDs as well.

  2. Unlike all other descriptor references, ReferencingDescriptorIDs contains not just direct references but also transitive references. Indeed, when a table descriptor goes through the schema changer machinery, we add its ID to all type descriptors whose IDs are in the return value of GetAllReferencedTypeIDs(), which includes the type descriptors' transitive closure. This is useful to quickly check whether a type can be dropped safely or not, but it is unlikely to scale well and has all kinds of ugly implications (what happens when we have record types as well as enum types? etc.). I strongly feel we should stick to direct references, along with providing some useful machinery in the descriptor collection or wherever in catalog to walk the relational graph to collect all transitive dependencies.

  3. Unlike all other descriptor references, there are no back-reference checks done on ReferencingDescriptorIDs when validating a table descriptor, for instance. Adding these is not so straightforward, since in some cases these back-references are not added in the same transaction but asynchronously in a schema changer job. Specifically, I tried adding back-reference checks and the CCL logic test suite alter_table_locality failed with the multiregion-9node-3region-3azs config due to a validation failure following a ROLLBACK.

To address these points I suggest we apply a similar treatment to this field as we did to the old foreign key representation:

  • deprecate the referencing_descriptor_ids field in the type descriptor proto;
  • introduce new fields in the protos (referencing_database_descriptor_ids, referencing_table_descriptor_ids, referencing_type_descriptor_ids) containing only direct references;
  • provide a schema migration to migrate from the old field to the new;
  • add an extra step in the RunPostDeserializationChanges type descriptor builder method implementation to do the same thing;
  • add a method to the descriptor collection which, given a type descriptor ID, builds the set of transitive dependencies for that type, by walking through the dependency graph, effectively rebuilding referencing_descriptor_ids;
  • add a generous quantity of test cases, especially involving backups, so that we can make this change with confidence.

Jira issue: CRDB-6452

Epic CRDB-31469

@postamar postamar added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-schema-descriptors Relating to SQL table/db descriptor handling. labels Apr 6, 2021
@jlinder jlinder added the T-sql-schema-deprecated Use T-sql-foundations instead label Jun 16, 2021
@arulajmani
Copy link
Collaborator

Indeed, when a table descriptor goes through the schema changer machinery, we add its ID to all type descriptors whose IDs are in the return value of GetAllReferencedTypeIDs(), which includes the type descriptors' transitive closure.

I was in the backup/restore area debugging something entirely different when I chanced upon an explanation for why this may have been designed the way it was. Consider the case when we're backing up a table that contains a user defined type. If this table is restored into a database which doesn't contain this type we also restore the type descriptor. Note that a type descriptor can't exist without a corresponding array type alias. As such, it looks like we elected to capture the array type descriptor as a "dependency". This if block

. is where all this happens.

FWIW, I don't think we necessarily need to rely on backing up the array type descriptor for this though, considering that the array type descriptor can be entirely derived from the actual enum type descriptor.

@postamar
Copy link
Contributor Author

Consider the case when we're backing up a table that contains a user defined type. If this table is restored into a database which doesn't contain this type we also restore the type descriptor. Note that a type descriptor can't exist without a corresponding array type alias. As such, it looks like we elected to capture the array type descriptor as a "dependency".

Thanks for the explanation! This design choice makes a lot more sense to me now.

@postamar
Copy link
Contributor Author

This just got pushed onto the critical path, if we're to implement DROP TYPE ... CASCADE correctly. Also bad things like #84305 keep happening because of this lack of validation, so we might as well do it sooner rather than later.

@postamar postamar self-assigned this Jul 12, 2022
@postamar postamar added branch-master Failures and bugs on the master branch. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Jul 13, 2022
@postamar postamar removed their assignment Aug 16, 2022
@postamar postamar removed branch-master Failures and bugs on the master branch. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Aug 16, 2022
@postamar postamar changed the title Fix the referencing_descriptor_ids field in type descriptors tabledesc: fix the referencing_descriptor_ids field in type descriptors Nov 22, 2022
@postamar postamar self-assigned this Jan 17, 2023
@postamar postamar removed their assignment Jan 30, 2023
@postamar
Copy link
Contributor Author

I've unassigned myself considering that I'm very unlikely to close this before going on leave.

@exalate-issue-sync exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-schema-descriptors Relating to SQL table/db descriptor handling. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
Development

No branches or pull requests

3 participants