-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: dropping objects should work even if the objects are invalid #50651
Comments
After further discussion, it's probably not the case that we want to blindly drop tables which are arbitrarily invalid. Rather, we want to support dropping tables which are invalid because they reference objects which do not exist. We recently had a bug #50587 in which renames would be stuck in an intermediate state. Both the old and new names would be effectively unusable. In this case, there were namespace entries for both the new and old name. The new name works in that case for |
To make the above example more concrete, with some added back-and-forth renaming you can get in the below situation:
Where you think you might want to drop table
|
I believe this has been done. |
This has not been done. Maybe we should do something here. In particular, when there's a back-reference to something which does not exist, we may as well ignore it. |
Relates to #72230. |
Perhaps what we really want is for the user to be able to disable validation altogether? This wasn't an option until recently because there was no way to disable validation-on-read while it was too deeply embedded in |
So, after discussing with @ajwerner here's the plan:
|
This commit removes the `sql.catalog.descs.validate_on_write.enabled` cluster setting and replaces it with the `descriptor_validation` session variable. The default value is 'on' which indicates that catalog descriptors are to be validated when both read from- and written to the system.descriptor table. Other possible values are 'off' which disables validation entirely and 'read_only' which disables it for writes. Informs cockroachdb#50651. Release note (sql change): added a new 'descriptor_validation' session variable which can be set to 'read_only' or 'off' to disable descriptor validation, which may be useful when mitigating or recovering from catalog corruption.
This commit removes the `sql.catalog.descs.validate_on_write.enabled` cluster setting and replaces it with the `descriptor_validation` session variable. The default value is 'on' which indicates that catalog descriptors are to be validated when both read from- and written to the system.descriptor table. Other possible values are 'off' which disables validation entirely and 'read_only' which disables it for writes. This commit therefore plumbs the session data into the descs.Collection object and for this purpose defines a new DescriptorSessionDataProvider interface, which is implemented in a new catsessiondata package. Informs cockroachdb#50651. Release note (sql change): added a new 'descriptor_validation' session variable which can be set to 'read_only' or 'off' to disable descriptor validation, which may be useful when mitigating or recovering from catalog corruption.
This commit removes the `sql.catalog.descs.validate_on_write.enabled` cluster setting and replaces it with the `descriptor_validation` session variable. The default value is 'on' which indicates that catalog descriptors are to be validated when both read from- and written to the system.descriptor table. Other possible values are 'off' which disables validation entirely and 'read_only' which disables it for writes. This commit therefore plumbs the session data into the descs.Collection object and for this purpose defines a new DescriptorSessionDataProvider interface, which is implemented in a new catsessiondata package. Informs cockroachdb#50651. Release note (sql change): added a new 'descriptor_validation' session variable which can be set to 'read_only' or 'off' to disable descriptor validation, which may be useful when mitigating or recovering from catalog corruption.
90488: sql,descs: add & adopt descriptor_validation session var r=postamar a=postamar This commit removes the `sql.catalog.descs.validate_on_write.enabled` cluster setting and replaces it with the `descriptor_validation` session variable. The default value is 'on' which indicates that catalog descriptors are to be validated when both read from- and written to the system.descriptor table. Other possible values are 'off' which disables validation entirely and 'read_only' which disables it for writes. Informs #50651. Release note (sql change): added a new 'descriptor_validation' session variable which can be set to 'read_only' or 'off' to disable descriptor validation, which may be useful when mitigating or recovering from catalog corruption. 90862: ui: handle errors on db endpoints r=maryliag a=maryliag Previously, when hitting an error on endpoints used on the database page, we would just keep retrying constantly, without showing a proper error state. On SQL Activity page, for example, we show the error message and let the user retry if they want. This commit uses the same logic on the Database page. Since the pages make several requests and just part of them can fail, some of the pages we will still load, but give a warning about unavailable data and show the error message about reload option. This commit also increases timeout of database endpoints. Fixes #90596 <img width="636" alt="Screen Shot 2022-10-28 at 6 28 55 PM" src="https://user-images.githubusercontent.com/1017486/198745467-7833de82-4eac-41fe-85ef-5035f99b0f35.png"> <img width="866" alt="Screen Shot 2022-10-28 at 6 29 30 PM" src="https://user-images.githubusercontent.com/1017486/198745476-306ce1af-9476-4d47-9f9d-46c954e69ab8.png"> https://www.loom.com/share/edc46386a6fe408b90fa5b6330870819 Release note: None 90962: sql: fix newly introduced race r=ajwerner a=ajwerner All of the goroutines were writing to the same `err` variable. This race was introduced in #89540. Epic: None Release note: None Co-authored-by: Marius Posta <[email protected]> Co-authored-by: maryliag <[email protected]> Co-authored-by: Andrew Werner <[email protected]>
I don't know what to do here. We are able to disable validation now but this only helps with being able to run DML queries without complaining about broken back-references which don't matter for queries anyway. However, when DROPping something (and it doesn't matter which schema changer flavour is involved) back-references do matter, and the only way to deal with broken back-references to achieve the result described in the issue description is:
This feels like a lot of work. It might actually be easier to do in the declarative schema changer, but still. I feel that having a builtin that repairs or removes broken cross-references would be more useful. Once we have this, we could look into calling that builtin transparently when a DROP throws a validation error for whatever reason and then retry. |
cc @dikshant |
I'm not convinced that this has been coming as often recently. We're going to defer doing anything. Let's collect evidence that this is important. It's hard. The more we put into making the catalog solid, the less important this will be. |
We've begun to see several issues around invalid table descriptor state creeping into the database. This is very troublesome, and we of course would prefer to never have bugs like this in the first place. But, bugs happen. We need to make it easier for people to recover from such situations.
The first reaction of many users upon encountering a "broken seeming" table or object is to try to DROP that object. Our well-intentioned validate methods tend to prevent this from happening, since we try to validate the object before doing anything else.
This issue tracks modifying DROP to care less about validation errors. I think they should still be logged and reported to sentry, and possibly returned to the user as a WARNING, but they shouldn't block the DROP.
A good example of this kind of issue is #50649. The issue prevents the object from being dropped altogether. This is unreasonable - we should obey the user's request and drop the object. Otherwise, the object will enter an unrecoverable state.
The text was updated successfully, but these errors were encountered: