Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: dropping objects should work even if the objects are invalid #50651

Open
jordanlewis opened this issue Jun 25, 2020 · 10 comments
Open

sql: dropping objects should work even if the objects are invalid #50651

jordanlewis opened this issue Jun 25, 2020 · 10 comments
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@jordanlewis
Copy link
Member

We've begun to see several issues around invalid table descriptor state creeping into the database. This is very troublesome, and we of course would prefer to never have bugs like this in the first place. But, bugs happen. We need to make it easier for people to recover from such situations.

The first reaction of many users upon encountering a "broken seeming" table or object is to try to DROP that object. Our well-intentioned validate methods tend to prevent this from happening, since we try to validate the object before doing anything else.

This issue tracks modifying DROP to care less about validation errors. I think they should still be logged and reported to sentry, and possibly returned to the user as a WARNING, but they shouldn't block the DROP.

A good example of this kind of issue is #50649. The issue prevents the object from being dropped altogether. This is unreasonable - we should obey the user's request and drop the object. Otherwise, the object will enter an unrecoverable state.

@jordanlewis jordanlewis added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jun 25, 2020
@ajwerner
Copy link
Contributor

ajwerner commented Jul 6, 2020

After further discussion, it's probably not the case that we want to blindly drop tables which are arbitrarily invalid. Rather, we want to support dropping tables which are invalid because they reference objects which do not exist.

We recently had a bug #50587 in which renames would be stuck in an intermediate state. Both the old and new names would be effectively unusable. In this case, there were namespace entries for both the new and old name. The new name works in that case for SELECT but the old name still exists if anybody else tries to rename over it. In that case, one might want to be able to drop the old name. If we didn't do any sanity checking, we might drop data we don't intend to.

@ajwerner
Copy link
Contributor

ajwerner commented Jul 7, 2020

To make the above example more concrete, with some added back-and-forth renaming you can get in the below situation:

root@localhost:26257/defaultdb> SELECT * FROM foo;                           
pq: relation "foo" does not exist
root@localhost:26257/defaultdb> ALTER TABLE bar RENAME TO foo;
pq: relation "foo" already exists

Where you think you might want to drop table foo which is actually bar. A customer got into this situation and tried to type DROP TABLE foo and I'm pretty happy it did not work.

root@localhost:26257/defaultdb> SELECT * FROM system.namespace where id = 56;               
  parentID | name | id  
+----------+------+----+
        50 | bar  | 56  
        50 | foo  | 56  

@postamar
Copy link
Contributor

I believe this has been done.

@ajwerner
Copy link
Contributor

This has not been done. Maybe we should do something here. In particular, when there's a back-reference to something which does not exist, we may as well ignore it.

@ajwerner ajwerner reopened this May 27, 2022
@ajwerner
Copy link
Contributor

Relates to #72230.

@postamar
Copy link
Contributor

Perhaps what we really want is for the user to be able to disable validation altogether? This wasn't an option until recently because there was no way to disable validation-on-read while it was too deeply embedded in catalogkv. However, we're now in a place where descriptor reads are mediated by the collection, even if it's only to a minimal degree.

@postamar postamar self-assigned this May 31, 2022
@postamar postamar removed their assignment Sep 16, 2022
@postamar postamar self-assigned this Sep 20, 2022
@postamar
Copy link
Contributor

So, after discussing with @ajwerner here's the plan:

  • add a session variable that disables validations on read
  • in these cases we should be able to use the legacy schema changer with read and write validations disabled, verify this with a handful of logic test cases

postamar pushed a commit to postamar/cockroach that referenced this issue Oct 27, 2022
This commit removes the `sql.catalog.descs.validate_on_write.enabled`
cluster setting and replaces it with the `descriptor_validation` session
variable. The default value is 'on' which indicates that catalog
descriptors are to be validated when both read from- and written to
the system.descriptor table. Other possible values are 'off' which
disables validation entirely and 'read_only' which disables it for
writes.

Informs cockroachdb#50651.

Release note (sql change): added a new 'descriptor_validation' session
variable which can be set to 'read_only' or 'off' to disable descriptor
validation, which may be useful when mitigating or recovering from
catalog corruption.
postamar pushed a commit to postamar/cockroach that referenced this issue Oct 28, 2022
This commit removes the `sql.catalog.descs.validate_on_write.enabled`
cluster setting and replaces it with the `descriptor_validation` session
variable. The default value is 'on' which indicates that catalog
descriptors are to be validated when both read from- and written to
the system.descriptor table. Other possible values are 'off' which
disables validation entirely and 'read_only' which disables it for
writes.

This commit therefore plumbs the session data into the descs.Collection
object and for this purpose defines a new DescriptorSessionDataProvider
interface, which is implemented in a new catsessiondata package.

Informs cockroachdb#50651.

Release note (sql change): added a new 'descriptor_validation' session
variable which can be set to 'read_only' or 'off' to disable descriptor
validation, which may be useful when mitigating or recovering from
catalog corruption.
postamar pushed a commit to postamar/cockroach that referenced this issue Oct 28, 2022
This commit removes the `sql.catalog.descs.validate_on_write.enabled`
cluster setting and replaces it with the `descriptor_validation` session
variable. The default value is 'on' which indicates that catalog
descriptors are to be validated when both read from- and written to
the system.descriptor table. Other possible values are 'off' which
disables validation entirely and 'read_only' which disables it for
writes.

This commit therefore plumbs the session data into the descs.Collection
object and for this purpose defines a new DescriptorSessionDataProvider
interface, which is implemented in a new catsessiondata package.

Informs cockroachdb#50651.

Release note (sql change): added a new 'descriptor_validation' session
variable which can be set to 'read_only' or 'off' to disable descriptor
validation, which may be useful when mitigating or recovering from
catalog corruption.
craig bot pushed a commit that referenced this issue Oct 31, 2022
90488: sql,descs: add & adopt descriptor_validation session var r=postamar a=postamar

This commit removes the `sql.catalog.descs.validate_on_write.enabled` cluster setting and replaces it with the `descriptor_validation` session variable. The default value is 'on' which indicates that catalog descriptors are to be validated when both read from- and written to the system.descriptor table. Other possible values are 'off' which disables validation entirely and 'read_only' which disables it for writes.

Informs #50651.

Release note (sql change): added a new 'descriptor_validation' session variable which can be set to 'read_only' or 'off' to disable descriptor validation, which may be useful when mitigating or recovering from catalog corruption.

90862: ui: handle errors on db endpoints r=maryliag a=maryliag

Previously, when hitting an error on endpoints
used on the database page, we would just keep retrying constantly, without showing a proper error state.
On SQL Activity page, for example, we show the error message and let the user retry if they want.
This commit uses the same logic on the Database page. Since the pages make several requests and just part of them can fail, some of the pages we will still load, but give a warning about unavailable data and show the error message about reload option.

This commit also increases timeout of database endpoints.

Fixes #90596

<img width="636" alt="Screen Shot 2022-10-28 at 6 28 55 PM" src="https://user-images.githubusercontent.com/1017486/198745467-7833de82-4eac-41fe-85ef-5035f99b0f35.png">


<img width="866" alt="Screen Shot 2022-10-28 at 6 29 30 PM" src="https://user-images.githubusercontent.com/1017486/198745476-306ce1af-9476-4d47-9f9d-46c954e69ab8.png">


https://www.loom.com/share/edc46386a6fe408b90fa5b6330870819

Release note: None

90962: sql: fix newly introduced race r=ajwerner a=ajwerner

All of the goroutines were writing to the same `err` variable. This race was introduced in #89540.

Epic: None

Release note: None

Co-authored-by: Marius Posta <[email protected]>
Co-authored-by: maryliag <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
@postamar
Copy link
Contributor

I don't know what to do here. We are able to disable validation now but this only helps with being able to run DML queries without complaining about broken back-references which don't matter for queries anyway. However, when DROPping something (and it doesn't matter which schema changer flavour is involved) back-references do matter, and the only way to deal with broken back-references to achieve the result described in the issue description is:

  • to add a "permissive" mode to the descs.Collection so that it doesn't choke on the first error it encounters when querying descriptors in bulk, like via GetAllSchemasInDatabase etc.
  • to make both the legacy and declarative schema changers invoke this "permissive" mode somehow (not clear how? a session var?) and then to deal with the fact that the catalog API call may not have been successful.

This feels like a lot of work. It might actually be easier to do in the declarative schema changer, but still. I feel that having a builtin that repairs or removes broken cross-references would be more useful. Once we have this, we could look into calling that builtin transparently when a DROP throws a validation error for whatever reason and then retry.

@postamar
Copy link
Contributor

cc @dikshant

@postamar postamar removed their assignment Jan 30, 2023
@ajwerner
Copy link
Contributor

I'm not convinced that this has been coming as often recently. We're going to defer doing anything. Let's collect evidence that this is important. It's hard. The more we put into making the catalog solid, the less important this will be.

@exalate-issue-sync exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
Development

No branches or pull requests

4 participants