-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: add descriptor validation on write #60552
sql: add descriptor validation on write #60552
Conversation
949c365
to
1e92a8c
Compare
86f7e05
to
fca1cfc
Compare
In the last 3 runs, CI failed on at least one instance of
for the following statement
I tried to repro the failure locally to no avail, trying different test configs, native build vs docker build, race detector enabled... The failure definitely seems related to this change though. Any ideas? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fun failure. I wonder what that's about.
One though I'm having is that perhaps this is something we want to be doing after statement execution too. It'd be weird if we allowed things which used to be invalid to happen during the statement but then caught them at commit time. Maybe this isn't such a big deal because we are doing ValidateSelf when writing the descriptor
Reviewed 28 of 28 files at r1, 5 of 5 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @postamar)
pkg/sql/catalog/descriptor.go, line 77 at r1 (raw file):
// Validate validates the descriptor. // Cross-reference checks are skipped when dg is nil.
nit: it's called descGetter
here.
pkg/sql/catalog/descs/collection.go, line 1415 at r1 (raw file):
var _ catalog.DescGetter = collectionDescGetter{} func (cdg collectionDescGetter) GetDesc(
I think that this is right but it's interesting. Namely, if we ever mucked with a descriptor and didn't cache it in the collection (just wrote it to the store), we'd have problems because this will use a leased descriptor.
pkg/sql/catalog/descs/collection.go, line 1439 at r1 (raw file):
Quoted 12 lines of code…
func (cdg collectionDescGetter) GetDescs( ctx context.Context, reqs []descpb.ID, ) ([]catalog.Descriptor, error) { ret := make([]catalog.Descriptor, len(reqs)) for i, id := range reqs { desc, err := cdg.GetDesc(ctx, id) if err != nil { return nil, err } ret[i] = desc } return ret, nil
I'm feeling somewhat sad about the need to do this. I'm thinking we should just have a function in catalog called GetDescs
that takes a DescGetter
and then tries to type assert it into a batched getter. This is all to enable an optimization where we can batched fetch descriptors in some cases. It's not obviously worth it.
pkg/sql/catalog/schemadesc/schema_desc.go, line 183 at r2 (raw file):
} // Validate punts to ValidateSelf.
This is an odd comment given it does other stuff too
fca1cfc
to
4792c59
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review and thanks for helping me look into that test failure earlier today, much appreciated.
With regards to your comment regarding per-statement validation, I believe that that's effectively already the case today. Apparently the primary index of a table can be disabled during a transaction as long as it's re-enabled come commit time. Bit of an edge case though and this doesn't mean we shouldn't be validating after executing individual statements like you suggest.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)
pkg/sql/catalog/descriptor.go, line 77 at r1 (raw file):
Previously, ajwerner wrote…
nit: it's called
descGetter
here.
Done.
pkg/sql/catalog/descs/collection.go, line 1415 at r1 (raw file):
Previously, ajwerner wrote…
I think that this is right but it's interesting. Namely, if we ever mucked with a descriptor and didn't cache it in the collection (just wrote it to the store), we'd have problems because this will use a leased descriptor.
Agreed.
pkg/sql/catalog/descs/collection.go, line 1439 at r1 (raw file):
Previously, ajwerner wrote…
func (cdg collectionDescGetter) GetDescs( ctx context.Context, reqs []descpb.ID, ) ([]catalog.Descriptor, error) { ret := make([]catalog.Descriptor, len(reqs)) for i, id := range reqs { desc, err := cdg.GetDesc(ctx, id) if err != nil { return nil, err } ret[i] = desc } return ret, nil
I'm feeling somewhat sad about the need to do this. I'm thinking we should just have a function in catalog called
GetDescs
that takes aDescGetter
and then tries to type assert it into a batched getter. This is all to enable an optimization where we can batched fetch descriptors in some cases. It's not obviously worth it.
Agreed. I did something along those lines.
pkg/sql/catalog/schemadesc/schema_desc.go, line 183 at r2 (raw file):
Previously, ajwerner wrote…
This is an odd comment given it does other stuff too
Fixed.
da5d121
to
977f456
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! mostly just the suggestion and making sure your test completes (that failure looks odd)
Reviewed 4 of 11 files at r3, 2 of 5 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, and @postamar)
pkg/sql/conn_executor_exec.go, line 751 at r5 (raw file):
} if err := ex.extraTxnState.descCollection.ValidateUncommittedDescriptors(ctx, ex.state.mu.txn); err != nil {
❤️
pkg/sql/catalog/desc_getter.go, line 28 at r5 (raw file):
Quoted 5 lines of code…
GetDescs(ctx context.Context, reqs []descpb.ID) ([]Descriptor, error) } // GetDescsSequential is a helper function for implementing the GetDescs // method of DescGetter as a sequence of GetDesc calls. func GetDescsSequential(
Can we take this one further and do something like this:
// DescGetter is an interface to retrieve descriptors. In general the interface
// is used to look up other descriptors during validation.
type DescGetter interface {
GetDesc(ctx context.Context, id descpb.ID) (Descriptor, error)
}
// BatchDescGetter is an interface to retrieve descriptors in a batch. It can be implemented by
// DescGetter to improve the performance of the GetDescs() function.
type BatchDescGetter interface {
GetDescs(ctx context.Context, reqs []descpb.ID) ([]Descriptor, error)
}
// GetDescs is a helper to retrieve multiple descriptors using a DescGetter. If dg implements
// BatchDescGetter, it will be utilized.
func GetDescs(ctx context.Context, dg DescGetter, reqs []descpb.ID) ([]Descriptor, error) {
if bdg, ok := dg.(BatchDescGetter); ok {
return bdg.GetDescs(ctx, reqs)
}
ret := make([]Descriptor, len(reqs))
for i, id := range reqs {
desc, err := descGetter.GetDesc(ctx, id)
if err != nil {
return nil, err
}
ret[i] = desc
}
return ret, nil
}
2bbe7d3
to
8b67cc9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'm hoping it should be good now.
The failure was caused by me checking the error result of the SET CLUSTER SETTING statement. Looking at the rest of the codebase it's common to just ignore that error so I "fixed" this failure by doing the same. Looking more closely at where the error comes from it seems we just wait 10 seconds before calling it quits: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/set_cluster_setting.go#L336
Perhaps that part of the code could do with a bit of love? Should I file an issue?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @miretskiy)
pkg/sql/catalog/desc_getter.go, line 28 at r5 (raw file):
Previously, ajwerner wrote…
GetDescs(ctx context.Context, reqs []descpb.ID) ([]Descriptor, error) } // GetDescsSequential is a helper function for implementing the GetDescs // method of DescGetter as a sequence of GetDesc calls. func GetDescsSequential(
Can we take this one further and do something like this:
// DescGetter is an interface to retrieve descriptors. In general the interface // is used to look up other descriptors during validation. type DescGetter interface { GetDesc(ctx context.Context, id descpb.ID) (Descriptor, error) } // BatchDescGetter is an interface to retrieve descriptors in a batch. It can be implemented by // DescGetter to improve the performance of the GetDescs() function. type BatchDescGetter interface { GetDescs(ctx context.Context, reqs []descpb.ID) ([]Descriptor, error) } // GetDescs is a helper to retrieve multiple descriptors using a DescGetter. If dg implements // BatchDescGetter, it will be utilized. func GetDescs(ctx context.Context, dg DescGetter, reqs []descpb.ID) ([]Descriptor, error) { if bdg, ok := dg.(BatchDescGetter); ok { return bdg.GetDescs(ctx, reqs) } ret := make([]Descriptor, len(reqs)) for i, id := range reqs { desc, err := descGetter.GetDesc(ctx, id) if err != nil { return nil, err } ret[i] = desc } return ret, nil }
Done. Sorry I misunderstood you earlier, I thought you wanted to type switch on the interface implementation. Still not completely used to go's duck typing I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand the settings thing you're saying but otherwise
Perhaps that part of the code could do with a bit of love? Should I file an issue?
Yes
Reviewed 1 of 11 files at r3, 12 of 12 files at r6, 5 of 5 files at r7, 1 of 1 files at r8.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @miretskiy and @postamar)
pkg/sql/schema_changer.go, line 420 at r6 (raw file):
if err := WaitToUpdateLeases(ctx, sc.leaseMgr, ancestor.TableID); err != nil { return err }
nit: accumulate the set of tables and call WaitToUpdateLeasesMultiple
?
pkg/sql/catalog/desc_getter.go, line 27 at r6 (raw file):
// batchDescGetter is like DescGetter but retrieves batches of descriptors, // which for some implementation may make more sense performance-wise. type batchDescGetter interface {
style nit: I think it's valuable to export this as a well to tell an implementer of DescGetter that there's an extension which can be implemented.
pkg/sql/tests/repair_test.go, line 252 at r6 (raw file):
setup := func(t *testing.T) (serverutils.TestServerInterface, *gosql.DB, func()) { s, db, _ := serverutils.StartServer(t, base.TestServerArgs{}) _, _ = db.Exec(`SET CLUSTER SETTING sql.catalog.descs.validate_on_write.enabled = false`)
wait, why does this ever fail?
8b67cc9
to
b40f4a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still confused about the failing setting thing but
Reviewed 5 of 11 files at r9, 5 of 5 files at r10, 1 of 1 files at r11.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @miretskiy and @postamar)
In the tests we usually don't check that a SET CLUSTER SETTING statement has effectively updated the setting value in the whole cluster via gossip. What happens is that these statements wait for 10 seconds then return an error In the test case in this PR we set validate_on_write to true and this sometimes triggers this error under stress. I'm guessing that the reason it's no big deal is that the setting value still gets changed locally and unit tests such as this one run on single node clusters anyway so it doesn't matter that the gossip still hasn't done its thing. |
I forgot to mention |
b40f4a4
to
772f0fb
Compare
@ajwerner following your suggestion re. SET CLUSTER SETTINGS I rebased on master and |
Previously, we didn't systematically validate descriptors when they were written. Furthermore, there existed no common method to validate descriptors across all descriptor subtypes This commit adds three methods to the catalog.Descriptor interface: 1. ValidateSelf ( context.Context ) error 2. Validate ( context.Context, catalog.DescGetter ) error 3. ValidateTxnCommit ( context.Context, catalog.DescGetter) error Each performs a subset the checks performed by the next. ValidateSelf contains all checks which can be performed in isolation, Validate also performs all those involving DescGetters (i.e. cross-reference checks) and ValidateTxnCommit also performs checks which should only be done at commit-time. An example of the latter is checking that a table has a primary key: dropping the PK is allowed within a transaction as long as a new PK is subsequently provided before committing. This commit adds new validation calls when writing descriptors: 1. ValidateSelf is called prior to Collection adding a descriptor Put to a kv.Batch. At this point, we want descriptors to be at least internally-consistent, both to catch validation errors early and because it's not possible to do cross-reference checking at this point (changes on FKs for instance involve multiple descriptors). 2. ValidateTxnCommit is called on the descs.Collection's uncommitted descriptors when the corresponding txn is about to commit, just prior to the two-version-invariant check. The Collection itself is used as a catalog.DescGetter. These validations may be disabled using a new cluster setting: sql.catalog.descs.validate_on_write.enabled Setting this to false makes it possible to corrupt the descriptor state using the crdb_internal.unsafe_* functions. Release note: None
Previously, the Validate() method on catalog.SchemaDescriptor did nothing. This commit adds validation checks that verify that the parentID exists and that that database descriptor contains the correct entry in its `schemas` mapping. Fixes cockroachdb#53439. Release note: None
Previously this method populated a couple of maps but didn't actually do anything with them besides that. Release note: None
772f0fb
to
bd0e202
Compare
bors r+ |
This PR was included in a batch that was canceled, it will be automatically retried |
Build failed (retrying...): |
Build succeeded: |
Previously, we didn't systematically validate descriptors when they were
written. Furthermore, there existed no common method to validate
descriptors across all descriptor subtypes
This commit adds three methods to the catalog.Descriptor interface:
Each performs a subset the checks performed by the next. ValidateSelf
contains all checks which can be performed in isolation, Validate also
performs all those involving DescGetters (i.e. cross-reference checks)
and ValidateTxnCommit also performs checks which should only be done at
commit-time. An example of the latter is checking that a table has
a primary key: dropping the PK is allowed within a transaction as long
as a new PK is subsequently provided before committing.
This commit adds new validation calls when writing descriptors:
to a kv.Batch. At this point, we want descriptors to be at least
internally-consistent, both to catch validation errors early and
because it's not possible to do cross-reference checking at this
point (changes on FKs for instance involve multiple descriptors).
descriptors when the corresponding txn is about to commit, just
prior to the two-version-invariant check.
These validations may be disabled using a new cluster setting:
sql.catalog.descs.validate_on_write.enabled
Setting this to false makes it possible to corrupt the descriptor state
using the crdb_internal.unsafe_* functions.
Release note: None