-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: descriptor validation overhaul #60775
sql: descriptor validation overhaul #60775
Conversation
48e5b47
to
8066079
Compare
1647661
to
10e83e0
Compare
This is interesting. I wonder if we should have it return a single error but have the implementation join the errors using
👍
No, our contracts, when they exist, are pgcode oriented.
Ideally we'd have fully qualified names, but, of course, at this layer we can't. I think i'd prefer both a name and ID. How about |
2a0c424
to
f7d7844
Compare
This is ready for another look. |
@@ -128,9 +128,6 @@ type TestServerArgs struct { | |||
|
|||
// IF set, the demo login endpoint will be enabled. | |||
EnableDemoLoginEndpoint bool | |||
|
|||
// If set, testing specific descriptor validation will be disabled. even if the server | |||
DisableTestingDescriptorValidation bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I'm effectively reverting dd90439, that framework is not actually used in any way.
@@ -222,8 +221,7 @@ func (n *alterDatabaseAddRegionNode) startExec(params runParams) error { | |||
|
|||
// Validate the type descriptor after the changes. We have to do this explicitly here, because | |||
// we're using an internal call to addEnumValue above which doesn't perform validation. | |||
dg := catalogkv.NewOneLevelUncachedDescGetter(params.p.txn, params.ExecCfg().Codec) | |||
if err := typeDesc.Validate(params.ctx, dg); err != nil { | |||
if err := validateDescriptor(params.ctx, params.p, typeDesc); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: This convenience function is defined in planner.go: https://github.com/cockroachdb/cockroach/pull/60775/files#diff-d4a3b2d400fd963e130106b8942783b2abbf757310a2f3904ace7cc5dc4052bfR794f
if err := typeDesc.Validate(params.ctx, dg); err != nil { | ||
return err | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I believe this was made redundant by validation-on-write.
@@ -281,7 +280,7 @@ func (n *alterTableNode) startExec(params runParams) error { | |||
case *tree.CheckConstraintTableDef: | |||
var err error | |||
params.p.runWithOptions(resolveFlags{contextDatabaseID: n.tableDesc.ParentID}, func() { | |||
info, infoErr := n.tableDesc.GetConstraintInfo(params.ctx, nil) | |||
info, infoErr := n.tableDesc.GetConstraintInfo() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I changed the signatures of GetConstraintInfo and associated methods to cut down on passing nil DescGetters. https://github.com/cockroachdb/cockroach/pull/60775/files#diff-e971105bc05467ef391df9792202d3a8b712d269b5f193a0d94b700414ef281eL180
@@ -135,7 +134,7 @@ func (n *alterTableSetLocalityNode) alterTableLocalityGlobalToRegionalByTable( | |||
|
|||
// Finalize the alter by writing a new table descriptor and updating the zone | |||
// configuration. | |||
if err := n.validateAndWriteNewTableLocalityAndZoneConfig( | |||
if err := n.writeNewTableLocalityAndZoneConfig( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: validation made implicit by validate-on-write.
if err := n.desc.Validate(params.ctx, dg); err != nil { | ||
return err | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: validation made implicit by validate-on-write.
if *ptr == nil { | ||
*ptr = &PrivilegeDescriptor{} | ||
} | ||
p := *ptr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this was done mainly to accomodate tests. I think it's also nicer though.
@@ -223,7 +224,7 @@ func (m *randomStreamClient) getDescriptorAndNamespaceKVForTableID( | |||
IngestionDatabaseID, | |||
tableID, | |||
fmt.Sprintf(RandomStreamSchemaPlaceholder, tableName), | |||
&descpb.PrivilegeDescriptor{}, | |||
descpb.NewDefaultPrivilegeDescriptor(security.RootUserName()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I'll revert this, I committed this by mistake.
pkg/sql/catalog/descriptor.go
Outdated
|
||
// Validate is like ValidateSelf but with additional cross-reference checks. | ||
Validate(ctx context.Context, descGetter DescGetter) error | ||
// ValidateSelfAndCrossReferences performs cross-reference checks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I'll fix this typo.
log.VEventf(ctx, 2, "planner getting immutable descriptor for id %d", id) | ||
return tc.getDescriptorByID(ctx, txn, id, flags, false /* mutable */) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this is used by the repair functions.
@@ -75,16 +71,18 @@ var ErrIndexGCMutationsList = errors.New("index in GC mutations list") | |||
// given TableDescriptor with the cluster version being the zero table. This | |||
// is for a table that is created in the transaction. | |||
func NewCreatedMutable(tbl descpb.TableDescriptor) *Mutable { | |||
return &Mutable{wrapper: wrapper{TableDescriptor: tbl}} | |||
m, _ := NewFilledInExistingMutable(context.TODO(), nil /* DescGetter */, false /* skipFKsWithMissingTable */, &tbl) | |||
return &Mutable{wrapper: m.wrapper} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this is mainly to make sure that the privileges are set from the get-go.
@@ -1659,26 +1280,19 @@ func FormatTableLocalityConfig(c *descpb.TableDescriptor_LocalityConfig, f *tree | |||
return nil | |||
} | |||
|
|||
// ValidateTableLocalityConfig validates whether the descriptor's locality | |||
// validateTableLocalityConfig validates whether the descriptor's locality |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I forgot to move this to validate.go
, will do.
// Validate performs ValidateSelf and then validates that | ||
// each reference to another table is resolvable and that the necessary back | ||
// references exist. | ||
func (desc *wrapper) Validate(ctx context.Context, descGetter catalog.DescGetter) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I moved this and the other validation functions to a new source file validate.go
. The diffs were pretty gnarly anyway and as it turns out this also helps slim down structured.go
which is far too big a source file.
@@ -137,1459 +133,6 @@ func TestAllocateIDs(t *testing.T) { | |||
t.Fatalf("expected %s, but found %s", a, b) | |||
} | |||
} | |||
func TestValidateDatabaseDesc(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: moved this to dbdesc
.
@@ -0,0 +1,1207 @@ | |||
// Copyright 2021 The Cockroach Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: here lives the table validation logic now. Unlike for type descriptors the table validation logic itself didn't need a lot of changes so I sincerely don't think this should impede review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really nice. I like the error reporter idea; it's much more flexible.
descpb.MaybeFixPrivileges(typ.GetID(), &typ.Privileges) | ||
} | ||
} | ||
res, err := protoutil.Marshal(desc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm cool with this.
} | ||
backRefMutables[id] = backRefMutable | ||
} | ||
if hasTempBackref { | ||
n.persistence = tree.PersistenceTemporary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the impact of this being missing?
// The prefix has the same format as the validation error wrapper. | ||
msgPrefix := fmt.Sprintf("%s %q (%d): ", desc.TypeName(), desc.GetName(), desc.GetID()) | ||
if msg[:len(msgPrefix)] == msgPrefix { | ||
msgPrefix = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack -- this code makes me somewhat sad. At some point hopefully it gets reworked with templates
d9d9a59
to
d443b82
Compare
Thanks a lot for the speedy review @ajwerner. I believe I've addressed your concerns in the latest revision I pushed. I also added some unit tests in Random thought: test coverage is always going to be spotty unless we enforce it somehow. Ideally I'd like the test suite to fail should there exist a specific validation error which doesn't have a corresponding unit test. Perhaps we could use the type system to this effect:
This validation error subtyping might have other interesting benefits like making the error message formatting more consistent, allowing for |
d443b82
to
2eec21b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This validation error subtyping might have other interesting benefits like making the error message formatting more consistent, allowing for errors.Is assertions, etc.
This is a fun idea. I think the way I'd do this is to have a hook in a testing knob to wrap the error reporter and then accumulate all of the errors generated through the test and then at the end of the test cross check that against a list of errors we expect to exist. We could do this leveraging types, or not. We could enforce this all further by saying that all of the error types need to be registered in some form or another at init time.
Didn't finish but here's a quick comment
Reviewed 26 of 69 files at r1, 1 of 5 files at r2, 14 of 23 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, and @postamar)
pkg/sql/scrub.go, line 433 at r3 (raw file):
table, ok := desc.(catalog.TableDescriptor) if !ok { return nil, catalog.ErrDescriptorNotFound
any interest in telling us which descriptor wasn't found with an errors.Wrapf
?
Scrub is sort of deprecated so not sure it matters too much.
pkg/sql/catalog/validate.go, line 139 at r3 (raw file):
Quoted 4 lines of code…
type validationErrorAccumulatorImpl struct { wrapPrefix string errors []error }
minor nit, make a separate type for the ValidationErrors
implementation.
type validationErrors struct {
errors []error
}
func (*validationErrors) sealed() {}
type validationErrorAccumulator struct {
validationErrors
wrapPrefix string
}
Also, I feel like the pattern where an unexported struct which implements an exported interface of the same name is pretty clear that it's the impl.
2eec21b
to
66db0fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, and @postamar)
pkg/sql/scrub.go, line 433 at r3 (raw file):
Previously, ajwerner wrote…
any interest in telling us which descriptor wasn't found with an
errors.Wrapf
?Scrub is sort of deprecated so not sure it matters too much.
Sure! In fact I went a bit further and made use of ErrDescNotFound in ValidationDescGetter and friends.
pkg/sql/catalog/validate.go, line 139 at r3 (raw file):
Previously, ajwerner wrote…
type validationErrorAccumulatorImpl struct { wrapPrefix string errors []error }
minor nit, make a separate type for the
ValidationErrors
implementation.type validationErrors struct { errors []error } func (*validationErrors) sealed() {} type validationErrorAccumulator struct { validationErrors wrapPrefix string }
Also, I feel like the pattern where an unexported struct which implements an exported interface of the same name is pretty clear that it's the impl.
Done.
66db0fd
to
43e6695
Compare
I rebased w/ linter error fix + a couple more validation test cases. I also added a new small commit which protects doctor validations from runtime errors like nil dereferences. I'm uncomfortable about the possibility of a customer running doctor to debug a cluster and doctor itself crashing 😬 |
43e6695
to
cc94a13
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 14 of 69 files at r1, 2 of 5 files at r2, 4 of 23 files at r3, 23 of 23 files at r4.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, and @postamar)
pkg/sql/catalog/dbdesc/database_desc.go, line 343 at r4 (raw file):
report := func(err error) { vea.Report(errors.Wrapf(err, "schema mapping entry %q (%d)", errors.Safe(schemaName), errors.Safe(schemaInfo.ID)))
descpb.ID is always safe to format; no need to wrap values of that type in Safe.
func (ID) SafeValue() {} |
pkg/sql/catalog/descs/collection.go, line 1453 at r4 (raw file):
} // ValidateUncommittedDescriptors validates all uncommitted descriptors
maybe comment that this goes to the store for all of the descriptors?
When I was initially thinking about this stuff, I was thinking that we wouldn't go re-read them all but in practice this seems all fine and good given the new parallelism.
pkg/sql/catalog/tabledesc/table.go, line 179 at r4 (raw file):
} // GetConstraintInfo returns a summary of all constraints on the table.
does it matter that now the constraint details won't have descriptors? Was anybody using that? If not, can we deprecate that field altogether and then remove it in the subsequent release?
pkg/sql/pgwire/testdata/pgtest/notice, line 58 at r4 (raw file):
CommandComplete ---- {"Severity":"NOTICE","SeverityUnlocalized":"","Code":"00000","Message":"the data for dropped indexes is reclaimed asynchronously","Detail":"","Hint":"The reclamation delay can be customized in the zone configuration for the table.","Position":0,"InternalPosition":0,"InternalQuery":"","Where":"","SchemaName":"","TableName":"","ColumnName":"","DataTypeName":"","ConstraintName":"","File":"drop_index.go","Line":519,"Routine":"dropIndexByName","UnknownFields":null}
:( every time this happens I cry a little inside
cc94a13
to
dc62504
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking at this.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @miretskiy, and @postamar)
pkg/sql/catalog/dbdesc/database_desc.go, line 343 at r4 (raw file):
Previously, ajwerner wrote…
descpb.ID is always safe to format; no need to wrap values of that type in Safe.
func (ID) SafeValue() {}
Done.
pkg/sql/catalog/descs/collection.go, line 1453 at r4 (raw file):
Previously, ajwerner wrote…
maybe comment that this goes to the store for all of the descriptors?
When I was initially thinking about this stuff, I was thinking that we wouldn't go re-read them all but in practice this seems all fine and good given the new parallelism.
Done.
pkg/sql/catalog/tabledesc/table.go, line 179 at r4 (raw file):
Previously, ajwerner wrote…
does it matter that now the constraint details won't have descriptors? Was anybody using that? If not, can we deprecate that field altogether and then remove it in the subsequent release?
Those do appear to be used in information_schema.go
, pg_catalog.go
and such places.
pkg/sql/pgwire/testdata/pgtest/notice, line 58 at r4 (raw file):
Previously, ajwerner wrote…
:( every time this happens I cry a little inside
If I had a dollar every time this happened I'd be able to buy myself a latte every month or two!
dc62504
to
523ed05
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 24 of 24 files at r5.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, and @postamar)
Previously, descriptor validation suffered from a few shortcomings: 1. The methods to call were inconsistent across descriptor types, and the error messages they return are inconsistent as well. 2. Those methods made ad-hoc use of catalog.DescGetter to walk through the descriptor reference graph, and could sometimes walk surprisingly far. This is particularly true for type descriptors, which hold references to other types. 3. To complicate things further, DescGetter implementations which read descriptors from storage themselves perform validation as well. Although it is perfectly sensible to systematically validate descriptors when read, there is a circularity problem at hand which would benefit from being solved without melting the maintainer's brain in the process. 4. The validation methods return an error type, making it awkward to return multiple errors. Tools like doctor would be made more useful if they could report more than only the first encountered error. Recently we introduced a change that added descriptor validation on write. This change involves validating all uncommitted descriptors in bulk and this in turn favours a bulk approach to reading all their referenced descriptors from storage. With this in mind, this commit adds a GetReferencedDescIDs method to catalog.Descriptor which returns the IDs of all descriptors referenced by the receiver. By calling this method on all descriptors we intend to validate and by reading all referenced descriptors in one kv.Batch and stuffing them in a catalog.MapDescGetter, we can now validate using this in-memory DescGetter instead. Turning validation into this multi-step process means it can no longer be invoked as a method call on the target descriptor. This commit moves the entry point to descriptor validation to the catalog.Validate function. The descriptor objects themselves still define the validation checks themselves but these are effectively made inaccessible from outside the catalog package. The rationale behind this is to enforce order in the sequence of checks performed and to enforce some uniformity in the formatting of the error messages. The checks are grouped into three tiers: internal consistency checks, cross-reference consistency checks, and transactional consistency checks. All this also helps reduce the number of call sites of descriptor validations as well as increase the scope of the validations. This has uncovered a few bugs related to schemas and temporary tables. This effort has also helped identify validation checks which have since been made redundant, either by other existing rules or by the new validation-on-write behaviour. Finally, this has allowed us to strip doctor of its own duplicated (and dodgy) validation logic and simply report all validation errors instead. Release justification: This commit is safe for this release because it consists of bug fixes and changes involving descriptor validation. If anything these changes will help uncover descriptor corruptions which were previously unnoticed, both because we're extending the validation suite and because the doctor tool produces richer output. Release note (cli change): The doctor tool can now report multiple descriptor validation failures per descriptor.
Previously, doctor could crash when validating corrupted descriptors. The validation code is usually cautious enough to not dereference nils or the like but this isn't enforced in any way. This commit protects the validation calls in doctor with panic recovery, allowing doctor to continue and validate the remaining descriptors. Release justification: non-production code change. Release note: None
523ed05
to
4d9034c
Compare
bors r+ |
Build succeeded: |
Follow-up work
WriteDescriptors
inpkg/ccl/backupccl/restore_job.go
: should it validate schemas and types also? It seems like it should. When I add those, tests fail. I didn't look into it further. Probably out of scope for this change.Add linter check to enforce the use of(edit: I since changed the way validation errors were collected, making a linter check unnecessary)catalog.Validate
andcatalog.ValidateSelf
? It seems all too easy to call theValidateSelf
method on acatalog.Descriptor
without being aware that it returns an[]error
instead of anerror
.ValidateSelf
methods in favour ofNewFilledInImmutable
-like constructors. I don't like that validation functionality changes the underlying objects. (edit: this is actually really hard, and dangerous! giving up on this for now)I feel like my validation error wrapping is an improvement to the incoherent messaging that prevailed before but I'm not sure what's best to put in the prefix message. I settled on(edit: done like Andrew suggested below)errors.Wrapf(err, "%s %q", desc.TypeName(), desc.GetName())
but it could be anything else really. Do we prefer reporting names or IDs? I also don't know whether error message changes are considered to be contract-breaking, worthy of release note mentions, interfering with stability period, etc.Review guide
The meat and potatoes of the change is in
pkg/sql/catalog/*.go
, best start reviewing from there, moving on to thepkg/sql/catalog/*desc
packages. Notice that I moved the validation logic intabledesc
out ofstructured.go
into a new file,validate.go
. The diff instructured.go
was going to be messy anyways despite there not being too many significant underlying changes: it's just callingReport
with the errors, moving stuff around, removing redundancies, breaking down massive functions into smaller chunks, etc. The code still basically does the same thing. I moved tests cases around also.