-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
catalog: refactor privileges and system schema #69008
catalog: refactor privileges and system schema #69008
Conversation
bcc1608
to
57744f7
Compare
e18c7e6
to
88b1133
Compare
88b1133
to
973fa88
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a review guide. The diff is quite large but includes a lot of low-entropy lines, so to speak, so it should be quite tractable. I'm hoping this will pass review on the basis of this code change being a strict improvement to the existing code.
21,DropDatabase/drop_database_0_tables | ||
28,DropDatabase/drop_database_1_table | ||
35,DropDatabase/drop_database_2_tables | ||
42,DropDatabase/drop_database_3_tables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears that allowing most system tables to be leased can lead to some significant roundtrip savings. I'm guessing this is the cause, at least (not sure what else it could be) and for drops I'm guessing it specifically involves leasing the jobs table.
@@ -6735,7 +6736,7 @@ func TestRestoreErrorPropagates(t *testing.T) { | |||
defer dirCleanupFn() | |||
params := base.TestClusterArgs{} | |||
params.ServerArgs.ExternalIODir = dir | |||
jobsTableKey := keys.SystemSQLCodec.TablePrefix(keys.JobsTableID) | |||
jobsTableKey := keys.SystemSQLCodec.TablePrefix(uint32(systemschema.JobsTable.GetID())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naturally this change (and the others like it) are motivated by wanting to make some of these system tables' IDs dynamically defined instead of hard-coded. Still, in the meantime, we might as well refer to them via systemschema
when possible. There's a very minor benefit of type safety, of course, and I'd rather we refer to the keys
constant only in cases where we actually want the table ID to be hard-coded for performance reasons, typically in cases where we want to build a roachpb.Key
without having to do lookups.
pkg/ccl/benchccl/rttanalysisccl/testdata/benchmark_expectations
Outdated
Show resolved
Hide resolved
pkg/config/system_test.go
Outdated
@@ -367,13 +367,14 @@ func TestComputeSplitKeyTableIDs(t *testing.T) { | |||
baseSql, _ /* splits */ := schema.GetInitialValues() | |||
// Real system tables plus some user stuff. | |||
kvs, _ /* splits */ := schema.GetInitialValues() | |||
start := uint32(schema.InitialUserDescriptorID()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Laying the ground here for start
to be outside of the reserved ID range. Again I don't want to use keys
ID constants if I can avoid it.
name == d.GetName() { | ||
return nil, true, nil | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could replace this by looking up a map[descpb.NameInfo]...
instead but since systemschema.UnleasableSystemDescriptors
is always going to be tiny there really is no point and looping through the entries is perfectly fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway the point is, if the descriptor can't be leased, we just return here with the shouldReadFromStore
return value set to true
. This will cause the retrieval to fall back to KV.
ctx context.Context, codec keys.SQLCodec, | ||
) error { | ||
nc.Lock() | ||
defer nc.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point in the future we'll scan the namespace table here for all entries with parentID = 1
. Right now all system descriptors have hard-coded IDs so there's no point in doing this, we get the IDs from the bootstrapped schema.
Note that we'll have to keep doing this even if we add this namespace table scan: if the cluster is bootstrapped then the namespace table might not have been populated yet, and in any case will be populated with the IDs defined in the bootstrapped schema.
} | ||
} | ||
return tbl | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function builds a minimum-viable descpb.TableDescriptor
which can have stuff added to it via the toCatalogDescriptor
function defined below.
KeyColumnDirections: []descpb.IndexDescriptor_Direction{descpb.IndexDescriptor_ASC, descpb.IndexDescriptor_ASC, descpb.IndexDescriptor_ASC}, | ||
KeyColumnIDs: []descpb.ColumnID{1, 2, 3}, | ||
}, | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should yield the same descriptor, but more concisely and with less potential for bugs. Recall that these descriptors are all tested in TestSystemTableLiterals
.
973fa88
to
ecf6e03
Compare
9c15eb0
to
1354b4e
Compare
Whoops this latest change in which I try to serve in-memory descriptors for unleasable tables doesn't actually work. I'll backtrack. |
1354b4e
to
b94d1f2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All around, this is a very welcome change.
Reviewed 39 of 62 files at r1, 3 of 17 files at r2, 3 of 12 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, and @postamar)
pkg/bench/rttanalysis/testdata/benchmark_expectations, line 37 at r1 (raw file):
Previously, postamar (Marius Posta) wrote…
It appears that allowing most system tables to be leased can lead to some significant roundtrip savings. I'm guessing this is the cause, at least (not sure what else it could be) and for drops I'm guessing it specifically involves leasing the jobs table.
That makes sense. rangelog
can also hurt in big clusters.
pkg/sql/catalog/bootstrap/metadata.go, line 90 at r1 (raw file):
err := fn(desc) if err != nil {
nit: if err := fn(desc); err != nil {
?
pkg/sql/catalog/bootstrap/metadata.go, line 199 at r1 (raw file):
// InitialUserDescriptorID returns the smallest descriptor ID for a non-system // descriptor. This value is used to initialize the descriptor ID sequence. func (ms MetadataSchema) InitialUserDescriptorID() descpb.ID {
what's the vision here? We won't get to have a reserved range once we have dynamic system table addresses. The reason why not is that we need to upgrade clusters which will have a descriptor 51
already.
pkg/sql/catalog/catprivilege/validate.go, line 23 at r1 (raw file):
p descpb.PrivilegeDescriptor, objectNameKey catalog.NameKey, objectType privilege.ObjectType, ) error { return p.Validate(objectNameKey.GetParentID(), objectType, objectNameKey.GetName(), allowedSuperuserPrivileges(objectNameKey))
nit: wrap the args?
pkg/sql/catalog/descs/collection.go, line 82 at r3 (raw file):
// bypassing the descriptor lease mechanism. The lease mechanism will have its // own transaction to read the descriptor and will hang waiting for the // uncommitted changes to the descriptor. These descriptors are local to this
it's not exactly true that it will hang, the lease mechanism now uses priority high. This definitely was a problem back in the day. It'd be correct if it said, if this transaction is PRIORITY HIGH
. Bad stuff does happen if you get pushed by the leasing.
pkg/sql/catalog/descs/descriptor.go, line 216 at r3 (raw file):
} func (tc *Collection) withReadFromStore(
this deserves commentary
pkg/sql/catalog/descs/kv_descriptors.go, line 218 at r1 (raw file):
withNameLookup := maybeLookedUpName != "" if id == keys.SystemDatabaseID { b := dbdesc.NewBuilder(systemschema.SystemDB.DatabaseDesc())
should we just statically allocate this this in a var
?
pkg/sql/catalog/descs/kv_descriptors.go, line 114 at r3 (raw file):
} func (kd *kvDescriptors) lookupName(
would you be willing to comment this and getDescriptor
?
pkg/sql/catalog/descs/leased_descriptors.go, line 88 at r1 (raw file):
Previously, postamar (Marius Posta) wrote…
Anyway the point is, if the descriptor can't be leased, we just return here with the
shouldReadFromStore
return value set totrue
. This will cause the retrieval to fall back to KV.
Heh, the fallback I want to one day kill because it's stupid. This is fine.
pkg/sql/catalog/descs/system_descriptors.go, line 42 at r1 (raw file):
ParentSchemaID: parentSchemaID, Name: name, }]
probably no reason to but the nstree.Map
would work here if you ever wanted to go the other way too.
pkg/sql/catalog/descs/system_descriptors.go, line 49 at r1 (raw file):
if the cluster is bootstrapped then the namespace table might not have been populated yet, and in any case will be populated with the IDs defined in the bootstrapped schema.
I don't get this comment. Can you say more about when that is possible?
b94d1f2
to
a321822
Compare
This commit refactors how the system schema and the privileges for these tables are defined: - system table names are all hard-coded in `catconstants`; - system table descriptor definitions are much more concise; - system table privilege definitions as well as privilege validation and repair logic are moved from `descpb` to `catprivilege`; - system table privileges are defined by name instead of by ID. This refactor made it possible to clean up the descriptors collection collection logic somewhat: 1. uncommitted descriptors are moved out of kvDescriptors and into their own layer, 2. all tables can now be leased except for those in a small deny-list, 3. kvDescriptors read code paths (by name and by ID) are more unified, 4. system database namespace lookups in kvDescriptors.getByName go through a cache, 5. descs.Collection read code paths are more unified as well. 6. descriptor validation at transaction commit time leverages the descs.Collection as a catalog.BatchDescGetter. Notably, as alluded in (2), instead of the existing allow-list of leasable system descriptors we now have `UnleasableSystemDescriptors` defined in `systemschema`, a deny-list comprised of: - the system database (1), - the descriptor table (3), - the lease table (11), - the rangelog table (13), - the namespace table (30). All these changes contribute to reducing the number of round-trips to the storage layer. Release note: None
a321822
to
617d300
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 16 of 62 files at r1, 7 of 17 files at r2, 7 of 12 files at r3, 6 of 6 files at r4, 8 of 8 files at r5, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @adityamaru and @postamar)
Thanks for the review! CI failure seems spurious, going ahead and merging this now. bors r+ |
Build failed (retrying...): |
Build succeeded: |
This PR is responsible for some regressions on the hot path in the SQL layer:
Feel free to ignore this message, if the benefits of the change justify some regressions. |
Thanks @yuzefovich, I remember being curious as to what the perf implications would be. This change made many system tables leaseable when they previously weren't. This does add some overhead because we now acquire a lease when we previously wouldn't have bothered, I believe that for instance |
Why don't we exclude all of the current (static) system tables from being leased? |
IIRC we want system tables to behave just like any other table, whenever possible. This in turn should make it less awkward to give them just any other ID. |
I looked at this as part of a different perf regression. I don't think this relates to leasing. Leasing doesn't really allocate. I'm pretty sure this ends up having something to do with cloning the system database. That sort of maybe has to do with the fact that we might be leasing some table somewhere (I didn't dig), but it's not really fundamental to the perf regression. The perf loss is in the fiddly bits of the special cases. We'll just want to find the right places for some singletons as opposed to the clones which happen here and subsequently here. I took a stab in #71936 for something easy, but it isn't quite right. I do believe that #71927 + #71936 will get us below our 21.1 benchmark value. Here were the outputs of that combination:
|
Ok, I stand corrected. Regarding the system database descriptor, there's also the fact that the collection now adds it to its set of uncommitted descriptors, which triggers yet more allocations. However, if the system database descriptor really is read-only, we could make the case that it never belongs in that set in the first place. Perhaps the synthetic descriptor layer should handle this special case instead? I'll look into this. |
This commit refactors how the system schema and the privileges for these
tables are defined:
catconstants
;and repair logic are moved from
descpb
tocatprivilege
;These changes in turn made it possible to clean up the descriptor
leasing and collection logic somewhat.
One change which is makes this commit not strictly-speaking a refactor
involves leasing. Instead of the existing allow-list of leasable system
descriptors we now have
systemschema.UnleasableSystemDescriptors
,a deny-list comprised of:
These changes were all motivated by the upcoming need to allow system
tables to exist outside of the restricted range of IDs smaller than 50.
Release note: None