Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

catalog: refactor privileges and system schema #69008

Merged
merged 1 commit into from
Aug 19, 2021

Conversation

postamar
Copy link
Contributor

@postamar postamar commented Aug 16, 2021

This commit refactors how the system schema and the privileges for these
tables are defined:

  • system table names are all hard-coded in catconstants;
  • system table descriptor definitions are much more concise;
  • system table privilege definitions as well as privilege validation
    and repair logic are moved from descpb to catprivilege;
  • system table privileges are defined by name instead of by ID.

These changes in turn made it possible to clean up the descriptor
leasing and collection logic somewhat.

One change which is makes this commit not strictly-speaking a refactor
involves leasing. Instead of the existing allow-list of leasable system
descriptors we now have systemschema.UnleasableSystemDescriptors,
a deny-list comprised of:

  • the system database (1),
  • the descriptor table (3),
  • the lease table (11),
  • the rangelog table (13),
  • the namespace table (30).

These changes were all motivated by the upcoming need to allow system
tables to exist outside of the restricted range of IDs smaller than 50.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@postamar postamar force-pushed the refactor-system-schema-privs branch from bcc1608 to 57744f7 Compare August 16, 2021 19:42
@postamar postamar force-pushed the refactor-system-schema-privs branch 2 times, most recently from e18c7e6 to 88b1133 Compare August 17, 2021 00:22
@postamar postamar marked this pull request as ready for review August 17, 2021 00:33
@postamar postamar requested a review from a team August 17, 2021 00:33
@postamar postamar requested review from a team as code owners August 17, 2021 00:33
@postamar postamar requested review from a team and adityamaru and removed request for a team August 17, 2021 00:33
@postamar postamar force-pushed the refactor-system-schema-privs branch from 88b1133 to 973fa88 Compare August 17, 2021 09:52
Copy link
Contributor Author

@postamar postamar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a review guide. The diff is quite large but includes a lot of low-entropy lines, so to speak, so it should be quite tractable. I'm hoping this will pass review on the basis of this code change being a strict improvement to the existing code.

21,DropDatabase/drop_database_0_tables
28,DropDatabase/drop_database_1_table
35,DropDatabase/drop_database_2_tables
42,DropDatabase/drop_database_3_tables
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that allowing most system tables to be leased can lead to some significant roundtrip savings. I'm guessing this is the cause, at least (not sure what else it could be) and for drops I'm guessing it specifically involves leasing the jobs table.

@@ -6735,7 +6736,7 @@ func TestRestoreErrorPropagates(t *testing.T) {
defer dirCleanupFn()
params := base.TestClusterArgs{}
params.ServerArgs.ExternalIODir = dir
jobsTableKey := keys.SystemSQLCodec.TablePrefix(keys.JobsTableID)
jobsTableKey := keys.SystemSQLCodec.TablePrefix(uint32(systemschema.JobsTable.GetID()))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naturally this change (and the others like it) are motivated by wanting to make some of these system tables' IDs dynamically defined instead of hard-coded. Still, in the meantime, we might as well refer to them via systemschema when possible. There's a very minor benefit of type safety, of course, and I'd rather we refer to the keys constant only in cases where we actually want the table ID to be hard-coded for performance reasons, typically in cases where we want to build a roachpb.Key without having to do lookups.

pkg/config/system_test.go Outdated Show resolved Hide resolved
@@ -367,13 +367,14 @@ func TestComputeSplitKeyTableIDs(t *testing.T) {
baseSql, _ /* splits */ := schema.GetInitialValues()
// Real system tables plus some user stuff.
kvs, _ /* splits */ := schema.GetInitialValues()
start := uint32(schema.InitialUserDescriptorID())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Laying the ground here for start to be outside of the reserved ID range. Again I don't want to use keys ID constants if I can avoid it.

name == d.GetName() {
return nil, true, nil
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could replace this by looking up a map[descpb.NameInfo]... instead but since systemschema.UnleasableSystemDescriptors is always going to be tiny there really is no point and looping through the entries is perfectly fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway the point is, if the descriptor can't be leased, we just return here with the shouldReadFromStore return value set to true. This will cause the retrieval to fall back to KV.

ctx context.Context, codec keys.SQLCodec,
) error {
nc.Lock()
defer nc.Unlock()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point in the future we'll scan the namespace table here for all entries with parentID = 1. Right now all system descriptors have hard-coded IDs so there's no point in doing this, we get the IDs from the bootstrapped schema.
Note that we'll have to keep doing this even if we add this namespace table scan: if the cluster is bootstrapped then the namespace table might not have been populated yet, and in any case will be populated with the IDs defined in the bootstrapped schema.

}
}
return tbl
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function builds a minimum-viable descpb.TableDescriptor which can have stuff added to it via the toCatalogDescriptor function defined below.

KeyColumnDirections: []descpb.IndexDescriptor_Direction{descpb.IndexDescriptor_ASC, descpb.IndexDescriptor_ASC, descpb.IndexDescriptor_ASC},
KeyColumnIDs: []descpb.ColumnID{1, 2, 3},
},
))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should yield the same descriptor, but more concisely and with less potential for bugs. Recall that these descriptors are all tested in TestSystemTableLiterals.

pkg/sql/tests/system_table_test.go Outdated Show resolved Hide resolved
@postamar postamar force-pushed the refactor-system-schema-privs branch from 973fa88 to ecf6e03 Compare August 17, 2021 16:39
@postamar postamar force-pushed the refactor-system-schema-privs branch 2 times, most recently from 9c15eb0 to 1354b4e Compare August 18, 2021 12:14
@postamar postamar requested a review from a team as a code owner August 18, 2021 12:14
@postamar
Copy link
Contributor Author

Whoops this latest change in which I try to serve in-memory descriptors for unleasable tables doesn't actually work. I'll backtrack.

@postamar postamar force-pushed the refactor-system-schema-privs branch from 1354b4e to b94d1f2 Compare August 18, 2021 16:59
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All around, this is a very welcome change.

Reviewed 39 of 62 files at r1, 3 of 17 files at r2, 3 of 12 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, and @postamar)


pkg/bench/rttanalysis/testdata/benchmark_expectations, line 37 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

It appears that allowing most system tables to be leased can lead to some significant roundtrip savings. I'm guessing this is the cause, at least (not sure what else it could be) and for drops I'm guessing it specifically involves leasing the jobs table.

That makes sense. rangelog can also hurt in big clusters.


pkg/sql/catalog/bootstrap/metadata.go, line 90 at r1 (raw file):

		err := fn(desc)
		if err != nil {

nit: if err := fn(desc); err != nil {?


pkg/sql/catalog/bootstrap/metadata.go, line 199 at r1 (raw file):

// InitialUserDescriptorID returns the smallest descriptor ID for a non-system
// descriptor. This value is used to initialize the descriptor ID sequence.
func (ms MetadataSchema) InitialUserDescriptorID() descpb.ID {

what's the vision here? We won't get to have a reserved range once we have dynamic system table addresses. The reason why not is that we need to upgrade clusters which will have a descriptor 51 already.


pkg/sql/catalog/catprivilege/validate.go, line 23 at r1 (raw file):

	p descpb.PrivilegeDescriptor, objectNameKey catalog.NameKey, objectType privilege.ObjectType,
) error {
	return p.Validate(objectNameKey.GetParentID(), objectType, objectNameKey.GetName(), allowedSuperuserPrivileges(objectNameKey))

nit: wrap the args?


pkg/sql/catalog/descs/collection.go, line 82 at r3 (raw file):

	// bypassing the descriptor lease mechanism. The lease mechanism will have its
	// own transaction to read the descriptor and will hang waiting for the
	// uncommitted changes to the descriptor. These descriptors are local to this

it's not exactly true that it will hang, the lease mechanism now uses priority high. This definitely was a problem back in the day. It'd be correct if it said, if this transaction is PRIORITY HIGH. Bad stuff does happen if you get pushed by the leasing.


pkg/sql/catalog/descs/descriptor.go, line 216 at r3 (raw file):

}

func (tc *Collection) withReadFromStore(

this deserves commentary


pkg/sql/catalog/descs/kv_descriptors.go, line 218 at r1 (raw file):

	withNameLookup := maybeLookedUpName != ""
	if id == keys.SystemDatabaseID {
		b := dbdesc.NewBuilder(systemschema.SystemDB.DatabaseDesc())

should we just statically allocate this this in a var?


pkg/sql/catalog/descs/kv_descriptors.go, line 114 at r3 (raw file):

}

func (kd *kvDescriptors) lookupName(

would you be willing to comment this and getDescriptor?


pkg/sql/catalog/descs/leased_descriptors.go, line 88 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

Anyway the point is, if the descriptor can't be leased, we just return here with the shouldReadFromStore return value set to true. This will cause the retrieval to fall back to KV.

Heh, the fallback I want to one day kill because it's stupid. This is fine.


pkg/sql/catalog/descs/system_descriptors.go, line 42 at r1 (raw file):

		ParentSchemaID: parentSchemaID,
		Name:           name,
	}]

probably no reason to but the nstree.Map would work here if you ever wanted to go the other way too.


pkg/sql/catalog/descs/system_descriptors.go, line 49 at r1 (raw file):

if the cluster is bootstrapped then the namespace table might not have been populated yet, and in any case will be populated with the IDs defined in the bootstrapped schema.

I don't get this comment. Can you say more about when that is possible?

@postamar postamar force-pushed the refactor-system-schema-privs branch from b94d1f2 to a321822 Compare August 19, 2021 14:30
This commit refactors how the system schema and the privileges for these
tables are defined:
- system table names are all hard-coded in `catconstants`;
- system table descriptor definitions are much more concise;
- system table privilege definitions as well as privilege validation
  and repair logic are moved from `descpb` to `catprivilege`;
- system table privileges are defined by name instead of by ID.

This refactor made it possible to clean up the descriptors collection
collection logic somewhat:
1. uncommitted descriptors are moved out of kvDescriptors and into their
   own layer,
2. all tables can now be leased except for those in a small deny-list,
3. kvDescriptors read code paths (by name and by ID) are more unified,
4. system database namespace lookups in kvDescriptors.getByName go
   through a cache,
5. descs.Collection read code paths are more unified as well.
6. descriptor validation at transaction commit time leverages the
   descs.Collection as a catalog.BatchDescGetter.

Notably, as alluded in (2), instead of the existing allow-list of
leasable system descriptors we now have `UnleasableSystemDescriptors`
defined in `systemschema`, a deny-list comprised of:
- the system database (1),
- the descriptor table (3),
- the lease table (11),
- the rangelog table (13),
- the namespace table (30).

All these changes contribute to reducing the number of round-trips to
the storage layer.

Release note: None
@postamar postamar force-pushed the refactor-system-schema-privs branch from a321822 to 617d300 Compare August 19, 2021 14:33
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 16 of 62 files at r1, 7 of 17 files at r2, 7 of 12 files at r3, 6 of 6 files at r4, 8 of 8 files at r5, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @adityamaru and @postamar)

@postamar
Copy link
Contributor Author

Thanks for the review! CI failure seems spurious, going ahead and merging this now.

bors r+

@craig
Copy link
Contributor

craig bot commented Aug 19, 2021

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Aug 19, 2021

Build succeeded:

@craig craig bot merged commit 65457f9 into cockroachdb:master Aug 19, 2021
@postamar postamar deleted the refactor-system-schema-privs branch August 19, 2021 19:08
@yuzefovich
Copy link
Member

This PR is responsible for some regressions on the hot path in the SQL layer:

Switching to 617d300d909fa808c6b528ace5f3a033895d51d1^
Switching to 617d300d909fa808c6b528ace5f3a033895d51d1
name                                           old time/op    new time/op    delta
FlowSetup/vectorize=true/distribute=true-24       230µs ± 2%     240µs ± 2%  +4.36%  (p=0.000 n=10+10)
FlowSetup/vectorize=true/distribute=false-24      221µs ± 5%     230µs ± 2%  +4.05%  (p=0.001 n=10+10)
FlowSetup/vectorize=false/distribute=true-24      222µs ± 3%     228µs ± 2%  +2.49%  (p=0.001 n=10+10)
FlowSetup/vectorize=false/distribute=false-24     210µs ± 3%     216µs ± 1%  +3.07%  (p=0.001 n=10+8)

name                                           old alloc/op   new alloc/op   delta
FlowSetup/vectorize=true/distribute=true-24      36.1kB ± 1%    36.9kB ± 0%  +2.24%  (p=0.000 n=9+8)
FlowSetup/vectorize=true/distribute=false-24     34.1kB ± 0%    35.1kB ± 0%  +2.72%  (p=0.000 n=8+8)
FlowSetup/vectorize=false/distribute=true-24     41.1kB ± 0%    42.0kB ± 0%  +2.26%  (p=0.000 n=8+8)
FlowSetup/vectorize=false/distribute=false-24    39.3kB ± 0%    40.2kB ± 1%  +2.40%  (p=0.000 n=8+9)

name                                           old allocs/op  new allocs/op  delta
FlowSetup/vectorize=true/distribute=true-24         327 ± 0%       344 ± 0%  +5.31%  (p=0.000 n=8+8)
FlowSetup/vectorize=true/distribute=false-24        309 ± 0%       326 ± 0%  +5.50%  (p=0.000 n=9+8)
FlowSetup/vectorize=false/distribute=true-24        306 ± 0%       323 ± 0%  +5.56%  (p=0.000 n=9+8)
FlowSetup/vectorize=false/distribute=false-24       288 ± 0%       305 ± 0%  +5.90%  (p=0.000 n=8+9)

Feel free to ignore this message, if the benefits of the change justify some regressions.

@postamar
Copy link
Contributor Author

postamar commented Oct 26, 2021

Thanks @yuzefovich, I remember being curious as to what the perf implications would be. This change made many system tables leaseable when they previously weren't. This does add some overhead because we now acquire a lease when we previously wouldn't have bothered, I believe that for instance system.jobs is one of those tables. However this overhead might conceivably be amortized in some cases.
In any case for the time being I think we should keep this change, because we've almost run out of IDs for system tables, and this change is going to help us deal with that. Furthermore if the performance hit is a problem we can always look into which system tables are most responsible and add them to the leasing deny-list, which should be quite easy to do.
edit: That being said, I'll still do some preliminary checks to see if the perf hit is not something more obvious!

@jordanlewis
Copy link
Member

Why don't we exclude all of the current (static) system tables from being leased?

@postamar
Copy link
Contributor Author

IIRC we want system tables to behave just like any other table, whenever possible. This in turn should make it less awkward to give them just any other ID.

@ajwerner
Copy link
Contributor

I looked at this as part of a different perf regression. I don't think this relates to leasing. Leasing doesn't really allocate. I'm pretty sure this ends up having something to do with cloning the system database. That sort of maybe has to do with the fact that we might be leasing some table somewhere (I didn't dig), but it's not really fundamental to the perf regression. The perf loss is in the fiddly bits of the special cases. We'll just want to find the right places for some singletons as opposed to the clones which happen here and subsequently here.

I took a stab in #71936 for something easy, but it isn't quite right. I do believe that #71927 + #71936 will get us below our 21.1 benchmark value. Here were the outputs of that combination:

name                                           old time/op    new time/op    delta
FlowSetup/vectorize=true/distribute=true-16       141µs ± 3%     128µs ± 2%   -9.51%  (p=0.000 n=19+20)
FlowSetup/vectorize=true/distribute=false-16      138µs ± 4%     124µs ± 2%   -9.89%  (p=0.000 n=19+20)
FlowSetup/vectorize=false/distribute=true-16      134µs ± 2%     120µs ± 2%  -10.13%  (p=0.000 n=20+19)
FlowSetup/vectorize=false/distribute=false-16     129µs ± 3%     116µs ± 2%   -9.79%  (p=0.000 n=20+19)

name                                           old alloc/op   new alloc/op   delta
FlowSetup/vectorize=true/distribute=true-16      38.1kB ± 2%    35.3kB ± 2%   -7.23%  (p=0.000 n=18+18)
FlowSetup/vectorize=true/distribute=false-16     36.2kB ± 0%    33.5kB ± 0%   -7.53%  (p=0.000 n=17+16)
FlowSetup/vectorize=false/distribute=true-16     42.6kB ± 0%    39.9kB ± 0%   -6.31%  (p=0.000 n=18+16)
FlowSetup/vectorize=false/distribute=false-16    41.0kB ± 0%    38.3kB ± 0%   -6.57%  (p=0.000 n=16+18)

name                                           old allocs/op  new allocs/op  delta
FlowSetup/vectorize=true/distribute=true-16         368 ± 0%       292 ± 0%  -20.65%  (p=0.000 n=16+17)
FlowSetup/vectorize=true/distribute=false-16        354 ± 0%       278 ± 0%  -21.47%  (p=0.000 n=18+17)
FlowSetup/vectorize=false/distribute=true-16        337 ± 0%       262 ± 1%  -22.40%  (p=0.000 n=19+19)
FlowSetup/vectorize=false/distribute=false-16       325 ± 0%       249 ± 0%  -23.38%  (p=0.000 n=17+18)

@postamar
Copy link
Contributor Author

Ok, I stand corrected. Regarding the system database descriptor, there's also the fact that the collection now adds it to its set of uncommitted descriptors, which triggers yet more allocations. However, if the system database descriptor really is read-only, we could make the case that it never belongs in that set in the first place. Perhaps the synthetic descriptor layer should handle this special case instead? I'll look into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants