-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
keys: introduce tenant keyspaces, generate SQL keys through tenant-bound key generator #47999
keys: introduce tenant keyspaces, generate SQL keys through tenant-bound key generator #47999
Conversation
ac44981
to
82bc28e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far! I left a comment about having the tenantID and its encoding/decoding encapsulated away in a package. I wonder if you had already considered and decided against that.
Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 2 of 2 files at r3, 6 of 6 files at r4, 86 of 86 files at r5, 4 of 4 files at r6.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @nvanbenschoten)
pkg/keys/constants.go, line 35 at r6 (raw file):
// The maximum value for a Uint64 encoded table ID is: // \xfd\xff\xff\xff\xff\xff\xff\xff\xff // So we could also set this to be \xfe without issue. That would prevent
Yeah, using xfe can't hurt. Let's do it
pkg/keys/constants.go, line 282 at r6 (raw file):
// Should this be TablePrefix(math.MaxUint32 + 1)? Should we just set it to // tenantPrefix? TableDataMax = SystemTenantKeyGen.TablePrefix(math.MaxUint32)
Yeah that seems wrong. I wouldn't set it to tenantPrefix, based on a desire to keep the spans tight and "noninterlinked".
pkg/keys/printer.go, line 250 at r6 (raw file):
tenantIDStr := input[:slashPos] tenantID, err := strconv.ParseUint(tenantIDStr, 10, 64) fmt.Println(tenantID, err)
This method too would be better kept in the hypothetical tenantpb
package.
pkg/keys/printer_test.go, line 203 at r6 (raw file):
{makeKey(ten5KeyGen.TablePrefix(42), roachpb.RKey(encoding.EncodeVarintAscending(nil, 1222)), roachpb.RKey(encoding.EncodeStringAscending(nil, "handsome man"))),
lol, never noticed these mysteriously ridiculous test cases. I'm scared of git blaming them, mostly because I might be the winner
pkg/keys/tenant.go, line 175 at r5 (raw file):
return 0, errors.Errorf("key is not a descriptor table entry: %v", key) } // descID
partial comment?
pkg/roachpb/tenant.go, line 18 at r4 (raw file):
// Each tenant is granted exclusive access to a portion of the keyspace and a // collection of SQL tables in that keyspace which comprise a "logical" cluster. type TenantID uint64
Could this be more opaque? It shouldn't just look like an int, i.e. something like
type TenantID struct{id uint64}
instead? No SQL uses should never care about the value. They will care whether it's SystemTenantID or not, but that's about it. A tenantID is only initialized in startup code (and tests), so there's going to be a constructor that has an underlying type, but that's it? The underlying type doesn't matter (outside of key encoding/decoding). Makes me wonder whether the tenantID and the structure that holds it for encoding purposes should live together in a tenantpb
package (I'm fine not making it a protobuf until we have to, it's unclear that we will), so that nobody outside of that package even knows what the underlying type is (you need a constructor of course, but only in one place at startup and in tests).
pkg/roachpb/tenant.go, line 24 at r4 (raw file):
// system tenant differs from all other tenants in three important ways: // 1. the system tenant's keyspace is not prefixed with a tenant specifier. // 2. the system tenant is created by default during cluster initialization.
It's also always present (i.e. can't be destroyed)
pkg/roachpb/tenant.go, line 25 at r4 (raw file):
// 1. the system tenant's keyspace is not prefixed with a tenant specifier. // 2. the system tenant is created by default during cluster initialization. // 2. the system tenant had the ability to create and destroy other tenants.
has
82bc28e
to
ada04e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
I left a comment about having the tenantID and its encoding/decoding encapsulated away in a package. I wonder if you had already considered and decided against that.
Discussed inline.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @tbg)
pkg/keys/constants.go, line 35 at r6 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Yeah, using xfe can't hurt. Let's do it
Done. And updated the doc.
pkg/keys/constants.go, line 282 at r6 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Yeah that seems wrong. I wouldn't set it to tenantPrefix, based on a desire to keep the spans tight and "noninterlinked".
Done.
pkg/keys/printer.go, line 250 at r6 (raw file):
Previously, tbg (Tobias Grieger) wrote…
This method too would be better kept in the hypothetical
tenantpb
package.
Let's discuss on the other thread.
pkg/keys/printer_test.go, line 203 at r6 (raw file):
Previously, tbg (Tobias Grieger) wrote…
lol, never noticed these mysteriously ridiculous test cases. I'm scared of git blaming them, mostly because I might be the winner
I took a look – it wasn't who I thought it would be...
pkg/keys/tenant.go, line 175 at r5 (raw file):
Previously, tbg (Tobias Grieger) wrote…
partial comment?
Copied from before, but fixed.
pkg/roachpb/tenant.go, line 18 at r4 (raw file):
something like
Good idea! Done.
I did consider adding a tenant
package but decided against it. The reason for that is that I don't want this multi-tenancy stuff to be viewed as a half-baked extension to Cockroach. Instead, I want it to be clear that this is the new Cockroach. The changes we're making to the keyspace and to the KV layer are fundamental and any new reader of this code should consider multi-tenancy from the beginning.
I also don't think there's enough code in this file to warrant a new package. I could be convinced otherwise though.
pkg/roachpb/tenant.go, line 24 at r4 (raw file):
Previously, tbg (Tobias Grieger) wrote…
It's also always present (i.e. can't be destroyed)
Done.
pkg/roachpb/tenant.go, line 25 at r4 (raw file):
Previously, tbg (Tobias Grieger) wrote…
has
Done.
❌ The GitHub CI (Cockroach) build has failed on ada04e2e. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's get this in ASAP, I assume it rots out from under you like twice an hour.
@asubiotto did you see this PR?
Reviewed 90 of 90 files at r7, 1 of 1 files at r8, 2 of 2 files at r9, 6 of 6 files at r10, 86 of 86 files at r11, 5 of 5 files at r12, 1 of 1 files at r13.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto and @nvanbenschoten)
c-deps/libroach/encoding.h, line 125 at r13 (raw file):
// tbl stores the decoded descriptor ID of the table. // // TODO(nvanbenschoten): support tenant ID prefix.
Can you file this one?
pkg/keys/keys.go, line 693 at r13 (raw file):
// For secondary index keys, the row prefix is defined as the entire key. // // TODO(nvanbenschoten): support tenant ID prefix.
Maybe make a checklist-style issue for completing tenantID support. It can be close to a placeholder only since I assume those will be your next PRs.
pkg/keys/printer_test.go, line 203 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I took a look – it wasn't who I thought it would be...
Couldn't not do it now, #3055 for anyone else watching.
pkg/keys/tenant.go, line 58 at r13 (raw file):
// SystemTenantKeyGen is a SQL key generator for the system tenant. var SystemTenantKeyGen = MakeTenantIDKeyGen(roachpb.SystemTenantID)
In the final code, do you see this used outside of tests at all?
pkg/roachpb/tenant.go, line 18 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
something like
Good idea! Done.
I did consider adding a
tenant
package but decided against it. The reason for that is that I don't want this multi-tenancy stuff to be viewed as a half-baked extension to Cockroach. Instead, I want it to be clear that this is the new Cockroach. The changes we're making to the keyspace and to the KV layer are fundamental and any new reader of this code should consider multi-tenancy from the beginning.I also don't think there's enough code in this file to warrant a new package. I could be convinced otherwise though.
I think our thinking is similar here and I think what I'm after is better phrased as a comment on the state of the encoding routines: It's a shame that SQL and KV are so intermingled here. pkg/keys
contains a whole bunch of KV internals but ends up also being the place where SQL keys are constructed. I know that there are some legit dependencies, but it's totally unclear what they are, you have to basically look at the callers for each of them which is prohibitive.
Now for constructing SQL keys, I agree that the tenant key gen you made is the way to get keys for use by SQL, though the name is perhaps even too tenant-specific then? If there's a TenantIDKeyGen
, could there be a NoTenantIDKeyGen
? Nope, could not. What we're looking at is perhaps better named a SQLKeys
instance (consisting of a SQLEncoder
(generates keys) and SQLDecoder
(decodes keys) component if you wanted to pull it apart more). And I think that is the thing I would want to see in a separate package, in an ideal world: definitely enough code to justify a package, and also the right boundary. This package would depend only on encoding
and a few constants: DescriptorTableID
, etc (and these SQL-specific constants should move with the keygen, out of keys
). It doesn't seem like it would depend on keys
which is great because then that can be kv-specific. The packages will catch some dependency because things like the system config etc, but those warts seem like they're going away over time.
I don't think it's necessary to make the changes I'm suggesting here, since we want to move quickly and this PR is already making things better, but I would love to see us headed that way if you're interested. (And even if not, curious whether there's something blocking it).
Something that I will point out again is that it just seems totally random that the tenant ID lives in roachpb. It isn't even a proto and the dependency of keys
on roachpb
generally also doesn't seem justified (right?). Curious what your thinking is here.
The builtin function was previously using `tableDesc.IndexSpan` and then ignoring the EndKey. This was confusing and a waste of a memory allocation.
Avoids an allocation + encoding per span.Builder.
…romEncDatums Be consistent with all other functions in this file.
This commit introduces a few new important concepts to the CockroachDB codebase: - Tenants - Tenant IDs - The "System" tenant A TenantID is a unique ID associated with a tenant in a multi-tenant cluster. Each tenant is granted exclusive access to a portion of the keyspace and a collection of SQL tables in that keyspace which comprise a logical cluster. The System tenant is the system's internal tenant in a multi-tenant cluster and the only tenant in a single-tenant cluster. This commit introduces just the bare bones in terms of the key encoding of tenants. Future commits will expand on this.
The commit then combines the concepts introduced in the previous commit with a new tenant-bound key generator structure called the `TenantIDKeyGen`. This key generator is bound to a TenantID on creation and is only capable of producing SQL keys for that tenant. The commit then removes all other forms of SQL key generation. Notably, it removes all free functions in `pkg/keys` that generate SQL keys and redirects all key generation in `pkg/sql/sqlbase/index_encoding.go` through `pkg/keys`. I audited all uses of `encoding.EncodeUvarintAscending` (and `DecodeUvarint64` for C++) in the codebase to get to a point where I feel relatively confident about this. Still, it will be nice to verify this further in the future when we test SQL tenant servers and reject all cross-tenant KV traffic. With all SQL key generation flowing through the tenant-bound key generators, the commit then introduces two global key generators: - `SystemTenantKeyGen`: a SQL key generator for the system tenant - `TODOTenantKeyGen`: equivalent to SystemTenantKeyGen, but should be used when it is unclear which tenant should be referenced by the surrounding context. These two key generators are used appropriately. Most production code uses the `TODOTenantKeyGen` for now because it will later be replaced with a true tenant-bound key gen. Most testing code use the `SystemTenantKeyGen`. Once most of the obvious uses of `TODOTenantKeyGen` are addressed, I might put it behind a function that takes a "issueNo" to make sure that all remaining uses of it are tracked in Github. This commit is very large already, so the process of testing the key generation and parsing for non-system tenants will come in the next commit.
This commit adds support to pretty print tenant prefixed keys and to (best-effort) parse tenant prefixed keys.
ada04e2
to
846a3d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR! I responded to your comment about package structure below. I'm happy to continue the discussion because I'd like to get this right, but I'm going to merge this when possible to keep things moving/not rotting.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto and @tbg)
c-deps/libroach/encoding.h, line 125 at r13 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Can you file this one?
Done. #48122.
pkg/keys/keys.go, line 693 at r13 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Maybe make a checklist-style issue for completing tenantID support. It can be close to a placeholder only since I assume those will be your next PRs.
This is actually the same issue as #48122. I also filed #48123 to track the use of keys.TODOTenantKeyGen
(now keys.TODOSQLCodec
) more generally.
pkg/roachpb/tenant.go, line 18 at r4 (raw file):
Previously, tbg (Tobias Grieger) wrote…
I think our thinking is similar here and I think what I'm after is better phrased as a comment on the state of the encoding routines: It's a shame that SQL and KV are so intermingled here.
pkg/keys
contains a whole bunch of KV internals but ends up also being the place where SQL keys are constructed. I know that there are some legit dependencies, but it's totally unclear what they are, you have to basically look at the callers for each of them which is prohibitive.Now for constructing SQL keys, I agree that the tenant key gen you made is the way to get keys for use by SQL, though the name is perhaps even too tenant-specific then? If there's a
TenantIDKeyGen
, could there be aNoTenantIDKeyGen
? Nope, could not. What we're looking at is perhaps better named aSQLKeys
instance (consisting of aSQLEncoder
(generates keys) andSQLDecoder
(decodes keys) component if you wanted to pull it apart more). And I think that is the thing I would want to see in a separate package, in an ideal world: definitely enough code to justify a package, and also the right boundary. This package would depend only onencoding
and a few constants:DescriptorTableID
, etc (and these SQL-specific constants should move with the keygen, out ofkeys
). It doesn't seem like it would depend onkeys
which is great because then that can be kv-specific. The packages will catch some dependency because things like the system config etc, but those warts seem like they're going away over time.I don't think it's necessary to make the changes I'm suggesting here, since we want to move quickly and this PR is already making things better, but I would love to see us headed that way if you're interested. (And even if not, curious whether there's something blocking it).
Something that I will point out again is that it just seems totally random that the tenant ID lives in roachpb. It isn't even a proto and the dependency of
keys
onroachpb
generally also doesn't seem justified (right?). Curious what your thinking is here.
I agree with a lot of what you're saying. For one, you're right that the name is too tenant-specific. I added another commit that renames it to keys.SQLCodec
with some inner delegation to a sqlEncoder
and a sqlDecoder
. That all turned out very well.
The part that I'm less sure about is the package structure. If we pull all of this out of pkg/keys
, then what is the point of that package? Its godoc says "Package keys manages the construction of keys for CockroachDB's key-value layer." So the construction of SQL keys to be handed to the key-value layer seems very appropriate for pkg/keys
. Maybe I'm thinking about this incorrectly, but I think it's nice that there's a single package responsible for the structure and encoding of the keyspace instead of fragmenting such responsibility across a series of packages.
But then this begs the question - how much of SQL encoding should live in ./pkg/keys
: just the table ID prefix? the table ID, index ID, and column-family prefix? The entire key? That's where I'm less sure.
Similarly, I don't see it as a problem for tenant ID to live in roachpb
any more than I see it as a problem that roachpb.Key
(which ./pkg/keys
has a huge dependency on) or roachpb.StoreID
live in roachpb
. roachpb
is meant to store common type definitions that may be part of protos and that other packages need to depend on. It seems to be serving exactly that goal by hosting roachpb.TenantID
.
pkg/keys/tenant.go, line 58 at r13 (raw file):
Previously, tbg (Tobias Grieger) wrote…
In the final code, do you see this used outside of tests at all?
I'm not sure. I think it still might be used by code in KV that is specific to the system-tenant. I think we'll have to wait and see.
math.MaxUint32 was meant to be the largest table ID (inclusive), not an exclusive upper bound. To adjust for this, we PrefixEnd the value to get a true upper bound key for all (system tenant) table keys.
A SQLCodec is a combination of a sqlEncoder and a sqlDecoder.
3cd16c0
to
cffd1aa
Compare
bors r+ |
Build succeeded |
First step in addressing #47903.
This PR introduces a few new important concepts to the CockroachDB codebase:
A TenantID is a unique ID associated with a tenant in a multi-tenant cluster. Each tenant is granted exclusive access to a portion of the keyspace and a collection of SQL tables in that keyspace which comprise a logical cluster. The "system" tenant is the system's internal tenant in a multi-tenant cluster and the only tenant in a single-tenant cluster.
The system tenant differs from all other tenants in three important ways:
The PR then combines the concepts introduced in the previous commit with a new tenant-bound key generator structure called the
TenantIDKeyGen
. This key generator is bound to a TenantID on creation and is only capable of producing SQL keys for that tenant.The change then removes all other forms of SQL key generation. Notably, it removes all free functions in
pkg/keys
that generate SQL keys and redirects all key generation inpkg/sql/sqlbase/index_encoding.go
throughpkg/keys
. I audited all uses ofencoding.EncodeUvarintAscending
(andDecodeUvarint64
for C++) in the codebase to get to a point where I feel relatively confident about this. Still, it will be nice to verify this further in the future when we test SQL tenant servers and reject all cross-tenant KV traffic.With all SQL key generation flowing through the tenant-bound key generators, the change then introduces two global key generators:
SystemTenantKeyGen
: a SQL key generator for the system tenantTODOTenantKeyGen
: equivalent toSystemTenantKeyGen
, but should be used when it is unclear which tenant should be referenced by the surrounding context.These two key generators are used appropriately. Most production code uses the
TODOTenantKeyGen
for now because it will later be replaced with a real tenant-bound key generator that hangs off of its SQL server. Most testing code uses theSystemTenantKeyGen
. Once most of the obvious uses ofTODOTenantKeyGen
are addressed in the next PR, I might put it behind a function that takes an "issueNo" to make sure that all remaining uses of it are tracked in Github.