Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keys: introduce tenant keyspaces, generate SQL keys through tenant-bound key generator #47999

Merged
merged 8 commits into from
Apr 29, 2020

Conversation

nvanbenschoten
Copy link
Member

First step in addressing #47903.

This PR introduces a few new important concepts to the CockroachDB codebase:

  • Tenants
  • Tenant IDs
  • Tenant keyspaces
  • The "System" tenant

A TenantID is a unique ID associated with a tenant in a multi-tenant cluster. Each tenant is granted exclusive access to a portion of the keyspace and a collection of SQL tables in that keyspace which comprise a logical cluster. The "system" tenant is the system's internal tenant in a multi-tenant cluster and the only tenant in a single-tenant cluster.

The system tenant differs from all other tenants in three important ways:

  1. the system tenant's keyspace is not prefixed with a tenant specifier.
  2. the system tenant is created by default during cluster initialization.
  3. the system tenant had the ability to create and destroy other tenants.

The PR then combines the concepts introduced in the previous commit with a new tenant-bound key generator structure called the TenantIDKeyGen. This key generator is bound to a TenantID on creation and is only capable of producing SQL keys for that tenant.

The change then removes all other forms of SQL key generation. Notably, it removes all free functions in pkg/keys that generate SQL keys and redirects all key generation in pkg/sql/sqlbase/index_encoding.go through pkg/keys. I audited all uses of encoding.EncodeUvarintAscending (and DecodeUvarint64 for C++) in the codebase to get to a point where I feel relatively confident about this. Still, it will be nice to verify this further in the future when we test SQL tenant servers and reject all cross-tenant KV traffic.

With all SQL key generation flowing through the tenant-bound key generators, the change then introduces two global key generators:

  • SystemTenantKeyGen: a SQL key generator for the system tenant
  • TODOTenantKeyGen: equivalent to SystemTenantKeyGen, but should be used when it is unclear which tenant should be referenced by the surrounding context.

These two key generators are used appropriately. Most production code uses the TODOTenantKeyGen for now because it will later be replaced with a real tenant-bound key generator that hangs off of its SQL server. Most testing code uses the SystemTenantKeyGen. Once most of the obvious uses of TODOTenantKeyGen are addressed in the next PR, I might put it behind a function that takes an "issueNo" to make sure that all remaining uses of it are tracked in Github.

@nvanbenschoten nvanbenschoten requested review from tbg and asubiotto April 24, 2020 01:37
@nvanbenschoten nvanbenschoten added the A-multitenancy Related to multi-tenancy label Apr 24, 2020
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/tenantKey branch 2 times, most recently from ac44981 to 82bc28e Compare April 24, 2020 02:26
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far! I left a comment about having the tenantID and its encoding/decoding encapsulated away in a package. I wonder if you had already considered and decided against that.

Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 2 of 2 files at r3, 6 of 6 files at r4, 86 of 86 files at r5, 4 of 4 files at r6.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @nvanbenschoten)


pkg/keys/constants.go, line 35 at r6 (raw file):

	// The maximum value for a Uint64 encoded table ID is:
	//  \xfd\xff\xff\xff\xff\xff\xff\xff\xff
	// So we could also set this to be \xfe without issue. That would prevent

Yeah, using xfe can't hurt. Let's do it


pkg/keys/constants.go, line 282 at r6 (raw file):

	// Should this be TablePrefix(math.MaxUint32 + 1)? Should we just set it to
	// tenantPrefix?
	TableDataMax = SystemTenantKeyGen.TablePrefix(math.MaxUint32)

Yeah that seems wrong. I wouldn't set it to tenantPrefix, based on a desire to keep the spans tight and "noninterlinked".


pkg/keys/printer.go, line 250 at r6 (raw file):

	tenantIDStr := input[:slashPos]
	tenantID, err := strconv.ParseUint(tenantIDStr, 10, 64)
	fmt.Println(tenantID, err)

This method too would be better kept in the hypothetical tenantpb package.


pkg/keys/printer_test.go, line 203 at r6 (raw file):

		{makeKey(ten5KeyGen.TablePrefix(42),
			roachpb.RKey(encoding.EncodeVarintAscending(nil, 1222)),
			roachpb.RKey(encoding.EncodeStringAscending(nil, "handsome man"))),

lol, never noticed these mysteriously ridiculous test cases. I'm scared of git blaming them, mostly because I might be the winner


pkg/keys/tenant.go, line 175 at r5 (raw file):

		return 0, errors.Errorf("key is not a descriptor table entry: %v", key)
	}
	// descID

partial comment?


pkg/roachpb/tenant.go, line 18 at r4 (raw file):

// Each tenant is granted exclusive access to a portion of the keyspace and a
// collection of SQL tables in that keyspace which comprise a "logical" cluster.
type TenantID uint64

Could this be more opaque? It shouldn't just look like an int, i.e. something like

type TenantID struct{id uint64}

instead? No SQL uses should never care about the value. They will care whether it's SystemTenantID or not, but that's about it. A tenantID is only initialized in startup code (and tests), so there's going to be a constructor that has an underlying type, but that's it? The underlying type doesn't matter (outside of key encoding/decoding). Makes me wonder whether the tenantID and the structure that holds it for encoding purposes should live together in a tenantpb package (I'm fine not making it a protobuf until we have to, it's unclear that we will), so that nobody outside of that package even knows what the underlying type is (you need a constructor of course, but only in one place at startup and in tests).


pkg/roachpb/tenant.go, line 24 at r4 (raw file):

// system tenant differs from all other tenants in three important ways:
// 1. the system tenant's keyspace is not prefixed with a tenant specifier.
// 2. the system tenant is created by default during cluster initialization.

It's also always present (i.e. can't be destroyed)


pkg/roachpb/tenant.go, line 25 at r4 (raw file):

// 1. the system tenant's keyspace is not prefixed with a tenant specifier.
// 2. the system tenant is created by default during cluster initialization.
// 2. the system tenant had the ability to create and destroy other tenants.

has

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/tenantKey branch from 82bc28e to ada04e2 Compare April 27, 2020 23:25
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

I left a comment about having the tenantID and its encoding/decoding encapsulated away in a package. I wonder if you had already considered and decided against that.

Discussed inline.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @tbg)


pkg/keys/constants.go, line 35 at r6 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Yeah, using xfe can't hurt. Let's do it

Done. And updated the doc.


pkg/keys/constants.go, line 282 at r6 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Yeah that seems wrong. I wouldn't set it to tenantPrefix, based on a desire to keep the spans tight and "noninterlinked".

Done.


pkg/keys/printer.go, line 250 at r6 (raw file):

Previously, tbg (Tobias Grieger) wrote…

This method too would be better kept in the hypothetical tenantpb package.

Let's discuss on the other thread.


pkg/keys/printer_test.go, line 203 at r6 (raw file):

Previously, tbg (Tobias Grieger) wrote…

lol, never noticed these mysteriously ridiculous test cases. I'm scared of git blaming them, mostly because I might be the winner

I took a look – it wasn't who I thought it would be...


pkg/keys/tenant.go, line 175 at r5 (raw file):

Previously, tbg (Tobias Grieger) wrote…

partial comment?

Copied from before, but fixed.


pkg/roachpb/tenant.go, line 18 at r4 (raw file):

something like

Good idea! Done.

I did consider adding a tenant package but decided against it. The reason for that is that I don't want this multi-tenancy stuff to be viewed as a half-baked extension to Cockroach. Instead, I want it to be clear that this is the new Cockroach. The changes we're making to the keyspace and to the KV layer are fundamental and any new reader of this code should consider multi-tenancy from the beginning.

I also don't think there's enough code in this file to warrant a new package. I could be convinced otherwise though.


pkg/roachpb/tenant.go, line 24 at r4 (raw file):

Previously, tbg (Tobias Grieger) wrote…

It's also always present (i.e. can't be destroyed)

Done.


pkg/roachpb/tenant.go, line 25 at r4 (raw file):

Previously, tbg (Tobias Grieger) wrote…

has

Done.

@blathers-crl
Copy link

blathers-crl bot commented Apr 27, 2020

❌ The GitHub CI (Cockroach) build has failed on ada04e2e.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: let's get this in ASAP, I assume it rots out from under you like twice an hour.

@asubiotto did you see this PR?

Reviewed 90 of 90 files at r7, 1 of 1 files at r8, 2 of 2 files at r9, 6 of 6 files at r10, 86 of 86 files at r11, 5 of 5 files at r12, 1 of 1 files at r13.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto and @nvanbenschoten)


c-deps/libroach/encoding.h, line 125 at r13 (raw file):

// tbl stores the decoded descriptor ID of the table.
//
// TODO(nvanbenschoten): support tenant ID prefix.

Can you file this one?


pkg/keys/keys.go, line 693 at r13 (raw file):

// For secondary index keys, the row prefix is defined as the entire key.
//
// TODO(nvanbenschoten): support tenant ID prefix.

Maybe make a checklist-style issue for completing tenantID support. It can be close to a placeholder only since I assume those will be your next PRs.


pkg/keys/printer_test.go, line 203 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I took a look – it wasn't who I thought it would be...

Couldn't not do it now, #3055 for anyone else watching.


pkg/keys/tenant.go, line 58 at r13 (raw file):

// SystemTenantKeyGen is a SQL key generator for the system tenant.
var SystemTenantKeyGen = MakeTenantIDKeyGen(roachpb.SystemTenantID)

In the final code, do you see this used outside of tests at all?


pkg/roachpb/tenant.go, line 18 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

something like

Good idea! Done.

I did consider adding a tenant package but decided against it. The reason for that is that I don't want this multi-tenancy stuff to be viewed as a half-baked extension to Cockroach. Instead, I want it to be clear that this is the new Cockroach. The changes we're making to the keyspace and to the KV layer are fundamental and any new reader of this code should consider multi-tenancy from the beginning.

I also don't think there's enough code in this file to warrant a new package. I could be convinced otherwise though.

I think our thinking is similar here and I think what I'm after is better phrased as a comment on the state of the encoding routines: It's a shame that SQL and KV are so intermingled here. pkg/keys contains a whole bunch of KV internals but ends up also being the place where SQL keys are constructed. I know that there are some legit dependencies, but it's totally unclear what they are, you have to basically look at the callers for each of them which is prohibitive.

Now for constructing SQL keys, I agree that the tenant key gen you made is the way to get keys for use by SQL, though the name is perhaps even too tenant-specific then? If there's a TenantIDKeyGen, could there be a NoTenantIDKeyGen? Nope, could not. What we're looking at is perhaps better named a SQLKeys instance (consisting of a SQLEncoder (generates keys) and SQLDecoder (decodes keys) component if you wanted to pull it apart more). And I think that is the thing I would want to see in a separate package, in an ideal world: definitely enough code to justify a package, and also the right boundary. This package would depend only on encoding and a few constants: DescriptorTableID, etc (and these SQL-specific constants should move with the keygen, out of keys). It doesn't seem like it would depend on keys which is great because then that can be kv-specific. The packages will catch some dependency because things like the system config etc, but those warts seem like they're going away over time.

I don't think it's necessary to make the changes I'm suggesting here, since we want to move quickly and this PR is already making things better, but I would love to see us headed that way if you're interested. (And even if not, curious whether there's something blocking it).

Something that I will point out again is that it just seems totally random that the tenant ID lives in roachpb. It isn't even a proto and the dependency of keys on roachpb generally also doesn't seem justified (right?). Curious what your thinking is here.

The builtin function was previously using `tableDesc.IndexSpan` and then
ignoring the EndKey. This was confusing and a waste of a memory allocation.
Avoids an allocation + encoding per span.Builder.
…romEncDatums

Be consistent with all other functions in this file.
This commit introduces a few new important concepts to the CockroachDB
codebase:
- Tenants
- Tenant IDs
- The "System" tenant

A TenantID is a unique ID associated with a tenant in a multi-tenant
cluster. Each tenant is granted exclusive access to a portion of the
keyspace and a collection of SQL tables in that keyspace which comprise
a logical cluster. The System tenant is the system's internal tenant in
a multi-tenant cluster and the only tenant in a single-tenant cluster.

This commit introduces just the bare bones in terms of the key encoding
of tenants. Future commits will expand on this.
The commit then combines the concepts introduced in the previous commit
with a new tenant-bound key generator structure called the
`TenantIDKeyGen`. This key generator is bound to a TenantID on creation
and is only capable of producing SQL keys for that tenant.

The commit then removes all other forms of SQL key generation. Notably,
it removes all free functions in `pkg/keys` that generate SQL keys and
redirects all key generation in `pkg/sql/sqlbase/index_encoding.go`
through `pkg/keys`. I audited all uses of `encoding.EncodeUvarintAscending`
(and `DecodeUvarint64` for C++) in the codebase to get to a point
where I feel relatively confident about this. Still, it will be nice
to verify this further in the future when we test SQL tenant servers
and reject all cross-tenant KV traffic.

With all SQL key generation flowing through the tenant-bound key
generators, the commit then introduces two global key generators:
- `SystemTenantKeyGen`: a SQL key generator for the system tenant
- `TODOTenantKeyGen`: equivalent to SystemTenantKeyGen, but should be
    used when it is unclear which tenant should be referenced by the
    surrounding context.

These two key generators are used appropriately. Most production code
uses the `TODOTenantKeyGen` for now because it will later be replaced
with a true tenant-bound key gen. Most testing code use the
`SystemTenantKeyGen`. Once most of the obvious uses of
`TODOTenantKeyGen` are addressed, I might put it behind a function that
takes a "issueNo" to make sure that all remaining uses of it are tracked
in Github.

This commit is very large already, so the process of testing the key
generation and parsing for non-system tenants will come in the next
commit.
This commit adds support to pretty print tenant prefixed keys and to
(best-effort) parse tenant prefixed keys.
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR! I responded to your comment about package structure below. I'm happy to continue the discussion because I'd like to get this right, but I'm going to merge this when possible to keep things moving/not rotting.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto and @tbg)


c-deps/libroach/encoding.h, line 125 at r13 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Can you file this one?

Done. #48122.


pkg/keys/keys.go, line 693 at r13 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Maybe make a checklist-style issue for completing tenantID support. It can be close to a placeholder only since I assume those will be your next PRs.

This is actually the same issue as #48122. I also filed #48123 to track the use of keys.TODOTenantKeyGen (now keys.TODOSQLCodec) more generally.


pkg/roachpb/tenant.go, line 18 at r4 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I think our thinking is similar here and I think what I'm after is better phrased as a comment on the state of the encoding routines: It's a shame that SQL and KV are so intermingled here. pkg/keys contains a whole bunch of KV internals but ends up also being the place where SQL keys are constructed. I know that there are some legit dependencies, but it's totally unclear what they are, you have to basically look at the callers for each of them which is prohibitive.

Now for constructing SQL keys, I agree that the tenant key gen you made is the way to get keys for use by SQL, though the name is perhaps even too tenant-specific then? If there's a TenantIDKeyGen, could there be a NoTenantIDKeyGen? Nope, could not. What we're looking at is perhaps better named a SQLKeys instance (consisting of a SQLEncoder (generates keys) and SQLDecoder (decodes keys) component if you wanted to pull it apart more). And I think that is the thing I would want to see in a separate package, in an ideal world: definitely enough code to justify a package, and also the right boundary. This package would depend only on encoding and a few constants: DescriptorTableID, etc (and these SQL-specific constants should move with the keygen, out of keys). It doesn't seem like it would depend on keys which is great because then that can be kv-specific. The packages will catch some dependency because things like the system config etc, but those warts seem like they're going away over time.

I don't think it's necessary to make the changes I'm suggesting here, since we want to move quickly and this PR is already making things better, but I would love to see us headed that way if you're interested. (And even if not, curious whether there's something blocking it).

Something that I will point out again is that it just seems totally random that the tenant ID lives in roachpb. It isn't even a proto and the dependency of keys on roachpb generally also doesn't seem justified (right?). Curious what your thinking is here.

I agree with a lot of what you're saying. For one, you're right that the name is too tenant-specific. I added another commit that renames it to keys.SQLCodec with some inner delegation to a sqlEncoder and a sqlDecoder. That all turned out very well.

The part that I'm less sure about is the package structure. If we pull all of this out of pkg/keys, then what is the point of that package? Its godoc says "Package keys manages the construction of keys for CockroachDB's key-value layer." So the construction of SQL keys to be handed to the key-value layer seems very appropriate for pkg/keys. Maybe I'm thinking about this incorrectly, but I think it's nice that there's a single package responsible for the structure and encoding of the keyspace instead of fragmenting such responsibility across a series of packages.

But then this begs the question - how much of SQL encoding should live in ./pkg/keys: just the table ID prefix? the table ID, index ID, and column-family prefix? The entire key? That's where I'm less sure.

Similarly, I don't see it as a problem for tenant ID to live in roachpb any more than I see it as a problem that roachpb.Key (which ./pkg/keys has a huge dependency on) or roachpb.StoreID live in roachpb. roachpb is meant to store common type definitions that may be part of protos and that other packages need to depend on. It seems to be serving exactly that goal by hosting roachpb.TenantID.


pkg/keys/tenant.go, line 58 at r13 (raw file):

Previously, tbg (Tobias Grieger) wrote…

In the final code, do you see this used outside of tests at all?

I'm not sure. I think it still might be used by code in KV that is specific to the system-tenant. I think we'll have to wait and see.

math.MaxUint32 was meant to be the largest table ID (inclusive), not
an exclusive upper bound. To adjust for this, we PrefixEnd the value
to get a true upper bound key for all (system tenant) table keys.
A SQLCodec is a combination of a sqlEncoder and a sqlDecoder.
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/tenantKey branch from 3cd16c0 to cffd1aa Compare April 28, 2020 22:05
@cockroachdb cockroachdb deleted a comment from blathers-crl bot Apr 28, 2020
@cockroachdb cockroachdb deleted a comment from blathers-crl bot Apr 28, 2020
@nvanbenschoten
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Apr 29, 2020

Build succeeded

@craig craig bot merged commit 40cf837 into cockroachdb:master Apr 29, 2020
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/tenantKey branch May 6, 2020 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-multitenancy Related to multi-tenancy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants