-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: authN and authZ for incoming tenant connections #47898
Comments
Per our discussion today, it does look like gRPC supports authentication on connection startup. From https://grpc.io/docs/guides/auth/:
I think you're aware of this because you use both in https://github.com/cockroachdb/cockroach/compare/master...tbg:poc/tokenauth?expand=1#diff-14fd680bac02bef9b58a5fd4fed0ff8bR141. So it seems like it should be possible to include the tenant auth token during connection startup and avoid passing an extra 64 bytes on each individual RPC. There would be some complication around wrapping a |
Oh, nice. I will give that a try. Thanks! |
That looks like it should work, but it goes back to "why don't we just use client certificates"? It really starts smelling like we're getting into a rebuild of client certs. It looks like maybe we should - the real difficulty with client certs (or equivalently your proposal) is that the expiration check needs to happen per-RPC (plus a little work on the RPCContext to proactively close conns that are close to expiring). I think we can make information from the conn available to the per-RPC interceptors though. Everything there is marked as experimental, but let's assume that's fine: The per-RPC interceptor can call https://pkg.go.dev/google.golang.org/grpc/credentials?tab=doc#RequestInfoFromContext which should really be a https://pkg.go.dev/google.golang.org/grpc/credentials?tab=doc#TLSInfo which contains https://pkg.go.dev/crypto/tls?tab=doc#ConnectionState which has the https://pkg.go.dev/crypto/x509?tab=doc#Certificate and those have the I tried this out just now and immediately got a reality check. We're on gRPC v1.21.2, but all of this API is just four months old at the time of writing, and (I think, judging from the milestone they gave the PR) is in gRPC v1.27. Possibly good news is that the current release is v1.29 and there's already a v1.27.1 (so v1.27 is possibly a reasonable candidate to upgrade to) but it leaves me uneasy to have a gRPC upgrade in the critical path here. @petermattis I would be curious how anxious you are about upgrading gRPC. It is about the right time in the cycle to do so, yet it's unclear that we have a strong incentive (aside from this issue). There are a few bug fixes/improvements possibly worth picking up:
|
Ha! Double-good news. I missed another API that gives us this already, which we are already using today: Lines 122 to 123 in f80ec9c
So it doesn't seem like we need to upgrade after all. |
I have no objections to upgrading grpc. This would be the right time to do an upgrade. |
Ok, I opened a PR: #49114 Note that we don't care about it for this issue. |
I added a commit to https://github.com/tbg/cockroach/tree/poc/tokenauth that prototypes the use of client certs using mostly cobbled together pieces of our existing I must say that the |
Reminder to self: as we do this, also make sure the RPC heartbeats between SQL and KV stop checking the cluster name, and can never bring down a KV node due to clock mismatch. |
This commit introduces methods that create a CA certificate for use with the auth broker envisioned in cockroachdb#49105 and, from that CA, client certificates for use in cockroachdb#47898. They are not exposed through the cli (yet), though the certificate loader already supports them. Touches cockroachdb#49105. Touches cockroachdb#47898. Release note: None
This adds a CLI command to start a SQL tenant in a standalone process. The tenant currently communicates with the KV layer "as a node"; this will only change with cockroachdb#47898. As is, the tenant has full access to the KV layer and so a few things may work that shouldn't as a result of that. Additionally, the tenant runs a Gossip client as a temporary stopgap until we have removed the remaining dependencies on it (cockroachdb#49693 has the details on what those are). Apart from that, though, this is the real thing and can be used to start setting up end-to-end testing of ORMs as well as the deploy- ments. A word on the choice of the `mt` command: `mt` is an abbreviation for `multi-tenant` which was considered too long; just `tenant` was considered misleading - there will be multiple additional subcommands housing the other services required for running the whole infrastructure including certs generation, the KV auth broker server, and the SQL proxy. Should a lively bikeshed around naming break out we can rename the commands later when consensus has been reached. For those curious to try it out the following will be handy: ```bash rm -rf ~/.cockroach-certs cockroach-data && killall -9 cockroach && \ export COCKROACH_CA_KEY=$HOME/.cockroach-certs/ca.key && \ ./cockroach cert create-ca && \ ./cockroach cert create-client root && \ ./cockroach cert create-node 127.0.0.1 && \ ./cockroach start --host 127.0.0.1 --background && \ ./cockroach sql --host 127.0.0.1 -e 'select crdb_internal.create_tenant(123);' ./cockroach mt start-sql --tenant-id 123 --kv-addrs 127.0.0.1:26257 --sql-addr :5432 & sleep 1 && \ ./cockroach sql --host 127.0.0.1 --port 5432 ``` This roughly matches what the newly introduced `acceptance/multitenant` roachtest does as well. Release note: None
This adds a CLI command to start a SQL tenant in a standalone process. The tenant currently communicates with the KV layer "as a node"; this will only change with cockroachdb#47898. As is, the tenant has full access to the KV layer and so a few things may work that shouldn't as a result of that. Additionally, the tenant runs a Gossip client as a temporary stopgap until we have removed the remaining dependencies on it (cockroachdb#49693 has the details on what those are). Apart from that, though, this is the real thing and can be used to start setting up end-to-end testing of ORMs as well as the deploy- ments. A word on the choice of the `mt` command: `mt` is an abbreviation for `multi-tenant` which was considered too long; just `tenant` was considered misleading - there will be multiple additional subcommands housing the other services required for running the whole infrastructure including certs generation, the KV auth broker server, and the SQL proxy. Should a lively bikeshed around naming break out we can rename the commands later when consensus has been reached. For those curious to try it out the following will be handy: ```bash rm -rf ~/.cockroach-certs cockroach-data && killall -9 cockroach && \ export COCKROACH_CA_KEY=$HOME/.cockroach-certs/ca.key && \ ./cockroach cert create-ca && \ ./cockroach cert create-client root && \ ./cockroach cert create-node 127.0.0.1 && \ ./cockroach start --host 127.0.0.1 --background && \ ./cockroach sql --host 127.0.0.1 -e 'select crdb_internal.create_tenant(123);' ./cockroach mt start-sql --tenant-id 123 --kv-addrs 127.0.0.1:26257 --sql-addr :5432 & sleep 1 && \ ./cockroach sql --host 127.0.0.1 --port 5432 ``` This roughly matches what the newly introduced `acceptance/multitenant` roachtest does as well. Release note: None
As of this commit, tenants would use their proper tenant client certs if it weren't for a manual override that was added. This override exists because the KV layer can not yet authenticate tenant client certs (this will change soon, in a follow-up to cockroachdb#50503). However, uncommenting both the override and the hack in `pkg/security/securitytest/test_certs/regenerate.sh` to make the tenant client certs match those used by the KV nodes gives early validation that this "will work" once the KV side plays ball. Touches cockroachdb#47898. Release note: None
With this PR, tenant SQL servers use tenant client certificates to connect to the tenant server endpoint at the KV layer. They were previously using the KV-internal node certs. No validation is performed as of this PR, but this is the obvious next step. Follow-up work will assertions that make sure that we don't see tenants "accidentally" use the node certs for some operations when they are available (as is typically the case during testing). Finally, there will be some work on the heartbeats exchanged by the RPC context. We don't want a SQL tenant's time signal to ever trigger KV nodes to crash, for example. Touches cockroachdb#47898. Release note: None
As of this commit, tenants would use their proper tenant client certs if it weren't for a manual override that was added. This override exists because the KV layer can not yet authenticate tenant client certs (this will change soon, in a follow-up to cockroachdb#50503). However, uncommenting both the override and the hack in `pkg/security/securitytest/test_certs/regenerate.sh` to make the tenant client certs match those used by the KV nodes gives early validation that this "will work" once the KV side plays ball. Touches cockroachdb#47898. Release note: None
With this PR, tenant SQL servers use tenant client certificates to connect to the tenant server endpoint at the KV layer. They were previously using the KV-internal node certs. No validation is performed as of this PR, but this is the obvious next step. Follow-up work will assertions that make sure that we don't see tenants "accidentally" use the node certs for some operations when they are available (as is typically the case during testing). Finally, there will be some work on the heartbeats exchanged by the RPC context. We don't want a SQL tenant's time signal to ever trigger KV nodes to crash, for example. Touches cockroachdb#47898. Release note: None
As of this commit, tenants would use their proper tenant client certs if it weren't for a manual override that was added. This override exists because the KV layer can not yet authenticate tenant client certs (this will change soon, in a follow-up to cockroachdb#50503). However, uncommenting both the override and the hack in `pkg/security/securitytest/test_certs/regenerate.sh` to make the tenant client certs match those used by the KV nodes gives early validation that this "will work" once the KV side plays ball. Touches cockroachdb#47898. Release note: None
With this PR, tenant SQL servers use tenant client certificates to connect to the tenant server endpoint at the KV layer. They were previously using the KV-internal node certs. No validation is performed as of this PR, but this is the obvious next step. Follow-up work will assertions that make sure that we don't see tenants "accidentally" use the node certs for some operations when they are available (as is typically the case during testing). Finally, there will be some work on the heartbeats exchanged by the RPC context. We don't want a SQL tenant's time signal to ever trigger KV nodes to crash, for example. Touches cockroachdb#47898. Release note: None
Fixes cockroachdb#47898. Rebased on cockroachdb#51503 and cockroachdb#52034. Ignore all but the last 3 commits. This commit adds a collection of access control policies for the newly exposed tenant RPC server. These authorization policies ensure that an authenticated tenant is only able to access keys within its keyspace and that no tenant is able to access data from another tenant's keyspace through the tenant RPC server. This is a major step in providing crypto-backed logical isolation between tenants in a multi-tenant cluster. The existing auth mechanism is retained on the standard RPC server, which means that the system tenant is still able to access any key in the system.
As of this commit, tenants would use their proper tenant client certs if it weren't for a manual override that was added. This override exists because the KV layer can not yet authenticate tenant client certs (this will change soon, in a follow-up to cockroachdb#50503). However, uncommenting both the override and the hack in `pkg/security/securitytest/test_certs/regenerate.sh` to make the tenant client certs match those used by the KV nodes gives early validation that this "will work" once the KV side plays ball. Touches cockroachdb#47898. Release note: None
With this PR, tenant SQL servers use tenant client certificates to connect to the tenant server endpoint at the KV layer. They were previously using the KV-internal node certs. No validation is performed as of this PR, but this is the obvious next step. Follow-up work will assertions that make sure that we don't see tenants "accidentally" use the node certs for some operations when they are available (as is typically the case during testing). Finally, there will be some work on the heartbeats exchanged by the RPC context. We don't want a SQL tenant's time signal to ever trigger KV nodes to crash, for example. Touches cockroachdb#47898. Release note: None
As of this commit, tenants would use their proper tenant client certs if it weren't for a manual override that was added. This override exists because the KV layer can not yet authenticate tenant client certs (this will change soon, in a follow-up to cockroachdb#50503). However, uncommenting both the override and the hack in `pkg/security/securitytest/test_certs/regenerate.sh` to make the tenant client certs match those used by the KV nodes gives early validation that this "will work" once the KV side plays ball. Touches cockroachdb#47898. Release note: None
With this PR, tenant SQL servers use tenant client certificates to connect to the tenant server endpoint at the KV layer. They were previously using the KV-internal node certs. No validation is performed as of this PR, but this is the obvious next step. Follow-up work will assertions that make sure that we don't see tenants "accidentally" use the node certs for some operations when they are available (as is typically the case during testing). Finally, there will be some work on the heartbeats exchanged by the RPC context. We don't want a SQL tenant's time signal to ever trigger KV nodes to crash, for example. Touches cockroachdb#47898. Release note: None
51503: rpc: use tenant client/server certs r=nvanbenschoten a=tbg With this PR, tenant SQL servers use tenant client certificates to connect to the tenant server endpoint at the KV layer. They were previously using the KV-internal node certs. No validation is performed as of this PR, but this is the obvious next step. Follow-up work will assertions that make sure that we don't see tenants "accidentally" use the node certs for some operations when they are available (as is typically the case during testing). Finally, there will be some work on the heartbeats exchanged by the RPC context. We don't want a SQL tenant's time signal to ever trigger KV nodes to crash, for example. Touches #47898. Release note: None - cli/flags,config: new flag for tenant KV listen addr - sql: route tenant KV traffic to tenant KV address - roachtest: configure --tenant-addr flag in acceptance/multitenant - rpc: add TenantID to rpc.ContextOptions - security: slight test improvement - rpc: pass TenantID to SecurityContext to CertManager - security: use a single test_certs dir - security: embed certs for a few hard-coded tenants - rpc: *almost* use tenant client certs (on tenants) - rpc: use tenant client/server certs where appropriate 52281: bulkio: Correctly handle exhausting retries when reading from HTTP. r=rohany a=miretskiy Fixes #52279 Return an error if we exhaust the retry budget when reading from HTTP. Release Notes: None Co-authored-by: Tobias Schottdorf <[email protected]> Co-authored-by: Yevgeniy Miretskiy <[email protected]>
51503: rpc: use tenant client/server certs r=nvanbenschoten a=tbg With this PR, tenant SQL servers use tenant client certificates to connect to the tenant server endpoint at the KV layer. They were previously using the KV-internal node certs. No validation is performed as of this PR, but this is the obvious next step. Follow-up work will assertions that make sure that we don't see tenants "accidentally" use the node certs for some operations when they are available (as is typically the case during testing). Finally, there will be some work on the heartbeats exchanged by the RPC context. We don't want a SQL tenant's time signal to ever trigger KV nodes to crash, for example. Touches #47898. Release note: None - cli/flags,config: new flag for tenant KV listen addr - sql: route tenant KV traffic to tenant KV address - roachtest: configure --tenant-addr flag in acceptance/multitenant - rpc: add TenantID to rpc.ContextOptions - security: slight test improvement - rpc: pass TenantID to SecurityContext to CertManager - security: use a single test_certs dir - security: embed certs for a few hard-coded tenants - rpc: *almost* use tenant client certs (on tenants) - rpc: use tenant client/server certs where appropriate 52108: sql: cleanup handling of EXPLAIN in the new factory r=yuzefovich a=yuzefovich **sql: cleanup ordinality code a bit** Release note: None **sql: cleanup handling of EXPLAIN in the new factory** Previously, several EXPLAIN variants were mishandled in the new factory: instead of being wrapped as other planNodes are they were kept separate which resulted in a "mixed"-representation plan. This is now fixed by properly wrapping them. Release note: None Co-authored-by: Tobias Schottdorf <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]>
51803: cmd/docgen: add HTTP extractor r=mjibson a=mjibson Add a way to extract docs from the status.proto HTTP endpoint. These can be imported into the docs project as needed. Release note: None 52083: roachtest: small misc r=andreimatei a=andreimatei See individual commits. 52094: rpc: implement tenant access control policies at KV RPC boundary r=nvanbenschoten a=nvanbenschoten Fixes #47898. Rebased on #51503 and #52034. Ignore all but the last 3 commits. This commit adds a collection of access control policies for the newly exposed tenant RPC server. These authorization policies ensure that an authenticated tenant is only able to access keys within its keyspace and that no tenant is able to access data from another tenant's keyspace through the tenant RPC server. This is a major step in providing crypto-backed logical isolation between tenants in a multi-tenant cluster. The existing auth mechanism is retained on the standard RPC server, which means that the system tenant is still able to access any key in the system. 52352: sql/pgwire: add regression test for varchar OIDs in RowDescription r=jordanlewis a=rafiss See issue #51360. The bug described in it was fixed somewhat accidentally, so this test will verify that we don't regress again. Release note: None 52386: opt: add SerializingProject exec primitive r=RaduBerinde a=RaduBerinde The top-level projection of a query has a special property - it can project away columns that we want an ordering on (e.g. `SELECT a FROM t ORDER BY b`). The distsql physical planner was designed to tolerate such cases, as they were much more common with the heuristic planner. But the new distsql exec factory does not; it currently relies on a hack: it detects this case by checking if the required output ordering is `nil`. This is fragile and doesn't work in all cases. This change adds a `SerializingProject` primitive which is like a SimpleProject but it forces serialization of all parallel streams into one. The new primitive is used to enforce the final query presentation. We only need to pass column names for the presentation, so we remove `RenameColumns` and remove the column names argument from `SimpleProject` (simplifying some execbuilder code). We also fix a bug in `ConstructSimpleProject` where we weren't taking the `PlanToStreamColMap` into account when building the projection. Release note: None Co-authored-by: Matt Jibson <[email protected]> Co-authored-by: Andrei Matei <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Rafi Shamim <[email protected]> Co-authored-by: Radu Berinde <[email protected]>
See here for background
SQL tenant servers will make outgoing connections, most prominently to the KV layer. We need to come up with way to encrypt and authenticate these connections.
This is still in the design phase but generally will be based on some crypto. The design will apply more broadly than to the SQL tenant connections - this issue tracks only the work to apply it to these connections.
An internal, early, work in progress strawman is here.
The auth broker is tracked in #49105.
The text was updated successfully, but these errors were encountered: