Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpc: use tenant client/server certs #51503

Merged
merged 7 commits into from
Aug 4, 2020

Conversation

tbg
Copy link
Member

@tbg tbg commented Jul 16, 2020

With this PR, tenant SQL servers use tenant client certificates to
connect to the tenant server endpoint at the KV layer. They were
previously using the KV-internal node certs.

No validation is performed as of this PR, but this is the obvious
next step.

Follow-up work will assertions that make sure that we don't see
tenants "accidentally" use the node certs for some operations
when they are available (as is typically the case during testing).

Finally, there will be some work on the heartbeats exchanged by
the RPC context. We don't want a SQL tenant's time signal to
ever trigger KV nodes to crash, for example.

Touches #47898.

Release note: None

  • cli/flags,config: new flag for tenant KV listen addr
  • sql: route tenant KV traffic to tenant KV address
  • roachtest: configure --tenant-addr flag in acceptance/multitenant
  • rpc: add TenantID to rpc.ContextOptions
  • security: slight test improvement
  • rpc: pass TenantID to SecurityContext to CertManager
  • security: use a single test_certs dir
  • security: embed certs for a few hard-coded tenants
  • rpc: almost use tenant client certs (on tenants)
  • rpc: use tenant client/server certs where appropriate

@tbg tbg requested a review from a team July 16, 2020 12:22
@tbg tbg requested a review from a team as a code owner July 16, 2020 12:22
@tbg tbg requested review from miretskiy and removed request for a team July 16, 2020 12:22
@tbg tbg added the A-multitenancy Related to multi-tenancy label Jul 16, 2020
@tbg tbg requested a review from nvanbenschoten July 16, 2020 12:23
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@tbg tbg removed the request for review from miretskiy July 16, 2020 12:24
@tbg tbg force-pushed the kvserver-tenant-certs branch from dab2e0e to dc54e00 Compare July 16, 2020 12:59
@tbg tbg force-pushed the kvserver-tenant-certs branch from dc54e00 to 145ab26 Compare July 21, 2020 15:12
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! I don't see anything major here that needs to be discussed further, though I'm ignoring the last commit. :lgtm:

Reviewed 24 of 24 files at r1, 1 of 1 files at r2, 7 of 7 files at r3, 29 of 29 files at r4, 28 of 28 files at r5, 2 of 2 files at r6, 5 of 5 files at r7, 1 of 1 files at r8.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)


pkg/rpc/context.go, line 239 at r7 (raw file):

		}

		// TODO(tbg): do something else if `o.tenant`.

This is the same TODO as the one five lines up, right? There's not multiple "something else"s that you want to do?


pkg/rpc/context.go, line 770 at r7 (raw file):

		// TODO(tbg): remove this override when the KV layer can authenticate tenant
		// client certs.
		const override = false

Why keep the code?


pkg/rpc/tls.go, line 85 at r3 (raw file):

	ctx.lazy.certificateManager.Do(func() {
		var opts []security.Option
		if !roachpb.IsSystemTenantID(ctx.tenID.ToUint64()) {

You can just do ctx.tenID != roachpb.SystemTenantID.


pkg/rpc/tls.go, line 86 at r3 (raw file):

		var opts []security.Option
		if !roachpb.IsSystemTenantID(ctx.tenID.ToUint64()) {
			opts = append(opts, security.ForTenant(ctx.tenID.String()))

Any reason to lose type safety here with the tenantIdentifier?


pkg/rpc/tls.go, line 177 at r7 (raw file):

// certificate for the configured tenant from the cert manager.
func (ctx *SecurityContext) GetTenantClientTLSConfig() (*tls.Config, error) {
	// Early out.

We have almost the exact same logic in each of these methods. Think it's worth generalizing? Something like:

func (ctx *SecurityContext) getTLSConfig(fn (*security.CertificateManager) (*tls.Config, error)) (*tls.Config, error) {
    ...
}

pkg/server/testserver.go, line 638 at r7 (raw file):

		}
	}
	sqlCfg.TenantKVAddrs = []string{ts.TenantAddr()}

Good catch. Should this be ServingTenantAddr though?

@tbg tbg requested a review from nvanbenschoten July 28, 2020 08:53
Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR! I'll see about this wip commit and try to merge this.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/rpc/tls.go, line 86 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Any reason to lose type safety here with the tenantIdentifier?

It didn't seem useful to have to deal with string/tenantID conversions inside of the security package. Open to revisiting this after this PR.


pkg/rpc/tls.go, line 177 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We have almost the exact same logic in each of these methods. Think it's worth generalizing? Something like:

func (ctx *SecurityContext) getTLSConfig(fn (*security.CertificateManager) (*tls.Config, error)) (*tls.Config, error) {
    ...
}

Yes, that might be a good idea. I would like to get this PR in first, though.

@tbg tbg force-pushed the kvserver-tenant-certs branch from 145ab26 to d9c5fc3 Compare July 28, 2020 08:54
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Jul 29, 2020
Fixes cockroachdb#47898.

Rebased on cockroachdb#51503 and cockroachdb#52034. Ignore all but the last 3 commits.

This commit adds a collection of access control policies for the newly
exposed tenant RPC server. These authorization policies ensure that an
authenticated tenant is only able to access keys within its keyspace and
that no tenant is able to access data from another tenant's keyspace
through the tenant RPC server. This is a major step in providing
crypto-backed logical isolation between tenants in a multi-tenant
cluster.

The existing auth mechanism is retained on the standard RPC server,
which means that the system tenant is still able to access any key in
the system.
@tbg tbg force-pushed the kvserver-tenant-certs branch 3 times, most recently from fa00dee to 4c7fca6 Compare July 30, 2020 15:36
@tbg tbg force-pushed the kvserver-tenant-certs branch from 4c7fca6 to 579122a Compare July 31, 2020 12:02
@tbg tbg requested a review from a team as a code owner July 31, 2020 12:02
@tbg
Copy link
Member Author

tbg commented Jul 31, 2020

I rebased onto #52034. This should turn green now.

tbg added 6 commits August 4, 2020 08:57
This is currently unused, but already initialized correctly: it's always
SystemTenantID except on SQL tenant servers, where it's the tenant's ID.

Release note: None
I worried for a short while that this would always return nil, but it
does not.

Release note: None
This makes sure that a certificate manager for an `rpc.Context` for a
given tenant is aware of the tenant ID.

This is not used yet, but a new TODO (to be addressed shortly) hints
at where this will be used during dialing: when we're a tenant, use
the tenant client certs instead of the client certs.

Release note: None
I had previously made sure that the multi-tenancy certs were in a
different subdirectory, but in hindsight this is not worth the
hassle.

Release note: None
We already have tests that want to use more than one tenant.
These will need to have certs for each of the tenants that
they use soon, so pave the way for that by adding a few
hard-coded tenant IDs that have certs embedded.

Release note: None
As of this commit, tenants would use their proper tenant client certs if
it weren't for a manual override that was added.

This override exists because the KV layer can not yet authenticate
tenant client certs (this will change soon, in a follow-up to cockroachdb#50503).

However, uncommenting both the override and the hack in
`pkg/security/securitytest/test_certs/regenerate.sh` to make the tenant
client certs match those used by the KV nodes gives early validation
that this "will work" once the KV side plays ball.

Touches cockroachdb#47898.

Release note: None
With this PR, tenant SQL servers use tenant client certificates to
connect to the tenant server endpoint at the KV layer. They were
previously using the KV-internal node certs.

No validation is performed as of this PR, but this is the obvious
next step.

Follow-up work will assertions that make sure that we don't see
tenants "accidentally" use the node certs for some operations
when they are available (as is typically the case during testing).

Finally, there will be some work on the heartbeats exchanged by
the RPC context. We don't want a SQL tenant's time signal to
ever trigger KV nodes to crash, for example.

Touches cockroachdb#47898.

Release note: None
@tbg tbg force-pushed the kvserver-tenant-certs branch from 579122a to b0c4384 Compare August 4, 2020 06:57
@tbg
Copy link
Member Author

tbg commented Aug 4, 2020

bors r=nvanbenschoten

@craig
Copy link
Contributor

craig bot commented Aug 4, 2020

Build failed:

@tbg
Copy link
Member Author

tbg commented Aug 4, 2020

Failed on pip in examples-orms

bors r=nvanbenschoten

@craig
Copy link
Contributor

craig bot commented Aug 4, 2020

Build failed:

@tbg
Copy link
Member Author

tbg commented Aug 4, 2020

Sigh

[10:50:03]
Cause 6: org.gradle.internal.resolve.ModuleVersionResolveException: Could not resolve javax.annotation:javax.annotation-api:1.2.
[10:50:03]
Required by:
[10:50:03]
project : > org.glassfish.jersey.core:jersey-server:2.25

bors r=nvanbenschoten

@knz
Copy link
Contributor

knz commented Aug 4, 2020

I have skipped the Example-ORMs target so this will be able to go in now

@craig
Copy link
Contributor

craig bot commented Aug 4, 2020

Build failed:

@knz
Copy link
Contributor

knz commented Aug 4, 2020

ok I thought I had skipped example-orms and it seems that I did not.

@knz
Copy link
Contributor

knz commented Aug 4, 2020

oh that's because your bors run started before my conf change. Let's try that again.

bors r=nvanbenschoten

craig bot pushed a commit that referenced this pull request Aug 4, 2020
51503: rpc: use tenant client/server certs r=nvanbenschoten a=tbg


With this PR, tenant SQL servers use tenant client certificates to
connect to the tenant server endpoint at the KV layer. They were
previously using the KV-internal node certs.

No validation is performed as of this PR, but this is the obvious
next step.

Follow-up work will assertions that make sure that we don't see
tenants "accidentally" use the node certs for some operations
when they are available (as is typically the case during testing).

Finally, there will be some work on the heartbeats exchanged by
the RPC context. We don't want a SQL tenant's time signal to
ever trigger KV nodes to crash, for example.

Touches #47898.

Release note: None

- cli/flags,config: new flag for tenant KV listen addr
- sql: route tenant KV traffic to tenant KV address
- roachtest: configure --tenant-addr flag in acceptance/multitenant
- rpc: add TenantID to rpc.ContextOptions
- security: slight test improvement
- rpc: pass TenantID to SecurityContext to CertManager
- security: use a single test_certs dir
- security: embed certs for a few hard-coded tenants
- rpc: *almost* use tenant client certs (on tenants)
- rpc: use tenant client/server certs where appropriate


52281: bulkio: Correctly handle exhausting retries when reading from HTTP. r=rohany a=miretskiy

Fixes #52279

Return an error if we exhaust the retry budget when reading from HTTP.

Release Notes: None

Co-authored-by: Tobias Schottdorf <[email protected]>
Co-authored-by: Yevgeniy Miretskiy <[email protected]>
@craig
Copy link
Contributor

craig bot commented Aug 4, 2020

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Aug 4, 2020

Build succeeded:

@craig craig bot merged commit 5102c97 into cockroachdb:master Aug 4, 2020
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Aug 4, 2020
Fixes cockroachdb#47898.

Rebased on cockroachdb#51503 and cockroachdb#52034. Ignore all but the last 3 commits.

This commit adds a collection of access control policies for the newly
exposed tenant RPC server. These authorization policies ensure that an
authenticated tenant is only able to access keys within its keyspace and
that no tenant is able to access data from another tenant's keyspace
through the tenant RPC server. This is a major step in providing
crypto-backed logical isolation between tenants in a multi-tenant
cluster.

The existing auth mechanism is retained on the standard RPC server,
which means that the system tenant is still able to access any key in
the system.
craig bot pushed a commit that referenced this pull request Aug 5, 2020
51803: cmd/docgen: add HTTP extractor r=mjibson a=mjibson

Add a way to extract docs from the status.proto HTTP endpoint. These
can be imported into the docs project as needed.

Release note: None

52083: roachtest: small misc r=andreimatei a=andreimatei

See individual commits.

52094: rpc: implement tenant access control policies at KV RPC boundary r=nvanbenschoten a=nvanbenschoten

Fixes #47898.

Rebased on #51503 and #52034. Ignore all but the last 3 commits.

This commit adds a collection of access control policies for the newly exposed tenant RPC server. These authorization policies ensure that an authenticated tenant is only able to access keys within its keyspace and that no tenant is able to access data from another tenant's keyspace through the tenant RPC server. This is a major step in providing crypto-backed logical isolation between tenants in a multi-tenant cluster.

The existing auth mechanism is retained on the standard RPC server, which means that the system tenant is still able to access any key in the system.

52352: sql/pgwire: add regression test for varchar OIDs in RowDescription r=jordanlewis a=rafiss

See issue #51360. The bug described in it was fixed somewhat
accidentally, so this test will verify that we don't regress again.

Release note: None

52386: opt: add SerializingProject exec primitive r=RaduBerinde a=RaduBerinde

The top-level projection of a query has a special property - it can project away
columns that we want an ordering on (e.g. `SELECT a FROM t ORDER BY b`).

The distsql physical planner was designed to tolerate such cases, as they were
much more common with the heuristic planner. But the new distsql exec factory
does not; it currently relies on a hack: it detects this case by checking if the
required output ordering is `nil`. This is fragile and doesn't work in all
cases.

This change adds a `SerializingProject` primitive which is like a SimpleProject
but it forces serialization of all parallel streams into one. The new primitive
is used to enforce the final query presentation. We only need to pass column
names for the presentation, so we remove `RenameColumns` and remove the column
names argument from `SimpleProject` (simplifying some execbuilder code).

We also fix a bug in `ConstructSimpleProject` where we weren't taking the
`PlanToStreamColMap` into account when building the projection.

Release note: None

Co-authored-by: Matt Jibson <[email protected]>
Co-authored-by: Andrei Matei <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Radu Berinde <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-multitenancy Related to multi-tenancy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants