rpc,security: allow setting the tenant ID via gRPC metadata #96153

knz · 2023-01-29T12:12:29Z

Fixes #75795.
Fixes #91996.
Epic: CRDB-14537
Informs #77173.

When running with shared-process multitenancy (with secondary tenant
servers running in the same process as the KV node), we want to allow
tenant servers to dial RPC connections with the same TLS client cert
as the KV node.

To make this possible, it becomes necessary for a RPC client
to identify itself as a secondary tenant by another mean than
the CN field in the TLS client cert.

This patch does by introducing a new optional gRPC metadata header,
"client-tenant".

When absent, we continue to use the tenant ID in the client
cert as the claimed tenant identity of the client,
as previously.
When present, it is used as the claimed tenant identity of the
client. In that case, we allow two TLS situations:
- either the client is also using a client tenant cert,
  in which case we verify that the tenant ID in the cert
  matches the one in the metadata;
- or, the client is using a regular client TLS cert,
  in which case we verify that it is using a 'root' or
  'node' cert, since only these principals are allowed
  to perform RPCs in the first place.

The authentication rules are summarized in the following table.

This matrix was designed with the following properties:

a rogue client cannot get more access by adding gRPC metadata than
it would have if it didn't pass gRPC metadata at all. This can be
checked by observing that for a given TLS client cert, the addition of
gRPC metadata always results in authn and authz rules that are at
least as restrictive.
the gRPC metadata remains optional, so as to allow previous version
SQL pods to continue to connect with just a valid TLS cert.

+------------------+---------------+-------------------------------------+--------------------------------------+
|                  |               | Server is system tenant             | Server is secondary tenant           |
+------------------+---------------+---------------------+---------------+-------------------------+------------+
| TLS client cert  | gRPC metadata | Authn result        | Authz rule    | Authn result            | Authz rule |
+------------------+---------------+---------------------+---------------+-------------------------+------------+
| Tenant client    | None          | OK                  | tenant-filter | OK if client tenant     | allow      |
|                  |               |                     |               | ID matches server       |            |
|                  |               |                     |               |                         |            |
+------------------+               +---------------------+---------------+-------------------------+------------+
| `root` or `node` |               | OK                  | allow         | OK if user scope        | allow      |
| client           |               |                     |               | maches server tenant ID |            |
+------------------+               +---------------------+---------------+-------------------------+------------+
| other client     |               | deny                | N/A           | deny                    | N/A        |
+------------------+---------------+---------------------+---------------+-------------------------+------------+
| Tenant client    | Client        | OK if TLS tenant ID | tenant-filter | OK if TLS, MD and       | allow      |
|                  | specified     | matches MD          |               | server tenant IDs match |            |
+------------------+ tenant ID     +---------------------+---------------+-------------------------+------------+
| `root` or `node` |               | OK                  | tenant-filter | OK if MD and server     | allow      |
| client           |               |                     |               | tenant IDs match        |            |
+------------------+               +---------------------+---------------+-------------------------+------------+
| other client     |               | deny                | N/A           | deny                    | N/A        |
+------------------+---------------+---------------------+---------------+-------------------------+------------+

Release note: None

cockroach-teamcity · 2023-01-29T12:12:39Z

This change is

stevendanna

Thanks for jumping on this. Just one question about whether we need to do any due diligence around other locations were we construct outgoing context.

stevendanna · 2023-01-30T18:11:42Z

pkg/server/status.go

+		// gateway forward to n2, we don't want n1 to authenticate itself
+		// as "tenant 10" to n2. At this point, the tenant information
+		// should not be used any more.
+		delete(md, rpc.ClientTIDMetadataHeaderKey)


Should we audit other calls to NewOutgoingContext?

stevendanna · 2023-01-30T18:21:55Z

pkg/rpc/auth.go

+	if val := metadata.ValueFromIncomingContext(ctx, ClientTIDMetadataHeaderKey); len(val) > 0 {
+		var err error
+		wantedTenantID, err = tenantIDFromString(val[0], "gRPC metadata")


[💭 ] I could see having a method in rpc/context for getting and removing the client tID from the context rather than exposing the header key and having the caller do the manipulation. Fine as-is though.

andreimatei

All commits except for the last are from #96207

Not the first one.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jeffswenson and @knz)

-- commits line 2 at r4:
Does this fix #95465 ?

-- commits line 39 at r4:
does this patch mean that shared-proc tenants now work in insecure clusters? Or are they already working today somehow?

pkg/rpc/auth.go line 155 at r1 (raw file):

			// serve requests for a client coming from tenant 456).
			if tid != a.tenant.tenantID {
				return roachpb.TenantID{}, authErrorf("this tenant (%v) cannot serve requests from a server for tenant %v", a.tenant.tenantID, tid)

consider not leaking the server's tenant ID in this error, as a matter of principle

pkg/rpc/auth.go line 158 at r1 (raw file):

			}

			// We return an unset tenant ID, to bypass authorization checks:

I found these semantics (sometimes returning an empty tenant to mean "authorized to do anything") to be dubious, particularly since the method is called authenticate. In other words, I think the distinction between this authenticate and the authorization step later on confusing. At the very least, consider expanding the comment on this function saying that a tenant ID is returned only for KV nodes (or generally expand that comment; the logic in this function is convoluted enough that I'm not even sure how to summarize it or what to suggest).

Also, I can't tell if the cases where an empty tenant ID is returned match the if localRequest {} case at the top of the function.

pkg/rpc/auth.go line 208 at r4 (raw file):

	}

	// Here are the remaining cases:

Before, there was a checkRootOrNodeInScope() call that was clearly guarding this seemingly dangerous return roachpb.TenantID{}, nil. Now, I can't really tell any more who falls through. It'd be good if we could structure the code to make that more obvious. Unfortunately it's too convoluted for me to make a simple suggestion.

pkg/rpc/context.go line 676 at r4 (raw file):

}

// tenantClientCred is responsible for passing the tenant ID as

I thought I convinced you of 80 cols :)

pkg/rpc/context.go line 1739 at r4 (raw file):

	if rpcCtx.clientCreds != nil {
		dialOpts = append(dialOpts, grpc.WithPerRPCCredentials(rpcCtx.clientCreds))

From what I remember, uses of gRPC metadata are quite expensive in terms of memory allocation. Consider running BenchmarkSQL or such to see if it shows anything.
I think it'd be good if we only used this metadata when we have to. I think we don't have to when calling to the local node (either through the internalClientAdapter or through gRPC), or when using a client certificate.

pkg/server/status.go line 139 at r4 (raw file):

we don't want n1 to authenticate itself as "tenant 10" to n2. At this point, the tenant information

Why not?
This propafateGatewayMetadata function is used by RPCs that do fan-out, right? Wouldn't it make sense for all the nodes to to treat the request as coming from a tenant?

knz

Not the first one.

That was already merged as part of #96152.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @jeffswenson, and @stevendanna)

-- commits line 2 at r4:

Previously, andreimatei (Andrei Matei) wrote…

Does this fix #95465 ?

It does! I had forgotten you had filed the issue.
This is PR #96152 btw.

-- commits line 39 at r4:

Previously, andreimatei (Andrei Matei) wrote…

does this patch mean that shared-proc tenants now work in insecure clusters? Or are they already working today somehow?

This is one half of the puzzle. The other half will be fixing this issue: #96215.
I plan to do this together with @stevendanna some time this week.

pkg/rpc/auth.go line 155 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

consider not leaking the server's tenant ID in this error, as a matter of principle

Both tenant IDs in this error message are coming from the client.

However there's another one where your comment applies. I'm changing it as per your recommendation.

pkg/rpc/auth.go line 158 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I found these semantics (sometimes returning an empty tenant to mean "authorized to do anything") to be dubious, particularly since the method is called authenticate. In other words, I think the distinction between this authenticate and the authorization step later on confusing. At the very least, consider expanding the comment on this function saying that a tenant ID is returned only for KV nodes (or generally expand that comment; the logic in this function is convoluted enough that I'm not even sure how to summarize it or what to suggest).

Also, I can't tell if the cases where an empty tenant ID is returned match the if localRequest {} case at the top of the function.

I have reworked this code to use named/enum types and split the code into multiple functions to clarify. See if you like it better.

pkg/rpc/auth.go line 208 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Before, there was a checkRootOrNodeInScope() call that was clearly guarding this seemingly dangerous return roachpb.TenantID{}, nil. Now, I can't really tell any more who falls through. It'd be good if we could structure the code to make that more obvious. Unfortunately it's too convoluted for me to make a simple suggestion.

I have reworked this to clarify. See if you like it better.

pkg/rpc/context.go line 1739 at r4 (raw file):

From what I remember, uses of gRPC metadata are quite expensive in terms of memory allocation.

My manual inspection of the gRPC code suggests that you are right, because of a poorly designed metadata.FromIncomingContext. However, the proper way to go about this is to patch gRPC to add an alternative impl to FromIncomingContext which avoids these extra allocations (it clones the metadata map that is already in the context. We could consume it as-is instead.). This would be a better use of our time than creating many more conditional paths around here. I plan to look into this during stability period.

pkg/server/status.go line 139 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

we don't want n1 to authenticate itself as "tenant 10" to n2. At this point, the tenant information

Why not?
This propafateGatewayMetadata function is used by RPCs that do fan-out, right? Wouldn't it make sense for all the nodes to to treat the request as coming from a tenant?

This is a good question!

Generally this would be incorrect, because we don't only do fan-out for incoming requests. We also do fan-out to handle a request internally to the KV layer / system tenant, in response to a SQL statement.

But then even if we focus specifically on SQL-to-SQL request fanout (within the tenant), we still would not need to propagate the metadata here. This is because the outgoing RPC connection (as initiated by rpc.Dial) would re-inject the tenant ID in the outgoing request anyway.

Because it's such a good question, I extended the comment to explain more.

pkg/server/status.go line 142 at r4 (raw file):

Previously, stevendanna (Steven Danna) wrote…

Should we audit other calls to NewOutgoingContext?

I have audited them all.

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jeffswenson, @knz, and @stevendanna)

pkg/rpc/auth.go line 155 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

Both tenant IDs in this error message are coming from the client.

However there's another one where your comment applies. I'm changing it as per your recommendation.

a.tenant.tenantID is coming from the client? That's because we went through a demultiplexer in order to even get to this server? Remind me pls, how does the client specify what server it wants to talk to?

knz

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @jeffswenson, and @stevendanna)

pkg/rpc/auth.go line 155 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

a.tenant.tenantID is coming from the client? That's because we went through a demultiplexer in order to even get to this server? Remind me pls, how does the client specify what server it wants to talk to?

I don't know what you're talking about. In my reviewable session, I see your comment above this code:

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jeffswenson, @knz, and @stevendanna)

pkg/rpc/auth.go line 155 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

I don't know what you're talking about. In my reviewable session, I see your comment above this code:

I think maybe you're not looking at the commit on which I commented? The comment is on r1, on the line that reads

return roachpb.TenantID{}, authErrorf("this tenant (%v) cannot serve requests from a server for tenant %v", a.tenant.tenantID, tid)

knz · 2023-01-31T00:02:00Z

I think maybe you're not looking at the commit on which I commented? The comment is on r1, on the line that reads
return roachpb.TenantID{}, authErrorf("this tenant (%v) cannot serve requests from a server for tenant %v", a.tenant.tenantID, tid)

Gotcha. Yes i changed that as you suggested.

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jeffswenson, @knz, and @stevendanna)

pkg/server/status.go line 139 at r4 (raw file):

Generally this would be incorrect, because we don't only do fan-out for incoming requests. We also do fan-out to handle a request internally to the KV layer / system tenant, in response to a SQL statement.

But this function doesn't seem to be used for KV calling KV, is it? At least not when calling Batch, which is the case where I see why propagating a tenant's identity would be bad (because for example reading from the meta ranges would fail). For RPCs that follow the pattern of having a NodeID in the request and forwarding the request to that node (or fanning out to all nodes if NodeID is zero), it seems to me that maintaining the tenants identity would be a good thing (so that each node can individually do its own auth rather than rely on only the first KV node to do it).

But then even if we focus specifically on SQL-to-SQL request fanout (within the tenant), we still would not need to propagate the metadata here. This is because the outgoing RPC connection (as initiated by rpc.Dial) would re-inject the tenant ID in the outgoing request anyway.

Well but it'd probably be cheaper for that function to check whether the desired tenant ID is already in the ctx, and not add it again if it is.

Does this function serve any purpose? I think this function would really benefit from a comment explaining what it's about. I think it was introduced in conjunction with server.remote_debugging.mode which has since gone away; I think it was about propagating a piece of metadata injected by the grpc-gateway that was to be checked on all nodes. I suspect we can get rid of the function and then we don't have to argue about what it should propagate and what it shouldn't.
If the function serves some purpose, I would really benefit from a comment explaining what the deal is.

95582: loqrecovery,server: apply staged loqrecovery plans on start r=erikgrinaker a=aliher1911 This commit adds loss of quorum recovery plan application functionality to server startup. Plans are staged during application phase and then applied when rolling restart of affected nodes is performed. Plan application peforms same actions that are performed by debug recovery apply-plan on offline storage. It is also updating loss of quorum recovery status stored in a store local key. Release note: None Fixes #93121 96126: sql: refactor FunctionProperties and Overload r=mgartner a=mgartner #### sql: remove FuncExpr.IsGeneratorApplication `FuncExpr.IsGeneratorApplication` has been removed and its single usage has been replaced with with `FuncExpr.IsGeneratorClass`. Release note: None #### sql: move FunctionClass from FunctionProperties to Overload `FunctionProperties` are attached to a `FunctionDefinition`, which is simply a collection of overloads that share the same name. Most of the fields in `FunctionProperties` are, however, overload-specific. They should be moved to the `Overload` struct. In the long-term, the hierarchy of function definitions, each with child function overloads, should be flattened to a just overloads. This commit takes one step in this direction. Epic: CRDB-20370 Release note: None 96207: rpc,security: simplify and enhance the client cert validation r=stevendanna a=knz Fixes #96174. Prerequisite for #96153. Epic: CRDB-14537 TLDR: this change enhances various aspects of TLS client cert validation. Between other things, it makes the error clearer in case of authentication failure. Example, before: ``` ERROR: requested user demo is not authorized for tenant {2} ``` Example, after: ``` ERROR: certificate authentication failed for user "demo" DETAIL: This is tenant system; cert is valid for "root" on all tenants. ``` Co-authored-by: Oleg Afanasyev <[email protected]> Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]>

stevendanna · 2023-02-01T10:47:48Z

pkg/rpc/auth.go

 	}
+	return authnSuccessPeerIsPrivileged{}, nil


Some of the previous discussion is around it being hard to see what cases lead us to hit this case. I think part of the difficulty is tracking when wantedTenantID is set and unset. I wonder if a simple refactoring might help (warning: code written directly in browser). I still don't love it, but it is nice to be able to directly scan for return authnSuccessPeerIsPrivileged. Anyway, perhaps this is just pushing peas around the plate.

// authenticate verifies the credentials of the client and performs // some consistency check with the information provided. func (a kvAuth) authenticate(ctx context.Context) (authnResult, error) { var ar authnResult if clientTenantID, localRequest := grpcutil.IsLocalRequestContext(ctx); localRequest { var err error ar, err = authenticateLocalRequest(ctx, clientTenantID) if err != nil { return nil, err } } else { var err error ar, err = authenticateNeworkRequest(ctx) if err != nil { return nil, err } } switch res := ar.(type) { case authnSuccessPeerIsTenantServer: if res != a.tenant.tenantID { log.Ops.Infof(ctx, "rejected incoming request from tenant %d (misconfiguration?)", wantedTenantID) return nil, authErrorf("client tenant identity (%v) does not match server", wantedTenantID) } case authnSuccessPeerIsPrivileged: default: return nil, errors.AssertionFailedf("programming error: unhandled case %T", ar) } return ar, nil } // Deal with local requests done through the // internalClientAdapter. There's no TLS for these calls, so the // regular authentication code path doesn't apply. The clientTenantID // should be the result of a call to grpcutil.IsLocalRequestContext. func (a kvAuth) authenticateLocalRequest(ctx context.Context, clientTenantID roachpb.TenantID) (authnResult, error) { if clientTenantID.IsSystem() { return authnSuccessPeerIsPrivileged{}, nil } // Sanity check: verify that we do not also have gRPC network credentials // in the context. This would indicate that metadata was improperly propagated. maybeTid, err := getTenantIDFromNetworkCredentials(ctx) if err != nil || maybeTid.IsSet() { return nil, authErrorf("programming error: network credentials in internal adapter request (%v, %v)", maybeTid, err) } return authnSuccessPeerIsTenantServer(clientTenantID), nil } // authenticateRemoteRequest authenticates requests made over a TLS connection. func (a kvAuth) authenticateRemoteRequest(ctx context.Context) (authnResult, error) { // We will need to look at the TLS cert in any case, so // extract it first. clientCert, err := getClientCert(ctx) if err != nil { return nil, err } tenantIDFromCredentials, err := getTenantIDFromNetworkCredentials(ctx) if err != nil { return nil, authErrorf("client provided invalid tenant ID: %v", err) } // Did the client peer use a tenant client cert? if security.IsTenantCertificate(clientCert) { // If the peer is using a client tenant cert, in any case we // validate the tenant ID stored in the CN for correctness. tlsID, err := tenantIDFromString(clientCert.Subject.CommonName, "Common Name (CN)") if err != nil { return nil, err } // If the peer is using a TenantCertificate and also // provided a tenant ID via gRPC metadata, they must // match. if tenantIDFromCredentials.IsSet() && tenantIDFromCredentials != tlsID { return nil, authErrorf( "client wants to authenticate as tenant %v, but is using TLS cert for tenant %v", tenantIDFromCredentials, tlsID) } return authnSuccessPeerIsTenantServer(tlsID), nil } else { // We are using TLS, but the peer is not using a client tenant // cert. In that case, we only allow RPCs if the principal is // 'node' or 'root' and the tenant scope in the cert matches // this server (either the cert has scope "global" or its scope // tenant ID matches our own). // // TODO(benesch): the vast majority of RPCs should be limited to just // NodeUser. This is not a security concern, as RootUser has access to // read and write all data, merely good hygiene. For example, there is // no reason to permit the root user to send raw Raft RPCs. certUserScope, err := security.GetCertificateUserScope(clientCert) if err != nil { return nil, err } if err := checkRootOrNodeInScope(certUserScope, a.tenant.tenantID); err != nil { return nil, err } if tenantIDFromCredentials.IsSet() { return authnSuccessPeerIsTenantServer(tlsID), nil } else { return authnSuccessPeerIsPrivileged{}, nil } } }

Not bad. Doing this simplification in a separate PR: #96338

knz

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @dhartunian, @jeffswenson, and @stevendanna)

pkg/rpc/auth.go line 155 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I think maybe you're not looking at the commit on which I commented? The comment is on r1, on the line that reads
return roachpb.TenantID{}, authErrorf("this tenant (%v) cannot serve requests from a server for tenant %v", a.tenant.tenantID, tid)

Done.

pkg/rpc/auth.go line 137 at r4 (raw file):

Previously, stevendanna (Steven Danna) wrote…

[💭 ] I could see having a method in rpc/context for getting and removing the client tID from the context rather than exposing the header key and having the caller do the manipulation. Fine as-is though.

Done.

pkg/server/status.go line 139 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Generally this would be incorrect, because we don't only do fan-out for incoming requests. We also do fan-out to handle a request internally to the KV layer / system tenant, in response to a SQL statement.

But this function doesn't seem to be used for KV calling KV, is it? At least not when calling Batch, which is the case where I see why propagating a tenant's identity would be bad (because for example reading from the meta ranges would fail). For RPCs that follow the pattern of having a NodeID in the request and forwarding the request to that node (or fanning out to all nodes if NodeID is zero), it seems to me that maintaining the tenants identity would be a good thing (so that each node can individually do its own auth rather than rely on only the first KV node to do it).

But then even if we focus specifically on SQL-to-SQL request fanout (within the tenant), we still would not need to propagate the metadata here. This is because the outgoing RPC connection (as initiated by rpc.Dial) would re-inject the tenant ID in the outgoing request anyway.

Well but it'd probably be cheaper for that function to check whether the desired tenant ID is already in the ctx, and not add it again if it is.

Does this function serve any purpose? I think this function would really benefit from a comment explaining what it's about. I think it was introduced in conjunction with server.remote_debugging.mode which has since gone away; I think it was about propagating a piece of metadata injected by the grpc-gateway that was to be checked on all nodes. I suspect we can get rid of the function and then we don't have to argue about what it should propagate and what it shouldn't.
If the function serves some purpose, I would really benefit from a comment explaining what the deal is.

We've discussed this elsewhere with David and a suggestion was made to do this modification right after authentication, instead of here. I've adapted the PR accordingly.
(And added an explanatory comment here as requested by Andrei)

knz · 2023-02-03T13:31:35Z

I have updated/upgraded this PR based off the changes in #96318, #96451, and some newfound lessons from #96427.

PTAL

In the previous change, the logging tag was just "tenant". This is somewhat confusing/ambiguous with the tenant identity of the server itself. Release note: None

knz · 2023-02-03T21:09:13Z

RFAL, Andrei asked me to not be too prescriptive about the context upon authentication, and instead only erase the excess metadata in the internal client adapter. I've updated the PR accordingly.

TLDR: this patch introduces a mechanism through which a secondary tenant server can identify itself to its peer through a RPC using a different mechanism than a TLS "tenant client" certificate. When running with shared-process multitenancy (with secondary tenant servers running in the same process as the KV node), we want to allow tenant servers to dial RPC connections with the same TLS client cert as the KV node. To make this possible, it becomes necessary for a RPC client to identify itself as a secondary tenant by another mean than the CN field in the TLS client cert. This patch does by introducing a new optional gRPC metadata header, "client-tenant". - When absent, we continue to use the tenant ID in the client cert as the claimed tenant identity of the client, as previously. - When present, it is used as the claimed tenant identity of the client. In that case, we allow two TLS situations: - either the client is _also_ using a client tenant cert, in which case we verify that the tenant ID in the cert matches the one in the metadata; - or, the client is using a regular client TLS cert, in which case we verify that it is using a 'root' or 'node' cert, since only these principals are allowed to perform RPCs in the first place. The authentication rules are summarized in the following table. This matrix was designed with the following properties: - *a rogue client cannot get more access by adding gRPC metadata than it would have if it didn't pass gRPC metadata at all*. This can be checked by observing that for a given TLS client cert, the addition of gRPC metadata always results in authn and authz rules that are at least as restrictive. - the gRPC metadata remains optional, so as to allow previous version SQL pods to continue to connect with just a valid TLS cert. ``` +------------------+---------------+-------------------------------------+--------------------------------------+ | | | Server is system tenant | Server is secondary tenant | +------------------+---------------+---------------------+---------------+-------------------------+------------+ | TLS client cert | gRPC metadata | Authn result | Authz rule | Authn result | Authz rule | +------------------+---------------+---------------------+---------------+-------------------------+------------+ | Tenant client | None | OK | tenant-filter | OK if client tenant | allow | | | | | | ID matches server | | | | | | | | | +------------------+ +---------------------+---------------+-------------------------+------------+ | `root` or `node` | | OK | allow | OK if user scope | allow | | client | | | | maches server tenant ID | | +------------------+ +---------------------+---------------+-------------------------+------------+ | other client | | deny | N/A | deny | N/A | +------------------+---------------+---------------------+---------------+-------------------------+------------+ | Tenant client | Client | OK if TLS tenant ID | tenant-filter | OK if TLS, MD and | allow | | | specified | matches MD | | server tenant IDs match | | +------------------+ tenant ID +---------------------+---------------+-------------------------+------------+ | `root` or `node` | | OK | tenant-filter | OK if MD and server | allow | | client | | | | tenant IDs match | | +------------------+ +---------------------+---------------+-------------------------+------------+ | other client | | deny | N/A | deny | N/A | +------------------+---------------+---------------------+---------------+-------------------------+------------+ ``` Release note: None

dhartunian

just have a question re: stripping metadata when running rpcs over network from tenant->kv.

the code iterations really made things a lot more readable. now that I walked through the call chain in auth.go things are much clearer.

Reviewed 5 of 9 files at r12, 4 of 4 files at r13, 3 of 11 files at r14, 8 of 8 files at r15, 3 of 7 files at r16, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei, @jeffswenson, @knz, and @stevendanna)

pkg/rpc/context.go line 822 at r17 (raw file):

			// metadata in the context, but we need to get rid of it
			// before we let the call go through KV.
			ctx = grpcutil.ClearIncomingContext(ctx)

Is this also done in the case where the tenant is out-of-process? This code path is for local calls only, right?

knz

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei, @dhartunian, @jeffswenson, and @stevendanna)

pkg/rpc/context.go line 822 at r17 (raw file):

Previously, dhartunian (David Hartunian) wrote…

Is this also done in the case where the tenant is out-of-process? This code path is for local calls only, right?

This code path is only done for local calls, yes.

We do not need to do this for network calls. gRPC differentiates between "incoming contexts" and "outgoing contexts". The data from an incoming context does not carry over to an outgoing context in a network call automatically.
(Which is why we have this function forwardSQLIdentityThroughRPCCalls in pkg/server to copy the SQL username from the incoming metadata to outgoing context.)

knz · 2023-02-03T22:26:31Z

TFYR!

bors r=dhartunian

craig · 2023-02-04T00:10:50Z

Build succeeded:

Bazel Essential CI (Cockroach)

knz requested review from a team as code owners January 29, 2023 12:12

knz force-pushed the 20230129-tenant-md branch 5 times, most recently from 8cc535b to 48b9c74 Compare January 29, 2023 18:34

knz requested a review from a team January 29, 2023 18:34

knz force-pushed the 20230129-tenant-md branch from 48b9c74 to a51599d Compare January 29, 2023 18:50

knz requested review from jeffswenson and stevendanna January 30, 2023 15:45

knz mentioned this pull request Jan 30, 2023

rpc,security: simplify and enhance the client cert validation #96207

Merged

stevendanna reviewed Jan 30, 2023

View reviewed changes

andreimatei reviewed Jan 30, 2023

View reviewed changes

knz force-pushed the 20230129-tenant-md branch from a51599d to b4f4d6b Compare January 30, 2023 23:03

knz commented Jan 30, 2023

View reviewed changes

knz requested review from stevendanna and andreimatei January 30, 2023 23:04

andreimatei reviewed Jan 30, 2023

View reviewed changes

knz commented Jan 30, 2023

View reviewed changes

knz added the A-multitenancy Related to multi-tenancy label Jan 30, 2023

andreimatei reviewed Jan 30, 2023

View reviewed changes

knz requested a review from andreimatei January 31, 2023 00:02

andreimatei reviewed Jan 31, 2023

View reviewed changes

knz force-pushed the 20230129-tenant-md branch from b4f4d6b to 021383a Compare February 1, 2023 00:05

stevendanna reviewed Feb 1, 2023

View reviewed changes

knz force-pushed the 20230129-tenant-md branch 3 times, most recently from 6cbd72c to 97f60e7 Compare February 1, 2023 22:01

knz requested review from dhartunian and andreimatei February 1, 2023 22:08

knz commented Feb 1, 2023

View reviewed changes

knz force-pushed the 20230129-tenant-md branch 3 times, most recently from 56d514d to 334d1d7 Compare February 3, 2023 13:30

knz mentioned this pull request Feb 3, 2023

rpc,server: simplify the constructor for the internal client adapter #96494

Merged

rpc: mark the client tenant using logging tag client-tenant

d53143b

In the previous change, the logging tag was just "tenant". This is somewhat confusing/ambiguous with the tenant identity of the server itself. Release note: None

knz force-pushed the 20230129-tenant-md branch from 334d1d7 to 82cbb17 Compare February 3, 2023 21:07

knz force-pushed the 20230129-tenant-md branch from 82cbb17 to f3fa310 Compare February 3, 2023 21:11

dhartunian approved these changes Feb 3, 2023

View reviewed changes

knz commented Feb 3, 2023

View reviewed changes

craig bot merged commit 581560b into cockroachdb:master Feb 4, 2023

knz deleted the 20230129-tenant-md branch February 4, 2023 01:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rpc,security: allow setting the tenant ID via gRPC metadata #96153

rpc,security: allow setting the tenant ID via gRPC metadata #96153

knz commented Jan 29, 2023 •

edited

Loading

cockroach-teamcity commented Jan 29, 2023

stevendanna left a comment

stevendanna Jan 30, 2023

stevendanna Jan 30, 2023

andreimatei left a comment

knz left a comment

andreimatei left a comment

knz left a comment

andreimatei left a comment

knz commented Jan 31, 2023

andreimatei left a comment

stevendanna Feb 1, 2023

knz Feb 1, 2023

knz left a comment

knz commented Feb 3, 2023

knz commented Feb 3, 2023

dhartunian left a comment

knz left a comment

knz commented Feb 3, 2023

craig bot commented Feb 4, 2023

rpc,security: allow setting the tenant ID via gRPC metadata #96153

rpc,security: allow setting the tenant ID via gRPC metadata #96153

Conversation

knz commented Jan 29, 2023 • edited Loading

cockroach-teamcity commented Jan 29, 2023

stevendanna left a comment

Choose a reason for hiding this comment

stevendanna Jan 30, 2023

Choose a reason for hiding this comment

stevendanna Jan 30, 2023

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

knz commented Jan 31, 2023

andreimatei left a comment

Choose a reason for hiding this comment

stevendanna Feb 1, 2023

Choose a reason for hiding this comment

knz Feb 1, 2023

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

knz commented Feb 3, 2023

knz commented Feb 3, 2023

dhartunian left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

knz commented Feb 3, 2023

craig bot commented Feb 4, 2023

knz commented Jan 29, 2023 •

edited

Loading