-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server, tenant: gate process debugging behind capability #102635
server, tenant: gate process debugging behind capability #102635
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modulo some tiny nits. Nice work on this - easy to reason about!
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @dhartunian, @renatolabs, and @smg260)
-- commits
line 20 at r1:
nit: is this the right issue ref?
pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer.go
line 399 at r1 (raw file):
} errFn := func() error { return errors.New("client tenant does not have capability to debug the process")
nit: Maybe I'm missing something, but why do we need an error function?
pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer.go
line 404 at r1 (raw file):
switch mode { case authorizerModeOn: break // fallthrough to the next check.
nit: fallthrough
is a bit of an overloaded term here when used in a switch statement. Can we re-word this a bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally OK with the architecture proposed. I have a suggestion for a simplification. Also a few comments about the wording.
Reviewed 16 of 16 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @abarganier, @dhartunian, @renatolabs, and @smg260)
-- commits
line 4 at r1:
"all tenant processes" -> "all tenant servers"
-- commits
line 6 at r1:
Add:
"""
This implementation is fine when servers serve just 1 tenant in a given process; that tenant then legitimately has access to all process-level control.
However, it becomes a problem in shared-process multitenancy, when the same process is shared by multiple tenants. In that case, it is undesirable for 1 tenant to have access to & control process properties, as it could influence the well-functioning of other tenants or potentially leak data across tenant boundaries.
"""
-- commits
line 8 at r1:
"This commit gates access to the debug server behind"...
-- commits
line 9 at r1:
"only with shared-process multitenancy"
-- commits
line 9 at r1:
"Tenant servers running within..."
-- commits
line 13 at r1:
remove "on the in-process tenant".
-- commits
line 15 at r1:
What is "parent process" in this sentence? IMHO this is not a thing. Did you mean "HTTP server for the system tenant"?
Previously, abarganier (Alex Barganier) wrote…
nit: is this the right issue ref?
yep I think it's not the right issue.
pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer.go
line 399 at r1 (raw file):
Previously, abarganier (Alex Barganier) wrote…
nit: Maybe I'm missing something, but why do we need an error function?
because it's used in 2 places below.
pkg/server/config.go
line 533 at r1 (raw file):
// SystemDebugServer is set if the sql configuration will be running in a shared // process model, allowing for parent process debugging if granted by a tenant
"parent process" is not a thing.
pkg/server/config.go
line 534 at r1 (raw file):
// SystemDebugServer is set if the sql configuration will be running in a shared // process model, allowing for parent process debugging if granted by a tenant // capabitlity.
nit: capability
pkg/server/tenant.go
line 417 at r1 (raw file):
if _, isSharedProcess := deps.instanceIDContainer.OptionalNodeID(); isSharedProcess { debugServer = debug.NewTenantDelegatingServer(
Could you not instantiate TenantDelegatingServer
in every case (including system tenant) so we have just 1 code path?
Presumably, the system tenant has the capability so the cap check would succeed.
pkg/server/testserver.go
line 924 at r1 (raw file):
hts.t.authentication = sqlServerWrapper.authentication hts.t.sqlServer = sqlServer hts.t.tenantName = args.TenantName
Why is this change needed?
21c30cd
to
733fc94
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @abarganier, @knz, @renatolabs, and @smg260)
Previously, knz (Raphael 'kena' Poss) wrote…
"all tenant processes" -> "all tenant servers"
Done.
Previously, knz (Raphael 'kena' Poss) wrote…
Add:
"""
This implementation is fine when servers serve just 1 tenant in a given process; that tenant then legitimately has access to all process-level control.
However, it becomes a problem in shared-process multitenancy, when the same process is shared by multiple tenants. In that case, it is undesirable for 1 tenant to have access to & control process properties, as it could influence the well-functioning of other tenants or potentially leak data across tenant boundaries.
"""
Done.
Previously, knz (Raphael 'kena' Poss) wrote…
"This commit gates access to the debug server behind"...
Done.
Previously, knz (Raphael 'kena' Poss) wrote…
"only with shared-process multitenancy"
Done.
Previously, knz (Raphael 'kena' Poss) wrote…
"Tenant servers running within..."
Done.
Previously, knz (Raphael 'kena' Poss) wrote…
remove "on the in-process tenant".
Done.
Previously, knz (Raphael 'kena' Poss) wrote…
What is "parent process" in this sentence? IMHO this is not a thing. Did you mean "HTTP server for the system tenant"?
clarified to say system tenant, but it does delegate to a separate debug server in this case, not the HTTP server.
Previously, knz (Raphael 'kena' Poss) wrote…
yep I think it's not the right issue.
ty. fixed.
pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer.go
line 404 at r1 (raw file):
Previously, abarganier (Alex Barganier) wrote…
nit:
fallthrough
is a bit of an overloaded term here when used in a switch statement. Can we re-word this a bit?
copied from other functions in this file so I'm gonna keep for consistency.
pkg/server/config.go
line 533 at r1 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
"parent process" is not a thing.
Done.
pkg/server/config.go
line 534 at r1 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
nit: capability
Done
pkg/server/tenant.go
line 417 at r1 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
Could you not instantiate
TenantDelegatingServer
in every case (including system tenant) so we have just 1 code path?Presumably, the system tenant has the capability so the cap check would succeed.
All cases now use the delegating server although it doesn't eliminate this particular conditional. not sure if that's possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have dug into this PR (and tried it out) in more details.
I would like to walk back my previous assent on the general architecture proposed; I think this approach is not going to work. (But I have suggestions - keep on reading.)
What this PR effectively does is re-route the handling of the traffic for, say, tenant 123 to the instance of debug.Server that was created for the system tenant.
This approach would only be approximately OK if a debug server only served endpoints that inspect the process's state -- or things that are obviously "across tenant boundaries". But that is not the case. For example, the display of HBA rules is tenant specific. With your current code, if i ask the debug server for tenant 123 its HBA rules, I get the HBA rules of the system tenant. This is not what we want.
Additionally, the epic is about giving secondary tenants access to process-scope inspection, for example pprof. It should not include things that are extremely well scoped to the KV/storage layer. For example, the LSM and closed timestamp inspection do not fall under "can inspect process state". (We could discuss adding a separate capability for that, but as a rule of thumb, if the debug endpoint wouldn't work on a separate-process deployment, it shouldn't be included under the "process debug" capability.)
Here is what we can do to make this PR work:
- keep all the capability code you've already modified - this is OK
- revert the code to instantiate
debug.Server
anew for each tenant server, like previously. - new direction: inject a capability authorizer function (from the server args) into the debug server of secondary tenants.
Here's my proposal: dhartunian#7
Reviewed 8 of 8 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @abarganier, @dhartunian, @renatolabs, and @smg260)
Previously, dhartunian (David Hartunian) wrote…
ty. fixed.
don't forget to update the issue ref in the PR description too.
pkg/server/tenant.go
line 417 at r1 (raw file):
Previously, dhartunian (David Hartunian) wrote…
All cases now use the delegating server although it doesn't eliminate this particular conditional. not sure if that's possible.
I see what you have done here. I don't dislike it, but I think you could have been much simpler. I think you forgot about the existence of http.HandlerFunc
.
See here for a suggestion: dhartunian#6
(but also note this is not the direction we should go to, see my comment at top)
733fc94
to
4fb92bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
took your advice and commits @knz. I like this approach much better, thank you.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @abarganier, @knz, @renatolabs, and @smg260)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also modify configprofiles/profiles.go
and add the new capability on line 105 thanks.
(We'll modify this logic to make it less manual later)
Reviewed 39 of 40 files at r3, 18 of 18 files at r4, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @abarganier, @dhartunian, @renatolabs, and @smg260)
Alternatively you can ignore my request -- I have #105609 queued. |
dea9eee
to
08741da
Compare
e3af59d
to
fed5caa
Compare
We were previously passing around our AdminURL() return values as `string` types which made manipulating them in tests a bit brittle. Recent addition of tenant selection query params rendered some URL strings challenging to manipulate without parsing them into a url.URL first anyway. This commit migrates to a new struct `serverutils.TestURL` which wraps `url.URL` and adds a helper `WithPath` that handles the common use case of concatenating and hides the error in a panic. All call sites have been migrated. Epic: None Release note: None
Previously, all tenant servers were started with a debug server that granted process-manipulating power via pprof and vmodule HTTP endpoints. This implementation is fine when servers serve just 1 tenant in a given process; that tenant then legitimately has access to all process-level control. However, it becomes a problem in shared-process multitenancy, when the same process is shared by multiple tenants. In that case, it is undesirable for 1 tenant to have access to & control process properties, as it could influence the well-functioning of other tenants or potentially leak data across tenant boundaries. This commit gates access to the debug server behind a capability **only with shared process multitenancy**. Tenant servers running within their own process will contain a debug server with no capability gating since they own their process. The gating is implemented via a tenant authorizer function backed by the capability store that is injected into the debug server upon startup. Shared process tenants are provided this authorizer, while separate process tenants and the system tenant use the no-op authorizer. Epic: CRDB-12100 Resolves: cockroachdb#97946 Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]>
fed5caa
to
cc16c03
Compare
TFTRs bors r=knz,abarganier |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 125379f to blathers/backport-release-23.1-102635: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Previously, all tenant servers were started with a debug server
that granted process-manipulating power via pprof and vmodule HTTP
endpoints.
This implementation is fine when servers serve just 1 tenant in
a given process; that tenant then legitimately has access to all
process-level control. However, it becomes a problem in shared-process
multitenancy, when the same process is shared by multiple tenants. In
that case, it is undesirable for 1 tenant to have access to & control
process properties, as it could influence the well-functioning of
other tenants or potentially leak data across tenant boundaries.
This commit gates access to the debug server behind a capability
only with shared process multitenancy. Tenant servers running
within their own process will contain a debug server with no
capability gating since they own their process.
The gating is implemented via a tenant authorizer function backed
by the capability store that is injected into the debug server
upon startup. Shared process tenants are provided this authorizer,
while separate process tenants and the system tenant use the no-op
authorizer.
Epic: CRDB-12100
Resolves: #97946
Release note: None