Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server, tenant: gate process debugging behind capability #102635

Conversation

dhartunian
Copy link
Collaborator

@dhartunian dhartunian commented Apr 28, 2023

Previously, all tenant servers were started with a debug server
that granted process-manipulating power via pprof and vmodule HTTP
endpoints.

This implementation is fine when servers serve just 1 tenant in
a given process; that tenant then legitimately has access to all
process-level control. However, it becomes a problem in shared-process
multitenancy, when the same process is shared by multiple tenants. In
that case, it is undesirable for 1 tenant to have access to & control
process properties, as it could influence the well-functioning of
other tenants or potentially leak data across tenant boundaries.

This commit gates access to the debug server behind a capability
only with shared process multitenancy. Tenant servers running
within their own process will contain a debug server with no
capability gating since they own their process.

The gating is implemented via a tenant authorizer function backed
by the capability store that is injected into the debug server
upon startup. Shared process tenants are provided this authorizer,
while separate process tenants and the system tenant use the no-op
authorizer.

Epic: CRDB-12100
Resolves: #97946

Release note: None

@dhartunian dhartunian requested review from a team as code owners April 28, 2023 19:31
@dhartunian dhartunian requested review from a team, smg260 and renatolabs and removed request for a team April 28, 2023 19:31
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: modulo some tiny nits. Nice work on this - easy to reason about!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @dhartunian, @renatolabs, and @smg260)


-- commits line 20 at r1:
nit: is this the right issue ref?


pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer.go line 399 at r1 (raw file):

	}
	errFn := func() error {
		return errors.New("client tenant does not have capability to debug the process")

nit: Maybe I'm missing something, but why do we need an error function?


pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer.go line 404 at r1 (raw file):

	switch mode {
	case authorizerModeOn:
		break // fallthrough to the next check.

nit: fallthrough is a bit of an overloaded term here when used in a switch statement. Can we re-word this a bit?

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally OK with the architecture proposed. I have a suggestion for a simplification. Also a few comments about the wording.

Reviewed 16 of 16 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @abarganier, @dhartunian, @renatolabs, and @smg260)


-- commits line 4 at r1:
"all tenant processes" -> "all tenant servers"


-- commits line 6 at r1:
Add:
"""
This implementation is fine when servers serve just 1 tenant in a given process; that tenant then legitimately has access to all process-level control.
However, it becomes a problem in shared-process multitenancy, when the same process is shared by multiple tenants. In that case, it is undesirable for 1 tenant to have access to & control process properties, as it could influence the well-functioning of other tenants or potentially leak data across tenant boundaries.
"""


-- commits line 8 at r1:
"This commit gates access to the debug server behind"...


-- commits line 9 at r1:
"only with shared-process multitenancy"


-- commits line 9 at r1:
"Tenant servers running within..."


-- commits line 13 at r1:
remove "on the in-process tenant".


-- commits line 15 at r1:
What is "parent process" in this sentence? IMHO this is not a thing. Did you mean "HTTP server for the system tenant"?


-- commits line 20 at r1:

Previously, abarganier (Alex Barganier) wrote…

nit: is this the right issue ref?

yep I think it's not the right issue.


pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer.go line 399 at r1 (raw file):

Previously, abarganier (Alex Barganier) wrote…

nit: Maybe I'm missing something, but why do we need an error function?

because it's used in 2 places below.


pkg/server/config.go line 533 at r1 (raw file):

	// SystemDebugServer is set if the sql configuration will be running in a shared
	// process model, allowing for parent process debugging if granted by a tenant

"parent process" is not a thing.


pkg/server/config.go line 534 at r1 (raw file):

	// SystemDebugServer is set if the sql configuration will be running in a shared
	// process model, allowing for parent process debugging if granted by a tenant
	// capabitlity.

nit: capability


pkg/server/tenant.go line 417 at r1 (raw file):

	if _, isSharedProcess := deps.instanceIDContainer.OptionalNodeID(); isSharedProcess {
		debugServer = debug.NewTenantDelegatingServer(

Could you not instantiate TenantDelegatingServer in every case (including system tenant) so we have just 1 code path?

Presumably, the system tenant has the capability so the cap check would succeed.


pkg/server/testserver.go line 924 at r1 (raw file):

	hts.t.authentication = sqlServerWrapper.authentication
	hts.t.sqlServer = sqlServer
	hts.t.tenantName = args.TenantName

Why is this change needed?

@dhartunian dhartunian force-pushed the gate-process-debug-http-server-behind-tenant-capabilities branch from 21c30cd to 733fc94 Compare June 12, 2023 19:57
Copy link
Collaborator Author

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @abarganier, @knz, @renatolabs, and @smg260)


-- commits line 4 at r1:

Previously, knz (Raphael 'kena' Poss) wrote…

"all tenant processes" -> "all tenant servers"

Done.


-- commits line 6 at r1:

Previously, knz (Raphael 'kena' Poss) wrote…

Add:
"""
This implementation is fine when servers serve just 1 tenant in a given process; that tenant then legitimately has access to all process-level control.
However, it becomes a problem in shared-process multitenancy, when the same process is shared by multiple tenants. In that case, it is undesirable for 1 tenant to have access to & control process properties, as it could influence the well-functioning of other tenants or potentially leak data across tenant boundaries.
"""

Done.


-- commits line 8 at r1:

Previously, knz (Raphael 'kena' Poss) wrote…

"This commit gates access to the debug server behind"...

Done.


-- commits line 9 at r1:

Previously, knz (Raphael 'kena' Poss) wrote…

"only with shared-process multitenancy"

Done.


-- commits line 9 at r1:

Previously, knz (Raphael 'kena' Poss) wrote…

"Tenant servers running within..."

Done.


-- commits line 13 at r1:

Previously, knz (Raphael 'kena' Poss) wrote…

remove "on the in-process tenant".

Done.


-- commits line 15 at r1:

Previously, knz (Raphael 'kena' Poss) wrote…

What is "parent process" in this sentence? IMHO this is not a thing. Did you mean "HTTP server for the system tenant"?

clarified to say system tenant, but it does delegate to a separate debug server in this case, not the HTTP server.


-- commits line 20 at r1:

Previously, knz (Raphael 'kena' Poss) wrote…

yep I think it's not the right issue.

ty. fixed.


pkg/multitenant/tenantcapabilities/tenantcapabilitiesauthorizer/authorizer.go line 404 at r1 (raw file):

Previously, abarganier (Alex Barganier) wrote…

nit: fallthrough is a bit of an overloaded term here when used in a switch statement. Can we re-word this a bit?

copied from other functions in this file so I'm gonna keep for consistency.


pkg/server/config.go line 533 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

"parent process" is not a thing.

Done.


pkg/server/config.go line 534 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

nit: capability

Done


pkg/server/tenant.go line 417 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

Could you not instantiate TenantDelegatingServer in every case (including system tenant) so we have just 1 code path?

Presumably, the system tenant has the capability so the cap check would succeed.

All cases now use the delegating server although it doesn't eliminate this particular conditional. not sure if that's possible.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have dug into this PR (and tried it out) in more details.

I would like to walk back my previous assent on the general architecture proposed; I think this approach is not going to work. (But I have suggestions - keep on reading.)

What this PR effectively does is re-route the handling of the traffic for, say, tenant 123 to the instance of debug.Server that was created for the system tenant.

This approach would only be approximately OK if a debug server only served endpoints that inspect the process's state -- or things that are obviously "across tenant boundaries". But that is not the case. For example, the display of HBA rules is tenant specific. With your current code, if i ask the debug server for tenant 123 its HBA rules, I get the HBA rules of the system tenant. This is not what we want.

Additionally, the epic is about giving secondary tenants access to process-scope inspection, for example pprof. It should not include things that are extremely well scoped to the KV/storage layer. For example, the LSM and closed timestamp inspection do not fall under "can inspect process state". (We could discuss adding a separate capability for that, but as a rule of thumb, if the debug endpoint wouldn't work on a separate-process deployment, it shouldn't be included under the "process debug" capability.)

Here is what we can do to make this PR work:

  • keep all the capability code you've already modified - this is OK
  • revert the code to instantiate debug.Server anew for each tenant server, like previously.
  • new direction: inject a capability authorizer function (from the server args) into the debug server of secondary tenants.

Here's my proposal: dhartunian#7

Reviewed 8 of 8 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @abarganier, @dhartunian, @renatolabs, and @smg260)


-- commits line 20 at r1:

Previously, dhartunian (David Hartunian) wrote…

ty. fixed.

don't forget to update the issue ref in the PR description too.


pkg/server/tenant.go line 417 at r1 (raw file):

Previously, dhartunian (David Hartunian) wrote…

All cases now use the delegating server although it doesn't eliminate this particular conditional. not sure if that's possible.

I see what you have done here. I don't dislike it, but I think you could have been much simpler. I think you forgot about the existence of http.HandlerFunc.

See here for a suggestion: dhartunian#6

(but also note this is not the direction we should go to, see my comment at top)

@knz knz added the A-multitenancy Related to multi-tenancy label Jun 20, 2023
@dhartunian dhartunian force-pushed the gate-process-debug-http-server-behind-tenant-capabilities branch from 733fc94 to 4fb92bf Compare June 26, 2023 21:00
@dhartunian dhartunian requested a review from a team as a code owner June 26, 2023 21:00
@dhartunian dhartunian requested a review from a team June 26, 2023 21:00
@dhartunian dhartunian requested review from a team as code owners June 26, 2023 21:00
Copy link
Collaborator Author

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

took your advice and commits @knz. I like this approach much better, thank you.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @abarganier, @knz, @renatolabs, and @smg260)

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also modify configprofiles/profiles.go and add the new capability on line 105 thanks.

(We'll modify this logic to make it less manual later)

Reviewed 39 of 40 files at r3, 18 of 18 files at r4, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @abarganier, @dhartunian, @renatolabs, and @smg260)

@knz
Copy link
Contributor

knz commented Jun 27, 2023

Can you also modify configprofiles/profiles.go and add the new capability on line 105 thanks.

Alternatively you can ignore my request -- I have #105609 queued.

@dhartunian dhartunian force-pushed the gate-process-debug-http-server-behind-tenant-capabilities branch 6 times, most recently from dea9eee to 08741da Compare June 28, 2023 19:13
@dhartunian dhartunian force-pushed the gate-process-debug-http-server-behind-tenant-capabilities branch 2 times, most recently from e3af59d to fed5caa Compare June 28, 2023 21:16
dhartunian and others added 2 commits June 28, 2023 17:16
We were previously passing around our AdminURL() return values as
`string` types which made manipulating them in tests a bit brittle.
Recent addition of tenant selection query params rendered some URL
strings challenging to manipulate without parsing them into a url.URL
first anyway.

This commit migrates to a new struct `serverutils.TestURL` which
wraps `url.URL` and adds a helper `WithPath` that handles the
common use case of concatenating and hides the error in a panic.

All call sites have been migrated.

Epic: None

Release note: None
Previously, all tenant servers were started with a debug server
that granted process-manipulating power via pprof and vmodule HTTP
endpoints.

This implementation is fine when servers serve just 1 tenant in
a given process; that tenant then legitimately has access to all
process-level control. However, it becomes a problem in shared-process
multitenancy, when the same process is shared by multiple tenants. In
that case, it is undesirable for 1 tenant to have access to & control
process properties, as it could influence the well-functioning of
other tenants or potentially leak data across tenant boundaries.

This commit gates access to the debug server behind a capability
**only with shared process multitenancy**. Tenant servers running
within their own process will contain a debug server with no
capability gating since they own their process.

The gating is implemented via a tenant authorizer function backed
by the capability store that is injected into the debug server
upon startup. Shared process tenants are provided this authorizer,
while separate process tenants and the system tenant use the no-op
authorizer.

Epic: CRDB-12100
Resolves: cockroachdb#97946

Release note: None

Co-authored-by: Raphael 'kena' Poss <[email protected]>
@dhartunian dhartunian force-pushed the gate-process-debug-http-server-behind-tenant-capabilities branch from fed5caa to cc16c03 Compare June 28, 2023 21:17
@knz knz added the backport-23.1.x Flags PRs that need to be backported to 23.1 label Jul 4, 2023
@dhartunian
Copy link
Collaborator Author

TFTRs

bors r=knz,abarganier

@craig
Copy link
Contributor

craig bot commented Jul 5, 2023

Build succeeded:

@craig craig bot merged commit 5b3cd0d into cockroachdb:master Jul 5, 2023
@blathers-crl
Copy link

blathers-crl bot commented Jul 5, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 125379f to blathers/backport-release-23.1-102635: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.1.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@dhartunian dhartunian removed the backport-23.1.x Flags PRs that need to be backported to 23.1 label Jul 5, 2023
@dhartunian dhartunian deleted the gate-process-debug-http-server-behind-tenant-capabilities branch February 5, 2024 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-multitenancy Related to multi-tenancy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

server: gate pprof and vmodule endpoints behind tenant capabilities
4 participants