Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(security): add SecurityContext to recordings #1188

Draft
wants to merge 95 commits into
base: main
Choose a base branch
from

Conversation

andrewazores
Copy link
Member

@andrewazores andrewazores commented Nov 1, 2022

Related to #760
Fixes #1409

Depends on #1338

@andrewazores andrewazores added the feat New feature or request label Nov 1, 2022
@mergify mergify bot added the safe-to-test label Nov 1, 2022
@andrewazores andrewazores force-pushed the security-context branch 3 times, most recently from d4f6023 to a3033a4 Compare November 2, 2022 13:51
@andrewazores andrewazores changed the title [WIP] feat(security): add SecurityContext to recordings feat(security): add SecurityContext to recordings Nov 2, 2022
@andrewazores andrewazores force-pushed the security-context branch 6 times, most recently from d0d0814 to 16682ba Compare November 11, 2022 04:59
@github-actions
Copy link
Contributor

Test image available:

$ CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1188-16682bada52d2d2485548838b272f7c5e35ca6c7 sh smoketest.sh

@github-actions
Copy link
Contributor

Test image available:

$ CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1188-77c1b14338fa5f6fbb470ebafd1ab055103877e4 sh smoketest.sh

@andrewazores andrewazores force-pushed the security-context branch 2 times, most recently from ccd276c to 41d6dd2 Compare December 1, 2022 20:21
@github-actions
Copy link
Contributor

github-actions bot commented Dec 1, 2022

Test image available:

$ CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1188-41d6dd23f9025551495cbf635018692c697f9735 sh smoketest.sh

@andrewazores
Copy link
Member Author

Current issues:

  1. Trying to download an active recording seems to reliably produce a 0-byte file on the first attempt, and then the proper file on subsequent attempts.
  2. Redeploying Cryostat while a sample application is still live the whole time results in security context errors when trying to perform API requests on the sample application. Restarting the sample application fixes it. Seems like the reverse lookup is probably failing.

Haven't been able to construct a scenario where there are actually users and targets across different namespaces and with different permissions to the namespaces, but for the current single-namespace deployment scenario this looks like a transparent change (as expected) other than the minor bugs listed above.

@andrewazores
Copy link
Member Author

andrewazores commented Dec 1, 2022

@tthvo @maxcao13 this will be ready for proper review soon I think. This needs to also be tested with cryostatio/cryostat-web#707 . There are some basic instructions on that PR about how to do so, but it's pretty standard development workflow.

Please give it a try and see if you can catch any more broken cases that I haven't run into yet. I can also talk through the high-level design here and how it is intended to solve the multi-namespace/multi-tenant security concerns.

I'll fix the unit tests soon as well, I broke them in one of the last commits where I made things fail faster if the correct request security context could not be determined.

@andrewazores andrewazores force-pushed the security-context branch 2 times, most recently from c3a3c73 to 25c4b2d Compare December 2, 2022 19:57
@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2022

Test image available:

$ CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1188-25c4b2d4ecf38866f61c5879bfe5a6c7f57f9311 sh smoketest.sh

@andrewazores
Copy link
Member Author

Tests seem flakey, it looks like something might not be working properly or cleaning up properly perhaps. There is also at least one itest that I have @Disabled for now since it's broken in a way that I haven't figured out yet.

@maxcao13
Copy link
Member

maxcao13 commented Dec 3, 2022

Manually testing the PR on ocp and it seems I can't find anything so far in a normal workflow that's been broken, but I will continue to do so after the weekend.

Looks good so far, I think @tthvo and I would definitely benefit from a walkthrough or writeup on how your changes are able to handle multi-namespace security concerns.

@andrewazores
Copy link
Member Author

andrewazores commented Dec 5, 2022

So the concept is centered around that new SecurityContext interface I added. It's an interface because the actual implementation may be different for different platforms, much like the AuthManager. The explanation below mostly assumes deployment in OpenShift for brevity, but at the end I'll tie back how this defaults back to essentially the current behaviour for other scenarios.

The idea is that currently, on OpenShift, Cryostat only deploys into a single namespace, and every user who accesses Cryostat must already have some level of access to that namespace and the other resources (deployments, pods, endpoints) within. This has been baked into Cryostat and its integration with OpenShift RBAC for a long time. For other deployments that don't use the OpenShiftAuthManager then there is currently no notion of a thing like a namespace, but the implementation should be easily extensible enough to support this in the future or for other platform implementations.

So, the goal is to extend the OpenShift platform implementation to support deployment into one namespace but talking to targets across other namespaces. If Cryostat is deployed into Namespace Z and there are target applications in Namespaces A and B, then recordings captured from Target A1 should somehow only be accessible to users who have permissions in Namespace A. For active recordings on presently-visible targets this is pretty easy - we know about the target, we know which Namespace we have observed it in, and so we can ask OpenShift if the auth token associated with the request has the correct permissions in Namespace A. This is already well-supported in Cryostat today.

The difficulty comes from archived recordings. Here we take the data out of Target A and write it to a file on the PVC that Cryostat controls. Before this PR, this operation loses some critical information. We still know the JMX service URL of the target that this data was collected from, but we don't know the Namespace that the target was located in. If we are lucky that information may actually be embedded within the hostname of the JMX service URL but we can't necessarily rely on that. The Security Context is meant to be a small additional data packet that can contain this kind of information. In the OpenShift case, the required information is just that Namespace name, so the OpenShiftSecurityContext implementation internally has a Map but the only key-value pair actually used at the moment is to record the namespace. The AuthManager interface has the new contextFor(AbstractNode | ServiceRef) methods which produce a Security Context for either of those two types. Either way, these two things must be located somewhere in the deployment tree. We also know that the KubeApiPlatformClient already tags ServiceRefs with the AnnotationKey.NAMESPACE within the annotations.cryostat map when it discovers targets, so to determine which Namespace a given ServiceRef is in we can just check that property. To determine the Namespace of some AbstractNode there is a simple trick. If it's a TargetNode then there is an embedded ServiceRef that we can check directly. If it is an EnvironmentNode we have a choice - either ascend back up the tree toward the root until we encounter a Namespace node (which may be the node we're already looking at), or descend down the tree until we encounter a TargetNode and then check that same property. This last way works because if the node we were given is "in between" the Namespace and the target, ex. it's a Deployment, then we know that every possible target leaf node we find below this node must be under the same Namespace. This does break if the given node is between the root and a Namespace, ex. it's a plugin realm or it's the whole universe (the root), but in that case there is no reasonable Namespace to select for permissions modelling. Currently the implementation defaults to using Cryostat's own deployment Namespace in that case.

Okay, all that said, we already had a way to determine the Namespace that a target belongs to, and now we have a Security Context that can hold that information. That Security Context is added as a field of the recording Metadata object. I applied a Gson trick so that this field gets included when this metadata is serialized and we are going to write it to a file for preserving the metadata on disk, but excluded when we are serializing it to include in an API response. The metadata already gets copied and preserved along with the recording JFR data when an active recording is copied to the archives, so now the Security Context also follows the recording data around.

Now we have that information attached to both active and archived recordings. The last thing that needs to be done is to make sure we actually check the requesting user's permissions against that Security Context. This is again implemented by the AuthManager. It's pretty simple and slots in well with the existing system. We already know how to check if the given user token has permissions to Cryostat's Namespace, so now we just need to check instead if the user token has permissions to the Namespace recorded in the Security Context instead. API requests (v1, v2, and GraphQL fetchers/mutators) now all implement an additional method that returns the correct Security Context for the request given the current request parameters/context. So for a mutator that starts a new recording on a target, the security context is the context for that target's ServiceRef. For a handler that streams the JFR data of an archived recording, the security context is the security context stored on disk within that archived recording's metadata. These are the security contexts that are checked by the AuthManager to ensure the currently requesting user has the right permissions (already implemented) within the required namespace.

So, what about API requests that don't touch any target or archived recording? Those don't have any explicit namespacing. Things like simply pinging Cryostat on /health/liveness, or downloading web-client assets to display in the browser, or adding JMX credentials to the keyring. This is what SecurityContext.DEFAULT represents - in OpenShift it represents the Namespace that Cryostat itself is deployed into. To add new JMX credentials the user must be able to CREATE_CREDENTIALS within this Namespace, for example, since it's Cryostat itself that the user is operating on. The other AuthManager implementations, like BasicAuthManager, always return SecurityContext.DEFAULT on the contextFor calls, and simply do a no-op/always-pass on any Security Context. This ends up behaving the same as things are now without the Security Contexts at all.


Lastly, there is one security hole in this implementation that I just figured out while typing this. Automated Rules allow permissions leaking across Namespaces which could end up looking like a privilege escalation. Users need the permissions CREATE_RULE, READ_TARGET, CREATE_RECORDING, UPDATE_RECORDING, READ_TEMPLATE to create rules, but right now these are given the SecurityContext.DEFAULT. So a user may create a rule and only have these permissions within Cryostat's deployment Namespace and nowhere else. But, when Cryostat processes this rule, it will use its own permissions to act upon the rule and start recordings which may include targets in other Namespaces where the user does not have their own permissions.

I will think and experiment with this some more. I'm not sure what to do about this at face value. Rules should probably be able to apply across namespaces. We could include a list of namespaces along with the rule definition and check that the user has permissions in all of those namespaces at rule creation time, but it's possible that the user loses those permissions later on in the external RBAC system and then the rule "should" also lose permissions, but it would not. Maybe that's OK and we require the admin user (whoever removed the first user's permissions) to also manually go and clean up the rules. Or, since we already have a specific CREATE_RULE permission, maybe this specific permission should be mapped to something cluster-scoped in the OpenShift RBAC so that a user must have a sort of superuser-level of permissions in order to create any rules at all. Any thoughts on this @ebaron? I think the cluster-scoped CREATE_RULE mapping probably works well and is pretty easy to implement all around. Maybe there is something we can do to have privileged and unprivileged rules where privileged rules require cluster-level permissions and apply to any namespace, whereas unprivileged rules must be scoped to a single namespace at rule creation time.

On the bright side, the targets and recordings still fall into the Security Context system described above, and so users would still need the appropriate permissions within the correct namespaces to actually be able to access any of the JFR data that would result from the rule. But, an unprivileged user should not be able to use rules to start recordings on targets, especially since recordings can incur performance overhead penalties depending on the selected event template.

src/main/java/io/cryostat/MainModule.java Show resolved Hide resolved
} else {
ns = ((OpenShiftSecurityContext) securityContext).getNamespace();
}
// FIXME remove
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove? Or maybe logger.trace?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe setting it to trace makes sense. I'll leave it as-is for now since it's handy during development.

JsonElement json, Type typeOfT, JsonDeserializationContext context)
throws JsonParseException {
if (!(auth.get() instanceof OpenShiftAuthManager)) {
// FIXME actually deserialize, don't make this assumption
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be fixed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll get around to that later. I think I'll need to add some logic or a migration script somewhere to upgrade existing stored metadata to include security contexts too.

@github-actions
Copy link
Contributor

Test image available:

$ CRYOSTAT_IMAGE=ghcr.io/cryostatio/cryostat:pr-1188-2cf3796b698507ee23e92672197ab2acc032c293 sh smoketest.sh

@ebaron
Copy link
Member

ebaron commented Mar 24, 2023

I've been able to track down why the OAuth grant is failing for multiple namespaces.

Here is the OAuthClient derived from the Service Account in namespace c:

metadata:
  name: system:serviceaccount:c:clustercryostat-sample
additionalSecrets:
  - <service_account_token>
redirectURIs:
  - https://clustercryostat-sample-c.apps.example.com
grantMethod: prompt
scopeRestrictions:
  - literals:
      - user:info
      - user:check-access
      - user:list-scoped-projects
      - user:list-projects
  - clusterRole:
      roleNames:
        - '*'
      namespaces:
        - c
      allowEscalation: true

This is the error from ValidateScopeRestrictions when trying to create a scoped token for namespaces a, b, and c:

error: '[role:cryostat-operator-oauth-client:a not found in [user:info user:check-access user:list-scoped-projects user:list-projects], role:cryostat-operator-oauth-client:a does not use an approved namespace, role:cryostat-operator-oauth-client:b not found in [user:info user:check-access user:list-scoped-projects user:list-projects], role:cryostat-operator-oauth-client:b does not use an approved namespace]'
errorCauses:
  - error: '[role:cryostat-operator-oauth-client:a not found in [user:info user:check-access user:list-scoped-projects user:list-projects], role:cryostat-operator-oauth-client:a does not use an approved namespace]'
    errorCauses:
      - error: role:cryostat-operator-oauth-client:a not found in [user:info user:check-access user:list-scoped-projects user:list-projects]
      - error: role:cryostat-operator-oauth-client:a does not use an approved namespace
  - error: '[role:cryostat-operator-oauth-client:b not found in [user:info user:check-access user:list-scoped-projects user:list-projects], role:cryostat-operator-oauth-client:b does not use an approved namespace]'
    errorCauses:
      - error: role:cryostat-operator-oauth-client:b not found in [user:info user:check-access user:list-scoped-projects user:list-projects]
      - error: role:cryostat-operator-oauth-client:b does not use an approved namespace

Looking at the OAuthClient, it has a scope restriction requiring the role scopes be in c. The docs indicate this is by design:

You can use a service account as a constrained form of OAuth client. Service accounts can request only a subset of scopes that allow access to some basic user information and role-based power inside of the service account’s own namespace

It seems like in order to get a scoped token to work for multiple namespaces, we can't use the service account as an OAuth client. We would have to manage our own OAuthClient(s): https://docs.openshift.com/container-platform/4.12/authentication/configuring-oauth-clients.html#oauth-register-additional-client_configuring-oauth-clients.

@andrewazores
Copy link
Member Author

Nice digging, that's great information to have on hand. I think that definitely pushes me over the edge of saying that this PR and #1409 will be pushed to 2.4.0, since it expands the scope of work required to get this functionality properly working. I will put this PR on the backburner again until after 2.3.0 is out the door.

Copy link
Contributor

This PR/issue depends on:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Status: Pushed to 2.4.0
Development

Successfully merging this pull request may close these issues.

[Story] Multi-tenant security
3 participants