Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-17157: pkg/controller: label RBAC with content hash #3034

Conversation

stevekuznetsov
Copy link
Member

@stevekuznetsov stevekuznetsov commented Sep 15, 2023

When a CSV is processed, it is assumed that the InstallPlan has already run, or that a user that's creating a CSV as their entrypoint into the system has otherwise met all the preconditions for the CSV to exist.

As part of validating these preconditions, the CSV logic today uses cluster-scoped listers for all RBAC resources. sets up an in-memory authorizer for these, and queries the CSV install strategie's permissions against those.

We would like to restrict the amount of memory OLM uses, and part of that is not caching the world. For the above approach to work, all RBAC objects fulfilling CSV permission preconditions would need to be labelled. In the case that a user is creating a CSV manually, this will not be the case.

We can use the SubjectAccessReview API to check for the presence of permissions without caching the world, but since a PolicyRule has slices of verbs, resources, subjects, etc and the SAR endpoint accepts but one of each, there will be (in the general case) a combinatorical explosion of calls to issue enough SARs to cover the full set of permissions.

Therefore, we need to limit the amount of times we take that action. A simple optimization is to check for permissions created directly by OLM, as that's by far the most common entrypoint into the system (a user creates a Subscription, that triggers an InstallPlan, which creates the RBAC).

As OLM chose to name the RBAC objects with random strings of characters, it's not possible to look at a list of permissions in a CSV and know which resources OLM would have created. Therefore, this PR adds a label to all relevant RBAC resources with the hash of their content. We already have the name of the CSV, but since CSV content is ostensibly mutable, this is not enough.

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Sep 15, 2023
@openshift-ci-robot
Copy link
Collaborator

@stevekuznetsov: This pull request references Jira Issue OCPBUGS-17157, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.15.0) matches configured target version for branch (4.15.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @jianzhangbjz

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

When a CSV is processed, it is assumed that the InstallPlan has already run, or that a user that's creating a CSV as their entrypoint into the system has otherwise met all the preconditions for the CSV to exist.

As part of validating these preconditions, the CSV logic today uses cluster-scoped listers for all RBAC resources. sets up an in-memory authorizer for these, and queries the CSV install strategie's permissions against those.

We would like to restrict the amount of memory OLM uses, and part of that is not caching the world. For the above approach to work, all RBAC objects fulfilling CSV permission preconditions would need to be labelled. In the case that a user is creating a CSV manually, this will not be the case.

We can use the SubjectAccessReview API to check for the presence of permissions without caching the world, but since a PolicyRule has slices of verbs, resources, subjects, etc and the SAR endpoint accepts but one of each, there will be (in the general case) a combinatorical explosion of calls to issue enough SARs to cover the full set of permissions.

Therefore, we need to limit the amount of times we take that action. A simple optimization is to check for permissions created directly by OLM, as that's by far the most common entrypoint into the system (a user creates a Subscription, that triggers an InstallPlan, which creates the RBAC).

As OLM chose to name the RBAC objects with random strings of characters, it's not possible to look at a list of permissions in a CSV and know which resources OLM would have created. Therefore, this PR adds a label to all relevant RBAC resources with the hash of their content. We already have the name of the CSV, but since CSV content is ostensibly mutable, this is not enough.

Description of the change:

Motivation for the change:

Architectural changes:

Testing remarks:

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Bug fixes are accompanied by regression test(s)
  • e2e tests and flake fixes are accompanied evidence of flake testing, e.g. executing the test 100(0) times
  • tech debt/todo is accompanied by issue link(s) in comments in the surrounding code
  • Tests are comprehensible, e.g. Ginkgo DSL is being used appropriately
  • Docs updated or added to /doc
  • Commit messages sensible and descriptive
  • Tests marked as [FLAKE] are truly flaky and have an issue
  • Code is properly formatted

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Collaborator

@stevekuznetsov: This pull request references Jira Issue OCPBUGS-17157, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.15.0) matches configured target version for branch (4.15.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @jianzhangbjz

In response to this:

When a CSV is processed, it is assumed that the InstallPlan has already run, or that a user that's creating a CSV as their entrypoint into the system has otherwise met all the preconditions for the CSV to exist.

As part of validating these preconditions, the CSV logic today uses cluster-scoped listers for all RBAC resources. sets up an in-memory authorizer for these, and queries the CSV install strategie's permissions against those.

We would like to restrict the amount of memory OLM uses, and part of that is not caching the world. For the above approach to work, all RBAC objects fulfilling CSV permission preconditions would need to be labelled. In the case that a user is creating a CSV manually, this will not be the case.

We can use the SubjectAccessReview API to check for the presence of permissions without caching the world, but since a PolicyRule has slices of verbs, resources, subjects, etc and the SAR endpoint accepts but one of each, there will be (in the general case) a combinatorical explosion of calls to issue enough SARs to cover the full set of permissions.

Therefore, we need to limit the amount of times we take that action. A simple optimization is to check for permissions created directly by OLM, as that's by far the most common entrypoint into the system (a user creates a Subscription, that triggers an InstallPlan, which creates the RBAC).

As OLM chose to name the RBAC objects with random strings of characters, it's not possible to look at a list of permissions in a CSV and know which resources OLM would have created. Therefore, this PR adds a label to all relevant RBAC resources with the hash of their content. We already have the name of the CSV, but since CSV content is ostensibly mutable, this is not enough.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@@ -108,7 +108,6 @@ type Operator struct {
client versioned.Interface
dynamicClient dynamic.Interface
lister operatorlister.OperatorLister
k8sLabelQueueSets map[schema.GroupVersionResource]workqueue.RateLimitingInterface
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this in a previous commit but was in a copy-pasta mode and it's not needed.

Name: gvr.String(),
})
queueInformer, err := queueinformer.NewQueueInformer(
ctx,
queueinformer.WithQueue(queue),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot this in the original PR to add the labeler functionality.

Copy link
Member

@awgreene awgreene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks fine, but I have a question regarding a possible edgecase. If we're using the role/roleBinding's spec to create the label, is there an opportunity for a collision if two operators specify roles with the same spec?

Comment on lines +70 to +76
func PolicyRuleHashLabelValue(rules []rbacv1.PolicyRule) (string, error) {
raw, err := json.Marshal(rules)
if err != nil {
return "", err
}
return toBase62(sha256.Sum224(raw)), nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this could cause an issue if two operators define the same rules for a clusterRole.

@stevekuznetsov
Copy link
Member Author

The code looks fine, but I have a question regarding a possible edgecase. If we're using the role/roleBinding's spec to create the label, is there an opportunity for a collision if two operators specify roles with the same spec?

The code for creating the objects has not changed, so this PR should not have any effect on that problem. In any case, random values are used to name the objects, so it does not seem like there would be any collisions.

The RBAC objects are labelled with a) the CSV that they are created for and b) a hash of the spec. So yes, if someone duplicated the list of permissions they asked for in one CSV, an approach that looked at these labels would not be able to tell those apart - but neither would the previous approach of using the authorizer. Since the question we want to be able to answer is "is the permission satisfied," it does not seem important to be able to distinguish between two identical specs.

@awgreene
Copy link
Member

The code for creating the objects has not changed, so this PR should not have any effect on that problem. In any case, random values are used to name the objects, so it does not seem like there would be any collisions.

Good point.

The RBAC objects are labelled with a) the CSV that they are created for and b) a hash of the spec. So yes, if someone duplicated the list of permissions they asked for in one CSV, an approach that looked at these labels would not be able to tell those apart - but neither would the previous approach of using the authorizer. Since the question we want to be able to answer is "is the permission satisfied," it does not seem important to be able to distinguish between two identical specs.

Okay cool, if the RBAC has a unique name, a CSV label, and a hash label there's a clear way identify the owner and we can satisfy the "is the permission satisfied" asks.

/approve

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 18, 2023
@stevekuznetsov stevekuznetsov added the lgtm Indicates that a PR is ready to be merged. label Sep 18, 2023
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Sep 18, 2023
@openshift-ci
Copy link

openshift-ci bot commented Sep 18, 2023

New changes are detected. LGTM label has been removed.

When a CSV is processed, it is assumed that the InstallPlan has already
run, or that a user that's creating a CSV as their entrypoint into the
system has otherwise met all the preconditions for the CSV to exist.

As part of validating these preconditions, the CSV logic today uses
cluster-scoped listers for all RBAC resources. sets up an in-memory
authorizer for these, and queries the CSV install strategie's
permissions against those.

We would like to restrict the amount of memory OLM uses, and part of
that is not caching the world. For the above approach to work, all RBAC
objects fulfilling CSV permission preconditions would need to be
labelled. In the case that a user is creating a CSV manually, this will
not be the case.

We can use the SubjectAccessReview API to check for the presence of
permissions without caching the world, but since a PolicyRule has slices
of verbs, resources, subjects, etc and the SAR endpoint accepts but one
of each, there will be (in the general case) a combinatorical explosion
of calls to issue enough SARs to cover the full set of permissions.

Therefore, we need to limit the amount of times we take that action. A
simple optimization is to check for permissions created directly by OLM,
as that's by far the most common entrypoint into the system (a user
creates a Subscription, that triggers an InstallPlan, which creates the
RBAC).

As OLM chose to name the RBAC objects with random strings of characters,
it's not possible to look at a list of permissions in a CSV and know
which resources OLM would have created. Therefore, this PR adds a label
to all relevant RBAC resources with the hash of their content. We
already have the name of the CSV, but since CSV content is ostensibly
mutable, this is not enough.

Signed-off-by: Steve Kuznetsov <[email protected]>
@stevekuznetsov stevekuznetsov added the lgtm Indicates that a PR is ready to be merged. label Sep 18, 2023
@awgreene
Copy link
Member

/approve

@awgreene awgreene self-requested a review September 18, 2023 16:11
@openshift-ci
Copy link

openshift-ci bot commented Sep 18, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: awgreene, stevekuznetsov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 8eb4f3e into operator-framework:master Sep 18, 2023
15 checks passed
@openshift-ci-robot
Copy link
Collaborator

@stevekuznetsov: Jira Issue OCPBUGS-17157: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-17157 has not been moved to the MODIFIED state.

In response to this:

When a CSV is processed, it is assumed that the InstallPlan has already run, or that a user that's creating a CSV as their entrypoint into the system has otherwise met all the preconditions for the CSV to exist.

As part of validating these preconditions, the CSV logic today uses cluster-scoped listers for all RBAC resources. sets up an in-memory authorizer for these, and queries the CSV install strategie's permissions against those.

We would like to restrict the amount of memory OLM uses, and part of that is not caching the world. For the above approach to work, all RBAC objects fulfilling CSV permission preconditions would need to be labelled. In the case that a user is creating a CSV manually, this will not be the case.

We can use the SubjectAccessReview API to check for the presence of permissions without caching the world, but since a PolicyRule has slices of verbs, resources, subjects, etc and the SAR endpoint accepts but one of each, there will be (in the general case) a combinatorical explosion of calls to issue enough SARs to cover the full set of permissions.

Therefore, we need to limit the amount of times we take that action. A simple optimization is to check for permissions created directly by OLM, as that's by far the most common entrypoint into the system (a user creates a Subscription, that triggers an InstallPlan, which creates the RBAC).

As OLM chose to name the RBAC objects with random strings of characters, it's not possible to look at a list of permissions in a CSV and know which resources OLM would have created. Therefore, this PR adds a label to all relevant RBAC resources with the hash of their content. We already have the name of the CSV, but since CSV content is ostensibly mutable, this is not enough.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants