Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-17157: pkg/controller: use a metadata watch for CRDs #3014

Conversation

stevekuznetsov
Copy link
Member

Using a full LIST+WATCH is an optimization, with trade-offs. Holding the state of the world for CRDs in memory when we rarely, if ever, actually need to access them is a bad use of that trade-off, especially when the sum total size of CRDs on even the most basic cluster is O(20MiB).

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Aug 17, 2023
@openshift-ci-robot
Copy link
Collaborator

@stevekuznetsov: This pull request references Jira Issue OCPBUGS-17157, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.14.0) matches configured target version for branch (4.14.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @jianzhangbjz

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Using a full LIST+WATCH is an optimization, with trade-offs. Holding the state of the world for CRDs in memory when we rarely, if ever, actually need to access them is a bad use of that trade-off, especially when the sum total size of CRDs on even the most basic cluster is O(20MiB).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@@ -64,7 +64,7 @@ ifeq (, $(wildcard $(KUBEBUILDER_ASSETS)/kube-apiserver))
endif

cover.out:
go test $(MOD_FLAGS) -tags "json1" -v -race -coverprofile=cover.out -covermode=atomic \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary but the DX of watching 10k lines of output scroll by on success makes it impossible to find test failures and should be an anti-pattern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1000

@@ -1183,7 +1193,7 @@ func (a *Operator) handleClusterServiceVersionDeletion(obj interface{}) {
}

for i, crdName := range desc.ConversionCRDs {
crd, err := a.lister.APIExtensionsV1().CustomResourceDefinitionLister().Get(crdName)
crd, err := a.opClient.ApiextensionsInterface().ApiextensionsV1().CustomResourceDefinitions().Get(context.TODO(), crdName, metav1.GetOptions{})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the crux of the trade-off. In the rare case that a CSV is deleted, we incur server cost and latency to do a live GET for the data we need. Better than holding onto 20MiB of every CRD ever when we would only ever be concerned with a tiny subset, and with the spec of that tiny subset in an even smaller set of cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's an improvement...

@tmshort
Copy link
Contributor

tmshort commented Aug 17, 2023

/retest

Copy link
Contributor

@tmshort tmshort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
(Assuming CI passes)

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 17, 2023
@openshift-ci
Copy link

openshift-ci bot commented Aug 17, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: stevekuznetsov, tmshort

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 17, 2023
@tmshort
Copy link
Contributor

tmshort commented Aug 17, 2023

/retest

@stevekuznetsov
Copy link
Member Author

Let me dig into e2e, this passes locally for me ... :|

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Aug 18, 2023
@stevekuznetsov stevekuznetsov force-pushed the skuznets/crd-metadata-watch branch 2 times, most recently from a5b9f39 to 3f8b4f1 Compare August 21, 2023 12:49
@tmshort
Copy link
Contributor

tmshort commented Aug 21, 2023

/retest

@stevekuznetsov
Copy link
Member Author

Reproducing the failure locally, looking into it.

Using a full LIST+WATCH is an optimization, with trade-offs. Holding the
state of the world for CRDs in memory when we rarely, if ever, actually
need to access them is a bad use of that trade-off, especially when the
sum total size of CRDs on even the most basic cluster is O(20MiB).

Signed-off-by: Steve Kuznetsov <[email protected]>
@stevekuznetsov
Copy link
Member Author

@tmshort should be good to go now

@tmshort
Copy link
Contributor

tmshort commented Aug 21, 2023

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 21, 2023
@openshift-merge-robot openshift-merge-robot merged commit 12437b3 into operator-framework:master Aug 21, 2023
15 checks passed
@openshift-ci-robot
Copy link
Collaborator

@stevekuznetsov: Jira Issue OCPBUGS-17157: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-17157 has not been moved to the MODIFIED state.

In response to this:

Using a full LIST+WATCH is an optimization, with trade-offs. Holding the state of the world for CRDs in memory when we rarely, if ever, actually need to access them is a bad use of that trade-off, especially when the sum total size of CRDs on even the most basic cluster is O(20MiB).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants