Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROX-17987: Remove unused Operator deployments #1118

Merged
merged 8 commits into from
Jun 29, 2023

Conversation

kurlov
Copy link
Member

@kurlov kurlov commented Jun 21, 2023

Description

Add method for removing unused operator deployments based on desired slice of operator images. So any deployed image which is not presented in the desired slice should be removed.

Checklist (Definition of Done)

  • Unit and integration tests added
    - [ ] Added test description under Test manual
    - [ ] Documentation added if necessary (i.e. changes to dev setup, test execution, ...)
  • CI and all relevant tests are passing
  • Add the ticket number to the PR title if available, i.e. ROX-12345: ...
    - [ ] Discussed security and business related topics privately. Will move any security and business related topics that arise to private communication channel.
    - [ ] Add secret to app-interface Vault or Secrets Manager if necessary
    - [ ] RDS changes were e2e tested manually
    - [ ] Check AWS limits are reasonable for changes provisioning new resources

Test manual

# To run tests locally run:
make deploy/dev-fast

@kurlov kurlov temporarily deployed to development June 21, 2023 15:15 — with GitHub Actions Inactive
@kurlov kurlov temporarily deployed to development June 21, 2023 15:15 — with GitHub Actions Inactive
@kurlov kurlov temporarily deployed to development June 21, 2023 15:15 — with GitHub Actions Inactive
@@ -129,6 +130,24 @@ func (u *ACSOperatorManager) ListVersionsWithReplicas(ctx context.Context) (map[
return versionWithReplicas, nil
}

// DeleteOperator removes specified operator deployment from the cluster
func (u *ACSOperatorManager) DeleteOperator(ctx context.Context, version string) error {
Copy link
Member

@SimonBaeumer SimonBaeumer Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this function executed and operator versions passed as an input parameter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -129,6 +130,24 @@ func (u *ACSOperatorManager) ListVersionsWithReplicas(ctx context.Context) (map[
return versionWithReplicas, nil
}

// DeleteOperator removes specified operator deployment from the cluster
func (u *ACSOperatorManager) DeleteOperator(ctx context.Context, version string) error {
depName := operatorDeploymentPrefix + "-" + version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've expected a function which will receive all expected operator versions, list all installed deployments and tries to delete all deployments which are not used anymore.
Similar to a garbage collector.

Additionally the computation of the name does not match the computation of the helm chart (i.e. the chart truncates the deployment name).
To keep the logic in one place please move the name computation to Go instead of the Helm template and reuse the function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've update the method. It can handle multiple deletion and it expects a slice of images to delete. I think it's convenient to use mages because they are the real representation of the operator deployment and images are used to install multiple operator versions. So it's reasonable to send images to delete function as well.

Also deploymentName chart val is added

@kurlov kurlov temporarily deployed to development June 22, 2023 15:45 — with GitHub Actions Inactive
@kurlov kurlov temporarily deployed to development June 22, 2023 15:45 — with GitHub Actions Inactive
@kurlov kurlov temporarily deployed to development June 22, 2023 15:45 — with GitHub Actions Inactive
@kurlov kurlov requested a review from SimonBaeumer June 22, 2023 15:49
Copy link
Member

@SimonBaeumer SimonBaeumer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good overall, only some nit-picks.

fleetshard/pkg/central/operator/upgrade.go Outdated Show resolved Hide resolved
fleetshard/pkg/central/operator/upgrade.go Outdated Show resolved Hide resolved
fleetshard/pkg/central/operator/upgrade.go Outdated Show resolved Hide resolved
fleetshard/pkg/central/operator/upgrade.go Outdated Show resolved Hide resolved

// delete multiple versions
err = u.InstallOrUpgrade(ctx, operatorImages, crdTag1)
require.NoError(t, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you assert the operators were installed?
Imho it is not always necessary to execute InstallOrUpgrade, you can also directly pass Deployment object to the fake client.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like your suggestion about passing Deployment object to the fake client.
I also added small check that only deployment is delete (e.g. serviceAccount still persists)

@kurlov kurlov temporarily deployed to development June 23, 2023 12:35 — with GitHub Actions Inactive
@kurlov kurlov temporarily deployed to development June 23, 2023 12:35 — with GitHub Actions Inactive
@kurlov kurlov temporarily deployed to development June 23, 2023 12:35 — with GitHub Actions Inactive
@kurlov kurlov requested a review from SimonBaeumer June 23, 2023 12:37
fleetshard/pkg/central/operator/upgrade.go Outdated Show resolved Hide resolved
for _, deployment := range deployments.Items {
for _, container := range deployment.Spec.Template.Spec.Containers {
if container.Name == "manager" && slices.Contains(images, container.Image) {
deleteDeps = append(deleteDeps, deployment.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can assume that all deployments are operator deployments because you've selected them by using the label rhacs-operator.
To check whether an operator should be garbage collected you can re-compute the name instead of using the images to track an unused deployment.
Alternateviley you could also maintain a data structure internally which maps deployment names to images before the Helm chart is applied, like a cache.

WDYTH about these approaches?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've change the method so now it cleans up unused operators. Although, I still stick to use images for checking what to delete because parsing deployment names introduces new potentials error to check.
Do you think it's better to call parseOperatorImages?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's better to call parseOperatorImages?

What do you mean? 🤔

Although, I still stick to use images for checking what to delete because parsing deployment names introduces new potentials error to check.

Sounds good!

@kurlov kurlov temporarily deployed to development June 26, 2023 12:47 — with GitHub Actions Inactive
@kurlov kurlov temporarily deployed to development June 26, 2023 12:47 — with GitHub Actions Inactive
@kurlov kurlov temporarily deployed to development June 26, 2023 12:47 — with GitHub Actions Inactive
@kurlov kurlov temporarily deployed to development June 26, 2023 12:51 — with GitHub Actions Inactive
@kurlov kurlov temporarily deployed to development June 26, 2023 12:51 — with GitHub Actions Inactive
@kurlov kurlov temporarily deployed to development June 26, 2023 12:51 — with GitHub Actions Inactive
@kurlov kurlov temporarily deployed to development June 28, 2023 09:15 — with GitHub Actions Inactive
@kurlov kurlov temporarily deployed to development June 28, 2023 09:15 — with GitHub Actions Inactive
@kurlov kurlov temporarily deployed to development June 28, 2023 09:15 — with GitHub Actions Inactive
@kurlov kurlov requested a review from SimonBaeumer June 28, 2023 13:06
@kurlov kurlov changed the title ROX-17987: Add delete Operator deployment ROX-17987: Remove unused Operator deployments Jun 28, 2023
fleetshard/pkg/central/operator/upgrade.go Outdated Show resolved Hide resolved
@@ -115,6 +117,46 @@ func (u *ACSOperatorManager) ListVersionsWithReplicas(ctx context.Context) (map[
return versionWithReplicas, nil
}

// RemoveUnusedOperators removes unused operator deployments from the cluster
func (u *ACSOperatorManager) RemoveUnusedOperators(ctx context.Context, desiredImages []string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curiosity: What was the reason for you to not go with GarbageCollectOperators or similar?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to put Garbage next to the Operator.
Seriously though, garbage collect sounds like it's about interacting with memory, having cycles/references/generations. But we just deleting deployment(s) and letting k8s do the rest for us

@openshift-ci openshift-ci bot added the lgtm label Jun 29, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 29, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kurlov, SimonBaeumer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [SimonBaeumer,kurlov]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 29, 2023

New changes are detected. LGTM label has been removed.

@openshift-ci openshift-ci bot removed the lgtm label Jun 29, 2023
@kurlov kurlov temporarily deployed to development June 29, 2023 08:27 — with GitHub Actions Inactive
@kurlov kurlov temporarily deployed to development June 29, 2023 08:28 — with GitHub Actions Inactive
@kurlov kurlov temporarily deployed to development June 29, 2023 08:28 — with GitHub Actions Inactive
@kurlov kurlov merged commit ac38434 into main Jun 29, 2023
@kurlov kurlov deleted the akurlov/ROX-17987-add-delete-operator-method branch June 29, 2023 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants