Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-1.5] feat: external load balancer garbage collection (part 2) - new gc service #3632

Merged

Conversation

richardcase
Copy link
Member

This is a manual cherry-pick of #3610

What type of PR is this?

/kind feature

What this PR does / why we need it:

This introduces a new garbage collection service which will be used later by the controllers to do clean up of AWS resources for workload clusters that were created via the CCM. Initially, we support deleting load balancers (classic elb, nlb, alb), target groups and security groups.

The service is design to be called by the infra cluster reconcilers when a cluster is delete. The entry point is ReconcileDelete into the service.

When a cluster is deleted the cluster controllers will call ReconcileDelete. The first job is to determine if the workload cluster being deleted should be garbage collected. A cluster will be garbage collected if either of these are true:

  • the aws.cluster.x-k8s.io/external-resource-gc annotation is absent
  • the aws.cluster.x-k8s.io/external-resource-gc annotation exists and its value is set to true

If a cluster is to be garbage collected then we need to identify the AWS resources that where created by the CCM in the workload cluster. This is done by using the AWS resource tagging api and getting resources with the Kubernetes cluster owned label (i.e. kubernetes.io/cluster/[CLUSTERNAME]). The resources that are returned are converted and then the list of them is passed to the defined cleanup functions in order. The order of the functions matter because some AWS resources cannot be delete before others. For example, target groups can only be deleted after load balancers. The cleanup function will then iterate over the resources and decide if it wants to handle that resource and delete it from AWS.

The service will be use by a later PR to the controllers and so is initially unused.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Relates #1718

Special notes for your reviewer:

This part 2 of 4 of changes (i.e. a stack) to implement the garbage collection. This is work to split up the original pr #3518

Checklist:

  • squashed commits
  • adds unit tests

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 29, 2022
@k8s-ci-robot
Copy link
Contributor

@richardcase: This issue is currently awaiting triage.

If CAPA/CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jul 29, 2022
@richardcase richardcase changed the title [release-1.5] 1718 gc new service 1 5 [release-1.5] feat: external load balancer garbage collection (part 2) - new gc service Jul 29, 2022
@sedefsavas
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 29, 2022
@sedefsavas
Copy link
Contributor

/test ?

@k8s-ci-robot
Copy link
Contributor

@sedefsavas: The following commands are available to trigger required jobs:

  • /test pull-cluster-api-provider-aws-build
  • /test pull-cluster-api-provider-aws-build-release-1-5
  • /test pull-cluster-api-provider-aws-e2e-blocking-release-1-5
  • /test pull-cluster-api-provider-aws-test-release-1-5
  • /test pull-cluster-api-provider-aws-verify-release-1-5

The following commands are available to trigger optional jobs:

  • /test pull-cluster-api-provider-aws-apidiff-release-1-5
  • /test pull-cluster-api-provider-aws-e2e-conformance-release-1-5
  • /test pull-cluster-api-provider-aws-e2e-conformance-with-ci-artifacts-release-1-5
  • /test pull-cluster-api-provider-aws-e2e-eks-release-1-5
  • /test pull-cluster-api-provider-aws-e2e-release-1-5

Use /test all to run the following jobs that were automatically triggered:

  • pull-cluster-api-provider-aws-apidiff-release-1-5
  • pull-cluster-api-provider-aws-build
  • pull-cluster-api-provider-aws-build-release-1-5
  • pull-cluster-api-provider-aws-e2e-blocking-release-1-5
  • pull-cluster-api-provider-aws-test-release-1-5
  • pull-cluster-api-provider-aws-verify-release-1-5

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sedefsavas
Copy link
Contributor

We might need to backport a linter update: #3617

@sedefsavas
Copy link
Contributor

#3637 should fix it.

This introduces a new garbage collection service which will be used later
by the controllers to do clean up of AWS resources for child/tenant
clusters that where created via the CCM. Initially we support clearing
up load balancers (classic elb, nlb, alb) and security groups.

It works in 2 phases depending on if a child cluster is being
created/update (i.e. **Reconcile**) or deleted (i.e.
**ReconcileDelete**).

When a cluster is created the cluster controllers will call **Reconcile**. Its
purpose is to deteremine if the cluster should be garbage collected and
if it should then it needs marked. If the gc feature is enabled we will
operate an opt-out model, so all clusters will be garbage collected
unless they exlicitly opt out. To opt out the infra cluster must have
the `aws.cluster.x-k8s.io/external-resource-gc` annotation with a value
of false. Otherwise a cluster is marked as requiring gc by adding the
`awsexternalresourcegc.infrastructure.cluster.x-k8s.io` finalizer to the
infra cluster.

When a cluster is deleted the cluster controllers will call
**ReconcileDelete**. The first job is to identify the AWS resources that
where created by the CCM in the child/tenant cluster. This is done by
using the AWS resource tagging api and getting resources with the
kubernetes cluster owned label. The resources that are returned are then
put into buckets for each AWS resource type (i.e. ec2,
elasticloadbalancing) and then these buckets of resource ARNs are passed
to a function for that specific AWS service which will do teh actual API
calls to clear up the AWS resources. The reason we use buckets is that
the order in which you delete services can matter, for example load
balancers must be deleted before target groups.

The service will be use by a later change to the controllers.

Signed-off-by: Richard Case <[email protected]>
Changed the gc service based on the lateset proposal update & review
feedback. It now uses a model where "the user decides whether
to enable the gc at any time before they delete the cluster.".

This translates into the `ReconcileDelete` now:
- Checks to see ig the cluster should be garbage collected by looking at
  the gc annotation (if it exists).
- Does resource cleanup

Signed-off-by: Richard Case <[email protected]>
The the cleanup functions now accepted the full list of AWS resources
and then they filter which resources they want to delete themselves.

Signed-off-by: Richard Case <[email protected]>
@richardcase richardcase force-pushed the 1718_gc_new_service_1-5 branch from 74c8471 to a65ccb2 Compare August 1, 2022 10:11
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 1, 2022
@richardcase
Copy link
Member Author

/test pull-cluster-api-provider-aws-e2e-blocking-release-1-5

@richardcase
Copy link
Member Author

/test pull-cluster-api-provider-aws-verify-release-1-5

@Ankitasw
Copy link
Member

Ankitasw commented Aug 1, 2022

@richardcase some issue with multitenancy test case 🤔 ?

@richardcase
Copy link
Member Author

/test pull-cluster-api-provider-aws-e2e-blocking-release-1-5

@richardcase
Copy link
Member Author

All passing now :)

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: richardcase

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 3, 2022
@Skarlso
Copy link
Contributor

Skarlso commented Aug 3, 2022

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 3, 2022
@k8s-ci-robot k8s-ci-robot merged commit 6e634b9 into kubernetes-sigs:release-1.5 Aug 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants