Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Fix ClusterCacheTracker memory leak #9543

Conversation

ionutbalutoiu
Copy link
Contributor

Decouple the code that creates uncached client from the code that creates the cached client. This way, we don't start the cache until the cached client is created.

This avoids scenarios where the ClusterCacheTracker cache is started when only an uncached client is needed.

There is an existing code path, described in the original issue, where the ClusterCacheTracker cache is started when only an uncached client is needed. The cache is re-created later, and the initial cache is still running continuously in the background.

What this PR does / why we need it:

Fix the memory leak described in the issue #9542, when cluster kubeconfig is rotated frequently,

Which issue(s) this PR fixes:
Fixes #9542

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 12, 2023
@k8s-ci-robot k8s-ci-robot added the do-not-merge/needs-area PR is missing an area label label Oct 12, 2023
@k8s-ci-robot
Copy link
Contributor

Welcome @ionutbalutoiu!

It looks like this is your first PR to kubernetes-sigs/cluster-api 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/cluster-api has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 12, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @ionutbalutoiu. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Oct 12, 2023
Copy link
Contributor

@killianmuldoon killianmuldoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/area clustercachetracker

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. area/clustercachetracker Issues or PRs related to the clustercachetracker and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. do-not-merge/needs-area PR is missing an area label labels Oct 12, 2023
@ionutbalutoiu ionutbalutoiu changed the title Fix ClusterCacheTracker memory leak 🐛 Fix ClusterCacheTracker memory leak Oct 12, 2023
@ionutbalutoiu ionutbalutoiu force-pushed the fix/cluster-cache-tracker-memory-leak branch from 36c0b2c to 88e53de Compare October 12, 2023 07:57
@ionutbalutoiu ionutbalutoiu force-pushed the fix/cluster-cache-tracker-memory-leak branch from 88e53de to 9dee01c Compare October 12, 2023 08:28
@killianmuldoon
Copy link
Contributor

/test

@k8s-ci-robot
Copy link
Contributor

@killianmuldoon: The /test command needs one or more targets.
The following commands are available to trigger required jobs:

  • /test pull-cluster-api-build-main
  • /test pull-cluster-api-e2e-full-dualstack-and-ipv6-main
  • /test pull-cluster-api-e2e-full-main
  • /test pull-cluster-api-e2e-main
  • /test pull-cluster-api-e2e-mink8s-main
  • /test pull-cluster-api-e2e-workload-upgrade-1-28-latest-main
  • /test pull-cluster-api-test-main
  • /test pull-cluster-api-test-mink8s-main
  • /test pull-cluster-api-verify-main

The following commands are available to trigger optional jobs:

  • /test pull-cluster-api-apidiff-main
  • /test pull-cluster-api-e2e-scale-main-experimental

Use /test all to run the following jobs that were automatically triggered:

  • pull-cluster-api-apidiff-main
  • pull-cluster-api-build-main
  • pull-cluster-api-e2e-main
  • pull-cluster-api-test-main
  • pull-cluster-api-verify-main

In response to this:

/test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@killianmuldoon
Copy link
Contributor

/test pull-cluster-api-e2e-main

@killianmuldoon
Copy link
Contributor

/test pull-cluster-api-e2e-workload-upgrade-1-28-latest-main

@vincepri
Copy link
Member

/assign

Would like some time to review this PR in a bit more details

@ionutbalutoiu ionutbalutoiu force-pushed the fix/cluster-cache-tracker-memory-leak branch from 9dee01c to 8cf7a66 Compare October 15, 2023 16:47
@ionutbalutoiu
Copy link
Contributor Author

@vincepri

When you get the chance, please see the updated PR.

It would be nice to have this fix included in the next CAPI release.

@ionutbalutoiu ionutbalutoiu force-pushed the fix/cluster-cache-tracker-memory-leak branch from 8cf7a66 to 5009688 Compare November 16, 2023 17:06
@ionutbalutoiu
Copy link
Contributor Author

ionutbalutoiu commented Nov 16, 2023

@vincepri Implemented the following changes for your most recent code review.

LE: Also, fixed the above changes linter errors with this.

Thank-you!

@ionutbalutoiu ionutbalutoiu force-pushed the fix/cluster-cache-tracker-memory-leak branch from 5009688 to a1e613e Compare November 16, 2023 18:56
Copy link
Member

@sbueringer sbueringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some smaller findings and answered on the discussion with some more context

controllers/remote/cluster_cache_tracker.go Outdated Show resolved Hide resolved
controllers/remote/cluster_cache_tracker.go Outdated Show resolved Hide resolved
controllers/remote/cluster_cache_tracker.go Outdated Show resolved Hide resolved
controllers/remote/cluster_cache_tracker.go Outdated Show resolved Hide resolved
controllers/remote/cluster_cache_tracker.go Show resolved Hide resolved
controllers/remote/cluster_cache_tracker.go Show resolved Hide resolved
@sbueringer
Copy link
Member

@ionutbalutoiu Sorry for the long delay. I sort of lost track of this PR a bit. Happy to quickly review again and merge once the open points are addressed

Decouple the code that creates uncached client from the code that
creates the cached client. This way, we don't start the cache until
the cached client is created.

This avoids scenarios where the `ClusterCacheTracker` cache is started
when only an uncached client is needed.

There is an existing code path, described in the original issue, where
the `ClusterCacheTracker` cache is started when only an uncached client
is needed. The cache is re-created later, and the initial cache is
still running continuously in the background.

Signed-off-by: Ionut Balutoiu <[email protected]>
@ionutbalutoiu ionutbalutoiu force-pushed the fix/cluster-cache-tracker-memory-leak branch from a1e613e to 97534b8 Compare January 26, 2024 15:22
@ionutbalutoiu
Copy link
Contributor Author

@ionutbalutoiu Sorry for the long delay. I sort of lost track of this PR a bit. Happy to quickly review again and merge once the open points are addressed

Thank-you for taking the time to review this! I've addressed all the remaining review open items. I also did a rebase against latest main branch.

Please take a look at the latest PR changes and let me know if there's anything else.

@sbueringer
Copy link
Member

Thank you very much!

Let's see if we have some luck with cherry-picking

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 26, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 71997f2a16ee7e4a9234d20c20f8fe449eee9a18

@sbueringer
Copy link
Member

/cherry-pick release-1.6

@k8s-infra-cherrypick-robot

@sbueringer: once the present PR merges, I will cherry-pick it on top of release-1.6 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbueringer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sbueringer
Copy link
Member

/cherry-pick release-1.5

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 26, 2024
@k8s-infra-cherrypick-robot

@sbueringer: once the present PR merges, I will cherry-pick it on top of release-1.5 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot merged commit 1bef042 into kubernetes-sigs:main Jan 26, 2024
20 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.7 milestone Jan 26, 2024
@k8s-infra-cherrypick-robot

@sbueringer: new pull request created: #10064

In response to this:

/cherry-pick release-1.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-infra-cherrypick-robot

@sbueringer: new pull request created: #10065

In response to this:

/cherry-pick release-1.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ionutbalutoiu ionutbalutoiu deleted the fix/cluster-cache-tracker-memory-leak branch January 26, 2024 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/clustercachetracker Issues or PRs related to the clustercachetracker cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
6 participants