Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 Cache Pods for KCP #11453

Conversation

fabriziopandini
Copy link
Member

@fabriziopandini fabriziopandini commented Nov 20, 2024

What this PR does / why we need it:
This PR enable caching for kubeadm static pods only in KCP.

The main reason while we identified this improvement was to improve KCP rollouts when temporary connection issue happens.

With the previous setup, pod where not cached, and thus every KCP reconcile a series of API calls where issues to gather status of the controlplane components in the workload cluster.

However, in case of temporary connection issues, kcp was "freezing" because stuck in a series of get pod--> wait 10s second timeout.

While looking at options to fix this issue, we figured out that by caching selectively the kubeadm Pods we can achieve a good trade off between:

  • stability (we keep reading from cache/without errors, while informers behind the scene takes care of the connection problems)
  • speed (cache calls are way faster than API calls, at the cost of a little delay in Pod status update which seems acceptable)
  • memory (we are caching up to 4 Pods for every CP machine, which also seems acceptable; additionally, in future we might also consider if to apply transformations and drop most of the Pods content since we care only about few info)

/area provider/control-plane-kubeadm

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-area PR is missing an area label labels Nov 20, 2024
@k8s-ci-robot
Copy link
Contributor

@fabriziopandini: The /test command needs one or more targets.
The following commands are available to trigger required jobs:

  • /test pull-cluster-api-build-main
  • /test pull-cluster-api-e2e-blocking-main
  • /test pull-cluster-api-e2e-conformance-ci-latest-main
  • /test pull-cluster-api-e2e-conformance-main
  • /test pull-cluster-api-e2e-latestk8s-main
  • /test pull-cluster-api-e2e-main
  • /test pull-cluster-api-e2e-mink8s-main
  • /test pull-cluster-api-e2e-upgrade-1-31-1-32-main
  • /test pull-cluster-api-test-main
  • /test pull-cluster-api-test-mink8s-main
  • /test pull-cluster-api-verify-main

The following commands are available to trigger optional jobs:

  • /test pull-cluster-api-apidiff-main

Use /test all to run the following jobs that were automatically triggered:

  • pull-cluster-api-apidiff-main
  • pull-cluster-api-build-main
  • pull-cluster-api-e2e-blocking-main
  • pull-cluster-api-test-main
  • pull-cluster-api-verify-main

In response to this:

/test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Nov 20, 2024
@fabriziopandini
Copy link
Member Author

/test?

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini: The following commands are available to trigger required jobs:

  • /test pull-cluster-api-build-main
  • /test pull-cluster-api-e2e-blocking-main
  • /test pull-cluster-api-e2e-conformance-ci-latest-main
  • /test pull-cluster-api-e2e-conformance-main
  • /test pull-cluster-api-e2e-latestk8s-main
  • /test pull-cluster-api-e2e-main
  • /test pull-cluster-api-e2e-mink8s-main
  • /test pull-cluster-api-e2e-upgrade-1-31-1-32-main
  • /test pull-cluster-api-test-main
  • /test pull-cluster-api-test-mink8s-main
  • /test pull-cluster-api-verify-main

The following commands are available to trigger optional jobs:

  • /test pull-cluster-api-apidiff-main

Use /test all to run the following jobs that were automatically triggered:

  • pull-cluster-api-apidiff-main
  • pull-cluster-api-build-main
  • pull-cluster-api-e2e-blocking-main
  • pull-cluster-api-test-main
  • pull-cluster-api-verify-main

In response to this:

/test?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@fabriziopandini
Copy link
Member Author

/test pull-cluster-api-e2e-main

@fabriziopandini fabriziopandini force-pushed the improve-control-plane-rollout branch from f784a78 to 4e9cb6f Compare November 20, 2024 20:14
@fabriziopandini fabriziopandini changed the title [WIP] 🌱 Improve control-plane rollout 🌱 Cache Pods for KCP Nov 20, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 20, 2024
Comment on lines 345 to 347
req1, _ := labels.NewRequirement("tier", selection.Equals, []string{"control-plane"})
req2, _ := labels.NewRequirement("component", selection.In, []string{"kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd"})
podSelector := labels.NewSelector().Add(*req1, *req2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did elsewhere:

Suggested change
req1, _ := labels.NewRequirement("tier", selection.Equals, []string{"control-plane"})
req2, _ := labels.NewRequirement("component", selection.In, []string{"kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd"})
podSelector := labels.NewSelector().Add(*req1, *req2)
must := func(r *labels.Requirement, err error) labels.Requirement {
if err != nil {
panic(err)
}
return *r
}
podSelector := labels.NewSelector().Add(
must(labels.NewRequirement("tier", selection.Equals, []string{"control-plane"})),
must(labels.NewRequirement("component", selection.In, []string{"kube-apiserver", "kube-controller-manager", "kube-scheduler", "etcd"})),
)

So it also ensures on changes we keep it right :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed,
But we have still 4 places where we are using "req, _ := labels.NewRequirement", I will follow up

@fabriziopandini fabriziopandini added the area/provider/control-plane-kubeadm Issues or PRs related to KCP label Nov 21, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-area PR is missing an area label label Nov 21, 2024
@fabriziopandini fabriziopandini force-pushed the improve-control-plane-rollout branch from 4e9cb6f to 6e059a2 Compare November 21, 2024 08:48
Copy link
Member

@chrischdi chrischdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 21, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: b68b8ad8e833d3f737e76112ec301ed4bc762580

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chrischdi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 21, 2024
@k8s-ci-robot k8s-ci-robot merged commit b7eb8f7 into kubernetes-sigs:main Nov 21, 2024
18 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.9 milestone Nov 21, 2024
@fabriziopandini fabriziopandini deleted the improve-control-plane-rollout branch December 2, 2024 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/control-plane-kubeadm Issues or PRs related to KCP cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants