Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-4210: Add ImageGCMaximumAge KEP #4211

Merged
merged 2 commits into from
Oct 5, 2023

Conversation

haircommander
Copy link
Contributor

  • One-line PR description:
    Add Kubelet option to specify the maximum age an image will be kept around before it is garbage collected
  • Other comments:

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. labels Sep 14, 2023
@haircommander haircommander marked this pull request as draft September 14, 2023 21:00
@haircommander haircommander changed the title WIP KEP-4210: Begin scaffolding ImageMaxGCAge KEP KEP-4210: Add ImageGCMaximumAge KEP Sep 14, 2023
Copy link
Member

@BenTheElder BenTheElder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the current system critical images?
Won't images like kube-proxy etc wind up getting marked eventually? Do we have a way to exclude these yet other than the sandbox image?

@pacoxu
Copy link
Member

pacoxu commented Sep 15, 2023

What about the current system critical images?
Won't images like kube-proxy etc wind up getting marked eventually? Do we have a way to exclude these yet other than the sandbox image?

We may pin images with cluster-critical and node-crictil priority like kubernetes/kubernetes#118544 the sandbox image.

@kannon92
Copy link
Contributor

I was curious. Since this is mostly adding a new policy to ImageGC and it should be completely opt-in. Would there be possibility in skipping the alpha phase?

I remember that there are some examples of KEPs that go directly into beta phase. Its mostly a thought for @haircommander and sig-node about the complexity of this KEP once it is finished.

@haircommander
Copy link
Contributor Author

What about the current system critical images?
Won't images like kube-proxy etc wind up getting marked eventually? Do we have a way to exclude these yet other than the sandbox image?

I don't think so. Here I define used as "no container running with that image in the specified time". The idea is to define a default so that any "critical" images would be used more frequently than the default. Besides, in the bad case here the image will be made available by an image pull

We may pin images with cluster-critical and node-crictil priority like kubernetes/kubernetes#118544 the sandbox image.

This is up to the CRI to do, though both CRI implementations have support for pinning images AFAIU so if users wanted this feature but were anxious about a very big image or something they could pin it

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 18, 2023
@haircommander haircommander marked this pull request as ready for review September 19, 2023 15:18
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 19, 2023
@haircommander haircommander force-pushed the max-image-gc branch 2 times, most recently from 57dffa7 to 9dacaf0 Compare September 19, 2023 15:44
ImageMaximumGCAge metav1.Duration
```

To begin, this option will be set to the maximum duration (math.MaxInt64), which effectively disables.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make more sense that if this isn't explicitly set, then just don't run this policy?

That seems to be more aligned with the API way of adding new fields?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe there is an unset for it, as metav1.Duration just wraps and marshals a time.Duration, which is just an int64--no pointers involved. Other fields have +optional on, but they default to 0. We could default to 0 in this case, interpreting it as "off" but that would prevent users from setting ImageMaximumGCAge to zero: meaning GC immediately

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've decided to set it to 0 by default and add +optional, along with adding a note that says 0 will disable the field

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. API review might point out that all new fields should be pointers but I am not sure if that is followed for configurations. THis would be the case for new API fields.

#### Beta

- Gather feedback from users
- Depending on PRR review and SIG-Node opt-in, this feature could start as a disabled Beta.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not a good practice. Let's not do it


Currently, all image garbage collection the Kubelet is triggered by disk usage going over a threshold (ImageGCLowThresholdPercent).
However, there are cases that additional conditions could be considered useful. One such condition is maximum age of an image.
If an image is unused for a long time (the exact amount of time will be decided, but on the order of weeks is what comes to mind),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that time for the default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah

@haircommander
Copy link
Contributor Author

updated based on feedback--I added some sections to Alternatives and to the Proposal

Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keps/sig-node/4210-max-image-gc-age/README.md Show resolved Hide resolved
keps/sig-node/4210-max-image-gc-age/README.md Show resolved Hide resolved

###### How can this feature be enabled / disabled in a live cluster?

- [ ] Feature gate (also fill in values in `kep.yaml`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaict we still require a feature gate, even for optional fields like this one.

@haircommander
Copy link
Contributor Author

thanks @soltysh ! updated

Copy link
Member

@johnbelamaric johnbelamaric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment but I am ok with this for PRR

keps/sig-node/4210-max-image-gc-age/README.md Outdated Show resolved Hide resolved
- Good testing will mitigate/fix any errors
- New, undiscovered races
- If the max image gc age is set very low, will the kubelet race with itself and remove the image right after pulling it?
- May need to define a minimum maximum gc age to prevent races like this.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is important question to answer on design stage. We can explicitly say that current iteration will set the high threshold.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean by the high threshold @SergeyKanzhelev ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry. Let's set the minimul available as 30 minuutes in alpha. And say we can allow less in future

@kannon92
Copy link
Contributor

kannon92 commented Oct 5, 2023

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 5, 2023
Copy link
Member

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

there is still a small comment on the behavior of sandbox image that wasn't marked as Pinned. But it feels to be something we may discover and protect from in beta.

If you can put it explicitly as something to protect from - it will be great.

@johnbelamaric
Copy link
Member

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: haircommander, johnbelamaric, mrunalp, SergeyKanzhelev

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 5, 2023
@haircommander
Copy link
Contributor Author

there is still a small comment on the behavior of sandbox image that wasn't marked as Pinned. But it feels to be something we may discover and protect from in beta.

I would be shocked if the pause image managed to be unused for long enough to qualify TBH

@SergeyKanzhelev
Copy link
Member

there is still a small comment on the behavior of sandbox image that wasn't marked as Pinned. But it feels to be something we may discover and protect from in beta.

I would be shocked if the pause image managed to be unused for long enough to qualify TBH

How will kubelet know it is used? Kubelet only knows about the user containers, not the sandbox?

@haircommander
Copy link
Contributor Author

there is still a small comment on the behavior of sandbox image that wasn't marked as Pinned. But it feels to be something we may discover and protect from in beta.

I would be shocked if the pause image managed to be unused for long enough to qualify TBH

How will kubelet know it is used? Kubelet only knows about the user containers, not the sandbox?

ah true good point. yeah I will investigate that and double check it isn't qualified.

@k8s-ci-robot k8s-ci-robot merged commit a1fa456 into kubernetes:master Oct 5, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.29 milestone Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Development

Successfully merging this pull request may close these issues.