-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-4210: Add ImageGCMaximumAge KEP #4211
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the current system critical images?
Won't images like kube-proxy etc wind up getting marked eventually? Do we have a way to exclude these yet other than the sandbox image?
We may pin images with |
I was curious. Since this is mostly adding a new policy to ImageGC and it should be completely opt-in. Would there be possibility in skipping the alpha phase? I remember that there are some examples of KEPs that go directly into beta phase. Its mostly a thought for @haircommander and sig-node about the complexity of this KEP once it is finished. |
I don't think so. Here I define used as "no container running with that image in the specified time". The idea is to define a default so that any "critical" images would be used more frequently than the default. Besides, in the bad case here the image will be made available by an image pull
This is up to the CRI to do, though both CRI implementations have support for pinning images AFAIU so if users wanted this feature but were anxious about a very big image or something they could pin it |
8c99be6
to
f10c442
Compare
57dffa7
to
9dacaf0
Compare
ImageMaximumGCAge metav1.Duration | ||
``` | ||
|
||
To begin, this option will be set to the maximum duration (math.MaxInt64), which effectively disables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make more sense that if this isn't explicitly set, then just don't run this policy?
That seems to be more aligned with the API way of adding new fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe there is an unset for it, as metav1.Duration just wraps and marshals a time.Duration, which is just an int64--no pointers involved. Other fields have +optional on, but they default to 0. We could default to 0 in this case, interpreting it as "off" but that would prevent users from setting ImageMaximumGCAge to zero: meaning GC immediately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've decided to set it to 0 by default and add +optional
, along with adding a note that says 0 will disable the field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. API review might point out that all new fields should be pointers but I am not sure if that is followed for configurations. THis would be the case for new API fields.
9dacaf0
to
074a238
Compare
074a238
to
9b82d66
Compare
#### Beta | ||
|
||
- Gather feedback from users | ||
- Depending on PRR review and SIG-Node opt-in, this feature could start as a disabled Beta. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is not a good practice. Let's not do it
|
||
Currently, all image garbage collection the Kubelet is triggered by disk usage going over a threshold (ImageGCLowThresholdPercent). | ||
However, there are cases that additional conditions could be considered useful. One such condition is maximum age of an image. | ||
If an image is unused for a long time (the exact amount of time will be decided, but on the order of weeks is what comes to mind), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that time for the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah
0cb2ca0
to
7fa9936
Compare
updated based on feedback--I added some sections to Alternatives and to the Proposal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits, the main concern is lack of a feature gate as explained in https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/feature-gates.md
|
||
###### How can this feature be enabled / disabled in a live cluster? | ||
|
||
- [ ] Feature gate (also fill in values in `kep.yaml`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
afaict we still require a feature gate, even for optional fields like this one.
5a94c5a
to
cee9750
Compare
thanks @soltysh ! updated |
cee9750
to
7dd76a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment but I am ok with this for PRR
- Good testing will mitigate/fix any errors | ||
- New, undiscovered races | ||
- If the max image gc age is set very low, will the kubelet race with itself and remove the image right after pulling it? | ||
- May need to define a minimum maximum gc age to prevent races like this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is important question to answer on design stage. We can explicitly say that current iteration will set the high threshold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean by the high threshold @SergeyKanzhelev ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry. Let's set the minimul available as 30 minuutes in alpha. And say we can allow less in future
Signed-off-by: Peter Hunt <[email protected]>
Signed-off-by: Peter Hunt <[email protected]>
7dd76a1
to
9cabce4
Compare
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
there is still a small comment on the behavior of sandbox image that wasn't marked as Pinned
. But it feels to be something we may discover and protect from in beta.
If you can put it explicitly as something to protect from - it will be great.
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: haircommander, johnbelamaric, mrunalp, SergeyKanzhelev The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I would be shocked if the pause image managed to be unused for long enough to qualify TBH |
How will kubelet know it is used? Kubelet only knows about the user containers, not the sandbox? |
ah true good point. yeah I will investigate that and double check it isn't qualified. |
Add Kubelet option to specify the maximum age an image will be kept around before it is garbage collected