Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crictl images prune #399

Closed
steven-sheehy opened this issue Nov 2, 2018 · 15 comments · Fixed by #555
Closed

crictl images prune #399

steven-sheehy opened this issue Nov 2, 2018 · 15 comments · Fixed by #555

Comments

@steven-sheehy
Copy link

I just switched to CRI-O and crictl and I'm trying to find an equivalent of the docker command to delete unused images to cleanup disk space. In docker, I would just run docker image prune -a -f to do this. I can't seem to find the equivalent in crictl, so is there one and if not can one be added? Something like crictl images prune or as a new param to existing crictl rmi --prune?

crictl: v1.12.0
crio: 1.11.7

@steven-sheehy
Copy link
Author

And is there any workaround for now to accomplish the same thing with a combination of commands?

@feiskyer
Copy link
Member

feiskyer commented Nov 3, 2018

There is a cleanup function in cri-o repo: https://github.com/kubernetes-sigs/cri-o/blob/master/test/helpers.bash#L301

@steven-sheehy
Copy link
Author

@feiskyer I tried the referenced function. It seems to try to delete all images, not unused images. Of course, the crictl rmi while fail if the images is still being used so it can be used as a workaround. A single command that prunes unused images would still be preferred. I've shorted the workaround to a one-liner:

crictl images -q | xargs -n 1 crictl rmi 2>/dev/null

@feiskyer
Copy link
Member

feiskyer commented Nov 3, 2018

@steven-sheehy Yep, there is no such single command yet as it's not part of CRI.

My concern of doing is that it may break kubelet container lifecycle, e.g. kubelet would pull an image before creating the container. If the prune happens during them, the container creation may be failed. So I think it's better to also consider image pull time when doing prune, but this is not included in CRI.

@steven-sheehy
Copy link
Author

I think the main use case of a prune command would be to be ran manually by a user or after a helm upgrade as part of continuous deployment to free up space. So it's most likely not ran very often to encounter the scenario you mention. And if it does happen to be ran in between image pull and execution, won't the kubelet just try the pull and execute again after the back off period?

@Random-Liu
Copy link
Contributor

This is possible to implement in crictl. We can:

  1. List all images and iterate through them;
  2. ListContainers and check whether an image is being used.

There might be some race condition, but that should be race, and we should get eventual consistency.

@Random-Liu
Copy link
Contributor

Random-Liu commented Nov 9, 2018

I think the main use case of a prune command would be to be ran manually by a user or after a helm upgrade as part of continuous deployment to free up space. So it's most likely not ran very often to encounter the scenario you mention. And if it does happen to be ran in between image pull and execution, won't the kubelet just try the pull and execute again after the back off period?

Kubernetes keep unused images on the node, and considers image locality during scheduling. I'm not sure whether crictl rmi --prune is a pattern we recommend user to easily do on a Kubernetes node.

@steven-sheehy
Copy link
Author

You guys are right, there is some race condition with kubelet, the container runtime and my workaround prune command above. I've since switched to containerd 1.2.0 and right after I do a helm upgrade I perform the prune and now the cluster cannot both terminate or start some pods. This is to be expected except that it never resolves itself by re-pulling the images. The errors in kubectl describe pod show:

Warning  FailedSync  13s (x48 over 1h)   kubelet, edge1  error determining status: rpc error: code = Unknown desc = failed to get image "sha256:7a344aad0fdbe8fd3ebd3ace7268d59946408503db1fe7c171bdb016a51729b7": does not exist

journalctl -fu containerd

Nov 27 17:21:19 edge2 containerd[18897]: time="2018-11-27T17:21:19.928243513Z" level=error msg="ContainerStatus for "319749e2f6c3d6004e2404549d11c09fef96b31dc98f57e0191dab3d115c73b2" failed" error="failed to get image "sha256:fb885d89ea5c35ac02acf79a398b793555cbb3216900f03f4b5f7dc31e595e31": does not exist"
Nov 27 17:21:19 edge2 containerd[18897]: time="2018-11-27T17:21:19.926931932Z" level=error msg="ContainerStatus for "8fd4a8b1cc7a4f76a1b6edc34c7d69bf596dbea79da49659568e9c543054c0b2" failed" error="failed to get image "sha256:0171babfae580e71d90e38c9dd1f8fe8b95532d95425c54c4920e98631656212": does not exist"

journalctl -fu kubelet

Nov 27 17:29:46 edge2 kubelet[18959]: E1127 17:29:46.001742   18959 remote_runtime.go:278] ContainerStatus "ae57ce88d169a01a30d571b0af8a349ebf95caff56a55559764d6aaa47c87b16" from runtime service failed: rpc error: code = Unknown desc = failed to get image "sha256:f4969b2fd68d5c6201512af2af252f415adf8e1d2c25c1640dcca43483d7b22a": does not exist
Nov 27 17:29:46 edge2 kubelet[18959]: E1127 17:29:46.001783   18959 kuberuntime_container.go:391] ContainerStatus for ae57ce88d169a01a30d571b0af8a349ebf95caff56a55559764d6aaa47c87b16 error: rpc error: code = Unknown desc = failed to get image "sha256:f4969b2fd68d5c6201512af2af252f415adf8e1d2c25c1640dcca43483d7b22a": does not exist
Nov 27 17:29:46 edge2 kubelet[18959]: E1127 17:29:46.001804   18959 kuberuntime_manager.go:873] getPodContainerStatuses for pod "egress-bc2lq_production(89681573-dfa8-11e8-ae36-00505691cd09)" failed: rpc error: code = Unknown desc = failed to get image "sha256:f4969b2fd68d5c6201512af2af252f415adf8e1d2c25c1640dcca43483d7b22a": does not exist
Nov 27 17:29:46 edge2 kubelet[18959]: E1127 17:29:46.001822   18959 generic.go:271] PLEG: pod egress-bc2lq/production failed reinspection: rpc error: code = Unknown desc = failed to get image "sha256:f4969b2fd68d5c6201512af2af252f415adf8e1d2c25c1640dcca43483d7b22a": does not exist

@Random-Liu Should I open an issue with containerd? You guys may not recommend pruning, but this issue didn't occur with CRI-O and I think cluster should have eventual consistency, as you mentioned.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 27, 2019
@towolf
Copy link

towolf commented Apr 27, 2019

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 27, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 26, 2019
@towolf
Copy link

towolf commented Jul 26, 2019

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 26, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 24, 2019
@towolf
Copy link

towolf commented Nov 7, 2019

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 7, 2019
praveenkumar added a commit to crc-org/snc that referenced this issue Dec 3, 2019
As part of CRC we disable some of operators like (monitoring/machine-config ..etc.) but
the images are always present on the node since there is no knob on installer side right now
to disable those operators from starting (iirc.) Now overall disk size is increases,
till 4.2 our final disk size was around 2GB but with 4.3 it is increasing around 3GB
and that is because all the other images are added as part of CVO payload.

crictl version used in the RHCOS is `0.1.0` which doesn't have fix
for kubernetes-sigs/cri-tools#399 one yet so
using kubernetes-sigs/cri-tools#399 (comment) as workaround.
praveenkumar added a commit to crc-org/snc that referenced this issue Jan 17, 2020
As part of CRC we disable some of operators like (monitoring/machine-config ..etc.) but
the images are always present on the node since there is no knob on installer side right now
to disable those operators from starting (iirc.) Now overall disk size is increases,
till 4.2 our final disk size was around 2GB but with 4.3 it is increasing around 3GB
and that is because all the other images are added as part of CVO payload.

crictl version used in the RHCOS is `0.1.0` which doesn't have fix
for kubernetes-sigs/cri-tools#399 one yet so
using kubernetes-sigs/cri-tools#399 (comment) as workaround.
praveenkumar added a commit to crc-org/snc that referenced this issue Jan 18, 2020
As part of CRC we disable some of operators like (monitoring/machine-config ..etc.) but
the images are always present on the node since there is no knob on installer side right now
to disable those operators from starting (iirc.) Now overall disk size is increases,
till 4.2 our final disk size was around 2GB but with 4.3 it is increasing around 3GB
and that is because all the other images are added as part of CVO payload.

`crictl images` will list all images, even the one being used, but crictl rmi will
only be able to remove the unused images, and will error out on the other images.

crictl version used in the RHCOS is `0.1.0` which doesn't have fix
for kubernetes-sigs/cri-tools#399 one yet so
using kubernetes-sigs/cri-tools#399 (comment) as workaround.
praveenkumar added a commit to crc-org/snc that referenced this issue Jan 18, 2020
As part of CRC we disable some of operators like (monitoring/machine-config ..etc.) but
the images are always present on the node since there is no knob on installer side right now
to disable those operators from starting (iirc.) Now overall disk size is increases,
till 4.2 our final disk size was around 2GB but with 4.3 it is increasing around 3GB
and that is because all the other images are added as part of CVO payload.

`crictl images` will list all images, even the one being used, but crictl rmi will
only be able to remove the unused images, and will error out on the other images.

crictl version used in the RHCOS is `0.1.0` which doesn't have fix
for kubernetes-sigs/cri-tools#399 one yet so
using kubernetes-sigs/cri-tools#399 (comment) as workaround.
@louygan
Copy link

louygan commented Aug 26, 2021

command like "docker system prune"

sudo crictl ps -a | grep -v Running | awk '{print $1}' | xargs sudo crictl rm && sudo crictl rmi --prune

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants