-
Notifications
You must be signed in to change notification settings - Fork 348
Add option for ignoring volumes defined in images #1504
Add option for ignoring volumes defined in images #1504
Conversation
Hi @lorenz. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comment
we also have a config document..
https://github.com/containerd/cri/blob/master/docs/config.md
@lorenz sounds like this was a concrete issue you ran into. Can you provide link(s)? |
ff69933
to
398fb1e
Compare
I've added a debug log output if we're ignoring volumes and documented the option in config.md. I looked at the commit for the last-added option (TolerateMissingHugePagesCgroupController) and added mine to all the files it touched, but it wasn't added to The main reason why I've been running this patch for quite some time is because of the issue where incorrectly configured volumes end up writing data into an ephemeral containerd-allocated volume and misbehaving containers writing lots of data into the containerd directory. |
/test pull-cri-containerd-verify |
yeah I figured as much.. Which of course was my bad, either I missed it .. or just let it go in that pr review and flagged it as a docs issue to be worked on later, can't remember which was a week ago and seems like months :-) |
Signed-off-by: Lorenz Brun <[email protected]>
398fb1e
to
5a1d49b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@mikebrow Do we actually need the volume here? The user from eBay filed issue about this before https://github.com/containerd/containerd/issues/3511.
The volume handled by CRI plugin doesn't provide interface for user to handle this. The CRI-API spec defines volumes from kubelet, not image config. If the user uses devicemapper as snapshotter with disk quota, the size of rootfs will be ensured with quota. But volume from image config will be leak point.
cc @wenlxie
Hmm. Reading the other issue (I re-assigned it to cri) it sounds like @Random-Liu already had some ideas for properly handling image specified volumes (report disk usage and /or ask kublet to manage these volumes). Maybe we can recruit him for a discussion at a future sig-node call. Either way I think this flag provides a good ignore option for now at least until we have a more elaborate solution to map and/or report usage for the image volumes. |
SGTM |
Is this waiting on something to get merged? |
Just giving maintainers a sufficient opportunity to weigh in. |
This unbreaks bbolt (as part of containerd) on 1.14+ (see etcd-io/bbolt#201 and etcd-io/bbolt#220), pulls in my patch to ignore image-defined volumes (containerd/cri#1504) and gets us some robustness fixes in containerd CNI/CRI integration (containerd/cri#1405). This also updates K8s at the same time since they share a lot of dependencies and only updating one is very annoying. On the K8s side we mostly get the standard stream of fixes plus some patches that are no longer necessary. One annoying on the K8s side (but with no impact to the functionality) are these messages in the logs of various components: ``` W0714 11:51:26.323590 1 warnings.go:67] policy/v1beta1 PodSecurityPolicy is deprecated in v1.22+, unavailable in v1.25+ ``` They are caused by KEP-1635, but there's not explanation why this gets logged so aggressively considering the operators cannot do anything about it. There's no newer version of PodSecurityPolicy and you are pretty much required to use it if you use RBAC. Test Plan: Covered by existing tests Bug: T753 X-Origin-Diff: phab/D597 GitOrigin-RevId: f6c447da1de037c27646f9ec9f45ebd5d6660ab0
This adds a new config option for ignoring volumes defined in image metadata.
In Kubernetes volumes (even temporary ones) are generally specified in the pod configuration. Mapping the ones in the image metadata can lead to resource accounting/exhaustion issues (think writing an unlimited amount of inodes or data into the volume) and policy enforcement issues (a PSP cannot disable these).
Specifically for containers started with
ReadOnlyRootFilesystem
it can also lead to unexpected behavior, for example if there's a typo in the volume mount path the container will still start since the process can write to the data location, but the data ends up in the volume defined by the image which is ephemeral and subsequently gets lost when the pod gets descheduled. The user expected the process in the container to fail since there's no volume mounted at the data path and the root filesystem is readonly, but since the image volume is mounted there this works.