-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
can't upgrade to latest containerd #1637
Comments
Quicker testing using this Dockerfile: FROM kindest/node:v1.18.2
ARG CONTAINERD_VERSION="v1.4.0-beta.0-2-g6312b52d"
RUN echo "Installing containerd ..." \
&& export ARCH=$(dpkg --print-architecture | sed 's/ppc64el/ppc64le/' | sed 's/armhf/arm/') \
&& export CONTAINERD_BASE_URL="https://github.com/kind-ci/containerd-nightlies/releases/download/containerd-${CONTAINERD_VERSION#v}" \
&& curl -sSL --retry 5 --output /tmp/containerd.tgz "${CONTAINERD_BASE_URL}/containerd-${CONTAINERD_VERSION#v}.linux-${ARCH}.tar.gz" \
&& tar -C /usr/local -xzvf /tmp/containerd.tgz \
&& rm -rf /tmp/containerd.tgz \
&& rm -f /usr/local/bin/containerd-stress /usr/local/bin/containerd-shim-runc-v1 \
&& curl -sSL --retry 5 --output /usr/local/sbin/runc "${CONTAINERD_BASE_URL}/runc.${ARCH}" \
&& chmod 755 /usr/local/sbin/runc \
&& containerd --version
|
Narrowed it to something in containerd/containerd 97ca1be0..7a5fcf61, we don't appear to have any nightly builds between these. |
Noting references to hugetlb in the diff ... https://github.com/containerd/containerd/compare/97ca1be0..7a5fcf61#diff-205a6ca18f070382cf15fbec12476bc5R558 It seems like perhaps the CRI change there started to require hugetlb ... |
cc @dims @mikebrow @bg-chun -- is it expected that systems without hugetlb cgroup cannot create containers / k8s pods now? Failure looks like this: #1634 (comment)
Suspect containerd/cri#1332 |
@dims suggested trying different runc but using this FROM kindest/node:v1.18.2
ARG CONTAINERD_VERSION="v1.4.0-beta.0-2-g6312b52d"
RUN echo "Installing runc ..." \
&& export ARCH=$(dpkg --print-architecture | sed 's/ppc64el/ppc64le/' | sed 's/armhf/arm/') \
&& export CONTAINERD_BASE_URL="https://github.com/kind-ci/containerd-nightlies/releases/download/containerd-${CONTAINERD_VERSION#v}" \
&& curl -sSL --retry 5 --output /usr/local/sbin/runc "${CONTAINERD_BASE_URL}/runc.${ARCH}" \
&& chmod 755 /usr/local/sbin/runc \
&& runc --version To run with:
(aka what runc containerd HEAD is specifying, but with containerd 1.3.X) shows no issues. |
Same issue when using runc rc9, rc10, and runc from HEAD ( |
@bg-chun i believe, this broke after the containerd/cri got updated with https://github.com/containerd/cri/pull/1332/files#diff-97925bda0bb83f5a74a5c8152bc190ceR451-R456 Can you please take a look? |
@dims my read of the code.. kubelet sent containerd-CRI a runtime.CreateContainerRequest that contained an array of HugepageLimit. containerd-CRI returned the error due to the system not supporting the request. It worked before.. only because we were ignoring the the array of HugepageLimit. We'll need a switch in containerd/cri to request containerd to ignore the array of HugepageLimit in container requests (create and update). Or kublet could hold the switch to ignore them and filter before sending. Probably best to skip hugepage tests on platforms without support, or somehow configure them to expect fail for said platform(s)? |
@mikebrow ack. Please see below: So it looks like we need to detect if the cgroups controller for hugetlb is present and active in two places
#1 is definitely the longer term fix, kubelet should not be requesting containerd for hugetlb stuff if we already know that it does not exist Looks like we can borrow code from cri-o: |
@dims I'm torn on what the default should be, have to think it over. |
I see cri-o is silently ignoring.. tyranny of the first to ignore... will also ignore by default. |
@mikebrow i threw a PR for @BenTheElder to try when he gets a chance - containerd/cri#1501 i can't really test as i don't have access to the environment (gLinux) |
@BenTheElder can you please try containerd/cri#1501 and let us know? |
Sorry for the delay, AFAICT containerd/cri#1501 does the trick. Full details ...Using this to test: FROM kindest/node:v1.18.2
COPY runc /usr/local/sbin/runc
COPY ctr containerd* /usr/local/bin/
RUN echo "Installing containerd ..." \
&& containerd --version \
&& runc --version Binaries are coming from This fails, but when I swap the $ docker exec kind-control-plane containerd --version
time="2020-06-02T05:02:33.919837002Z" level=warning msg="This customized containerd is only for CI test, DO NOT use it for distribution."
containerd github.com/containerd/containerd d7ce093d-TEST To futher sanity check, the |
thanks @BenTheElder |
update for those following along: containerd/cri#1501 is in and seems to mitigate, we'll need to get it into containerd/containerd next, and then it perhaps makes sense for kubelet to mitigate as well. |
Well this fix is nearly in, but sadly we now can't update due to CI becoming much less stable about deleting containers somewhere in v1.3.3-14-g449e9269 to v1.3.4-12-g1e902b2d, we've rolled back to v1.3.3-14-g449e9269 for now 😞 |
We've been reproducing / bisecting this in GCE kube-up.sh based (not KIND) CI and currently suspect containerd/containerd@b26e6ab |
kind-ci/containerd-nightlies#13 vendor.conf format changed, breaking the runc commit lookup. @dims figured it out and fixed it 🙏 |
we are back up to HEAD |
See: #1634 for prior context.
In short: When running on gLinux upgrading containerd to the latest 1.4.X development nightlies breaks cluster bringup. Containers can't come up due to failing to set hugetlb limit. gLinux does not appear to have hugetlb on the host. This works fine under docker for mac and in our CI.
Using https://github.com/kind-ci/containerd-nightlies I'm searching for the break (NOTE: the commit is really the only relevant part, the "1.3.0" binaries here are not necessarily from the 1.3 branch).
The text was updated successfully, but these errors were encountered: