-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kindest/node entrypoint sometimes fails with exit code 141 (SIGPIPE) #2178
Comments
we've seen something like this before in kubernetes kubernetes/kubernetes#57805, I'm guessing piping to we should use the heredoc approach here as well, it's a bit cheaper / simpler |
thanks for tracking this down and filing the very detailed report :-) |
You're welcome :) How/when is the new version of the script included in the images? I assume something like:
Thx for the fast fix |
we push images as needed, the base image is updated in the fix PR, I'd push a node image but we're already working on revamped node images in #2176 which will hopefully be ready to go soon. If it doesn't we'll push one anyhow, we keep one fairly up to date for HEAD. A full set of images is pushed on release, we were hoping for a release next week, but with people out it may slip another week (I am out all week, it looks like others are out some as well). |
Ah perfect. I think we'll wait for the full set, but a few weeks is what I hoped for so that's great. We're currently using 1.19 and for upgrade tests also a 1.18 image (I'll have to check why not 1.20 and 1.19). |
What happened:
Some context:
Most of it shouldn't matter, but it might be helpful context. From time to time we're seeing that the kindest/node image fails on startup. According to
docker inspect
it fails with exit code141
(logs).docker logs
shows the following:To me it looks like it fails here: https://github.com/kubernetes-sigs/kind/blob/v0.9.0/images/base/files/usr/local/bin/entrypoint#L82
The only reasonable explanation I could come up with is:
echo
sends thedocker_cgroup_mounts
into the pipe (maybe line by line?)head -n 1
exits after the first lineecho
cannot send the other linesI suspect it happens rarely because this might be caused by a CPU context switch (or something like this?) at the wrong moment, because of the high concurrency we have during our tests.
What you expected to happen:
kindest/node should not fail on startup.
How to reproduce it (as minimally and precisely as possible):
Really hard to reproduce. In our case we can be sure it's gone after about 10-20 successful test runs in a row.
Anything else we need to know?:
I changed the following and this seems to fix the issue in my case:
docker_cgroup=$(echo "${docker_cgroup_mounts}" | head -n 1 | cut -d' ' -f 4)
docker_cgroup=$( < <(echo "${docker_cgroup_mounts}") head -n 1 | cut -d' ' -f 4)
The idea is to move the echo out of the pipe so it should not be possible anymore to get a SIGPIPE/ pipefail.
The code in kind on the main branch changed a bit since the version 0.9 we're currently using, but there's one line left which should have the same issue:
https://github.com/kubernetes-sigs/kind/blob/main/images/base/files/usr/local/bin/entrypoint#L232
For reference, the respective cluster-api issue: kubernetes-sigs/cluster-api#4405
Environment:
kind version
): 0.9kubectl version
): 1.19.1docker info
):/etc/os-release
): not sure but the image we're using in the ProwJob is:gcr.io/k8s-testimages/kubekins-e2e:v20210330-fadf59c-go-canary
The text was updated successfully, but these errors were encountered: