-
Notifications
You must be signed in to change notification settings - Fork 295
Pods Terminating forever due to Docker 17.09-ce Bug #1135
Comments
Glad someone else reproduced this. I’ve also tried k8s 1.9.x and so far unable to get a stable dev cluster partly due to this problem. |
coreOS releases jumps docker version form 1.12 to 17.09. So 17.03, the only one validated for k8s 1.8 its not available |
Downgrade to 1.12 fix that issue.
|
Related with #1106 |
I was looking at this further for some time but I didn't get far in diagnosing the issue. I know @mumoshu has been running at least a dev cluster on the latest container linux AMI with docker 17.09 and that's been stable. For the last month or so I've been unable to gain a healthy cluster in dev that I can prompt to other environments. I'm considering also downgrading for now. Related comments: |
So I basically believe that:
And for @c-knowles's case, we should just make sure K8S 1.9.x is compatible with the newer docker? @c-knowles Would you mind sharing a detailed configuration of your cluster, so that we could probably diagnose the root cause(s) together with us and folks from upstream? |
@mumoshu sure I'd be happy to diagnose with some help. My dev setup is here, the only bit I simplified was etcd to remove some custom units which install datadog (etcd seems stable anyway). If anyone is interested in the setup for packer, this is the provisioner you need: {
"type": "shell",
"inline": [
"sudo mkdir -p /etc/coreos",
"echo yes | sudo tee /etc/coreos/docker-1.12",
"sudo systemctl stop docker.service",
"sudo rm -rf /var/lib/docker"
]
} |
@c-knowles Thx for sharing! I did read thorough your config. I suspect your t2.medium nodes are loaded too much workloads. |
@mumoshu I can try to roll another cluster side by side with this one as I've already changed it back to docker 1.12. The same node config works fine on docker 1.12 by the way. For workloads, the only thing this cluster is running are the pods that kube-aws creates plus a few nginx/traefik containers that aren't receiving much if any traffic (I'm acceptance testing some basic deploys/helm charts). |
Thx! Then, my best guest at the moment is the newer docker somehow consumes a little bit more cpu than before/has some race-related issue triggered when low cpu. Anyway, that was only thing I could read from the config. If it isn't t2 issue at all, I guess asking docker devs for more assistance would be the only way. |
@mumoshu not sure if you spotted the |
Thx! I didn't realize that but probably it |
This looks related to moby/moby#36048 / moby/moby#36010, which was a bug in RunC (see opencontainers/runc#1698); that bug had been around for a long time, but wasn't triggered until the Meltdown/Spectre patches began to roll out |
I made a PR in order to make AmiID required in cluster.yaml #1201. It will not fix the issue but will prevent random ami updates in cluster update. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
We've a cluster created with v0.9.9, k8s 1.8.4, and docker version 17.09.1-ce
Steps to Reproduce
Enter in a host and run:
docker ps
still displaying the ubuntu containerRelated
moby/moby#33820
Looks like the docker version is not compatible with k8s yet.
How can I found the correct coreOS AMI with the validated docker version
17.03.2
?Thanks in advance
The text was updated successfully, but these errors were encountered: