-
Notifications
You must be signed in to change notification settings - Fork 12
Host reboot, connectivity lost #34
Comments
Same behaviour with 2 hosts. I have an unknown "tunl0" interface:
|
Hi Smana, I'm trying to reproduce this behavior locally. Can you show the output of the following commands?
Regarding |
I've managed to reproduce this issue locally - It looks to me like the Kubelet isn't informing the Calico plugin that the pods have been recreated, and so Calico doesn't know to re-create the caliXXX veths. What I would expect to happen is this:
I'm not seeing step 3 occur, which smells like a bug in the Kubelet to me. We're going to do some more digging on this to see what codepaths are getting hit in the Kubelet, and raise a bug on the Kubernetes GitHub if appropriate. |
thank you, i stay tuned |
In the meantime, as a workaround is there a way to force the step 3 ?
|
Sorry for the delay. To workaround this, you can simply delete the pods using If they are not managed by a ReplicationController, then re-create the pods after deleting them. |
Hi, any news ? Regards, |
Hi Smana, I believe @Symmetric was looking into this, but he's on vacation now. I'll pick up the investigation of this problem today and post my findings to the issue. Casey |
Hi Smana, I've made some progress on this.
Could you try following these steps and let me know if they work? This is what I had to do to fix this issue.
To get this working with Docker IPAM likely requires changes to the Kubelet. I'll have to do some more thinking on what the correct behavior should be there. |
Hi Casey, |
Well, first step i've upgraded kubernetes(kubelet included) to the master version.
It didn't helped. Indeed when i rebooted my host, the pods were created but without ip address. Now i need to figure out how to upgrade |
Hi Smana - I suspect we will be releasing a new version of calico-kubernetes in the next 1-2 days. So, if you can wait that long before testing, that might be the easiest option. If you'd like to test sooner than that, you can follow these steps to build and install calico-kubernetes. To build from source.
To manually install the binary, copy it from
You'll want to remove the Let me know if you have any trouble with these steps and I'll be glad to help. |
Thanks Casey, Regards, |
By the way could you please tell me what version of calico-docker is supposed to be used with the latest version of calico-kubernetes plugins ? |
I'd recommend using any version of calico-docker version v0.5.5 or later. |
Hi @Smana - we've released calico-kubernetes v0.2.0 here. You can download and install the plugin like so:
To enable calico IPAM, edit
Then, restart the kubelet
A version of calico-docker which includes this binary is expected to be released later this week. In the meantime, calico-docker v0.5.5 or later will work fine. |
Thanks Casey, :) |
Well my first tests using the
i download the calico-kubernetes plugin
The network-environment has been updated
finally kubelet is restarted. Then to test it out, i create a pod busybox and try to ping an ip address on another node and name resolution.
Note: the ip address 10.233.0.1 is on another node Using the
I still have the same issue on host reboot. Regards, |
I forgot to mention that i was using kubernetes v1.6.0. |
Hi Smana, I've so far been unable to reproduce your issue on my cluster using the following:
Could you run the following commands to generate a Calico diagnostics bundle on both the source compute host as well as the destination compute host? This will gather all the logs and routing diagnostics to help me debug.
Once you've generated the bundles, could you please send them to me at [email protected] so I can take a look? One thing I did not mention above is that when enabling Calico IPAM, you should make sure that your Docker bridge (cbr0) is not configured with an IP address within the Calico IP pool. This prevents conflicts between the docker bridge (which is still required for running docker) and the IPs assigned by Calico.
|
That was exactly what i needed, the containers networking are now well configured after a reboot.
But now i can't control the subnet assigned to each node. it is randomly set. Regards, |
Glad to hear it's working! Currently, it is not possible to manually assign subnets to each node when using Calico's IPAM. Is this a required feature for you? Calico IPAM does currently assign address blocks to nodes in order to aggregate routes. |
Actually that would give us more control on ip assignement. My second concern is that i prefer to use a released version of kubernetes for stability reasons. Anyway i'll open a new ticket if needed. Thank you. |
I can't close this ticket because i'm not the issuer. |
Actually, considering i'm using dns (managed by kubernetes) for service name resolution i need to solve this issue, the workaround i've described above is not a solution because if the dns server can't be reached my apps won't work properly. More generally, how do we deal with kubernetes services addresses please ? Again, thank you @caseydavenport |
@Smana the service VIPs should be in a different subnet than the IP pools used for pods. This is because the kube-proxy on each node replaces the service VIP with the IP of one of the pods that is backing the service, and so that Service VIP could resolve to an IP in any pool in the cluster. Automatically assigned Service ClusterIPs are controlled by the following kube-apiserver flag:
I'm not sure if you're allowed to manually configure a ClusterIP outside of that range; can you check what you've configured for the service-cluster-ip-range on your API server? |
Are you referring to routes back to your containers here? Do you mean your router does not support BGP and thus you had to manually install routes? Or that BGP distributed routes were not sufficient in some way?
But that clusterIP shouldn't actually be routable in your DC because of the proxying behavior @Symmetric mentioned above.
Understood - Kubernetes v1.1 has just entered code-freeze and will contain the code you need. It should be released in about a month, and then you'll be all set :) |
I've change the kubernetes services addresses sunbet
Yes i configured the routes back. Anyway it is specific to my lab and i will have the target plateform very soon (with physical nodes and bgp)
Cool! i look forward to it. Well that's fine as i said now after a reboot the pods ips are configured
So the ticket can be closed There are 2 remaining problems but they're not directly related the the reboot issue :
Thank you two :) I will soon share my work regarding how to build a full cluster (kubernetes+calico) on debian jessie with Ansible. |
I've raised #49 to discuss your first bullet. The second one doesn't sound like a Calico issue to me, but a Kubernetes issue.
Sounds great, looking forward to it! |
Can you please open a separate issue for this? It could be a Calico issue, so report to us first. Start kubelet with the |
From #33
The text was updated successfully, but these errors were encountered: