Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Calico] Fix delay setting up ip routes in new nodes #4588

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions docs/networking.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,15 @@ For help with Calico or to report any issues:

Calico currently uses etcd as a backend for storing information about workloads and policies. Calico does not interfere with normal etcd operations and does not require special handling when upgrading etcd. For more information please visit the [etcd Docs](https://coreos.com/etcd/docs/latest/)

#### Calico troubleshooting
##### New nodes are taking minutes for syncing ip routes and new pods on them can't reach kubedns
This is caused by nodes in the Calico etcd nodestore no longer existing. Due to the ephemeral nature of AWS EC2 instances, new nodes are brought up with different hostnames, and nodes that are taken offline remain in the Calico nodestore. This is unlike most datacentre deployments where the hostnames are mostly static in a cluster. Read more about this issue at https://github.com/kubernetes/kops/issues/3224
This has been solved in kops 1.8.2, when creating a new cluster no action is needed, but if the cluster was created with a prior kops version the following actions should be taken:
* Use kops to update the cluster ```kops update cluster <name> --yes```
* Delete all calico-node pods in kube-system namespace, so that they will apply the new env CALICO_K8S_NODE_REF and update the current nodes in etcd
* Decommission all invalid nodes, [see here](https://docs.projectcalico.org/v2.6/usage/decommissioning-a-node)
* All nodes that are deleted from the cluster after this actions should be cleaned from calico's etcd storage and the delay programming routes should be solved.

### Canal Example for CNI and Network Policy

Canal is a project that combines [Flannel](https://github.com/coreos/flannel) and [Calico](http://docs.projectcalico.org/latest/getting-started/kubernetes/installation/hosted/) for CNI Networking. It uses Flannel for networking pod traffic between hosts via VXLAN and Calico for network policy enforcement and pod to pod traffic.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,11 @@ spec:
# Disable file logging so `kubectl logs` works.
- name: CALICO_DISABLE_FILE_LOGGING
value: "true"
# Set noderef for node controller.
- name: CALICO_K8S_NODE_REF
valueFrom:
fieldRef:
fieldPath: spec.nodeName
# Auto-detect the BGP IP address.
- name: IP
value: ""
Expand Down Expand Up @@ -322,6 +327,13 @@ spec:
requests:
cpu: 10m
env:
# By default only policy, profile, workloadendpoint are turned
# on, node controller will decommission nodes that do not exist anymore
# this and CALICO_K8S_NODE_REF in calico-node fixes #3224, but invalid nodes that are
# already registered in calico needs to be deleted manually, see
# https://docs.projectcalico.org/v2.6/usage/decommissioning-a-node
- name: ENABLED_CONTROLLERS
value: policy,profile,workloadendpoint,node
# The location of the Calico etcd cluster.
- name: ETCD_ENDPOINTS
valueFrom:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,11 @@ spec:
# Disable file logging so `kubectl logs` works.
- name: CALICO_DISABLE_FILE_LOGGING
value: "true"
# Set noderef for node controller.
- name: CALICO_K8S_NODE_REF
valueFrom:
fieldRef:
fieldPath: spec.nodeName
# Auto-detect the BGP IP address.
- name: IP
value: ""
Expand Down Expand Up @@ -287,6 +292,13 @@ spec:
requests:
cpu: 10m
env:
# By default only policy, profile, workloadendpoint are turned
# on, node controller will decommission nodes that do not exist anymore
# this and CALICO_K8S_NODE_REF in calico-node fixes #3224, but invalid nodes that are
# already registered in calico needs to be deleted manually, see
# https://docs.projectcalico.org/v2.6/usage/decommissioning-a-node
- name: ENABLED_CONTROLLERS
value: policy,profile,workloadendpoint,node
# The location of the Calico etcd cluster.
- name: ETCD_ENDPOINTS
valueFrom:
Expand Down
4 changes: 2 additions & 2 deletions upup/pkg/fi/cloudup/bootstrapchannelbuilder.go
Original file line number Diff line number Diff line change
Expand Up @@ -476,8 +476,8 @@ func (b *BootstrapChannelBuilder) buildManifest() (*channelsapi.Addons, map[stri
key := "networking.projectcalico.org"
versions := map[string]string{
"pre-k8s-1.6": "2.4.1",
"k8s-1.6": "2.6.2",
"k8s-1.7": "2.6.2",
"k8s-1.6": "2.6.6-kops.1",
"k8s-1.7": "2.6.6-kops.1",
}

{
Expand Down