Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podCIDR allocation is not working as expected #5231

Closed
sohnaeo opened this issue Oct 2, 2019 · 12 comments · Fixed by #6580
Closed

podCIDR allocation is not working as expected #5231

sohnaeo opened this issue Oct 2, 2019 · 12 comments · Fixed by #6580
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@sohnaeo
Copy link

sohnaeo commented Oct 2, 2019

Problem:

**pods are not getting the ips from the podCIDR which assigned to the nodes **

1- Checkout master branch

2- Create inventory, changed only 3 variables

a) Change the etcd deployment to host

b) Change the pod subnets and service addresses

kube_service_addresses: 10.242.0.0/21
kube_pods_subnet: 10.242.64.0/21
kube_network_node_prefix: 24

3- Once Cluster is up , check the pods CIDR assigned to each node
/usr/local/bin/kubectl get nodes node1 -ojsonpath='{.spec.podCIDR}'
node1-->10.242.64.0/24
node3-->10.242.65.0/24
node4-->10.242.66.0/24
node5-->10.242.67.0/24

4- kubectl apply -f nginx.yml with replicas of 6

nginx-5754944d6c-8kzhj 1/1 Running 0 66m 10.242.70.2 node5
nginx-5754944d6c-b2tvh 1/1 Running 0 66m 10.242.67.3 node4
nginx-5754944d6c-dj4qq 1/1 Running 0 66m 10.242.66.1 node3
nginx-5754944d6c-wbhdb 1/1 Running 0 66m 10.242.70.3 node5
nginx-5754944d6c-x7gdq 1/1 Running 0 66m 10.242.66.2 node3
nginx-5754944d6c-z9vcv 1/1 Running 0 66m 10.242.67.2 node4

look at above pods are getting the ips from range which not assigned to the hosts. It is working fine in kubernetes 1.9

Environment: master branch

  • Cloud provider or hardware configuration: AWS

  • OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):
    NAME="CentOS Linux"
    VERSION="7 (Core)"
    ID="centos"
    ID_LIKE="rhel fedora"
    VERSION_ID="7"
    PRETTY_NAME="CentOS Linux 7 (Core)"
    ANSI_COLOR="0;31"
    CPE_NAME="cpe:/o:centos:centos:7"
    HOME_URL="https://www.centos.org/"
    BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

  • Version of Ansible (ansible --version):
    ansible 2.7.12
    config file = /home/farhan/workspaces/kubespray-orignal/ansible.cfg
    configured module search path = ['/home/farhan/workspaces/kubespray-orignal/library']
    ansible python module location = /usr/lib/python3.7/site-packages/ansible
    executable location = /usr/bin/ansible
    python version = 3.7.4 (default, Jul 16 2019, 07:12:58) [GCC 9.1.0]

**Kubespray version (commit) (git rev-parse --short HEAD):86cc703

Network plugin used: defult

Copy of your inventory file:
[all]
node1 ansible_host=13.x.x.x ip=13.211.170.14 # ip=10.3.0.1 etcd_member_name=etcd1
node2 ansible_host=3.x.x.x ip=3.104.120.158 # ip=10.3.0.2 etcd_member_name=etcd2
node3 ansible_host=13.x.x.x ip=13.210.80.241 # ip=10.3.0.3 etcd_member_name=etcd3

[kube-master]
node1

[etcd]
node5

[kube-node]
node2
node3
node4

[calico-rr]

[k8s-cluster:children]
kube-master
kube-node
calico-rr

@sohnaeo sohnaeo added the kind/bug Categorizes issue or PR as related to a bug. label Oct 2, 2019
@mattymo
Copy link
Contributor

mattymo commented Oct 2, 2019

@sohnaeo I'm not sure this is a bug. It looks like you're probably using calico here. Calico assigns blocks of IPs to a node and then if it fills up, it assigns another block from the 10.242.64.0/21 pool. All the IPs here are from that range, so I don't see what the problem is.

@sohnaeo
Copy link
Author

sohnaeo commented Oct 2, 2019

@mattymo

Thanks for quick reply, actually I dig more into and it seems IP addresses given to pods are managed by the chosen CNI IPAM plugin. Calico's IPAM plugin doesn't respect the values given to Node.Spec.PodCIDR, and instead manages its own per-node.

In our private network, we cant use BIRD (BGP) and have to rely on the static routes so we would like to be dead sure what routes need to be added on control planes and nodes. But due to new feature of calico, each node can have pod of that big range 10.242.64.0/21. We would like to make sure podCIDR works for each node so each node have pods that CIDR is assigned to it. for example

/usr/local/bin/kubectl get nodes node1 -ojsonpath='{.spec.podCIDR}'
node1-->10.242.64.0/24
So we would like to have node1 pods with ip range from 10.242.64 subnet so we can add route for that subnet on other nodes. I hope I make it clear.

@sohnaeo
Copy link
Author

sohnaeo commented Oct 2, 2019

@mattymo

I fixed this issue by hacking the below

network_plugin/calico/templates/cni-calico.conflist.j2

FROM
{% else %}
"ipam": {
"type": "calico-ipam",
"assign_ipv4": "true",
"ipv4_pools": ["{{ calico_pool_cidr | default(kube_pods_subnet) }}"]
},

TO:

{% else %}
"ipam": {
"type": "host-local",
"subnet": "usePodCidr"
},

Is it possible to provide an option to use "host-local" for ETCD as well ? Could I raise PR for this ?
In our case, it makes sense to use ETCD and ipam type should be host-local to use uePodCidr.

@mattymo
Copy link
Contributor

mattymo commented Oct 2, 2019

I'm not sure if this is a supported way Calico can operate here. Maybe you should switch to flannel, which respects the node pod cidr allocations.

@sohnaeo
Copy link
Author

sohnaeo commented Oct 2, 2019

@mattymo

We can't use Flannel due to security as it is an overlay network, we have to use layer 3 network protocol Calico. We also cant run BIRD/BGP that's the reason we need to add static routes so pods can reachable on the podcidr allocated nodes

@radut
Copy link

radut commented Dec 20, 2019

Encountered similar issue.
Since calico >3.6 : projectcalico/calico#2592

Thanks for the hack @sohnaeo
I am still looking for a clear config though...

Edit: For me the hack you provided didn't worked.
instead it worked like this: (kubespray v2.12.0 has calico v3.7.3 , https://docs.projectcalico.org/v3.7/reference/cni-plugin/configuration#using-host-local-ipam )

      "ipam": {
         "type": "host-local",
         "ranges": [
                    [
                      { "subnet": "usePodCidr" }
                    ]
                   ]
       },

+1

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 20, 2020
@radut
Copy link

radut commented Mar 20, 2020

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 20, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 18, 2020
@radut
Copy link

radut commented Jun 18, 2020

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 18, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 16, 2020
@radut
Copy link

radut commented Sep 16, 2020

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants