Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeasz导致deploy机器上的pod无法通信 #1224

Closed
bogeit opened this issue Jan 29, 2023 · 4 comments
Closed

kubeasz导致deploy机器上的pod无法通信 #1224

bogeit opened this issue Jan 29, 2023 · 4 comments
Labels

Comments

@bogeit
Copy link

bogeit commented Jan 29, 2023

KUBEASZ VERSION: 3.1.1
K8S VERSION: v1.22.14
OS SYSTEM: Ubuntu 18.04.6 LTS
DOCKER VERSION: 20.10.8

问题分析:

问题节点(ip-172-31-27-189),用kubeasz部署时即是master角色同时也是deploy角色,那ezdown脚本会在此节点执行这个函数install_docker,导致docker启动服务里面并没有这一条iptables放行规则(ExecStartPost=/sbin/iptables -I FORWARD -s 0.0.0.0/0 -j ACCEPT),最终导致在这台NODE上运行的所有pod,均无法正常通信(只能和pod及运行node的内网IP通信)

查看iptables的FORWARD规则会发现这样一条DROP规则(xxx xxxx DROP all -- * * 0.0.0.0/0 0.0.0.0/0)
-- ## iptables -nv -L FORWARD
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
365M 177G cali-FORWARD all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:wUHhoiAYhphO9Mso /
4187K 253M KUBE-FORWARD all -- * * 0.0.0.0/0 0.0.0.0/0 /
kubernetes forwarding rules /
4187K 253M DOCKER-USER all -- * * 0.0.0.0/0 0.0.0.0/0
4187K 253M DOCKER-ISOLATION-STAGE-1 all -- * * 0.0.0.0/0 0.0.0.0/0
0 0 ACCEPT all -- * docker0 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
0 0 DOCKER all -- * docker0 0.0.0.0/0 0.0.0.0/0
0 0 ACCEPT all -- docker0 !docker0 0.0.0.0/0 0.0.0.0/0
0 0 ACCEPT all -- docker0 docker0 0.0.0.0/0 0.0.0.0/0
4187K 253M ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0
xxx xxxx DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /
问题节点存在的DROP规则,删除掉即恢复正常 /
0 0 ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 /
cali:S93hcgKJrXEqnTfs / / Policy explicitly accepted packet. / mark match 0x10000/0x10000
0 0 MARK all -- * * 0.0.0.0/0 0.0.0.0/0 /
cali:mp77cMpurHhyjLrM */ MARK or 0x10000

问题节点的docker启动配置文件

root@ip-172-31-27-189:/var/log# cat /etc/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=http://docs.docker.io
[Service]
Environment="PATH=/opt/kube/bin:/bin:/sbin:/usr/bin:/usr/sbin"
ExecStartPre=/sbin/iptables -F
ExecStartPre=/sbin/iptables -X
ExecStartPre=/sbin/iptables -F -t nat
ExecStartPre=/sbin/iptables -X -t nat
ExecStartPre=/sbin/iptables -F -t raw
ExecStartPre=/sbin/iptables -X -t raw
ExecStartPre=/sbin/iptables -F -t mangle
ExecStartPre=/sbin/iptables -X -t mangle
ExecStart=/opt/kube/bin/dockerd
ExecStartPost=/sbin/iptables -P INPUT ACCEPT
ExecStartPost=/sbin/iptables -P OUTPUT ACCEPT
ExecStartPost=/sbin/iptables -P FORWARD ACCEPT
ExecReload=/bin/kill -s HUP $MAINPID
Restart=on-failure
RestartSec=5
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
Delegate=yes
KillMode=process
[Install]
WantedBy=multi-user.target

正常节点的docker启动配置文件

root@ip-172-31-23-122:~# cat /etc/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=http://docs.docker.io

[Service]
Environment="PATH=/opt/kube/bin:/bin:/sbin:/usr/bin:/usr/sbin"
ExecStart=/opt/kube/bin/dockerd
ExecStartPost=/sbin/iptables -I FORWARD -s 0.0.0.0/0 -j ACCEPT
ExecReload=/bin/kill -s HUP $MAINPID
Restart=always
RestartSec=5
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
Delegate=yes
KillMode=process

[Install]
WantedBy=multi-user.target

@gjmzj
Copy link
Collaborator

gjmzj commented Jan 30, 2023

感谢提醒,可以按照正常节点的docker启动配置替换掉 ezdown脚本里面的内容

@bogeit
Copy link
Author

bogeit commented Jan 30, 2023

嗯,最终是这么解决的。但问题却不是那么容易被发现,因为问题node节点能正常通信,这还是有个业务pod调度到这台节点,并且里面需要访问到公网才发现有问题,并且排查也花费了一些时间,希望能从安装脚本侧规避掉此类问题。

kubeasz pushed a commit that referenced this issue Feb 9, 2023
kubeasz pushed a commit that referenced this issue Feb 9, 2023
kubeasz pushed a commit that referenced this issue Feb 9, 2023
@github-actions
Copy link

github-actions bot commented Mar 1, 2023

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Mar 1, 2023
@github-actions
Copy link

github-actions bot commented Mar 9, 2023

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Mar 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants