-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calico no connectivity from windows to linux pods #5539
Comments
Looks likely to be a general Windows networking issue - @song-jiang @lmm should be able to help |
@MBcom This log:
means that no default ippool CIDR was defined with If you have an existing ippool you want to use, then you can use the env var |
Actually, I see now that in your linked PR , you've already set
Was this the manifest you used to produce this issue? |
@lmm thanks for your response and sorry for my late response I have set the environment variable you mentioned now but the connection problem persists. The mentioned startup message is gone now. Yes, the linked PR contains the working manifest for the Linux cluster nodes. |
@MBcom are you seeing the connection problem now with new Windows pods that are using your |
@lmm Yes, the connection problem is still there. Do you need some more logs? |
@MBcom do you see any errors in the calico-felix log file in |
Hi @lmm,
I can't see any errors in kube-proxy.log but it looks like kube-proxy restarts quite often. (~8200 times in 10 h) So it looks like it crashes.
Thanks @lmm In summary:
I'm not sure if it is the best solution to PR the above changes because there could be things I missed? |
Hi @MBcom, thanks you are right: our kube-proxy configuration is broken for k8s 1.23+. I tested this myself and kube-proxy will just loop forever in this state doing nothing. This explains why the egress traffic from your Windows pods was not working. The issue that adding So I think the immediate thing to PR is to remove |
@MBcom as for your other notes:
Could you please explain more about the issue you're seeing here? My understanding of the
Yes I think we should do this but perhaps in a separate PR. Maybe @song-jiang might have ideas about this too. |
@MBcom would you be up for putting up the fix to remove the |
@MBcom I've put up the PR for the kube-proxy service fix. |
@lmm thanks for creating the PR. I was too busy to come back here. Regarding the other problem: |
Hi, we have exactly the same problem. We have Calico 3.22.1 on Windows Server 2019. We tried VXLAN and BGP. If I can provide any usefull logs, I'll happy to post them. greetings |
I looked a bit further and that warning log is actually quite old (5 years old).
@MBcom that's strange, the log level should not be preventing kubelet from doing its work. Is there anything in the kubelet logs that suggest anything? What does a |
@ckruegerkpmg could you please provide logs? See this comment: #5539 (comment) A fix for the kube-proxy config bug should have been fixed in Calico v3.22.1 |
@lmm
PS: A much newer version (2 years old) is available as a nuget package: |
@lmm: The bug IS fixed with 3.22.1 but the It doesn't download the new version even if specified via
Is the install-calico-windows.ps1 somewhere available in a public git repo? |
@ckruegerkpmg yeah you're right, that file is pretty old. Thanks, I didn't know that nuget package existed! We'll have to see if that package works as a drop-in replacement.
Ugh... the script has the wrong version embedded for the v3.22.1 release: https://github.com/projectcalico/calico/blob/v3.22.1/calico/scripts/install-calico-windows.ps1#L23-L24 Thanks for raising that, I'll fix that now. I'll also need to fix how our install script is handled during our release. (This is all fallout from this which I should consider reverting.) It should be possible to override
|
The original kube-proxy service bug has been fixed. |
Expected Behavior
Current Behavior
Possible Solution
Steps to Reproduce (for bugs)
We changed the log level in
C:\k\cni\config\10-calico.conf
toerror
because calico reports that a file named 'mtu' was not found which makes kubelet think there is an error. Than we run:Context
Our windows pods need to access S3 storages deployed on linux nodes therefore they need access to other kubernetes services/ pods.
HNSEndpoints looks good:
We saw
startup/startup.go 907: Selected default IP pool is '192.168.0.0/16'
in calico-node.log but IPPool ist configured asYour Environment
The text was updated successfully, but these errors were encountered: