Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDP connections from pods to daemonset are lost when daemonset is replaced #373

Closed
Suckzoo opened this issue Apr 3, 2019 · 7 comments
Closed

Comments

@Suckzoo
Copy link

Suckzoo commented Apr 3, 2019

Background

  • Kubernetes version: 1.11
  • vpc-cni version: 1.3.2
  • EKS version: eks.2

Problem

We deployed a daemonset that accepts UDP packets through hostPort 8125. At first, we observed that other pods are correctly sending packets to pods of the daemonset.: pods are trying to send UDP packets to their host, and the hosts redirects the packets to their pods of the daemonset.

Then we replaced and redeployed the daemonset with the yaml file which is totally identical with the previous daemonset's one. After we deploy the daemonset again, the replaced daemonset does not accept packets from the pods. Pods are doing their best to send their packets, but the packets are never being delivered to the daemonset.

How to Reproduce

  1. Deploy a daemonset that accepts UDP packets through hostPort. An UDP echo server would be enough.
  2. Deploy pods that sends UDP packets consistently to their host's designated hostPort.
  3. You can check packets are incoming to the daemonset.
  4. Replace daemonset, i.e. kubectl replace --force -f daemonset.yml
  5. Packets from pod(specifically, the running process spawned by dockerfile) do not reach the newly-deployed daemonset.

Expected Behavior

Replaced daemonset should also accept the packets. In other words, cni must reroute the packets to the newly deployed daemonset.

Trivia

  • We tried to run a shell inside the pods from step 2 and sent UDP packets manually to the host. We've tcpdumped entire packets incoming to the daemonset and we've observed that the packets manually sent are correctly reaching to the daemonset.
  • We tried to delete the running pods from step 2. Likewise, we've observed that the packets sent from newly deployed pods are correctly reaching to the daemonset.

Should you need more information, please let me know via mentioning me.
Thanks in advance.

@sethp-nr
Copy link

sethp-nr commented Apr 3, 2019

When we observed similar behavior, it wasn't the fault of the CNI. In our case, the client was doing two unusual things:

  1. Calling connect on a UDP socket
  2. Caching the DNS resolution in a static member variable and never re-resolving it

The effect of [1] was to cause the kernel to "pin" the UDP flow: it only went through the iptables rules once at connect time. When the client sent packets on that socket, they followed that flow. The effect of [2] was exactly what you'd expect from a no-TTL DNS cache, just harder to find (which is why I mention it).

Unfortunately, I don't recall how we solved it: UDP has no in-band way to signal that the receiver's gone away and the client should try reconnecting. Not connecting would have caused a performance hit, but maybe that's acceptable in your case?

@Suckzoo
Copy link
Author

Suckzoo commented Apr 4, 2019

In our case, we're sending UDP flows via host's IP, not the name of the server. We think DNS is irrelevant with our issue.

@sethp-nr
Copy link

sethp-nr commented Apr 4, 2019

That makes sense. From #153, it seems to me the hostPort handling is delegated out to the upstream portmap plugin. That's billed as:

portmap: An iptables-based portmapping plugin. Maps ports from the host's address space to the container.

Which suggests to me that the behavior is "expected," or at least an issue more fixable in one of the upstream bits (portmap or the linux kernel, maybe). We never got around to trying it, but another thought we had was to turn down the nf_conntrack_udp_timeout to see if that made the kernel forget about "connected" UDP sockets fast enough for an acceptable amount of packet loss.

@Suckzoo
Copy link
Author

Suckzoo commented Apr 8, 2019

@sethp-nr Thanks for suggesting the workaround :)
We tried to set nf_conntrack_udp_timeout to 0. The result was, as you expected, we observed that packets are flowing into the newly created container. However, there is a big problem: name resolving does not work at all. The container failed to resolve any address.

@Suckzoo
Copy link
Author

Suckzoo commented Apr 8, 2019

For now, I think we can make one of these choices:

  1. Detect container destruction in vpc-cni and flush entries from the portmap's iptable.
  2. Detect container destruction in portmap and flush entries from the portmap's iptable

As I'm not familiar with both portmap and vpc-cni, I'm not sure which to fix. Which would be the best option?

@sethp-nr
Copy link

sethp-nr commented Apr 8, 2019

Well, since this CNI delegates to portmap's implementation for host ports, it seems to me that the right place would be the upstream project.

In fact, it looks like there's already an issue about this case: containernetworking/plugins#123

@Suckzoo
Copy link
Author

Suckzoo commented Apr 9, 2019

Seems I should go there to discuss :) Closing this issue since it seems portmap plugin is responsible for this. Thanks a lot @sethp-nr :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants