Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lighthouse DNS unable to handle TCP retries after truncation #3218

Closed
t0lya opened this issue Sep 9, 2024 · 1 comment · Fixed by #3220
Closed

Lighthouse DNS unable to handle TCP retries after truncation #3218

t0lya opened this issue Sep 9, 2024 · 1 comment · Fixed by #3220
Assignees
Labels
bug Something isn't working priority:medium

Comments

@t0lya
Copy link
Contributor

t0lya commented Sep 9, 2024

What happened:

We have >200 pods behind a headless service in our fleet of clusters.
We are making DNS requests for the headless service (<service>.<namespace>.svc.clusterset.local) which is supposed to return >200 A records (one for every pod).
dig <service>.<namespace>.svc.clusterset.local @<pod IP of lighthouse DNS> works as expected.
dig <service>.<namespace>.svc.clusterset.local @<clusterIP of lighthouse DNS> does not work, returns SERVFAIL.

What you expected to happen:

Per RFC1035 and RFC2181, DNS messages over UDP are limited to 512 bytes. Messages above limit are truncated. Server sents back response with TC bit set, after which client is supposed to retry with TCP.

But lighthouse DNS clusterIP service only exposes UDP port and not TCP port:

Ports: []corev1.ServicePort{{
Name: "udp",
Protocol: "UDP",
Port: 53,
TargetPort: intstr.IntOrString{
Type: intstr.Int,
IntVal: 53,
},
}},

We manually created a copy of clusterIP service with TCP port added (updating existing service was not possible because controller would overide our changes). After which dig <service>.<namespace>.svc.clusterset.local @<copied clusterIP of lighthouse DNS> began working.

Can we make change in submariner-operator to include TCP port in submariner-lighthouse-coredns clusterIP service?

How to reproduce it (as minimally and precisely as possible):
Create many pods (>200) behind a headless service.
dig <service>.<namespace>.svc.clusterset.local @<pod IP of lighthouse DNS> works as expected.
dig <service>.<namespace>.svc.clusterset.local @<clusterIP of lighthouse DNS> does not work, returns SERVFAIL.

Anything else we need to know?:

Environment:

  • Diagnose information (use subctl diagnose all):
  • Gather information (use subctl gather)
  • Cloud provider or hardware configuration:
  • Install tools:
  • Others:
@t0lya t0lya added the bug Something isn't working label Sep 9, 2024
@t0lya t0lya changed the title Lighthouse DNS unable to handle TCP retries during truncation Lighthouse DNS unable to handle TCP retries after truncation Sep 9, 2024
@tpantelis
Copy link
Contributor

Can we make change in submariner-operator to include TCP port in submariner-lighthouse-coredns clusterIP service?

We welcome contributions. Please submit a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority:medium
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants