You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have >200 pods behind a headless service in our fleet of clusters.
We are making DNS requests for the headless service (<service>.<namespace>.svc.clusterset.local) which is supposed to return >200 A records (one for every pod). dig <service>.<namespace>.svc.clusterset.local @<pod IP of lighthouse DNS> works as expected. dig <service>.<namespace>.svc.clusterset.local @<clusterIP of lighthouse DNS> does not work, returns SERVFAIL.
What you expected to happen:
Per RFC1035 and RFC2181, DNS messages over UDP are limited to 512 bytes. Messages above limit are truncated. Server sents back response with TC bit set, after which client is supposed to retry with TCP.
But lighthouse DNS clusterIP service only exposes UDP port and not TCP port:
We manually created a copy of clusterIP service with TCP port added (updating existing service was not possible because controller would overide our changes). After which dig <service>.<namespace>.svc.clusterset.local @<copied clusterIP of lighthouse DNS> began working.
Can we make change in submariner-operator to include TCP port in submariner-lighthouse-coredns clusterIP service?
How to reproduce it (as minimally and precisely as possible):
Create many pods (>200) behind a headless service. dig <service>.<namespace>.svc.clusterset.local @<pod IP of lighthouse DNS> works as expected. dig <service>.<namespace>.svc.clusterset.local @<clusterIP of lighthouse DNS> does not work, returns SERVFAIL.
Anything else we need to know?:
Environment:
Diagnose information (use subctl diagnose all):
Gather information (use subctl gather)
Cloud provider or hardware configuration:
Install tools:
Others:
The text was updated successfully, but these errors were encountered:
t0lya
changed the title
Lighthouse DNS unable to handle TCP retries during truncation
Lighthouse DNS unable to handle TCP retries after truncation
Sep 9, 2024
What happened:
We have >200 pods behind a headless service in our fleet of clusters.
We are making DNS requests for the headless service (
<service>.<namespace>.svc.clusterset.local
) which is supposed to return >200 A records (one for every pod).dig <service>.<namespace>.svc.clusterset.local @<pod IP of lighthouse DNS>
works as expected.dig <service>.<namespace>.svc.clusterset.local @<clusterIP of lighthouse DNS>
does not work, returns SERVFAIL.What you expected to happen:
Per RFC1035 and RFC2181, DNS messages over UDP are limited to 512 bytes. Messages above limit are truncated. Server sents back response with TC bit set, after which client is supposed to retry with TCP.
But lighthouse DNS clusterIP service only exposes UDP port and not TCP port:
submariner-operator/controllers/servicediscovery/servicediscovery_controller.go
Lines 388 to 396 in ee76b52
We manually created a copy of clusterIP service with TCP port added (updating existing service was not possible because controller would overide our changes). After which
dig <service>.<namespace>.svc.clusterset.local @<copied clusterIP of lighthouse DNS>
began working.Can we make change in submariner-operator to include TCP port in
submariner-lighthouse-coredns
clusterIP service?How to reproduce it (as minimally and precisely as possible):
Create many pods (>200) behind a headless service.
dig <service>.<namespace>.svc.clusterset.local @<pod IP of lighthouse DNS>
works as expected.dig <service>.<namespace>.svc.clusterset.local @<clusterIP of lighthouse DNS>
does not work, returns SERVFAIL.Anything else we need to know?:
Environment:
subctl diagnose all
):subctl gather
)The text was updated successfully, but these errors were encountered: