Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support NodeLocal DNSCache with AntreaProxy #2137

Closed
antoninbas opened this issue Apr 29, 2021 · 6 comments · Fixed by #2882
Closed

Support NodeLocal DNSCache with AntreaProxy #2137

antoninbas opened this issue Apr 29, 2021 · 6 comments · Fixed by #2882
Assignees
Labels
good first issue Good for newcomers kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@antoninbas
Copy link
Contributor

Describe the problem/challenge you have
NodeLocal DNSCache improves performance of DNS queries in a K8s cluster by running a DNS cache on each Node. DNS queries are intercepted by a local instance of CoreDNS, which forwards the requests to CoreDNS (cluster local queries) or the upstream DNS server in case of a cache miss.

The way it works (normally) is by assigning the the kube-dns ClusterIP to a local "dummy" interface, and installing iptables rules to disable connection tracking for the queries and bypass kube-proxy. The local CoreDNS instance is configured to bind to that address and can therefore intercept queries. In case of a cache miss, queries can be sent to the cluster CoreDNS Pods thanks to a "shadow" Service which will expose CoreDNS Pods thanks to a new ClusterIP. Additional local IPs can be assigned to the "dummy" interface and be used to query the local CoreDNS instance. However, with a default Pod DNS configuration (ClusterFirst), the kube-dns ClusterIP will be used by Pods and the local IP doesn't seem to play an important role. Except if IPVS is used for kube-proxy, in which case the kube-dns ClusterIP is already assigned to a different interface, and the Pods' DNS configuration needs to be changed to use the local IP.

When AntreaProxy is enabled (default), Pod DNS queries to the kube-dns ClusterIP will be load-balanced directly by AntreaProxy to a CoreDNS Pod endpoint. This means that NodeLocal DNSCache is completely bypassed, which is probably not acceptable for users who want to leverage this feature to improve DNS performance in their clusters. While these users can update the Pod configuration to use the local IP assigned by NodeLocal DNSCache to the dummy interface, this is not always ideal in the context of CaaS, as it can require everyone running Pods in the cluster to be aware of the situation.

Thanks @alex-vmw for bringing this to my attention!

Describe the solution you'd like
One solution would be to add a special rule in OVS to bypass AntreaProxy for kube-dns ClusterIP traffic, when NodeLocal DNSCache is used. This can be done via a configuration parameter, or hopefully through automated detection.
When AntreaProxy replaces kube-proxy completely and kube-proxy can be removed from the cluster it will mean that Pod DNS queries will take the following path (assuming a cache miss): Pod -> OVS (AntreaProxy bypass) -> host netns -> local CoreDNS instance -> OVS (ClusterIP load balancing to "shadow" DNS Service) -> egress.
We also need to think about what it will mean for NetworkPolicy enforcement.

Another more long term solution could be to provide our own NodeLocal DNSCache functionality. NodeLocal DNSCache is a very simple piece of software that mostly takes care of 1) configuring the network (dummy interface, iptables rules) and 2) configuring / running a local CoreDNS instance. We can provide similar functionality in Antrea, with no need for iptables rules. We could eliminate a DaemonSet and potentially provide some value this way.

@antoninbas antoninbas added kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Apr 29, 2021
@jianjuns
Copy link
Contributor

jianjuns commented May 6, 2021

I feel we might make it more generic to let AntreaProxy ingore configurable ClusterIP Services.

@antoninbas
Copy link
Contributor Author

I feel we might make it more generic to let AntreaProxy ingore configurable ClusterIP Services.

This is certainly the most straightforward / generic solution, and easy to implement.
I wonder if users will have the expectation that NodeLocal DNSCache will work out-of-the box with Antrea, and AntreaProxy since it is enabled by default. Without requiring an extra configuration step in Antrea and a potential Antrea Agent restart. Maybe I can check with @alex-vmw since he is the first user we know of who wants to deploy NodeLocal DNSCache with AntreaProxy.

@antoninbas
Copy link
Contributor Author

I chatted with @alex-vmw and he things the config option with a list of Service names / Cluster IPs is good enough.

I'm tagging this as a good first issue because the scope is quite small and doesn't require too much prior knowledge about Antrea.

@antoninbas antoninbas added the good first issue Good for newcomers label May 7, 2021
@luolanzone luolanzone self-assigned this May 31, 2021
@luolanzone
Copy link
Contributor

luolanzone commented Jun 2, 2021

@antoninbas @jianjuns do you think it's Ok to add a string slice 'SkipServices' to antrea-agent.conf like below?

SkipServices: ["kube-system/kube-dns", "10.1.11.4"]

@antoninbas
Copy link
Contributor Author

@luolanzone I would say yes, but it's probably better to keep this issue for a new contributor IMO, since it's not urgent and quite straightforward to do.

@luolanzone
Copy link
Contributor

@antoninbas ok, I was planing to go through the code and get myself familiar with AntreaProxy related code, if it's not urgent, I will leave it for now.

@luolanzone luolanzone removed their assignment Jun 2, 2021
luolanzone added a commit to luolanzone/antrea that referenced this issue Oct 11, 2021
luolanzone added a commit to luolanzone/antrea that referenced this issue Oct 11, 2021
luolanzone added a commit to luolanzone/antrea that referenced this issue Oct 12, 2021
luolanzone added a commit to luolanzone/antrea that referenced this issue Oct 13, 2021
luolanzone added a commit to luolanzone/antrea that referenced this issue Oct 13, 2021
luolanzone added a commit to luolanzone/antrea that referenced this issue Oct 14, 2021
luolanzone added a commit to luolanzone/antrea that referenced this issue Oct 15, 2021
luolanzone added a commit to luolanzone/antrea that referenced this issue Oct 19, 2021
luolanzone added a commit to luolanzone/antrea that referenced this issue Oct 19, 2021
luolanzone added a commit to luolanzone/antrea that referenced this issue Oct 19, 2021
luolanzone added a commit to luolanzone/antrea that referenced this issue Oct 19, 2021
Add a skipServices in antrea-agent.conf so AntreaProxy can be configured
to skip proxying kube-dns service which allow user to use NodeLocal DNSCache

Resolves antrea-io#2137

Signed-off-by: Lan Luo <[email protected]>
tnqn pushed a commit that referenced this issue Oct 19, 2021
Add a skipServices in antrea-agent.conf so AntreaProxy can be configured
to skip proxying kube-dns service which allow user to use NodeLocal DNSCache

Resolves #2137

Signed-off-by: Lan Luo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants