-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support NodeLocal DNSCache with AntreaProxy #2137
Comments
I feel we might make it more generic to let AntreaProxy ingore configurable ClusterIP Services. |
This is certainly the most straightforward / generic solution, and easy to implement. |
I chatted with @alex-vmw and he things the config option with a list of Service names / Cluster IPs is good enough. I'm tagging this as a good first issue because the scope is quite small and doesn't require too much prior knowledge about Antrea. |
@antoninbas @jianjuns do you think it's Ok to add a string slice 'SkipServices' to antrea-agent.conf like below?
|
@luolanzone I would say yes, but it's probably better to keep this issue for a new contributor IMO, since it's not urgent and quite straightforward to do. |
@antoninbas ok, I was planing to go through the code and get myself familiar with AntreaProxy related code, if it's not urgent, I will leave it for now. |
Resolves antrea-io#2137 Signed-off-by: Lan Luo <[email protected]>
Resolves antrea-io#2137 Signed-off-by: Lan Luo <[email protected]>
Resolves antrea-io#2137 Signed-off-by: Lan Luo <[email protected]>
Resolves antrea-io#2137 Signed-off-by: Lan Luo <[email protected]>
Resolves antrea-io#2137 Signed-off-by: Lan Luo <[email protected]>
Resolves antrea-io#2137 Signed-off-by: Lan Luo <[email protected]>
Resolves antrea-io#2137 Signed-off-by: Lan Luo <[email protected]>
Resolves antrea-io#2137 Signed-off-by: Lan Luo <[email protected]>
Resolves antrea-io#2137 Signed-off-by: Lan Luo <[email protected]>
Resolves antrea-io#2137 Signed-off-by: Lan Luo <[email protected]>
Add a skipServices in antrea-agent.conf so AntreaProxy can be configured to skip proxying kube-dns service which allow user to use NodeLocal DNSCache Resolves antrea-io#2137 Signed-off-by: Lan Luo <[email protected]>
Add a skipServices in antrea-agent.conf so AntreaProxy can be configured to skip proxying kube-dns service which allow user to use NodeLocal DNSCache Resolves #2137 Signed-off-by: Lan Luo <[email protected]>
Describe the problem/challenge you have
NodeLocal DNSCache improves performance of DNS queries in a K8s cluster by running a DNS cache on each Node. DNS queries are intercepted by a local instance of CoreDNS, which forwards the requests to CoreDNS (cluster local queries) or the upstream DNS server in case of a cache miss.
The way it works (normally) is by assigning the the kube-dns ClusterIP to a local "dummy" interface, and installing iptables rules to disable connection tracking for the queries and bypass kube-proxy. The local CoreDNS instance is configured to bind to that address and can therefore intercept queries. In case of a cache miss, queries can be sent to the cluster CoreDNS Pods thanks to a "shadow" Service which will expose CoreDNS Pods thanks to a new ClusterIP. Additional local IPs can be assigned to the "dummy" interface and be used to query the local CoreDNS instance. However, with a default Pod DNS configuration (ClusterFirst), the kube-dns ClusterIP will be used by Pods and the local IP doesn't seem to play an important role. Except if IPVS is used for kube-proxy, in which case the kube-dns ClusterIP is already assigned to a different interface, and the Pods' DNS configuration needs to be changed to use the local IP.
When AntreaProxy is enabled (default), Pod DNS queries to the kube-dns ClusterIP will be load-balanced directly by AntreaProxy to a CoreDNS Pod endpoint. This means that NodeLocal DNSCache is completely bypassed, which is probably not acceptable for users who want to leverage this feature to improve DNS performance in their clusters. While these users can update the Pod configuration to use the local IP assigned by NodeLocal DNSCache to the dummy interface, this is not always ideal in the context of CaaS, as it can require everyone running Pods in the cluster to be aware of the situation.
Thanks @alex-vmw for bringing this to my attention!
Describe the solution you'd like
One solution would be to add a special rule in OVS to bypass AntreaProxy for kube-dns ClusterIP traffic, when NodeLocal DNSCache is used. This can be done via a configuration parameter, or hopefully through automated detection.
When AntreaProxy replaces kube-proxy completely and kube-proxy can be removed from the cluster it will mean that Pod DNS queries will take the following path (assuming a cache miss): Pod -> OVS (AntreaProxy bypass) -> host netns -> local CoreDNS instance -> OVS (ClusterIP load balancing to "shadow" DNS Service) -> egress.
We also need to think about what it will mean for NetworkPolicy enforcement.
Another more long term solution could be to provide our own NodeLocal DNSCache functionality. NodeLocal DNSCache is a very simple piece of software that mostly takes care of 1) configuring the network (dummy interface, iptables rules) and 2) configuring / running a local CoreDNS instance. We can provide similar functionality in Antrea, with no need for iptables rules. We could eliminate a DaemonSet and potentially provide some value this way.
The text was updated successfully, but these errors were encountered: