Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add skipServices to support NodeLocal DNSCache with AntreaProxy #2882

Merged
merged 1 commit into from
Oct 19, 2021

Conversation

luolanzone
Copy link
Contributor

@luolanzone luolanzone commented Oct 11, 2021

Add a skipServices in antrea-agent.conf so AntreaProxy can be configured to skip proxying kube-dns service which allow user to use NodeLocal DNSCache

Resolves #2137

Signed-off-by: Lan Luo [email protected]

@luolanzone luolanzone requested review from antoninbas and tnqn October 11, 2021 10:34
@luolanzone
Copy link
Contributor Author

@antoninbas @tnqn I didn't find an efficient way to check the cache hit inside a DNS pod, I used a prometheus locally to check the metric coredns_cache_hits_total from NodeLocal DNSCache, it looks fine from the metrics, Could you help to take a look? thanks.

@codecov-commenter
Copy link

codecov-commenter commented Oct 11, 2021

Codecov Report

Merging #2882 (aeb8266) into main (95e836d) will decrease coverage by 20.23%.
The diff coverage is 50.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##             main    #2882       +/-   ##
===========================================
- Coverage   61.18%   40.94%   -20.24%     
===========================================
  Files         284      157      -127     
  Lines       23494    19586     -3908     
===========================================
- Hits        14374     8020     -6354     
- Misses       7565    10803     +3238     
+ Partials     1555      763      -792     
Flag Coverage Δ
kind-e2e-tests ?
unit-tests 40.94% <50.00%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/agent/proxy/proxier.go 41.81% <0.00%> (-19.01%) ⬇️
pkg/agent/proxy/service.go 100.00% <100.00%> (ø)
pkg/ovs/openflow/default.go 0.00% <0.00%> (-100.00%) ⬇️
pkg/util/runtime/runtime.go 0.00% <0.00%> (-100.00%) ⬇️
pkg/agent/cniserver/pod_configuration_linux.go 0.00% <0.00%> (-100.00%) ⬇️
pkg/ovs/openflow/logs.go 9.52% <0.00%> (-90.48%) ⬇️
pkg/apis/controlplane/register.go 0.00% <0.00%> (-90.00%) ⬇️
pkg/agent/nodeportlocal/k8s/annotations.go 0.00% <0.00%> (-83.88%) ⬇️
pkg/agent/agent_linux.go 0.00% <0.00%> (-80.00%) ⬇️
pkg/agent/client.go 0.00% <0.00%> (-77.42%) ⬇️
... and 232 more

@@ -190,3 +190,6 @@ antreaProxy:
# (e.g. 1.2.3.0/24, 1.2.3.4/32). An empty string slice is meant to select all host IPv4/IPv6 addresses.
# Note that the option is only valid when proxyAll is true.
#nodePortAddresses: []
# A string array of values which specifies the service list should skip proxying. Values can be a valid ClusterIP (e.g. 10.11.1.2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line is a bit long

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 164 to 166
if o.config.AntreaProxy.SkipServices != nil {
util.SkipServices = o.config.AntreaProxy.SkipServices
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting a global variable in a separate package like this is not good practice IMO. Please add a method on the proxier to configure this, or pass the list during proxier initialization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done via pass the list through proxier initialization

@@ -77,6 +81,21 @@ func ShouldSkipService(service *v1.Service) bool {
klog.V(3).Infof("Skipping service %s in namespace %s due to Type=ExternalName", service.Name, service.Namespace)
return true
}

if matchSkipServices(service.Namespace+"/"+service.Name) || matchSkipServices(service.Spec.ClusterIP) {
klog.Infof("Skipping service %s in namespace %s due to it matches SkipService List ", service.Name, service.Namespace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use structured logging

klog.InfoS("Skipping Service because it matches skipServices list ", "service", klog.KObj(svc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -3953,6 +3953,9 @@ data:
# (e.g. 1.2.3.0/24, 1.2.3.4/32). An empty string slice is meant to select all host IPv4/IPv6 addresses.
# Note that the option is only valid when proxyAll is true.
#nodePortAddresses: []
# A string array of values which specifies the service list should skip proxying. Values can be a valid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An array of string values to specify a list of Services which should be ignored by AntreaProxy (traffic to these Services will not be load-balanced).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -203,14 +203,15 @@ func run(o *Options) error {
v4Enabled := config.IsIPv4Enabled(nodeConfig, networkConfig.TrafficEncapMode)
v6Enabled := config.IsIPv6Enabled(nodeConfig, networkConfig.TrafficEncapMode)
proxyAll := o.config.AntreaProxy.ProxyAll
skipServices := o.config.AntreaProxy.SkipServices
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should there be a check in options.go with a warning if AntreaProxy is disabled by len(skipServices) > 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 169 to 184
makeTestService(svc1PortName.Namespace, svc1PortName.Name, func(svc *corev1.Service) {
svc.Spec.ClusterIP = "10.96.10.12"
svc.Spec.Ports = []corev1.ServicePort{{
Name: svc1PortName.Port,
Port: int32(svc1Port),
Protocol: corev1.ProtocolTCP,
}}
}),
makeTestService(svc2PortName.Namespace, svc2PortName.Name, func(svc *corev1.Service) {
svc.Spec.ClusterIP = "192.168.1.2"
svc.Spec.Ports = []corev1.ServicePort{{
Name: svc2PortName.Port,
Port: int32(svc2Port),
Protocol: corev1.ProtocolTCP,
}}
}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this should be a separate test case because with the current code, it's not clear what all what the test is actually verifying. I assume that the point of the test is that even though we add these Services, because they are skipped, we don't expect any new call to InstallServiceFlows?

But these Services don't seem to have any Endpoints, and IIRC we skip such Services completely in installServices. So not sure what we are testing here. At the very least there should be an explanatory comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it expects with no new call to InstallServiceFlows, let me check if a separate case works or a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, added a new test case TestClusterSkipServices

@luolanzone luolanzone force-pushed the local-dns-cache branch 2 times, most recently from 144aa05 to 9263f5a Compare October 13, 2021 04:01
@@ -232,6 +232,10 @@ func (o *Options) setDefaults() {
}

func (o *Options) validateAntreaProxyConfig() error {
if !features.DefaultFeatureGate.Enabled(features.AntreaProxy) && len(o.config.AntreaProxy.SkipServices) > 0 {
klog.Warningf("skipServices will be ignored due to AntreaProxy is disabled")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately (?) there is no klog.WarningS and the recommendation is to use klog.InfoS (kubernetes/klog#184)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is klog.Warning, but I'd like to update it as klog.Warningf("skipServices %v will be ignored due to AntreaProxy is disabled",o.config.AntreaProxy.SkipServices)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luolanzone my point was that we are transitioning to structured logging everywhere, so we should not use klog.Warning / klog.Warningf at all. We should use klog.InfoS instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, Ok, I am not aware of this kind of transition, I have changed it to use klog.InfoS now.

}),
)

fullServices := append(extraSvcs, makeTestService(svcPortName.Namespace, svcPortName.Name, func(svc *corev1.Service) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/fullServices/allServices

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

pkg/agent/proxy/proxier_test.go Show resolved Hide resolved
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except for the use of klog.Warning and one comment improvement


recorder record.EventRecorder
recorder record.EventRecorder
// skipServices indicates the service list which should skip proxying
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// skipServices indicates the service list which should skip proxying
// skipServices indicates the service list for which we should skip proxying

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this title doesn't reflect what the patch does properly, which will confuse code reader. Instead of supporting NodeLocal DNSCache with AntreaProxy, it just provides a way that can support NodeLocal DNSCache and user still needs to configure Antrea to achieve it.

@@ -232,6 +232,10 @@ func (o *Options) setDefaults() {
}

func (o *Options) validateAntreaProxyConfig() error {
if !features.DefaultFeatureGate.Enabled(features.AntreaProxy) && len(o.config.AntreaProxy.SkipServices) > 0 {
klog.InfoS("skipServices will be ignored due to AntreaProxy is disabled", "skipServices", o.config.AntreaProxy.SkipServices)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "due to AntreaProxy is disabled" correct grammatically? I think it should be "due to AntreaProxy being disabled" or "when AntreaProxy is disabled" @antoninbas

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes you're correct. Either solution works, or even "skipServices will be ignored because AntreaProxy is disabled"

Copy link
Contributor Author

@luolanzone luolanzone Oct 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, change it to skipServices will be ignored because AntreaProxy is disabled and updated PR's summary and description

return false
}
if matchSkipServices(service.Namespace+"/"+service.Name, skipServices) || matchSkipServices(service.Spec.ClusterIP, skipServices) {
klog.InfoS("Skipping service because it matches skipServices list ", "service", klog.KObj(service))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
klog.InfoS("Skipping service because it matches skipServices list ", "service", klog.KObj(service))
klog.InfoS("Skipping service because it matches skipServices list", "service", klog.KObj(service))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

recorder record.EventRecorder
// skipServices indicates the service list for which we should skip proxying
// it will be initialized from antrea-agent.conf
skipServices []string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be worthy to use a set to keep the services which provides O(1) time for query, otherwise having N skipServices means the code needs to compare 2*N times for each service.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't point this out, because my assumption was that the list of services (provided in the antrea-agent configuration) would always be small (< 10 or so), in which case a slice works fine and may even be faster than a map / set. However, if we think someone may want to use this with more services, then we should definitely use a set. That's the least risky option since we don't have too much control over how people may use this option :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@antoninbas antoninbas added this to the Antrea v1.4 release milestone Oct 18, 2021
@luolanzone luolanzone changed the title Support NodeLocal DNSCache with AntreaProxy Add skipServices to support NodeLocal DNSCache with AntreaProxy Oct 19, 2021
@luolanzone luolanzone force-pushed the local-dns-cache branch 2 times, most recently from 4cae243 to ff855d3 Compare October 19, 2021 02:44
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit-picking comment. And you should update the commit message as well to reflect the real change

return &ServiceChangeTracker{
items: make(map[types.NamespacedName]*serviceChange),
makeServiceInfo: makeServiceInfo,
recorder: recorder,
ipFamily: ipFamily,
processServiceMapChange: processServiceMapChange,
skipServices: skipSvcSet,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
skipServices: skipSvcSet,
skipServices: sets.NewString(skipServices...),

Copy link
Contributor Author

@luolanzone luolanzone Oct 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done and updated the commit msg.

Add a skipServices in antrea-agent.conf so AntreaProxy can be configured
to skip proxying kube-dns service which allow user to use NodeLocal DNSCache

Resolves antrea-io#2137

Signed-off-by: Lan Luo <[email protected]>
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn
Copy link
Member

tnqn commented Oct 19, 2021

/test-all

@tnqn tnqn merged commit 9e7f20b into antrea-io:main Oct 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support NodeLocal DNSCache with AntreaProxy
4 participants