-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support EndpointSlice in Antrea proxy #1703
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1703 +/- ##
=======================================
Coverage ? 52.80%
=======================================
Files ? 188
Lines ? 16347
Branches ? 0
=======================================
Hits ? 8632
Misses ? 6856
Partials ? 859
Flags with carried forward coverage won't be shown. Click here to find out more. |
b8fbd91
to
d0e83a1
Compare
57cbc6c
to
3866e27
Compare
5d1a1ef
to
649969e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. What is the behavior if some enabled the Antrea EndpointSlice
feature gate in a K8s cluster which does not support the EndpointSlice API?
8dcbf3a
to
88af983
Compare
pkg/agent/proxy/proxier.go
Outdated
@@ -105,6 +109,7 @@ func (p *proxier) removeStaleServices() { | |||
} | |||
} | |||
delete(p.serviceInstalledMap, svcPortName) | |||
delete(p.endpointInstalledMap, svcPortName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a bug fix for memory leak?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tnqn Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weiqiangt is it still supposed to remove the endpointInstalledMap here given that you also do it in removeStaleEndpoints
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hongliangl please reply here so we have the context to discuss.
Moving your reply here:
is it still supposed to remove the endpointInstalledMap here given that you also do it in removeStaleEndpoints?
Here removes all endpoints related with a stale service. Function removeStaleEndpoints removes stale endpoints of a normal service.
@weiqiangt From my understanding of #1815, you need the endpointInstalledMap to be there so that you can identify stale endpoints to delete, right? I think deleting it here cause openflow leak?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I see that in this version, endpoints are still removed in removeStaleServices
so it makes sense to delete p.endpointInstalledMap
.
There will be a conflict between this PR and #1815. @weiqiangt could you cooperate with @hongliangl to see how to merge these PRs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it still supposed to remove the endpointInstalledMap here given that you also do it in removeStaleEndpoints?
Here removes all endpoints related with a stale service. Function removeStaleEndpoints removes stale endpoints of a normal service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
88af983
to
67e5f07
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM. I will defer to @tnqn for final approval.
One of my comments has not been addressed: I do not see any mention anywhere of the behavior if 1) AntreaProxy
and EndpointSlice
feature gates are enabled; 2) the EndpointSlice API is disabled in K8s. IMO, the behavior should at least be mentioned in docs/feature-gates.md and should probably also be included in the commit message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM
@@ -106,6 +108,11 @@ case $key in | |||
PROXY=false | |||
shift | |||
;; | |||
--endpointslice) | |||
PROXY=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antoninbas AntreaProxy
and EndpointSlice
can be enabled at the same time. However it's unnecessary to enable both arguments. We can see here if EndpointSlice
is enabled, AntreaProxy
will also be set as enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, is this the answer to one of my questions? I'm confused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of my comments has not been addressed: I do not see any mention anywhere of the behavior if 1) AntreaProxy and EndpointSlice feature gates are enabled; 2) the EndpointSlice API is disabled in K8s. IMO, the behavior should at least be mentioned in docs/feature-gates.md and should probably also be included in the commit message.
@antoninbas This is the answer of question 1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am asking about the behavior when EndpointSlice is enabled in Antrea but the EndpointSlice API is disabled / not supported in Kubernetes. Is there a crash? Is there an error log message? Is it a no-op?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antoninbas The restriction of using EndpointSlice
in Antrea should be explained in docs. The explain is that this feature can be set as enabled if only that Kubernetes 1.16 (alpha) or later with EndpointSlice enabled. For users of Antrea, the behavior when EndpointSlice is enabled in Antrea but the EndpointSlice API is disabled / not supported in Kubernetes should be avoid. So far, there is not an exact evaluation about the unwanted behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far, there is not an exact evaluation about the unwanted behavior.
Could we make a quick test to have a clear idea on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested this scenario on a single node minikube cluster with Kubernetes v1.16.1.
The agent was not crash and no services/endpoints could be watched, and thus the agent can not connect to the controller.
There are logs in the agent indicate that the EndpointSlice resource is unavailable.
E0202 04:53:35.852020 1 reflector.go:178] pkg/mod/github.com/tnqn/[email protected]/tools/cache/reflector.go:125: Failed to list *v1beta1.EndpointSlice: the server could not find the requested resource (get endpointslices.discovery.k8s.io)
I think we can first try to check if the EndpointSlice resource exists before really starting Antrea Proxy with EndpointSlice enabled.
f0204ae
to
77304a6
Compare
@@ -0,0 +1,364 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would agree!
Also thinking should we add comments for the changes we made to the original copied files, e.g. at the function level? Or there might be too many changes to track in this way?
1da8bbc
to
e7b242e
Compare
./skip-all |
56d7762
to
81639b8
Compare
pkg/agent/proxy/proxier.go
Outdated
@@ -105,6 +109,7 @@ func (p *proxier) removeStaleServices() { | |||
} | |||
} | |||
delete(p.serviceInstalledMap, svcPortName) | |||
delete(p.endpointInstalledMap, svcPortName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weiqiangt is it still supposed to remove the endpointInstalledMap here given that you also do it in removeStaleEndpoints
?
pkg/agent/proxy/proxier.go
Outdated
@@ -276,6 +281,9 @@ func (p *proxier) installServices() { | |||
} | |||
|
|||
p.serviceInstalledMap[svcPortName] = svcPort | |||
for _, endpoint := range endpointUpdateList { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed? Haven't L210-L213 done it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed? Haven't L210-L213 done it?
Agreed, it is not needed since endpointUpdateList
is not updated after L210-L213.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it still supposed to remove the endpointInstalledMap here given that you also do it in removeStaleEndpoints?
Here removes all endpoints related with a stale service. Function removeStaleEndpoints
removes stale endpoints of a normal service.
867d6fe
to
0ba1d22
Compare
The EndpointSlice API version that AntreaProxy supports is v1beta1 for now, and other EndpointSlice API versions are not supported. Endpoint condition Serving,Terminating as well as ServiceTopology is not supported in this commit.
0ba1d22
to
8411bca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just have some nits on comments, logs, and documentation. I am fine to merge the PR first, given Hongliang should be taking holiday leave already.
docs/feature-gates.md
Outdated
|
||
`EndpointSlice` enables Service EndpointSlice support in AntreaProxy. The | ||
EndpointSlice API was introduced in Kubernetes 1.16 (alpha) and it is enabled | ||
by default in Kubernetes 1.17 (beta). This flag will take no effect if AntreaProxy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably clearer to replace "This flag" with "The EndpointSlice feature gate".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for advice, I will handle today.
docs/feature-gates.md
Outdated
by default in Kubernetes 1.17 (beta). This flag will take no effect if AntreaProxy | ||
is not enabled. The endpoint conditions of `Serving` and `Terminating` are not | ||
supported currently. ServiceTopology is not supported either. Refer to this [link](https://kubernetes.io/docs/tasks/administer-cluster/enabling-endpointslices/) | ||
for more information. EndpointSlice API version that AntreaProxy supports is v1beta1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The EndpointSlice API version
docs/feature-gates.md
Outdated
for more information. EndpointSlice API version that AntreaProxy supports is v1beta1 | ||
currently, and other EndpointSlice API versions are not supported. If EndpointSlice is | ||
enabled in AntreaProxy, but EndpointSlice API is disabled in Kubernetes or EndpointSlice | ||
API version v1beta1 is not supported in Kubernetes, error messages will be logged by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably "Antrea Agent will log an error message and ..."
pkg/agent/proxy/endpoints.go
Outdated
} | ||
|
||
if _, _, err := endpointSliceCacheKeys(endpointSlice); err != nil { | ||
klog.Warningf("Error getting endpoint slice cache keys: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to use the same format for EndpointService (rather than all endpointSlice, EndpointSlice, endpoint slice..)
// limitations under the License. | ||
// | ||
// Original file https://raw.githubusercontent.com/kubernetes/kubernetes/0c0d4fea8dd6bdcd16b9e1d35da3f7d209341a6f/pkg/proxy/endpointslicecache.go | ||
// If this file is located in third_party, there will be an import cycle issue when build Antrea as this file import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when build -> when building
import -> imports
/test-all |
All the test have passed. @hongliangl let me know if you only make documentation changes and log message changes (please consider pushing a separate commit instead of amending the existing one) and I will not need to run the IPv6 and dual-stack tests again. |
Including files: - docs/feature-gates.md - pkg/agent/proxy/endpoints.go - pkg/agent/proxy/endpointslicecache.go
57e5e8f
to
73ee70e
Compare
I have pushed a separate commit that only includes documentation and log message changes. @antoninbas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the comments!
/skip-all |
Thanks @hongliangl I will merge this - enjoy your holiday! |
Thanks for everyone's help and advice, I've learned a lot. |
Support EndpointSlice in AntreaProxy, ServiceTopology is not supported in this commit