-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graduate NodePortLocal feature from Alpha to Beta #2924
Graduate NodePortLocal feature from Alpha to Beta #2924
Conversation
@@ -3850,7 +3850,7 @@ data: | |||
# Traceflow: true | |||
|
|||
# Enable NodePortLocal feature to make the pods reachable externally through NodePort | |||
# NodePortLocal: false | |||
# NodePortLocal: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it is a problem NPL controllers, etc. will always be started even no NPL configuration at all in most deployments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NPL controller watches for Services and local Pods. Watching for Services shouldn't have an impact since we already watch them for kube-proxy and the informer is shared. For Pods, that do mean we create an extra watch for each Agent, since we do not otherwise watch Pods. However, given that we filter based on the Node, and the filtering is done server-side, I think it's acceptable.
I imagine that as we add watches, it can increase memory usage of the K8s apiserver. I hope it's fairly low impact, given the number of watches we have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Humm.. Still I do like to add unnecessary cost. But do we have other features (except secondary network support) that will watch Pods as well? @tnqn : comments on the Pod watch overhead?
What you think adding a config parameter to enable this feature (but integrated distros can enable it by default if they need NPL)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a cost for every watch that we start (e.g. the watches for Egress resources). Is this something we are going to do for every feature as we graduate them to Beta?
Will wait for @tnqn to comment, but after that I can add a config parameter to enable / disable NPL, like we did for IPAM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel it would be better to have a parameter to enable NPL as it's not used for most clusters (is it true? or it can be useful even without specific LB?), and there will be some overhead in cpu (serialization and deserialization), memory (Pod will be cached in agents) and network on both kube-apiserver and antrea-agent side even user is not using the feature. For features like Egress, the overhead is much less (just dummy watchers with no objects transmitted) if users is not using it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback guys, I'll update the PR accordingly
Codecov Report
@@ Coverage Diff @@
## main #2924 +/- ##
==========================================
- Coverage 40.59% 40.57% -0.02%
==========================================
Files 158 158
Lines 19952 19950 -2
==========================================
- Hits 8099 8095 -4
Misses 11082 11082
- Partials 771 773 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
|
211de6f
to
377e471
Compare
@@ -164,6 +166,14 @@ func (o *Options) validate(args []string) error { | |||
} | |||
} | |||
} | |||
if features.DefaultFeatureGate.Enabled(features.NodePortLocal) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you validate feature gate must be enabled if NodePortLocalConfig.Enable is true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -173,6 +173,8 @@ func (o *Options) validate(args []string) error { | |||
} | |||
o.nplStartPort = startPort | |||
o.nplEndPort = endPort | |||
} else if o.config.NodePortLocal.Enable { | |||
klog.InfoS("The nodePortLocal.enable config option is set to true, but it will be ignored because the NodePortLocal feature gate is disabled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be a warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We din't use warnings with structured logging any more. The recommendation is to use InfoS
with the default verbosity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
type configChange interface { | ||
ApplyChange(content string) string | ||
} | ||
|
||
type configChangeParam struct { | ||
field string | ||
value string | ||
} | ||
|
||
func (cg *configChangeParam) ApplyChange(content string) string { | ||
r := regexp.MustCompile(fmt.Sprintf(`(?m)#?.*%s:.*$`, cg.field)) | ||
return r.ReplaceAllString(content, fmt.Sprintf("%s: %s", cg.field, cg.value)) | ||
} | ||
|
||
type configChangeFeatureGate struct { | ||
name string | ||
enabled bool | ||
} | ||
|
||
func (cg *configChangeFeatureGate) ApplyChange(content string) string { | ||
r := regexp.MustCompile(fmt.Sprintf(`(?m)#? %s:.*$`, cg.name)) | ||
value := "false" | ||
if cg.enabled { | ||
value = "true" | ||
} | ||
return r.ReplaceAllString(content, fmt.Sprintf(" %s: %s", cg.name, value)) | ||
} | ||
|
||
type configChangeRaw struct { | ||
fn func(content string) string | ||
} | ||
|
||
func (cg *configChangeRaw) ApplyChange(content string) string { | ||
return cg.fn(content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is getting less than ideal. We could revisit the idea of moving the definition of the AntreaConfig
struct to a package that can be easily imported. @tnqn what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. This was planned several times when we dealt with similar problems before.
/test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, a typo
docs/node-port-local.md
Outdated
in `nplPortRange`. If the value of `nplPortRange` is not specified, the range | ||
`61000-62000` will be used by default. | ||
In addition to enabling the NodePortLocal feature gate (if needed), you need to | ||
ensure that the `nodePortLocal.enable` flag is set so true in the Antrea Agent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ensure that the `nodePortLocal.enable` flag is set so true in the Antrea Agent | |
ensure that the `nodePortLocal.enable` flag is set to true in the Antrea Agent |
type configChange interface { | ||
ApplyChange(content string) string | ||
} | ||
|
||
type configChangeParam struct { | ||
field string | ||
value string | ||
} | ||
|
||
func (cg *configChangeParam) ApplyChange(content string) string { | ||
r := regexp.MustCompile(fmt.Sprintf(`(?m)#?.*%s:.*$`, cg.field)) | ||
return r.ReplaceAllString(content, fmt.Sprintf("%s: %s", cg.field, cg.value)) | ||
} | ||
|
||
type configChangeFeatureGate struct { | ||
name string | ||
enabled bool | ||
} | ||
|
||
func (cg *configChangeFeatureGate) ApplyChange(content string) string { | ||
r := regexp.MustCompile(fmt.Sprintf(`(?m)#? %s:.*$`, cg.name)) | ||
value := "false" | ||
if cg.enabled { | ||
value = "true" | ||
} | ||
return r.ReplaceAllString(content, fmt.Sprintf(" %s: %s", cg.name, value)) | ||
} | ||
|
||
type configChangeRaw struct { | ||
fn func(content string) string | ||
} | ||
|
||
func (cg *configChangeRaw) ApplyChange(content string) string { | ||
return cg.fn(content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. This was planned several times when we dealt with similar problems before.
All the tests have passed. I will fix the typo and merge. |
NodePortLocal will be graduated to Beta and enabled by default starting with v1.4. On the e2e test side, we follow the same logic as for other Beta features: if the feature has been disabled, we skip all the relevant tests that depend on that feature instead of updating the ConfigMap as part of the test (which is the approach we take for Alpha features). The NPL e2e tests can still update the ConfigMap if the nplPortRange needs to be updated. This commit changes the mutateAntreaConfigMap so that the function is a no-op in case there is no actual change. Fixes antrea-io#2923 Signed-off-by: Antonin Bas <[email protected]>
In addition to the feature gate. NPL options are grouped under the `nodePortLocal` configuration option. Signed-off-by: Antonin Bas <[email protected]>
Signed-off-by: Antonin Bas <[email protected]>
Signed-off-by: Antonin Bas <[email protected]>
Signed-off-by: Antonin Bas <[email protected]>
cb00174
to
94e19c6
Compare
NodePortLocal will be graduated to Beta and enabled by default starting with v1.4. In order to avoid CPU & memory overhead in the K8s control plane and in the Antrea Agent, a configuration parameter is provided (in addition to the feature gate) to control enablement of the feature. All NPL options (enable and portRange) are grouped under one top-level config option (nodePortLocal) like we have done for other features. By default, the feature is disabled and nodePortLocal.enable needs to be set to true. Note that eventually the feature gate will go GA and the config option will be the only way to enable / disable NPL. On the e2e test side, we follow the same logic as for other Beta features: if the feature has been disabled, we skip all the relevant tests that depend on that feature instead of updating the ConfigMap as part of the test (which is the approach we take for Alpha features). The NPL e2e tests still need to update the ConfigMap to enable the feature and change the portRange. This commit changes the mutateAntreaConfigMap so that the function is a no-op in case there is no actual change. Fixes antrea-io#2923 Signed-off-by: Antonin Bas <[email protected]>
NodePortLocal will be graduated to Beta and enabled by default starting
with v1.4.
In order to avoid CPU & memory overhead in the K8s control plane and
in the Antrea Agent, a configuration parameter is provided (in addition
to the feature gate) to control enablement of the feature. All NPL
options (enable and portRange) are grouped under one top-level config
option (nodePortLocal) like we have done for other features. By default,
the feature is disabled and nodePortLocal.enable needs to be set to
true.
Note that eventually the feature gate will go GA and the config option
will be the only way to enable / disable NPL.
On the e2e test side, we follow the same logic as for other Beta
features: if the feature has been disabled, we skip all the relevant
tests that depend on that feature instead of updating the ConfigMap as
part of the test (which is the approach we take for Alpha
features). The NPL e2e tests still need to update the ConfigMap to
enable the feature and change the portRange. This commit changes the
mutateAntreaConfigMap so that the function is a no-op in case there is
no actual change.
Fixes #2923
Signed-off-by: Antonin Bas [email protected]