Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graduate NodePortLocal feature from Alpha to Beta #2924

Merged
merged 5 commits into from
Oct 27, 2021

Conversation

antoninbas
Copy link
Contributor

@antoninbas antoninbas commented Oct 21, 2021

NodePortLocal will be graduated to Beta and enabled by default starting
with v1.4.

In order to avoid CPU & memory overhead in the K8s control plane and
in the Antrea Agent, a configuration parameter is provided (in addition
to the feature gate) to control enablement of the feature. All NPL
options (enable and portRange) are grouped under one top-level config
option (nodePortLocal) like we have done for other features. By default,
the feature is disabled and nodePortLocal.enable needs to be set to
true.
Note that eventually the feature gate will go GA and the config option
will be the only way to enable / disable NPL.

On the e2e test side, we follow the same logic as for other Beta
features: if the feature has been disabled, we skip all the relevant
tests that depend on that feature instead of updating the ConfigMap as
part of the test (which is the approach we take for Alpha
features). The NPL e2e tests still need to update the ConfigMap to
enable the feature and change the portRange. This commit changes the
mutateAntreaConfigMap so that the function is a no-op in case there is
no actual change.

Fixes #2923

Signed-off-by: Antonin Bas [email protected]

@antoninbas antoninbas requested review from jianjuns and tnqn October 21, 2021 20:10
@antoninbas antoninbas added the area/proxy/nodeportlocal Issues or PRs related to the NodePortLocal feature label Oct 21, 2021
@antoninbas antoninbas added this to the Antrea v1.4 release milestone Oct 21, 2021
@@ -3850,7 +3850,7 @@ data:
# Traceflow: true

# Enable NodePortLocal feature to make the pods reachable externally through NodePort
# NodePortLocal: false
# NodePortLocal: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it is a problem NPL controllers, etc. will always be started even no NPL configuration at all in most deployments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NPL controller watches for Services and local Pods. Watching for Services shouldn't have an impact since we already watch them for kube-proxy and the informer is shared. For Pods, that do mean we create an extra watch for each Agent, since we do not otherwise watch Pods. However, given that we filter based on the Node, and the filtering is done server-side, I think it's acceptable.

I imagine that as we add watches, it can increase memory usage of the K8s apiserver. I hope it's fairly low impact, given the number of watches we have.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Humm.. Still I do like to add unnecessary cost. But do we have other features (except secondary network support) that will watch Pods as well? @tnqn : comments on the Pod watch overhead?

What you think adding a config parameter to enable this feature (but integrated distros can enable it by default if they need NPL)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a cost for every watch that we start (e.g. the watches for Egress resources). Is this something we are going to do for every feature as we graduate them to Beta?

Will wait for @tnqn to comment, but after that I can add a config parameter to enable / disable NPL, like we did for IPAM.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it would be better to have a parameter to enable NPL as it's not used for most clusters (is it true? or it can be useful even without specific LB?), and there will be some overhead in cpu (serialization and deserialization), memory (Pod will be cached in agents) and network on both kube-apiserver and antrea-agent side even user is not using the feature. For features like Egress, the overhead is much less (just dummy watchers with no objects transmitted) if users is not using it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback guys, I'll update the PR accordingly

@codecov-commenter
Copy link

codecov-commenter commented Oct 21, 2021

Codecov Report

Merging #2924 (94e19c6) into main (5dea7b5) will decrease coverage by 0.01%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2924      +/-   ##
==========================================
- Coverage   40.59%   40.57%   -0.02%     
==========================================
  Files         158      158              
  Lines       19952    19950       -2     
==========================================
- Hits         8099     8095       -4     
  Misses      11082    11082              
- Partials      771      773       +2     
Flag Coverage Δ
unit-tests 40.57% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/agent/nodeportlocal/npl_agent_init.go 77.27% <0.00%> (+6.43%) ⬆️
pkg/apiserver/storage/ram/store.go 76.86% <0.00%> (-1.50%) ⬇️
pkg/controller/traceflow/controller.go 66.29% <0.00%> (-1.13%) ⬇️

@antoninbas antoninbas force-pushed the graduate-NodePortLocal-to-Beta branch from 211de6f to 377e471 Compare October 25, 2021 21:26
@antoninbas
Copy link
Contributor Author

@jianjuns @tnqn I have added the config option. Changes are in a separate commit for ease of review. I have also updated the PR description.

cmd/antrea-agent/util.go Outdated Show resolved Hide resolved
cmd/antrea-agent/util.go Outdated Show resolved Hide resolved
@@ -164,6 +166,14 @@ func (o *Options) validate(args []string) error {
}
}
}
if features.DefaultFeatureGate.Enabled(features.NodePortLocal) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you validate feature gate must be enabled if NodePortLocalConfig.Enable is true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

jianjuns
jianjuns previously approved these changes Oct 26, 2021
@@ -173,6 +173,8 @@ func (o *Options) validate(args []string) error {
}
o.nplStartPort = startPort
o.nplEndPort = endPort
} else if o.config.NodePortLocal.Enable {
klog.InfoS("The nodePortLocal.enable config option is set to true, but it will be ignored because the NodePortLocal feature gate is disabled")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be a warning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We din't use warnings with structured logging any more. The recommendation is to use InfoS with the default verbosity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

Comment on lines +191 to +224
type configChange interface {
ApplyChange(content string) string
}

type configChangeParam struct {
field string
value string
}

func (cg *configChangeParam) ApplyChange(content string) string {
r := regexp.MustCompile(fmt.Sprintf(`(?m)#?.*%s:.*$`, cg.field))
return r.ReplaceAllString(content, fmt.Sprintf("%s: %s", cg.field, cg.value))
}

type configChangeFeatureGate struct {
name string
enabled bool
}

func (cg *configChangeFeatureGate) ApplyChange(content string) string {
r := regexp.MustCompile(fmt.Sprintf(`(?m)#? %s:.*$`, cg.name))
value := "false"
if cg.enabled {
value = "true"
}
return r.ReplaceAllString(content, fmt.Sprintf(" %s: %s", cg.name, value))
}

type configChangeRaw struct {
fn func(content string) string
}

func (cg *configChangeRaw) ApplyChange(content string) string {
return cg.fn(content)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is getting less than ideal. We could revisit the idea of moving the definition of the AntreaConfig struct to a package that can be easily imported. @tnqn what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. This was planned several times when we dealt with similar problems before.

@antoninbas
Copy link
Contributor Author

/test-all

tnqn
tnqn previously approved these changes Oct 27, 2021
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, a typo

in `nplPortRange`. If the value of `nplPortRange` is not specified, the range
`61000-62000` will be used by default.
In addition to enabling the NodePortLocal feature gate (if needed), you need to
ensure that the `nodePortLocal.enable` flag is set so true in the Antrea Agent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ensure that the `nodePortLocal.enable` flag is set so true in the Antrea Agent
ensure that the `nodePortLocal.enable` flag is set to true in the Antrea Agent

Comment on lines +191 to +224
type configChange interface {
ApplyChange(content string) string
}

type configChangeParam struct {
field string
value string
}

func (cg *configChangeParam) ApplyChange(content string) string {
r := regexp.MustCompile(fmt.Sprintf(`(?m)#?.*%s:.*$`, cg.field))
return r.ReplaceAllString(content, fmt.Sprintf("%s: %s", cg.field, cg.value))
}

type configChangeFeatureGate struct {
name string
enabled bool
}

func (cg *configChangeFeatureGate) ApplyChange(content string) string {
r := regexp.MustCompile(fmt.Sprintf(`(?m)#? %s:.*$`, cg.name))
value := "false"
if cg.enabled {
value = "true"
}
return r.ReplaceAllString(content, fmt.Sprintf(" %s: %s", cg.name, value))
}

type configChangeRaw struct {
fn func(content string) string
}

func (cg *configChangeRaw) ApplyChange(content string) string {
return cg.fn(content)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. This was planned several times when we dealt with similar problems before.

@antoninbas
Copy link
Contributor Author

All the tests have passed. I will fix the typo and merge.

NodePortLocal will be graduated to Beta and enabled by default starting
with v1.4.

On the e2e test side, we follow the same logic as for other Beta
features: if the feature has been disabled, we skip all the relevant
tests that depend on that feature instead of updating the ConfigMap as
part of the test (which is the approach we take for Alpha
features). The NPL e2e tests can still update the ConfigMap if the
nplPortRange needs to be updated. This commit changes the
mutateAntreaConfigMap so that the function is a no-op in case there is
no actual change.

Fixes antrea-io#2923

Signed-off-by: Antonin Bas <[email protected]>
In addition to the feature gate.
NPL options are grouped under the `nodePortLocal` configuration option.

Signed-off-by: Antonin Bas <[email protected]>
Signed-off-by: Antonin Bas <[email protected]>
Signed-off-by: Antonin Bas <[email protected]>
@antoninbas antoninbas force-pushed the graduate-NodePortLocal-to-Beta branch from cb00174 to 94e19c6 Compare October 27, 2021 16:46
@antoninbas antoninbas merged commit ae2b53d into antrea-io:main Oct 27, 2021
@antoninbas antoninbas deleted the graduate-NodePortLocal-to-Beta branch October 27, 2021 17:23
wenqiq pushed a commit to wenqiq/antrea that referenced this pull request Oct 29, 2021
NodePortLocal will be graduated to Beta and enabled by default starting
with v1.4.

In order to avoid CPU & memory overhead in the K8s control plane and
in the Antrea Agent, a configuration parameter is provided (in addition
to the feature gate) to control enablement of the feature. All NPL
options (enable and portRange) are grouped under one top-level config
option (nodePortLocal) like we have done for other features. By default,
the feature is disabled and nodePortLocal.enable needs to be set to
true.
Note that eventually the feature gate will go GA and the config option
will be the only way to enable / disable NPL.

On the e2e test side, we follow the same logic as for other Beta
features: if the feature has been disabled, we skip all the relevant
tests that depend on that feature instead of updating the ConfigMap as
part of the test (which is the approach we take for Alpha
features). The NPL e2e tests still need to update the ConfigMap to
enable the feature and change the portRange. This commit changes the
mutateAntreaConfigMap so that the function is a no-op in case there is
no actual change.

Fixes antrea-io#2923

Signed-off-by: Antonin Bas <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/proxy/nodeportlocal Issues or PRs related to the NodePortLocal feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Graduate the NodePortLocal feature to Beta
4 participants