Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing NodePortLocal in Antrea Agent #1459

Merged
merged 16 commits into from
Jan 12, 2021

Conversation

monotosh-avi
Copy link
Contributor

@monotosh-avi monotosh-avi commented Oct 29, 2020

This commit introduces NodePortLocal (NPL) in Antrea agent.
With NPL, a Pod port can be directly reached from external network through a port of the node on which the pod is running.
NPL programs IPTABLES rules to send incoming traffic on a specific port of a node to a POD port.
This information is exposed through annotation of the POD for consumption of other agents. An example of the pod annotation is given below.
metadata:
annotations:
npl.k8s.io/endpoints: '[{"podport":"8080","nodeip":"10.102.47.229","nodeport":"40002"}]'

To use this feature, following changes are required in Antrea configuration:

  • Set NodePortLocal: true under featureGates in antrea-agent.conf in Antrea Configmap.
  • In the Antrea Agent DaemonSet configure the port range for NPL (NPL_PORT_RANGE) in the env. e.g.
    - name: NPL_PORT_RANGE
    value: 40000-46000
  • Add capability for pod update in antrea-agent cluster role

Note: NPL agent is currently not supported in Windows platform

Upcoming changes in future commits:

  • Label based service filtering to select backend pods for NPL
  • Option to select port range for NPL though Configmap
  • Bootup sync : Compare programmed IPTABLES rules with pod annotation during bootup and add/delete new rules only if required.
    • Currently we are deleting all the rules and adding new rules for all the pods while booting up Antrea agent.

Signed-off-by: Hemant Shaw [email protected]
Signed-off-by: Manu Dilip Shah [email protected]
Signed-off-by: Monotosh Das [email protected]
Signed-off-by: Shubham Chauhan [email protected]
Signed-off-by: Sudipta Biswas [email protected]

@antrea-bot
Copy link
Collaborator

Thanks for your PR.
Unit tests and code linters are run automatically every time the PR is updated.
E2e, conformance and network policy tests can only be triggered by a member of the vmware-tanzu organization. Regular contributors to the project should join the org.

The following commands are available:

  • /test-e2e: to trigger e2e tests.
  • /skip-e2e: to skip e2e tests.
  • /test-conformance: to trigger conformance tests.
  • /skip-conformance: to skip conformance tests.
  • /test-whole-conformance: to trigger all conformance tests on linux.
  • /skip-whole-conformance: to skip all conformance tests on linux.
  • /test-networkpolicy: to trigger networkpolicy tests.
  • /skip-networkpolicy: to skip networkpolicy tests.
  • /test-windows-conformance: to trigger windows conformance tests.
  • /skip-windows-conformance: to skip windows conformance tests.
  • /test-windows-networkpolicy: to trigger windows networkpolicy tests.
  • /skip-windows-networkpolicy: to skip windows networkpolicy tests.
  • /test-hw-offload: to trigger ovs hardware offload test.
  • /skip-hw-offload: to skip ovs hardware offload test.
  • /test-all: to trigger all tests (except whole conformance).
  • /skip-all: to skip all tests (except whole conformance).

@codecov-io
Copy link

codecov-io commented Oct 29, 2020

Codecov Report

Merging #1459 (b7f2021) into master (9d3d10b) will decrease coverage by 1.66%.
The diff coverage is 45.57%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1459      +/-   ##
==========================================
- Coverage   63.31%   61.65%   -1.67%     
==========================================
  Files         170      193      +23     
  Lines       14250    16961    +2711     
==========================================
+ Hits         9023    10458    +1435     
- Misses       4292     5445    +1153     
- Partials      935     1058     +123     
Flag Coverage Δ
e2e-tests 46.21% <38.28%> (?)
kind-e2e-tests 51.54% <37.23%> (-3.85%) ⬇️
unit-tests 40.39% <36.77%> (-0.88%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
cmd/antrea-agent/agent.go 0.00% <0.00%> (ø)
.../agent/apiserver/handlers/networkpolicy/handler.go 58.33% <ø> (ø)
...gent/controller/noderoute/node_route_controller.go 60.98% <0.00%> (+14.51%) ⬆️
pkg/agent/nodeportlocal/k8s/annotations.go 0.00% <0.00%> (ø)
pkg/agent/nodeportlocal/k8s/controller.go 0.00% <0.00%> (ø)
pkg/agent/nodeportlocal/portcache/port_table.go 0.00% <0.00%> (ø)
pkg/agent/nodeportlocal/rules/iptable_rule.go 0.00% <0.00%> (ø)
pkg/agent/nodeportlocal/rules/rules.go 0.00% <0.00%> (ø)
pkg/agent/nodeportlocal/util/parse_port.go 0.00% <0.00%> (ø)
pkg/agent/proxy/proxier.go 77.20% <ø> (+2.39%) ⬆️
... and 159 more

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some initial comments

I'm not 100% sure (would like to hear from others) but it seems maybe the entire npl code could go under pkg/agent/controller to match other parts of the agent's implementation.

"testing"
"time"

"github.com/onsi/gomega"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't use gomega for unit tests in this repo, we use the Go testing framework directly + github.com/stretchr/testify for test assertions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I'll check it out.

package lib

const (
NPLEPAnnotation = "npl.k8s.io/endpoints"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not an expert there, but it seems wrong to use k8s.io for an Antrea-specific feature. The K8s documentation states:

The kubernetes.io/ and k8s.io/ prefixes are reserved for Kubernetes core components.

It seems that it should be npl.antrea.tanzu.vmware.com to be consistent with the Antrea APIs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Or for short, we can consider antrea or antrea.io too.


const (
NPLEPAnnotation = "npl.k8s.io/endpoints"
IPTableRule = "IPTABLE"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on usage in pkg/agent/nplagent/rules/rules.go, it seems that this should be an enum or you should define a type for this. For example:

type NPLRuleImplementation string
const NPLRuleImplementationIptable NPLRuleImplementation = "Iptable"

Comment on lines 33 to 31
func Initrules(args ...string) PodPortRules {
var ruletype string
if len(args) == 0 {
// By default use iptable
ruletype = nplutils.IPTableRule
} else {
ruletype = args[0]
}
switch ruletype {
case "IPTABLE":
ipt, err := iptables.New()
if err != nil {
return nil
}
iptRule := IPTableRule{
name: "NPL",
table: ipt,
}

iptRule.DeleteAllRules()
iptRule.CreateChains()
return &iptRule
}
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe I'm missing something, but this seems to really defeat the point of having an interface.

there should be a NewIPTableRule constructor and the client should be responsible for calling Init on the returned object, which would in turn take care of creating new chains, etc if necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should be able to figure out the type of implementation for PodPortRules from the underneath system's capability or configuration specified. In this case, as we are only supporting one kind of implementation we don't need to do that.
Though, I get your point about adding an abstraction. I'll update it.

"k8s.io/klog"
)

type IPTableRule struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't know if this is a great name, this doesn't seem to represent a single rule

Comment on lines 29 to 30
klog.Errorf("NodePortLocal is currently not supported in Windows Platform ")
return errors.New("Windows Platform not supported fot NPL")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as in another place: do not log and return the same error, it will end up being logged twice

@@ -0,0 +1,43 @@
// +build !windows
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there is no need for a bootstrap package here, these files can go in `pkg/agent/nplagent/ directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes Sense. I'll move the file.

// CreateChains : Create the chain NODE-PORT-LOCAL in NAT table
// All DNAT rules for NPL would be added in this chain
func (ipt *IPTableRule) CreateChains() error {
exists, err := ipt.table.Exists("nat", "NODE-PORT-LOCAL")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"NODE-PORT-LOCAL" should maybe be a constant (nodePortLocalChain)

Comment on lines 26 to 30
AddRule(port int, podip string) (bool, error)
DeleteRule(port int, podip string) (bool, error)
SyncState(podPort map[int]string) bool
GetAllRules(podPort map[int]string) (bool, error)
DeleteAllRules() (bool, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why most of these interface functions return both a boolean (to indicate success) and an error, instead of just an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. I'll just return one value here.

pt.tableLock.RLock()
defer pt.tableLock.RUnlock()
for i := pt.StartPort; i <= pt.EndPort; i++ {
if _, ok := pt.Table[i]; !ok && nplutils.IsPortAvailable(i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand the usefulness of the IsPortAvailable port here. Once we select a port number for NPL and install the iptables NAT rule, is there anything preventing a process running on the host to bind to this port as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do a unix socket bind in IsPortAvailable(). So, at least we should be able to check if any other process is already using this port and skip it.

@antoninbas antoninbas requested a review from tnqn October 29, 2020 20:38
Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, currently Agent K8s controllers are all under pkg/agent/controller, but I can understand if a feature is big you might not want to put all code under controller/. Is it possible to keep only K8s controller code in controller/?
@tnqn

pkg/agent/nplagent/bootstrap/npl_agent_init_windows.go Outdated Show resolved Hide resolved
build/yamls/antrea.yml Outdated Show resolved Hide resolved
@monotosh-avi
Copy link
Contributor Author

I have added a commit with the changes requested

Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, have not reviewed the majority of the code yet. Some comments for the commit message:

This commit introduces NodePortLocal (NPL) in Antrea agent.
With NPL, a pod pod port can be directly reached from external network through a port of the node on which the pod is running.

"a pod pod port" -> "a Pod port"?
You used both "pod" and "POD". I suggest to use Pod (Antrea code/documentation capitalize the first letter of a K8s resource type, like NetworkPolicy, Service, Namespace, etc.).

NPL programs IPTABLE rules to send incoming traffic on a specific port of a node to a POD port.

IPTABLE -> IPTABLES or iptables

This information is exposed through annotation of the POD for consumption of other agents. An example of the pod annotation is given bellow.

bellow -> below

metadata:
annotations:
npl.k8s.io/endpoints: '[{"podport":"8080","nodeip":"10.102.47.229","nodeport":"40002"}]'
More details about Node-Port-Local can be found in the design document: https://docs.google.com/document/d/1SoAKBvbbt2QMnGmt53E8ja0ZxCv_T4r1zLVnMzpykTg

I suggest not to add google doc links to commit messages or code, as the links can become invalid.

To use this feature, following changes are required in Antrea configuration:

  • Set NodePortLocal: true under featureGates in antrea-agent.conf in Antrea Configmap.
  • In the Antrea Agent DaemonSet configure the port range for NPL (NPL_PORT_RANGE) in the env. e.g.
    - name: NPL_PORT_RANGE
    value: 40000-46000
  • Add capability for pod update in antrea-agent cluster role

Note: NPL agent is currently not supported in Windows platform

Upcoming changes in future commits:

  • Label based service filtering to select backend pods for NPL
  • Option to select port range for NPL though Configmap
  • Bootup sync : Compare programmed IPtable rules with pod annotation during bootup and add/delete new rules in iptable only if required.

IPTable/iptable -> IPTABLES or iptables

- Currently we are deleting all the rules and adding new rules for all the pods while booting up Antrea agent.

build/yamls/base/conf/antrea-agent.conf Outdated Show resolved Hide resolved
pkg/features/antrea_features.go Outdated Show resolved Hide resolved
pkg/agent/nplagent/rules/rules.go Outdated Show resolved Hide resolved
pkg/agent/nplagent/k8s/annotations.go Outdated Show resolved Hide resolved
pkg/agent/nplagent/lib/constants.go Outdated Show resolved Hide resolved
package lib

const (
NPLEPAnnotation = "npl.antrea.io"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be in k8s/annotation.go? Does it need to be public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using this in tests. I have moved this to k8s/annotations.go, but keeping it public.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

pkg/agent/nplagent/lib/constants.go Outdated Show resolved Hide resolved
pkg/agent/nplagent/lib/utils.go Outdated Show resolved Hide resolved
pkg/agent/nplagent/lib/utils.go Outdated Show resolved Hide resolved
pkg/agent/nplagent/lib/utils.go Outdated Show resolved Hide resolved
@antrea-bot
Copy link
Collaborator

Thanks for your PR.
Unit tests and code linters are run automatically every time the PR is updated.
E2e, conformance and network policy tests can only be triggered by a member of the vmware-tanzu organization. Regular contributors to the project should join the org.

The following commands are available:

  • /test-e2e: to trigger e2e tests.
  • /skip-e2e: to skip e2e tests.
  • /test-conformance: to trigger conformance tests.
  • /skip-conformance: to skip conformance tests.
  • /test-all-features-conformance: to trigger conformance tests with all alpha features enabled.
  • /skip-all-features-conformance: to skip conformance tests with all alpha features enabled.
  • /test-whole-conformance: to trigger all conformance tests on linux.
  • /skip-whole-conformance: to skip all conformance tests on linux.
  • /test-networkpolicy: to trigger networkpolicy tests.
  • /skip-networkpolicy: to skip networkpolicy tests.
  • /test-windows-conformance: to trigger windows conformance tests.
  • /skip-windows-conformance: to skip windows conformance tests.
  • /test-windows-networkpolicy: to trigger windows networkpolicy tests.
  • /skip-windows-networkpolicy: to skip windows networkpolicy tests.
  • /test-hw-offload: to trigger ovs hardware offload test.
  • /skip-hw-offload: to skip ovs hardware offload test.
  • /test-all: to trigger all tests (except whole conformance).
  • /skip-all: to skip all tests (except whole conformance).

Copy link
Contributor Author

@monotosh-avi monotosh-avi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jianjun for reviewing. I have updated with a commit.

pkg/features/antrea_features.go Outdated Show resolved Hide resolved
pkg/agent/nplagent/rules/rules.go Outdated Show resolved Hide resolved
pkg/agent/nplagent/lib/utils.go Outdated Show resolved Hide resolved
pkg/agent/nplagent/lib/utils.go Outdated Show resolved Hide resolved
package lib

const (
NPLEPAnnotation = "npl.antrea.io"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using this in tests. I have moved this to k8s/annotations.go, but keeping it public.

pkg/agent/nplagent/lib/constants.go Outdated Show resolved Hide resolved
pkg/agent/nplagent/k8s/annotations.go Outdated Show resolved Hide resolved
build/yamls/base/conf/antrea-agent.conf Outdated Show resolved Hide resolved
pkg/agent/nplagent/rules/rules.go Outdated Show resolved Hide resolved
pkg/agent/nplagent/lib/constants.go Outdated Show resolved Hide resolved
pkg/agent/nplagent/rules/iptable_rule.go Outdated Show resolved Hide resolved
pkg/agent/nplagent/rules/iptable_rule.go Outdated Show resolved Hide resolved
pkg/agent/nplagent/rules/iptable_rule.go Outdated Show resolved Hide resolved
pkg/agent/nplagent/k8s/k8s_controller.go Outdated Show resolved Hide resolved
Copy link
Contributor Author

@monotosh-avi monotosh-avi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with the requested changes.
I'll wait for the review of @tnqn for more changes.

pkg/agent/nplagent/rules/iptable_rule.go Outdated Show resolved Hide resolved
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't finished review yet.

@@ -97,3 +100,6 @@ featureGates:
# the flow collector.
# Flow export frequency should be greater than or equal to 1.
#flowExportFrequency: 12

# Provide the port range used by NodePortLocal for programming IPTABLES rules. This is only used if NodePortLocal is enabled.
#nplPortRange: 40000=41000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if features.DefaultFeatureGate.Enabled(features.NodePortLocal) {
err := npl.InitializeNPLAgent(k8sClient, informerFactory, o.config.NPLPortRange)
if err != nil {
klog.Warningf("Failed to start NPL agent: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the process exit with error if it fails to initialize a requested feature?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes Sense. I'll add it.

type NPLEPAnnotation struct {
PodPort string `json:"Podport"`
NodeIP string `json:"Nodeip"`
NodePort string `json:"Nodeport"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The json keys are normally in this style: podPort, nodeIP, nodePort.
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/api/core/v1/types.go#L61

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update it.

}

func Stringify(serialize interface{}) string {
json_marshalled, _ := json.Marshal(serialize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jsonMarshalled to keep golang style?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change it.


// RemoveNPLAnnotationFromPods : Removes npl annotations from all pods
func (c *Controller) RemoveNPLAnnotationFromPods() {
podList, err := c.KubeClient.CoreV1().Pods("").List(context.TODO(), metav1.ListOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be expensive, in large scale, a single such call will cause considerable overhead to kube-apiserver.
Currently antrea-agent only lists Pods of this Node once on start, and it uses "resourceVersion=0" in the options to avoid reading from etcd, which caused performance issue in scale clusters. See #1044 for details.
I think if it's one time job, it could reuse the result of https://github.com/vmware-tanzu/antrea/blob/714af4df1c826131075d4dc65d50f4c6f7049902/pkg/agent/cniserver/server.go#L606 which lists Pods of this Node only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is one time operation only. I'll follow your suggestion.

klog.Infof("Removing all NPL annotation from pod: %s, ns: %s", pod.Name, pod.Namespace)
delete(podAnnotation, NPLAnnotationStr)
pod.Annotations = podAnnotation
c.KubeClient.CoreV1().Pods(pod.Namespace).Update(context.TODO(), &pod, metav1.UpdateOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be problem if the update fails?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be a problem in that Case. I'll add a log here.

klog.Warningf("Unable to update pod %s with annotation: %+v", pod.Name, err)
return err
}
klog.Infof("Successfully updated pod %s %s annotation", pod.Name, pod.Namespace)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
klog.Infof("Successfully updated pod %s %s annotation", pod.Name, pod.Namespace)
klog.Infof("Successfully updated pod %s/%s annotation", pod.Namespace, pod.Name)

to be more readable and natural?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tnqn I have updated incorporating the changes you suggested. Please let me know further comments for the other files.

@antoninbas antoninbas added this to the Antrea v0.12.0 release milestone Nov 5, 2020
@monotosh-avi monotosh-avi force-pushed the monotosh-npl branch 6 times, most recently from 5b01a1f to 327e9bf Compare November 12, 2020 10:38
@monotosh-avi
Copy link
Contributor Author

@tnqn @antoninbas @jianjuns Let me know about the further changes required. After that, I'll raise another PR to add changes required for supporting service selector.

cmd/antrea-agent/agent.go Outdated Show resolved Hide resolved
pkg/agent/nplagent/k8s/annotations.go Outdated Show resolved Hide resolved
current = pod.Annotations
}

klog.Infof("Building annotation for pod: %s\tport: %s --> %s:%s", pod.Name, containerPort, nodeIP, nodePort)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the log temporary for development or supposed to be V(0)? not sure if it's too verbose to be default level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure changed to V(0)

pkg/agent/nplagent/k8s/annotations.go Outdated Show resolved Hide resolved
pkg/agent/nplagent/k8s/annotations.go Outdated Show resolved Hide resolved
defer o.Unlock()
ports := []int32{}
if !o.IsPodInStore(pod.Name) {
o.podPortsMap[pod.Name] = ports
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the end of this function, the pod will be inserted to the map anyway, why do we insert an empty slice here?

}
}
return false
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see the 3 methods in this file are used anywhere, is the code completed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I was planning to use this when we do a sync during boot up. This is not required now. I have removed this file

// See the License for the specific language governing permissions and
// limitations under the License.

package lib
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we add this file only when we need one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure removed

"k8s.io/klog"
)

// HasElem : check if a slice contains an element
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// HasElem : check if a slice contains an element
// HasElem checks if a slice contains an element.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return false
}

// ParsePortsRange : parse port range and check if valid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

@monotosh-avi monotosh-avi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot Quan for your detailed review. I have addressed most of the points except a few which I'm not sure about / planning to take care in next PR

})
if err != nil {
klog.Warningf("Unable to list Pods, err: %v", err)
return
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this during boot up.
In the next PR, I'll add the change to compare existing iptbales rules with pod annotations and apply only required changes during boot up. Then we can remove this function.

return false
}

// ParsePortsRange : parse port range and check if valid
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"k8s.io/klog"
)

// HasElem : check if a slice contains an element
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// See the License for the specific language governing permissions and
// limitations under the License.

package lib
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure removed

}
}
return false
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I was planning to use this when we do a sync during boot up. This is not required now. I have removed this file

pkg/agent/nplagent/k8s/handlers.go Outdated Show resolved Hide resolved
pkg/agent/nplagent/k8s/handlers.go Outdated Show resolved Hide resolved
pkg/agent/nplagent/k8s/k8s_controller.go Outdated Show resolved Hide resolved
pkg/agent/nplagent/k8s/k8s_controller.go Outdated Show resolved Hide resolved
objOnce.Do(func() {
objStore = ObjectStore{}
})
return &objStore
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed this file now

@monotosh-avi monotosh-avi force-pushed the monotosh-npl branch 2 times, most recently from 21490da to 78fc586 Compare November 17, 2020 03:47
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some more nits, otherwise LGTM

I think @tnqn should review as well before we merge

@@ -120,4 +120,6 @@ type AgentConfig struct {
// Flow export frequency should be greater than or equal to 1.
// Defaults to "12".
FlowExportFrequency uint `yaml:"flowExportFrequency,omitempty"`
// Provide the port range used by NodePortLocal for programming IPTABLES rules. This is only used if NodePortLocal is enabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we match the comment from build/yamls/base/conf/antrea-agent.conf?

klog.Warningf("Unable to update Pod %s with annotation: %+v", pod.Name, err)
return err
}
klog.V(2).Infof("Successfully updated Pod %s/%s annotation", pod.Namespace, pod.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
klog.V(2).Infof("Successfully updated Pod %s/%s annotation", pod.Namespace, pod.Name)
klog.V(2).Infof("Successfully updated annotation for Pod %s/%s", pod.Namespace, pod.Name)

return
}
}
podKey := pod.Namespace + "/" + pod.Name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe define a local helper function for that, e.g. func podKeyFunc(pod *corev1.Pod) string

return nil
}

// HandleDeletePod Removes rules from port table and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Removes/removes

if !c.portTable.RuleExists(podIP, int(cport.ContainerPort)) {
err = c.addRuleForPod(newPod)
if err != nil {
return fmt.Errorf("Failed to add rule for Pod %s/%s: %s", newPod.Namespace, newPod.Name, err.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Failed/failed for consistency

same below

const resyncPeriod = 60 * time.Minute

// InitializeNPLAgent initializes the NodePortLocal (NPL) agent.
// It initializes the port table cache to keep rack of Node ports available for use by NPL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/keep rack/keep track

var data string
var found bool

a := assert.New(t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think the style should match for the assert & require assertions

so you can define r := require.New(t) as well, or switch to using assert.<Func>(t, ...)

if err != nil {
return fmt.Errorf("IPTABLES rule creation failed for NPL with error: %v", err)
}
klog.Infof("successfully added rule for pod %s: %d", podip, port)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/pod/Pod

}

// AddRule appends a DNAT rule in NodePortLocalChain chain of NAT table
func (ipt *IPTableRules) AddRule(port int, podip string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest renaming podip to podIP

tableLock sync.RWMutex
}

func NewPortTable(start, end int) (*PortTable, bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idea for a future PR: we can do the same thing as kube-proxy for NodePorts and actually "hold" the port by binding a socket to it and keeping it alive

see https://github.com/kubernetes/kubernetes/blob/86f8c3ee91b6faec437f97e3991107747d7fc5e8/pkg/proxy/iptables/proxier.go#L1664

I know you had something like this originally, but IIRC you were just trying to bind to the port to check if it was available at that point in time, and not actually reserving the port indefinitely

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure Antonin. I'll take care of this in next PR.

- Updated logs and comments
- Added a check for existence of NPL chain before deleting all rules from iptables
build/yamls/base/conf/antrea-agent.conf Outdated Show resolved Hide resolved
cmd/antrea-agent/config.go Outdated Show resolved Hide resolved
cmd/antrea-agent/options.go Outdated Show resolved Hide resolved
pkg/agent/nodeportlocal/k8s/annotations.go Outdated Show resolved Hide resolved
pkg/agent/nodeportlocal/k8s/controller.go Outdated Show resolved Hide resolved
pkg/agent/nodeportlocal/k8s/controller.go Outdated Show resolved Hide resolved
pkg/agent/nodeportlocal/nplagent_test.go Outdated Show resolved Hide resolved
return nil
}

func (pt *PortTable) getFreePort(podip string, podport int) int {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (pt *PortTable) getFreePort(podip string, podport int) int {
func (pt *PortTable) getFreePort(podIP string, podPort int) int {

and so do below

pkg/agent/nodeportlocal/rules/rules.go Outdated Show resolved Hide resolved
pkg/features/antrea_features.go Outdated Show resolved Hide resolved
podIP := pod.Status.PodIP
if podIP == "" {
klog.Infof("IP address not found for pod: %s/%s", pod.Namespace, pod.Name)
klog.Infof("Got delete event for pod: %s", key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to generate an INFO log message for every Pod add/delete event? If not, consider adding a verbosity level, like klog.V(2).Infof().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be fine for delete events to have logs ? @antoninbas had suggested this.

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for addressing my comments. One last comment left.

}
assignPodAnnotation(newPod, newPod.Status.HostIP, port, nodePort)

if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is duplicate with L196?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Corrected.

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn
Copy link
Member

tnqn commented Jan 8, 2021

/test-all

@monotosh-avi
Copy link
Contributor Author

/jenkins-networkpolicy

@monotosh-avi
Copy link
Contributor Author

/test-networkpolicy

@antoninbas antoninbas merged commit 90719c3 into antrea-io:master Jan 12, 2021
antoninbas pushed a commit that referenced this pull request Jan 13, 2021
This commit introduces NodePortLocal (NPL) in Antrea agent.
With NPL, a Pod port can be directly reached from external network through a port of the Node on which the Pod is running.
NPL programs IPTABLE rules to send incoming Node traffic to the target Pod port.
This information is exposed through an annotation on the Pod object, for consumption by other entities, such as external load-balancers. An example of the pod annotation is given bellow:

metadata:
  annotations:
    npl.k8s.io/endpoints: '[{"podport":"8080","nodeip":"10.102.47.229","nodeport":"40002"}]'

To use this feature, following changes are required in the Antrea configuration:
- Enable the "NodePortLocal" featureGate in the antrea-agent configuration
- If desired, change the value of "nplPortRange" in the antrea-agent configuration

When using this feature, the Antrea Agent need to mutate Pod objects, which is why access to the Pod API was added to the antrea-agent ClusterRole.

Note: NPL agent is currently not supported in Windows platform

Future changes:
- Label based service filtering to select backend pods for NPL
- Sync on start: compare programmed iptables rules with pod annotation during bootup and add/delete new rules in iptable only if required. Currently we are deleting all the rules and adding new rules for all the pods while booting up Antrea agent.
- Add e2e test
- Add documentation

See #969 

Signed-off-by: Hemant Shaw <[email protected]>
Signed-off-by: Manu Dilip Shah <[email protected]>
Signed-off-by: Monotosh Das <[email protected]>
Signed-off-by: Shubham Chauhan <[email protected]>
Signed-off-by: Sudipta Biswas <[email protected]>
hemantavi added a commit to hemantavi/antrea that referenced this pull request Jan 18, 2021
Currently, we delete all the iptable rules when the antrea-agent
is restarted. This commit does the following:
- Evaluates the iptable rules in the Node Port Local chain and
  check if a rule is required. If a pod got deleted while the
  antrea-agent was restarting, we don't need that rule and has
  to be deleted.
- The above evaluation logic also creates a cache for all the
  pods retrived from the iptable rules.
- Evaluates the kubernetes Pods and checks if a rule has to be
  added for a Pod.

Part of TODO as mentioned
[here](antrea-io#1459).
hemantavi added a commit to hemantavi/antrea that referenced this pull request Jan 18, 2021
Currently, we delete all the iptable rules when the antrea-agent
is restarted. This commit does the following:
- Evaluates the iptable rules in the Node Port Local chain and
  check if a rule is required. If a pod got deleted while the
  antrea-agent was restarting, we don't need that rule and has
  to be deleted.
- The above evaluation logic also creates a cache for all the
  pods retrived from the iptable rules.
- Evaluates the kubernetes Pods and checks if a rule has to be
  added for a Pod.

Part of TODO as mentioned
[here](antrea-io#1459).
hemantavi added a commit to hemantavi/antrea that referenced this pull request Jan 18, 2021
Currently, we delete all the iptable rules when the antrea-agent
is restarted. This commit does the following:
- Evaluates the iptable rules in the Node Port Local chain and
  check if a rule is required. If a pod got deleted while the
  antrea-agent was restarting, we don't need that rule and has
  to be deleted.
- The above evaluation logic also creates a cache for all the
  pods retrived from the iptable rules.
- Evaluates the kubernetes Pods and checks if a rule has to be
  added for a Pod.

Part of TODO as mentioned
[here](antrea-io#1459).
hemantavi added a commit to hemantavi/antrea that referenced this pull request Jan 18, 2021
Currently, we delete all the iptable rules when the antrea-agent
is restarted. This commit does the following:
- Evaluates the iptable rules in the Node Port Local chain and
  check if a rule is required. If a pod got deleted while the
  antrea-agent was restarting, we don't need that rule and has
  to be deleted.
- The above evaluation logic also creates a cache for all the
  pods retrived from the iptable rules.
- Evaluates the kubernetes Pods and checks if a rule has to be
  added for a Pod.

Part of TODO as mentioned
[here](antrea-io#1459).
hemantavi added a commit to hemantavi/antrea that referenced this pull request Jan 18, 2021
Currently, we delete all the iptable rules when the antrea-agent
is restarted. This commit does the following:
- Evaluates the iptable rules in the Node Port Local chain and
  check if a rule is required. If a pod got deleted while the
  antrea-agent was restarting, we don't need that rule and has
  to be deleted.
- The above evaluation logic also creates a cache for all the
  pods retrived from the iptable rules.
- Evaluates the kubernetes Pods and checks if a rule has to be
  added for a Pod.

Part of TODO as mentioned
[here](antrea-io#1459).
hemantavi added a commit to hemantavi/antrea that referenced this pull request Jan 18, 2021
Currently we delete all the iptable rules when the antrea-agent
is restarted. This commit does the following:
- Evaluates the iptable rules in the Node Port Local chain and
  check if a rule is required. If a Pod got deleted while the
  antrea-agent was restarting, we don't need that rule and has
  to be deleted.
- The above evaluation logic also creates a cache for all the
  pods retrived from the iptable rules.
- Evaluates the kubernetes Pods, and for each Pod, it checks if
  its part of the above cache. If yes, only the Pod's annotations
  needs to be verified. If not, a new Node port has to be allocated
  and a rule has to be added.

One of the TODOs as mentioned
[here](antrea-io#1459).
hemantavi added a commit to hemantavi/antrea that referenced this pull request Feb 3, 2021
Currently we delete all the iptable rules when the antrea-agent
is restarted. This commit does the following:
- Evaluates the iptable rules in the Node Port Local chain and
  check if a rule is required. If a Pod got deleted while the
  antrea-agent was restarting, we don't need that rule and has
  to be deleted.
- The above evaluation logic also creates a cache for all the
  pods retrived from the iptable rules.
- Evaluates the kubernetes Pods, and for each Pod, it checks if
  its part of the above cache. If yes, only the Pod's annotations
  needs to be verified. If not, a new Node port has to be allocated
  and a rule has to be added.

One of the TODOs as mentioned
[here](antrea-io#1459).
hemantavi added a commit to hemantavi/antrea that referenced this pull request Feb 10, 2021
Currently we delete all the iptable rules when the antrea-agent
is restarted. This commit does the following:
- Evaluates the iptable rules in the Node Port Local chain and
  check if a rule is required. If a Pod got deleted while the
  antrea-agent was restarting, we don't need that rule and has
  to be deleted.
- The above evaluation logic also creates a cache for all the
  pods retrived from the iptable rules.
- Evaluates the kubernetes Pods, and for each Pod, it checks if
  its part of the above cache. If yes, only the Pod's annotations
  needs to be verified. If not, a new Node port has to be allocated
  and a rule has to be added.

One of the TODOs as mentioned
[here](antrea-io#1459).
antoninbas pushed a commit that referenced this pull request Feb 11, 2021
When the Antrea Agent restarts, instead of removing the NPL annotations
from all Pods before starting the NPL Controller, we reconcile existing
Pod annotations with iptables rules using iptables-restore. We also
re-build the port table / port cache. We then start running the NPL
Controller. If the Pod NPL annotations are no longer up-to-date
(e.g. because a Service selector changed), it will be handled when the
Controller is running, just like any normal update.

This is a follow-up work item from #1459.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants