Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to setup pod network using VLANs #1125

Merged
merged 2 commits into from
Aug 10, 2020
Merged

Conversation

SaranBalaji90
Copy link
Contributor

@SaranBalaji90 SaranBalaji90 commented Aug 7, 2020

Description of changes:

Issue:
aws/containers-roadmap#177, #208, aws/containers-roadmap#398

Changes included in this PR are as follows,

  • Creates vlan for pod requesting unique security group.
  • Skips SNAT for pkts flowing out of Vlan tags, this is to prevent outbound connections from bypassing the egress check.
  • Supports NodePort for pod backed using Vlan dev.
  • Adds packet verifier binary to validate the packet flow as part of integration tests.

High level design diagram

image

Follow-up PRs

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@SaranBalaji90 SaranBalaji90 changed the title Add support to setup pod network based on Vlan ID Add support to setup pod network using VLANs Aug 7, 2020
@SaranBalaji90 SaranBalaji90 force-pushed the vlan-cni branch 2 times, most recently from 5ef3b1f to 4dd10e8 Compare August 7, 2020 19:46
* Create vlan for pod requesting unique security group.
* Adding packet verifier binary to validate the packet flow as part of integration tests.
Copy link
Contributor

@mogren mogren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

cmd/routed-eni-cni-plugin/driver/driver.go Show resolved Hide resolved
return errors.Wrapf(err, "SetupPodENINetwork failed to setup veth pair.")
}

vlanTableId := vlanId + 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Table names are currently based on Device Index for attached interfaces. Since the maximum number of ENIs for an instance is currently 50, this should be safe.

Comment on lines +341 to +343
// Prepare the Desired Rule for SNAT Rule for non-pod ENIs
snatRule := []string{"!", "-o", "vlan+",
"-m", "comment", "--comment", "AWS, SNAT",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is important, we can't SNAT traffic from pods with Branch ENIs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, what does this affect? Does this mean we can't send traffic outside the immediate VPC from these pods? (that seems bad)

Aside: my quick search just turned up https://docs.aws.amazon.com/eks/latest/userguide/external-snat.html which says:

SNAT is necessary because the internet gateway only knows how to translate between the primary private and public or elastic IP address assigned to the primary elastic network interface of the Amazon EC2 instance node that pods are running on.

Huh, I wouldn't have guessed this limitation, and I couldn't find more info in the VPC docs after a brief search. Is this an IGW limitation, or some side effect of routes that we configure for secondary addresses within CNI? (as in: how will this affect vlan interfaces, if they are configured with the vpc subnet gateway?)

Copy link
Contributor Author

@SaranBalaji90 SaranBalaji90 Aug 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ingress and Egress permission on the security group is applied to the traffic outside the instance, which means that we can't do any NAT within the instance for these pods. This will bypass the security group check.

If these pods uses private subnet which has NAT gateway associated then traffic will flow out to the internet. Otherwise it can talk only to resources within the VPC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that makes sense. Does that mean we could do the same (NAT at ngw) for secondary IPs, and remove the iptables snat?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can remove if worker nodes subnets have NAT attached, then we don't have to do this SNAT at the host.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they have a NAT Gateway, they can set the wonderfully named AWS_VPC_K8S_CNI_EXTERNALSNAT=true, and then aws-node won't set up any SNAT rules (Through this tricky code...).

Copy link
Contributor

@anguslees anguslees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, nice code.

Some minor comments, but nothing serious.

@@ -240,7 +240,7 @@ lint:

# Run go vet on source code.
vet:
go vet ./...
go vet $(ALLPKGS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why shouldn't packet-verifier pass go vet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was seeing the following errors in pcap.go, so going to follow up separately on this with upstream.

# github.com/google/gopacket/pcap
../../../../pkg/mod/github.com/google/[email protected]/pcap/pcap.go:30:22: undefined: pcapErrorNotActivated
../../../../pkg/mod/github.com/google/[email protected]/pcap/pcap.go:52:17: undefined: pcapTPtr
../../../../pkg/mod/github.com/google/[email protected]/pcap/pcap.go:64:10: undefined: pcapPkthdr
../../../../pkg/mod/github.com/google/[email protected]/pcap/pcap.go:102:7: undefined: pcapBpfProgram
../../../../pkg/mod/github.com/google/[email protected]/pcap/pcap.go:103:7: undefined: pcapPkthdr
../../../../pkg/mod/github.com/google/[email protected]/pcap/pcap.go:261:33: undefined: pcapErrorActivated
../../../../pkg/mod/github.com/google/[email protected]/pcap/pcap.go:262:33: undefined: pcapWarningPromisc
../../../../pkg/mod/github.com/google/[email protected]/pcap/pcap.go:263:33: undefined: pcapErrorNoSuchDevice
../../../../pkg/mod/github.com/google/[email protected]/pcap/pcap.go:264:33: undefined: pcapErrorDenied
../../../../pkg/mod/github.com/google/[email protected]/pcap/pcap.go:265:33: undefined: pcapErrorNotUp

cmd/routed-eni-cni-plugin/cni_test.go Outdated Show resolved Hide resolved
cmd/routed-eni-cni-plugin/driver/driver.go Outdated Show resolved Hide resolved

// 1. clean up if vlan already exists (necessary when trunk ENI changes).
if oldVlan, err := os.netLink.LinkByName(vlanLink.Name); err == nil {
if err = os.netLink.LinkDel(oldVlan); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we ignore 'not found' error here? Or will we end up retrying anyway in that (rare!) case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If err != nil it will proceed further. Do you mean to say proceed only if err != nil and isNotFound? I can add but thinking it might not be necessary.

cmd/routed-eni-cni-plugin/driver/driver.go Outdated Show resolved Hide resolved
test/cmd/packet-verifier/example/packetverifier_pod.yaml Outdated Show resolved Hide resolved
test/cmd/packet-verifier/packet-verifier.go Outdated Show resolved Hide resolved
test/cmd/packet-verifier/packet-verifier.go Outdated Show resolved Hide resolved
RUN go build -o packet-verifier packet-verifier.go

FROM amazonlinux:2
RUN yum install -y libpcap-devel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need all of libpcap-devel here, or will just the runtime libraries in libpcap do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially tried with just libpcap but that didn't help. Had to pull in couple of dependency from libpcap-devel as well.

@SaranBalaji90
Copy link
Contributor Author

lgtm, nice code.

Some minor comments, but nothing serious.

Thank you :)

Comment on lines 487 to 501
// 2. delete two ip rules associated with the vlan
vlanRule := os.netLink.NewRule()
vlanRule.Table = vlanId + 100
vlanRule.Priority = vlanRulePriority

for {
if err := os.netLink.RuleDel(vlanRule); err != nil {
if !containsNoSuchRule(err) {
return errors.Wrapf(err, "TeardownPodENINetwork: failed to delete container rule for %d", vlanId)
}
break
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slightly tricky and it would be nice to have a comment explaining how it works. Basically, we create a new route table for each vlan in SetupPodENINetwork(). The table only has two rules, defined in buildRoutesForVlan(). When we tear down, we want to delete both those rules. Calling netLink.RuleDel with only the table name and priority will delete the first rule matching that, that's why we have the for-loop.

@mogren mogren merged commit ad4802b into aws:master Aug 10, 2020
mogren pushed a commit to mogren/amazon-vpc-cni-k8s that referenced this pull request Aug 11, 2020
* Changes include:
* Create vlan for pod requesting unique security group.
* Adding packet verifier binary to validate the packet flow as part of integration tests.
SaranBalaji90 added a commit that referenced this pull request Aug 11, 2020
* Changes include:
* Create vlan for pod requesting unique security group.
* Adding packet verifier binary to validate the packet flow as part of integration tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants