-
Notifications
You must be signed in to change notification settings - Fork 742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support to setup pod network using VLANs #1125
Conversation
5ef3b1f
to
4dd10e8
Compare
* Create vlan for pod requesting unique security group. * Adding packet verifier binary to validate the packet flow as part of integration tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
return errors.Wrapf(err, "SetupPodENINetwork failed to setup veth pair.") | ||
} | ||
|
||
vlanTableId := vlanId + 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Table names are currently based on Device Index for attached interfaces. Since the maximum number of ENIs for an instance is currently 50, this should be safe.
// Prepare the Desired Rule for SNAT Rule for non-pod ENIs | ||
snatRule := []string{"!", "-o", "vlan+", | ||
"-m", "comment", "--comment", "AWS, SNAT", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is important, we can't SNAT traffic from pods with Branch ENIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, what does this affect? Does this mean we can't send traffic outside the immediate VPC from these pods? (that seems bad)
Aside: my quick search just turned up https://docs.aws.amazon.com/eks/latest/userguide/external-snat.html which says:
SNAT is necessary because the internet gateway only knows how to translate between the primary private and public or elastic IP address assigned to the primary elastic network interface of the Amazon EC2 instance node that pods are running on.
Huh, I wouldn't have guessed this limitation, and I couldn't find more info in the VPC docs after a brief search. Is this an IGW limitation, or some side effect of routes that we configure for secondary addresses within CNI? (as in: how will this affect vlan interfaces, if they are configured with the vpc subnet gateway?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ingress and Egress permission on the security group is applied to the traffic outside the instance, which means that we can't do any NAT within the instance for these pods. This will bypass the security group check.
If these pods uses private subnet which has NAT gateway associated then traffic will flow out to the internet. Otherwise it can talk only to resources within the VPC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that makes sense. Does that mean we could do the same (NAT at ngw) for secondary IPs, and remove the iptables snat?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can remove if worker nodes subnets have NAT attached, then we don't have to do this SNAT at the host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they have a NAT Gateway, they can set the wonderfully named AWS_VPC_K8S_CNI_EXTERNALSNAT=true
, and then aws-node
won't set up any SNAT rules (Through this tricky code...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, nice code.
Some minor comments, but nothing serious.
@@ -240,7 +240,7 @@ lint: | |||
|
|||
# Run go vet on source code. | |||
vet: | |||
go vet ./... | |||
go vet $(ALLPKGS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why shouldn't packet-verifier pass go vet
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was seeing the following errors in pcap.go, so going to follow up separately on this with upstream.
# github.com/google/gopacket/pcap
../../../../pkg/mod/github.com/google/[email protected]/pcap/pcap.go:30:22: undefined: pcapErrorNotActivated
../../../../pkg/mod/github.com/google/[email protected]/pcap/pcap.go:52:17: undefined: pcapTPtr
../../../../pkg/mod/github.com/google/[email protected]/pcap/pcap.go:64:10: undefined: pcapPkthdr
../../../../pkg/mod/github.com/google/[email protected]/pcap/pcap.go:102:7: undefined: pcapBpfProgram
../../../../pkg/mod/github.com/google/[email protected]/pcap/pcap.go:103:7: undefined: pcapPkthdr
../../../../pkg/mod/github.com/google/[email protected]/pcap/pcap.go:261:33: undefined: pcapErrorActivated
../../../../pkg/mod/github.com/google/[email protected]/pcap/pcap.go:262:33: undefined: pcapWarningPromisc
../../../../pkg/mod/github.com/google/[email protected]/pcap/pcap.go:263:33: undefined: pcapErrorNoSuchDevice
../../../../pkg/mod/github.com/google/[email protected]/pcap/pcap.go:264:33: undefined: pcapErrorDenied
../../../../pkg/mod/github.com/google/[email protected]/pcap/pcap.go:265:33: undefined: pcapErrorNotUp
|
||
// 1. clean up if vlan already exists (necessary when trunk ENI changes). | ||
if oldVlan, err := os.netLink.LinkByName(vlanLink.Name); err == nil { | ||
if err = os.netLink.LinkDel(oldVlan); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we ignore 'not found' error here? Or will we end up retrying anyway in that (rare!) case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If err != nil it will proceed further. Do you mean to say proceed only if err != nil and isNotFound
? I can add but thinking it might not be necessary.
RUN go build -o packet-verifier packet-verifier.go | ||
|
||
FROM amazonlinux:2 | ||
RUN yum install -y libpcap-devel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need all of libpcap-devel
here, or will just the runtime libraries in libpcap
do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially tried with just libpcap but that didn't help. Had to pull in couple of dependency from libpcap-devel as well.
Thank you :) |
// 2. delete two ip rules associated with the vlan | ||
vlanRule := os.netLink.NewRule() | ||
vlanRule.Table = vlanId + 100 | ||
vlanRule.Priority = vlanRulePriority | ||
|
||
for { | ||
if err := os.netLink.RuleDel(vlanRule); err != nil { | ||
if !containsNoSuchRule(err) { | ||
return errors.Wrapf(err, "TeardownPodENINetwork: failed to delete container rule for %d", vlanId) | ||
} | ||
break | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is slightly tricky and it would be nice to have a comment explaining how it works. Basically, we create a new route table for each vlan in SetupPodENINetwork()
. The table only has two rules, defined in buildRoutesForVlan()
. When we tear down, we want to delete both those rules. Calling netLink.RuleDel
with only the table name and priority will delete the first rule matching that, that's why we have the for-loop.
57f9585
to
130b4e2
Compare
* Changes include: * Create vlan for pod requesting unique security group. * Adding packet verifier binary to validate the packet flow as part of integration tests.
* Changes include: * Create vlan for pod requesting unique security group. * Adding packet verifier binary to validate the packet flow as part of integration tests.
Description of changes:
Issue:
aws/containers-roadmap#177, #208, aws/containers-roadmap#398
Changes included in this PR are as follows,
High level design diagram
Follow-up PRs
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.