-
Notifications
You must be signed in to change notification settings - Fork 742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should AWS-SNAT-CHAIN-0 be vpcCIDRs or should it be subnetCIDRs? With.... #550
Comments
After speaking directly with support, i was informed that there's no real way to make the AWS VPC CNI environment "feel" like an on prem or vanilla kube solution wherein the only access to the Pod IP network was via a Kubelet... hopefully through the Services/Endpoints framework. Sure, it could be simulated with security groups on the ENIConfigs, but anything in the same VPC is going to think that it can route straight to a pod and unless we can usurp the AWS-SNAT-CHAIN-0 rule which prevents natting anything that is in a vpcCIDR, making such a security group would be problematic? I guess if this were a feature request, I'd ask that there be a flag to force all pod outbound traffic to SNAT behind the kubelet IP unless it was headed for another Pod IP on the same cluster. Thoughts? |
@mogren Actually I kinda wanted the opposite... Forced SNATing unless the Pod was talking to another Pod on the same cluster. Basically, like it would be with an on-premise solution not leveraging AMZN VPC, but without building a vxlan fabric or scope-limited bgp peering for the pod network. It does not make sense to me that some random ec2 instance within my VPC can speak directly to a pod when that pod initiates the connection. It seems like I can use security groups on the ENIConfig allocated subnets to simulate non-reachability save through the Kube Proxy/IPTables/IPVS, but the outbound is literally short circuited with the IPTables rule that is at the head of the SNAT Chains since that will not match packets with a destination of the VPC CIDR and therefore it would not SNAT as Pods sent traffic through the docker bridge to the real network. |
That said, we're actually stepping away from the CUSTOM side of this cni plugin due to the fact that they simply decided to allow us a larger set of private subnets which obviates the need to use the CGNat 100.64/10 space in the first place... But not before i wrote a big old thing bifurcating the eksctl CF instantiation of the Control plane and the Nodes and inserting the DaemonSet env's and then doing a big old loop in a... bash script... bitshifting to match subnets (( >> 24 )) ..etc..etc.. on the actual 32 bit addresses to find the "next available" CIDR block of size 22, spin up subnets, and then tag them correctly owned and named, associate them with the VPC, and so on (and the delete_cluster functionality to reverse it). I think they liked giving me more 1918 better than giving some script rights to totally mess up the VPC in the event of a bug :P And lets be honest, the pod initiating outbound to other services on the same VPC without SNATing at the Kubelet is probably loads more efficient. It just violates that principle that the Pod IPs are literally not real... They are now real within the VPC routing domain. :) R.I.P. for i in $(seq 0 3); do
dottedquad[${i}]=$(( ${ipdecwork} / (( 2 ** (( 8 * (( 3 - ${i} )) )) )) )) || error "Unable to convert the $(( ${i} + 1 ))th octet: ${ipdecwork}"
ipdecwork=$(( ${ipdecwork} - (( ${dottedquad[${i}]} * (( 2 ** (( 8 * (( 3 - ${i} )) )) )) )) )) || \
error "Unable to carry remainder to next octet"
done
echo "${dottedquad[0]}.${dottedquad[1]}.${dottedquad[2]}.${dottedquad[3]}" and
|
Howdy,
Looking at this whole thing... using CGNat space and custom subnets... Shouldn't AWS-SNAT-CHAIN-0 be more specific? Basically, it's not SNATing anything that is destined to the entire VPC... This means that Pods talking to anything else in the entire VPC will NOT be SNATed and will have their communications and return packets not traverse the Kubelet at all. It also means that the IP addresses behind the Kubelet (i.e. the CGnat space) has to be routable. This seems to be to be a violation of the very idea of the Pod network being a virtual network that does not exist in reality.
I've read the code and I do not see anything that looks like I could easily fix this other than perhaps disabling SNAT and then manually adding my own rules that do the SNAT at the kubelet for any traffic that is originating from the POD network unless the destination is also the POD network? Or am I clinging to non-EKS models too much?
It just feels wrong to have Pod <-> out of cluster communications NOT traversing the Kubelet rather routing directly around through the VPC's virtual routers?
We wind up with this rule being first and there seems to be no way to usurp it? X.X.0.0/16 is the CIDR of the VPC. Also notable is the misspelled "CHAN" vs. "CHAIN". Reading the source, this should not be a thing... Perhaps Im using an older version?
More details:
The text was updated successfully, but these errors were encountered: