-
Notifications
You must be signed in to change notification settings - Fork 742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid caching dynamic VPC CIDR info #1113
Conversation
Note to reviewer:
|
a9e2b5c
to
8cec203
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pkg/ec2metadata/client.go
Outdated
@@ -22,6 +24,7 @@ import ( | |||
// EC2Metadata wraps the methods from the amazon-sdk-go's ec2metadata package | |||
type EC2Metadata interface { | |||
GetMetadata(path string) (string, error) | |||
GetMetadataWithContext(ctx context.Context, path string) (string, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah... We already set the default config to have WithMaxRetries(10)
...
pkg/awsutils/awsutils.go
Outdated
return eni, 0, nil | ||
} | ||
// 0 is reserved for primary ENI, the rest of them has to +1 to avoid collision at 0 | ||
return eni, int(deviceNum + 1), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I removed the +1 thing here, since we're already assuming that 0 == primary (and the previous code did too).
I think (with high confidence) that that's ok. I think (with much less confidence) that's it's also ok across upgrade. Upgrade testing/failures should look here for the cause - especially upgrade tests involving multiple interfaces...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started to do something similar in #1071, but was not 100% sure that the primary ENI is always at index 0: https://github.com/aws/amazon-vpc-cni-k8s/pull/1071/files#diff-b55f09f925e40eba84053fdae2096d9cR648
pkg/awsutils/imds.go
Outdated
return strconv.Atoi(data) | ||
} | ||
|
||
// GetDeviceNumber returns the unique device number associated with an interface. The primary interface is 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the primary interface always 0? I have kind of assumed it is, but is that documented somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not explicitly afaics :( Primary ENI is assumed to be always "eth0" in all our scripts/docs/tutorials and the metadata docs for device-number say "eg eth3 is device-number 3" (*). Primary ENI is also the first/only interface attached at boot, so assuming indices go up then the first is also always 0.
(*) .. which is obviously nonsense in the general case, because you can't know how the OS is going to do device enumeration - but I believe this is the fantasy that the original Xen API was based around.
I can remove this comment if you want. The rest of our code already assumes 0==primary (eg here and here) and this hasn't caused us any problems - yet.
Mostly I wanted to be more consistent across our codebase instead of using a mix of .Primary
and .DeviceIndex==0
, and here seemed a reasonable place to declare this assumption. I can certainly soften it to say "vpc-cni assumes that primary interface is always 0"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is slightly different:
// TODO: Verify that the first IP is always the primary
primary := isFirst
This is checking with IP should not be used by pods. I've checked in the aws/amazon-ec2-net-util repo where they also use index 0 to be the "attchment IP". We should definitely see if we can get rid of this limitation though, and just reserve the "node IP".
The aws.Int64Value(ec2res.Attachment.DeviceIndex) == 0
check to add that log line was to help customers who don't have delete_on_termination
set for the primary ENI. #1023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Oh yes, you're right, one of my examples was primary IP not primary ENI. Anything that calls this getENIDeviceNumber()
function (ie: anything that calls GetAttachedENIs()
) is another example of code that always sees primaryENI as DeviceNumber==0.)
I'm happy to remove this, and instead make the code track primary ENI separately, and allow any DeviceNumber (including 0) for any ENI. What I don't want to do is have a function called EC2InstanceMetadataCache.getENIDeviceNumber()
that is hard-coded to return 0 for the primaryENI and a value that is subtly not the real IMDS device-number for everything else :P
Shall I put in a PR ahead of this one that tracks "primary-ness" explicitly and calculates our legacy "table number" without calling it device-number?
pkg/awsutils/awsutils.go
Outdated
return eni, 0, nil | ||
} | ||
// 0 is reserved for primary ENI, the rest of them has to +1 to avoid collision at 0 | ||
return eni, int(deviceNum + 1), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started to do something similar in #1071, but was not 100% sure that the primary ENI is always at index 0: https://github.com/aws/amazon-vpc-cni-k8s/pull/1071/files#diff-b55f09f925e40eba84053fdae2096d9cR648
40db7fb
to
0870553
Compare
Rebased. With the additional changes, I think this is too big now. I'm going to split out the first commit into a new PR.. Edit: -> #1225 Will rebase this PR on that PR if/when it's merged. |
Rather than cache and then update the cache - just don't cache :)
0870553
to
d4ee76c
Compare
Drastically reduced the scope of this PR, and reworded the PR description accordingly. You can ignore most of the previous discussion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One less go-routine... Nice!
Triggered another e2e test: https://github.com/aws/amazon-vpc-cni-k8s/runs/1152850334?check_suite_focus=true
func (cache *EC2InstanceMetadataCache) GetVPCIPv4CIDRs() ([]string, error) { | ||
ctx := context.TODO() | ||
|
||
ipnets, err := cache.imds.GetVPCIPv4CIDRBlocks(ctx, cache.primaryENImac) | ||
if err != nil { | ||
return nil, err | ||
} | ||
|
||
// TODO: keep as net.IPNet and remove this round-trip to/from string | ||
asStrs := make([]string, len(ipnets)) | ||
for i, ipnet := range ipnets { | ||
asStrs[i] = ipnet.String() | ||
} | ||
|
||
return asStrs, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main change. Just always fetch the CIDR blocks from the metadata, instead of having a loop to update a cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, as @mogren mentioned one less thread :)
We expect the list of VPC CIDRs to change. Rather than cache the VPC CIDRs and then asynchronously update that cache, just don't cache :) and fetch the latest VPC CIDRs on demand.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.