Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix device number and update table name the device index #1071
Fix device number and update table name the device index #1071
Changes from all commits
4396c3b
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return 0 instead? (logger will help us to debug if something is not right?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, better to always return
0
here, together with an error log.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SaranBalaji90 Before this PR,
eth1
getstable 2
because of this code, since it has device number 1.eth2
getstable 3
and so on.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure? Because on a c5.18xlarge eth1 got 15 but rest of them got similar number as device number - for eg eth2 got route table number 2 and so on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was the device index of
eth1
14 at that time? The device index does not always match the interface name, especially when deleting and attaching a lot of ENIs. I have also seeneth4
gettable 3
since it had the device number 2.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have the node now but I think looking at the code it should be 14. I wonder how is this number populated? Because of pod ENI feature we don't want this number to go beyond 100.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we're assuming that currently, but do we enforce that in the code anywhere? It sounds like this warrants a nice loud/obvious error message - if we ever accidentally trigger it.
Fwiw, I think the relevant code currently always looks for the lowest unoccupied DeviceNumber, so it will continue to reuse the same low numbers (bounded indirectly by max-eni) (good). In addition, I note the relevant bit of code imposes an artifical max of 128, for no particular reason (the code uses a fixed size array, whereas it could have dynamically resized the array, or used a sparse hash). We should probably lower
maxENIs
to 100, with a comment explaining why 100 is actually a real limit we have.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point Gus. For some reason I missed that we were setting the device index. I agree we should change it to 100 and raise error if there is no available index within 100.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, set 100 as a hard limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has to be deviceNumber +1 to maintain backward compatibility otherwise how would this work on existing clusters? existing clusters might already have route for deviceNumber.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right that we can't change the table name now and stay backwards compatible on nodes that are already running. We are stuck using the name we currently have.
I have a different suggestion on how to solve it though. Let's use
tableNumber := deviceNumber + 1
where we set up the routes, and stop having a variable calleddeviceNumber
that doesn't actually store the device number. I'll update the PR.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 If we're stuck with the +1 thing, we should absolutely stop calling it deviceNumber
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change should have failed some unit test right? For eg instead of returning x now it will be returning x-1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SaranBalaji90 Yes, this test had to be updated:
https://github.com/aws/amazon-vpc-cni-k8s/pull/1071/files/6d847db2bcd471c1b180fd8d7e81712ec6dbd942#diff-52f7ba503946a19ecad23c70607d5112R825
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This didn't fail any test is it? Because now the value that is passed to this method and table number are different, I would expect mock to fail?