-
Notifications
You must be signed in to change notification settings - Fork 742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Force ENI/IP reconciliation to delete from the datastore #733
Conversation
* seems about right * force removal tests
Oops, this was a bad merge from my 1.5.5 branch. I'll fix that now. |
@tatatodd this is showing merge conflicts. Mind having a whack at those? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few femto-nits inline. But, big picture, I'm concerned about the comment that @mogren had on an earlier revision about the ec2 instance metadata service often containing stale information. If that is the case, do we really want to use the IMDS as the source of truth instead of the internal ipamd datastore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look @jaypipes
I've fixed the merge and addressed your comments.
WRT the higher-level question of whether the local metadata should be the source-of-truth, my 2 cents is that at the moment it's already treated as the source of truth. AFAICT that seems to be the point of the reconciliation process. And while it's unfortunate that the local metadata can be so stale, if it reports that an ENI is detached, it doesn't seem useful for ipamd
to ignore the local metadata and to continue to assign IPs from that ENI.
Another way to think of it is that it's weird to treat the local metadata as the source of truth for ENI attachment, but to ignore it for ENI detachment, which is what HEAD currently does.
That said, I don't know what other architectural changes you folks might have in mind, so I'll leave this up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making those small changes, @tatatodd, appreciated.
I think your explanation/justification is on point, so I'm cool with pushing this through. I do wonder if we should attempt to reassign the active pod's IP addresses from the detached ENI to an attached ENI (or alternately bring up a new ENI automatically on the instance and assign the pods an IP from that newly-attached ENI). But, this patch doesn't need to address that; we can discuss in a future patch whether that's something we should tackle.
@mogren if you're cool with this, feel free to send it on through. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this makes sense, thanks!
@tatatodd I felt bad because I merged something ahead of you that cause merge conflicts with your PR, so I checked out your commits, rebased and fixed the conflicts and have pushed a PR here: #754. Closing this PR out (you still get credit for all your excellent contributions, just in the #754 instead of here). |
Fixes #732
The ENI/IP reconciliation logic fails to delete from the datastore, if any IPs are already assigned to pods. This is wrong; the AWS local metadata is the source of truth for what ENIs/IPs are actually attached to the EC2 instance. By failing to delete from the datastore, ipamd will assign IPs from ENIs that aren't actually attached to the EC2 instance.
This PR fixes this by forcing the reconciliation logic to delete from the datastore. I've also added prometheus counters to track how often this force-deletion is occurring, to aid debugging.
The unittests pass, and I've been running an image built from this on my clusters, and have seen that it fixes the problem.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.