Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt manual removal of CNI IP allocations on refresh #5434

Merged

Conversation

mheon
Copy link
Member

@mheon mheon commented Mar 9, 2020

We previously attempted to work within CNI to do this, without success. So let's do it manually, instead. We know where the files should live, so we can remove them ourselves instead. This solves issues around sudden reboots where containers do not have time to fully tear themselves down, and leave IP address allocations which, for various reasons, are not stored in tmpfs and persist through reboot.

Fixes #5433

@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mheon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 9, 2020
func (c *Container) removeIPv4Allocations() error {
// Using a hardcoded path here is VERY gross, but the lack of a good
// interface into CNI means there's no non-hacky way to do this.
const cniNetworksDir = "/var/lib/cni/networks/"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the fence, should there be a function like getCNINetworksDir that returns this string when on a linux distro, and then unknown or something for windows/others? Or maybe define this in containers.conf?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this should be handled in container_internal_linux.go.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@mheon mheon force-pushed the force_delete_cni_netreg branch 3 times, most recently from fc6bc4c to 0be7934 Compare March 16, 2020 13:46
@mheon
Copy link
Member Author

mheon commented Mar 16, 2020

I really want to test this, but the whole system-restart thing kind of puts a kibosh on that. I don't think we can really mock the circumstances involved, either.

@TomSweeneyRedHat
Copy link
Member

LGTM assuming happy tests

@mheon
Copy link
Member Author

mheon commented Mar 16, 2020

Restarted flakes

@rhatdan
Copy link
Member

rhatdan commented Mar 16, 2020

@mheon Needs a rebase

@mheon
Copy link
Member Author

mheon commented Mar 16, 2020

I'm not seeing that?

@mheon mheon force-pushed the force_delete_cni_netreg branch from 0be7934 to bd98679 Compare March 19, 2020 21:06
@mheon
Copy link
Member Author

mheon commented Mar 19, 2020

Rebased to pick up CI fixes

@rh-atomic-bot
Copy link
Collaborator

☔ The latest upstream changes (presumably #5088) made this pull request unmergeable. Please resolve the merge conflicts.

We previously attempted to work within CNI to do this, without
success. So let's do it manually, instead. We know where the
files should live, so we can remove them ourselves instead. This
solves issues around sudden reboots where containers do not have
time to fully tear themselves down, and leave IP address
allocations which, for various reasons, are not stored in tmpfs
and persist through reboot.

Fixes containers#5433

Signed-off-by: Matthew Heon <[email protected]>
@mheon mheon force-pushed the force_delete_cni_netreg branch from bd98679 to b695475 Compare March 19, 2020 21:20
@rhatdan
Copy link
Member

rhatdan commented Mar 23, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 23, 2020
@openshift-merge-robot openshift-merge-robot merged commit e34ec61 into containers:master Mar 23, 2020
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 25, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dangling file in /var/lib/cni/networks/podman after hard restart
6 participants