-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent handling of container failure to start #4294
Comments
putting into 1.3 and marking high pri, per @pdaigle |
@stuclem A container that times out while starting will result in an error message "context deadline exceeded". When that occurs the containerVM is not powered off but is left in state "Starting" and may not have a configured network interface. A secondary consequence is that |
Thanks @hickeng. I added your writeup to the RNs. |
This is for handling error path, lowering to p2. Root cause has been fixed. |
Removed from 1.4 release and OKR - Customer Environments. |
I think the fix #8445 supports solution 1. We will not unbind container if we found error later. |
PR #8445 has already been merged. Closing it. |
I don't think that this affects the core user doc, but it does need to be marked as a resolved issue in the 1.5.2 release notes. |
This is identified during investigation of #4289
The containerVM in question is not setting the
session.started
flag and as such the property collector times out. The reason why it's not being set is unimportant in this issue.The problem is how we handle that failure - we Unbind the network config, but we do not power down the container. This means that if the container process didn't start successfully we have a useless containerVM consuming resource. If it was a network hicup/VC queuing/other control plane issue, then we've just disconnected a functioning containerVM from the network it required.
Solutions:
In either case, the Unbind can be triggered by the VM power off instead of explciitly.
Notes:
This impacts cVMs that start after the current 3min timeout has expired. The timeout for cVM start was added to address failure scenarios and because
docker run -it
inherits the awkward blocking behaviour of the standard docker client when attach is used (interception of Ctrl-C, et al) meaning it cannot be easily escaped. The correct solution to this is:This likely means changing the power state operations to be async and then waiting on events (either the expected status change or an error). I've upated the estimate to encompass a possible shift to async power operations but doesn't not include raising a PR for the signal forwarding behaviour of docker client.
The text was updated successfully, but these errors were encountered: