-
Notifications
You must be signed in to change notification settings - Fork 882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add network restore to support docker live restore container #1135
Conversation
@@ -233,6 +257,121 @@ func (c *controller) makeDriverConfig(ntype string) map[string]interface{} { | |||
return config | |||
} | |||
|
|||
func (c *controller) registerNetwork(n Network) error { | |||
if err := c.addNetwork(n.(*network)); err != nil && !strings.Contains(err.Error(), "exists") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be needed. On daemon restart the networks are supposed to be present already.
Are you trying to populate some in-memory states with this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aboch yes, this is not needed, will be removed later
@coolljt0725 I think we should design the restore functionality with an objective to avoid maintaining states in the db. But this PR is adding more states and that would cause more issues. Shall we start exchanging notes on the design before getting into the code-review ? |
@mavenugo sorry, I didn't show a clear design at the beginning. here is the design(hope I can describe my design clearly, thanks to my poor english :) )
|
@mavenugo most codes of this design are adding more states of |
@coolljt0725 thanks for the investigation. Yes, reconstructing states from running container is preferred. Also, instead of a flag determining the need to do a global restore functionality, we think it would be appropriate to decide the restore or cleanup per-container. Either we could pass all the containers that needs to be restored (rather than cleaned up) or libnetwork can query the daemon for every sandbox (in sandboxCleanup & cleanupLocalEndpoints) before deciding to restore or cleanup. This will make the code consistent for ungraceful host restart, ungraceful daemon restart, containerd failure, etc... For cases where there is no other option for restoring the states, we can discuss the best way to handle storing & restoring the states. It would be good to keep it to absolute minimum. |
Having this flag in this design just because docker only support live restore on experimental build, so this flag just to let libnetwork know the daemon is on experimental build and want container live restoring. This flag could be removed once the live-container-restore graduate from experimental
This is exactly what this design is doing(https://github.com/docker/libnetwork/pull/1135/files#diff-e30be89bfd41a0c219178028b9971a32R286 https://github.com/docker/docker/pull/22248/files#diff-1a1f3e7ad9b1d7584e2d3e7d0c4c3db9R330), the daemon pass all the containers that need to be restored with |
Thanks for helping @coolljt0725 ! In addition to what @mavenugo already said, @crosbymichael also told us the daemon would not have the notion that it started after a graceful or ungraceful shutdown, therefore it would not be possible to set a "restore" option in 'Config`. |
@coolljt0725 I think it would be better not to have a separate API for the sandbox restore, rather have that be done from inside |
I don't even know how sandbox can be restored from container namespace state. They both are very different data models. |
@aboch as I have posted above, the
This is a way, I have tried this at the beginning, the hard thing is on daemon side, the network should be initialized before container restoring because the @mrjana There are already several field of sandbox stored in store, the most important field
This is another way, but I concern this would add more complication, in this way, we need to restore states on driver init, and also need a cleanup functionality to cleanup some unused object(endpoint for example). IMO, Using replaying |
I'm not in favor of doing this. It's coupling libnetwork with docker. |
I am not sure it is going to be simpler. In case of restore, you will need checks everywhere not do any netlink, userland-proxy (in another package) and iptables programming. Also consider you cannot just replay the join with expose ports and hope the resulting host port mapping will turn out the same (again residing in another package, portmap), unless somehow you replay the joins in the same order they happened in the past (no way). Also, please consider this has to be done for ipvlan, macvlan, and most importantly overlay driver. The restore strategy has to be workable for any driver: Remote drivers may be containers. Which means they do not need a replay. |
@aboch thank you for the detailed explanation. I'll consider this and also @mavenugo 's suggestion, if we can reconstruct it, reconstruct it, if not, save it. |
So looking through the libnetwork code and talking a couple of ppl we were thinking about changing how the container is initialized and how it cleans things up. @mrjana let me know if I'm saying anything wrong. Here is there the controller cleans things up whenever it is started. https://github.com/docker/libnetwork/blob/master/controller.go#L199 What I'm thinking is that these functions need to take some type of state so they know what sandboxes, networks, and endpoints are still in use so that they do not clean those up. Instead of persisting this type of information to disk we can reconstruct what is in use from docker. Docker should create some type of state object for libnetwork that will let it know what is currently still in use. Atleast this is the general idea on the design for how to do restore. I'm still looking through the code and learning about it to see what else we need to do. What do you think? |
@crosbymichael thank you for your response. The current design is restoring all the networks endpoints sandboxes on controller initialization rather than cleanup them and the docker daemon send the sandboxes that need to be restored(we know the sandboxes in use, we know the endpoints in use; we know the endpoints in use, we know the networks in use) to libnetwork, and then libnetwork restore these sandboxes, after restoring , then clean up those sandboxes , endpoints , networks that are not in using. @crosbymichael @mavenugo @aboch @mrjana @chenchun let me summarize this discussion about what we have to come to an agreement about the design( maybe more, correct me if I wrong:-) 1 when/how the network controller restore the network and cleanup https://github.com/docker/libnetwork/blob/master/controller.go#L199 What I'm thinking is that these functions need to take some type of state so they know what sandboxes, networks, and endpoints are still in use so that they do not clean those up.` IMO, there are no big difference in these there approach, the current design are easily to change to approach 2 how to restore sandbox 3 restore driver states |
@mavenugo @aboch @mrjana @crosbymichael I have updated the design.
for now, the bridge has implemented and others is working on them what do you think? |
@mavenugo @aboch @mrjana @crosbymichael |
|
||
if len(endpoint.extConnConfig.PortBindings) > 0 { | ||
// TODO: daemon pass the ports to restore the portMapping may be a better way | ||
endpoint.portMapping, err = n.allocatePorts(endpoint, n.config.DefaultBindingIP, d.config.EnableUserlandProxy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this will cause a new port mapping being done for each exposed port, a new instance of docker-proxy be spawned when there is already one running ?
Also, if the user specified an explicit port mapping (-p X:x) for the container, won't this call fail when proxy tries to bind to an already used host port ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the old mapped ports are lost. If container is stopped, nobody will release the ports mapped during the previous daemon life.
I am afraid we cannot avoid storing the driver's endpoint to store.
Also, given portmapper is not storing the ports to store, he could give out already reserved ports to new containers (I know portmapper does not give out port that he has just freed, he keeps moving to the next one, but we should not relay on that) while designing the daemon reload changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aboch thanks for your response
Won't this will cause a new port mapping being done for each exposed port, a new instance of docker-proxy be spawned when there is already one running ?
docker-proxy will die once the daemon process dies, because we send SIGTERM to it if the daemon process dies
see https://github.com/docker/docker/blob/master/vendor/src/github.com/docker/libnetwork/portmapper/proxy.go#L110 so we have to spawned a new one. and
Also, if the user specified an explicit port mapping (-p X:x) for the container, won't this call fail when proxy tries to bind to an already used host port ?
also don't happened.
Also, the old mapped ports are lost. If container is stopped, nobody will release the ports mapped during the previous daemon life.
I am afraid we cannot avoid storing the driver's endpoint to store.
Also, given portmapper is not storing the ports to store, he could give out already reserved ports to new containers (I know portmapper does not give out port that he has just freed, he keeps moving to the next one, but we should not relay on that) while designing the daemon reload changes.
The old mapped ports are not lost, the n.allocatePorts(endpoint, n.config.DefaultBindingIP, d.config.EnableUserlandProxy)
will re-allocate the ports of the container and store these information in PortMapper
, and to avoid duplication iptables rules, we add a check (https://github.com/docker/libnetwork/pull/1135/files/88e1c42a523211f8aba604be3373d98bac96b7c4#diff-b90cadcd0928c1e490272f4761a52bacR350) before insert. so n.allocatePorts
here is to re-construct the portMapper
.
The problem is that the order of the sandboxes to be restored is changed(for example, A container start first, and then B, but during restore, the B may first), so for the random ports, the number could be changed, may not be the same with previous.
But I still think we can avoid storing the driver's endpoint to store, we can pass the port mapping information form daemon( which we can see in docker ps), and use these information to re-construct portMapper
, I working on this local, I'm sure it can work, I'll update this PR when I finished.
Are you planning to keep this behavior ? I was thinking app networking should not be disrupted if the daemon goes down. If an app relies on the userland-proxy functionality,I thought we should not kill the proxy. |
@aboch I tend to not keep this behavior, but the hard thing is to restore the userland-proxy on restoring, we have to restore it, so we can kill the userland-proxy once container exit. I'm trying if we can find a good way to restore the userland-proxy |
ping @chenchun . Could you please give your feedback ? |
@coolljt0725 I think you have to preserve
But if users disabled userland proxy. There is no need to worry about this. |
I also stored bridge endpoints in store instead of doing some replay work. I think it is the simple way. @mavenugo Are you worried about all the possible ungraceful shutdown issues as more states are going to be stored? |
I restored all of them because I think after involking |
@chenchun the |
@chenchun @aboch I updated the userland-proxy restore. In the this implementation, we find the pid of user-land proxy and and use @aboch and for the port re-allocate, we use |
The daemon changes for network restore in this design is coolljt0725/docker@f880a92 |
Signed-off-by: Lei Jitang <[email protected]>
Signed-off-by: Lei Jitang <[email protected]>
@@ -57,6 +57,9 @@ type Driver interface { | |||
// Leave method is invoked when a Sandbox detaches from an endpoint. | |||
Leave(nid, eid string) error | |||
|
|||
// Restore reconstruct driver struct | |||
Restore(nid, eid string, sboxKey string, ifInfo InterfaceInfo, options map[string]interface{}) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be possible to not add this new method. And reuse CreateEndpoint(..., restore bool) and Join(..., restore bool)
I still think it is better to follow existing logic where the network driver takes care of restoring the resources it owns. Therefore restoring its endpoints as it does for the networks. Got briefed more about the daemon restart use case, and it looks this is meant mainly for daemon upgrade and other corner cases which are thought to complete in a reasonable small time window. Based on the above thinking, I pushed a PR for bridge driver to take care of the endpoint and port mapping restore. I see this will simplify your libnetwork PR and also your docker side changes (no more need to re-build thesandbox options for the port bindings). So at the moment the plan for daemon reload network support is to go with your changes up to libnetwork level + the drivers' changes to manage the endpoint restore. |
Changes from this PR have been moved to #1244 |
Signed-off-by: Lei Jitang [email protected]
Fix #975
for now, it's work for bridge driver
TODO: add more driver support