Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add network restore to support docker live restore container #1135

Closed
wants to merge 2 commits into from
Closed

Add network restore to support docker live restore container #1135

wants to merge 2 commits into from

Conversation

coolljt0725
Copy link
Contributor

Signed-off-by: Lei Jitang [email protected]

Fix #975
for now, it's work for bridge driver
TODO: add more driver support

@@ -233,6 +257,121 @@ func (c *controller) makeDriverConfig(ntype string) map[string]interface{} {
return config
}

func (c *controller) registerNetwork(n Network) error {
if err := c.addNetwork(n.(*network)); err != nil && !strings.Contains(err.Error(), "exists") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be needed. On daemon restart the networks are supposed to be present already.
Are you trying to populate some in-memory states with this ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aboch yes, this is not needed, will be removed later

@mavenugo
Copy link
Contributor

mavenugo commented May 2, 2016

@coolljt0725 I think we should design the restore functionality with an objective to avoid maintaining states in the db. But this PR is adding more states and that would cause more issues.

Shall we start exchanging notes on the design before getting into the code-review ?

@coolljt0725
Copy link
Contributor Author

@mavenugo sorry, I didn't show a clear design at the beginning. here is the design(hope I can describe my design clearly, thanks to my poor english :) )

  1. if the docker is an experimental build, the network controller will restore endpoints networks sandbox from the store rather remove them on controller initialization. and during restore networks and endpoints, reconstruct the state of the driver(add networks and endpoints to driver).
  2. on the daemon side, after container restoring, if there is old running containers, then call RestoreSandbox , this will reconstruct ExternalConnectivity if necessary, restart resolver if necessary and reattach the sandbox to the driver, and cleanup the sandbox and endpoints that are not using by old running containers.

@coolljt0725
Copy link
Contributor Author

@mavenugo most codes of this design are adding more states of sandbox to store, after deep thinking and investigating, I think adding a functionality to reconstruct the sandbox without adding more states to store is possible. I'll try this way.

@mavenugo
Copy link
Contributor

mavenugo commented May 6, 2016

@coolljt0725 thanks for the investigation. Yes, reconstructing states from running container is preferred.
btw, I synced up with @crosbymichael & @aboch on the possible design and we are all in sync with respect to reconstruction of states as much as possible rather than saving them.

Also, instead of a flag determining the need to do a global restore functionality, we think it would be appropriate to decide the restore or cleanup per-container. Either we could pass all the containers that needs to be restored (rather than cleaned up) or libnetwork can query the daemon for every sandbox (in sandboxCleanup & cleanupLocalEndpoints) before deciding to restore or cleanup. This will make the code consistent for ungraceful host restart, ungraceful daemon restart, containerd failure, etc...

For cases where there is no other option for restoring the states, we can discuss the best way to handle storing & restoring the states. It would be good to keep it to absolute minimum.

@coolljt0725
Copy link
Contributor Author

@mavenugo

Also, instead of a flag determining the need to do a global restore functionality, we think it would be appropriate to decide the restore or cleanup per-container.

Having this flag in this design just because docker only support live restore on experimental build, so this flag just to let libnetwork know the daemon is on experimental build and want container live restoring. This flag could be removed once the live-container-restore graduate from experimental

Either we could pass all the containers that needs to be restored (rather than cleaned up) or libnetwork can query the daemon for every sandbox (in sandboxCleanup & cleanupLocalEndpoints) before deciding to restore or cleanup.

This is exactly what this design is doing(https://github.com/docker/libnetwork/pull/1135/files#diff-e30be89bfd41a0c219178028b9971a32R286 https://github.com/docker/docker/pull/22248/files#diff-1a1f3e7ad9b1d7584e2d3e7d0c4c3db9R330), the daemon pass all the containers that need to be restored with RestoreSandbox(sbids map[string]interface{}) (map[string]interface{}, error) and will return the containers which has been restored successfully. and the unused sandbox and endpoints will be cleanup in RestoreSandbox and the daemon will kill the containers whose networking are failed to restore.

@aboch
Copy link
Contributor

aboch commented May 6, 2016

Thanks for helping @coolljt0725 !

In addition to what @mavenugo already said, @crosbymichael also told us the daemon would not have the notion that it started after a graceful or ungraceful shutdown, therefore it would not be possible to set a "restore" option in 'Config`.
Also we discussed with @mrjana and he suggested it would be better to have the network drivers store what is needed (the endpoint structures for bridge, for example), instead of replaying the create/join endpoint with "restore" option.

@aboch
Copy link
Contributor

aboch commented May 6, 2016

@coolljt0725 I think it would be better not to have a separate API for the sandbox restore, rather have that be done from inside libbnetwork.New(), so that when the controller is created, it has done the restore/cleanup and it is ready to be used.
I am thinking the list of active sandboxes could be passed to New() as config parameter or via a newly added parameter.
IOW I am suggesting your function to be an unexported method func (c *controller) restoreSandbox() which would be called from inside New().

@mrjana
Copy link
Contributor

mrjana commented May 6, 2016

I don't even know how sandbox can be restored from container namespace state. They both are very different data models.

@coolljt0725
Copy link
Contributor Author

@aboch as I have posted above, the restore flag is just to tell the libnetwork the daemon is on experimental build and want network restore since daemon support live container restore only on experimental build, we can remove this flag once live-container-restore graduate from experimental.

I am thinking the list of active sandboxes could be passed to New() as config parameter or via a newly added parameter.

This is a way, I have tried this at the beginning, the hard thing is on daemon side, the network should be initialized before container restoring because the container restoring will start some containers which have restart policy, so I decide to has a separate API. but it's possible to change the daemon initialization flow to have libnetwork.New restore the active sandbox.

@mrjana There are already several field of sandbox stored in store, the most important field osSbox of sandbox can restore from container namespace and endpoint, the filed containerConfig can reconstruct from container config passed from daemon, so it's possible to restore sandbox rather than adding more states to store.

@aboch

Also we discussed with @mrjana and he suggested it would be better to have the network drivers store what is needed (the endpoint structures for bridge, for example), instead of replaying the create/join endpoint with "restore" option.

This is another way, but I concern this would add more complication, in this way, we need to restore states on driver init, and also need a cleanup functionality to cleanup some unused object(endpoint for example). IMO, Using replaying CreateEndpoint/Join with restore option to reconstruct the driver is much more simple and can make sure it just restores what we really need.

@chenchun
Copy link
Contributor

chenchun commented May 6, 2016

the filed containerConfig can reconstruct from container config passed from daemon, so it's possible to restore sandbox rather than adding more states to store.

I'm not in favor of doing this. It's coupling libnetwork with docker.

@aboch
Copy link
Contributor

aboch commented May 6, 2016

@coolljt0725

IMO, Using replaying CreateEndpoint/Join with restore option to reconstruct the driver is much more simple and can make sure it just restores what we really need.

I am not sure it is going to be simpler. In case of restore, you will need checks everywhere not do any netlink, userland-proxy (in another package) and iptables programming. Also consider you cannot just replay the join with expose ports and hope the resulting host port mapping will turn out the same (again residing in another package, portmap), unless somehow you replay the joins in the same order they happened in the past (no way).

Also, please consider this has to be done for ipvlan, macvlan, and most importantly overlay driver.

The restore strategy has to be workable for any driver: Remote drivers may be containers. Which means they do not need a replay.

@coolljt0725
Copy link
Contributor Author

@aboch thank you for the detailed explanation. I'll consider this and also @mavenugo 's suggestion, if we can reconstruct it, reconstruct it, if not, save it.
and thank you @aboch @mavenugo @mrjana @chenchun make the design more clear. I'll update this design on this weekend and then we can talk about the design more detailedly

@crosbymichael
Copy link
Contributor

So looking through the libnetwork code and talking a couple of ppl we were thinking about changing how the container is initialized and how it cleans things up. @mrjana let me know if I'm saying anything wrong.

Here is there the controller cleans things up whenever it is started.

https://github.com/docker/libnetwork/blob/master/controller.go#L199

What I'm thinking is that these functions need to take some type of state so they know what sandboxes, networks, and endpoints are still in use so that they do not clean those up.

Instead of persisting this type of information to disk we can reconstruct what is in use from docker. Docker should create some type of state object for libnetwork that will let it know what is currently still in use. Atleast this is the general idea on the design for how to do restore. I'm still looking through the code and learning about it to see what else we need to do.

What do you think?

@coolljt0725
Copy link
Contributor Author

@crosbymichael thank you for your response. The current design is restoring all the networks endpoints sandboxes on controller initialization rather than cleanup them and the docker daemon send the sandboxes that need to be restored(we know the sandboxes in use, we know the endpoints in use; we know the endpoints in use, we know the networks in use) to libnetwork, and then libnetwork restore these sandboxes, after restoring , then clean up those sandboxes , endpoints , networks that are not in using.

@crosbymichael @mavenugo @aboch @mrjana @chenchun let me summarize this discussion about what we have to come to an agreement about the design( maybe more, correct me if I wrong:-)

1 when/how the network controller restore the network and cleanup
a. the current design is using a seperate API RestoreSandbox, the daemon call it after restoring containers and passing the containers that need to restore network, the RestoreSandbox will restore the in using sandbox, endpoints, networks and cleanup unused. and RestoreSandbox will return the container ids has been restored successfully, the daemon will kill the containers which failed to restore network based on the return.
b. as @aboch said above I am thinking the list of active sandboxes could be passed to New() as config parameter or via a newly added parameter. IOW I am suggesting your function to be an unexported method func (c *controller) restoreSandbox() which would be called from inside New()(#1135 (comment))
c. as @crosbymichael said above `Here is there the controller cleans things up whenever it is started.

https://github.com/docker/libnetwork/blob/master/controller.go#L199

What I'm thinking is that these functions need to take some type of state so they know what sandboxes, networks, and endpoints are still in use so that they do not clean those up.`

IMO, there are no big difference in these there approach, the current design are easily to change to approach b or c

2 how to restore sandbox
for now, sandbox state is not fully stored in the store, such as the containerConfig EpPriority RefCnt osSbox.
a. to restore these states in the store which is what this design is doing
b. to reconstruct these states based on some objects passed from daemon. this approach I had implement on local.
For approach a, it can restore all the sandboxes states and seems much easy. For approach b, just as @chenchun said, it will coupling libnetwork with docker and it can't restore all states(though these states is never used again once container has stared, but we can't make sure whether it will in use in future feature).

3 restore driver states
a. replaying the create/join endpoint with restore flag which is this design doing. In this way we can reconstruct driver states without storing. but @aboch has some concern about this(#1135 (comment)), for Also consider you cannot just replay the join with expose ports and hope the resulting host port mapping will turn out the same (again residing in another package, portmap), unless somehow you replay the joins in the same order they happened in the past (no way). I think it does joins in the same order they happened in the past(have checked, but maybe I'm wrong), for other driver ipvlan macvlan overlay, this design has implemented, it could work(need more tests and checks), and for remote and windows driver, we just return if we are doing restoring.
b. to restore the driver states.

@coolljt0725
Copy link
Contributor Author

@mavenugo @aboch @mrjana @crosbymichael I have updated the design.
In this design:

  1. the daemon pass sandboxes options which need to be restored in New on network controller creation, in sandboxCleanup will cleanup unused sandboxes and restore the needed sandboxes.
    and then call restoreSandbox, restoreSandbo will reconstruct sandbox with the sandbox option passed from daemon, after the sandbox are restored, restore the endpoints in this sandbox to driver.
  2. add a new api Restore to driver, the Restore in driver will re-construct the driver struct.
  3. at the end of New, it will return the sandboxes which has been restored successfully, so the daemon can kill those containers which network failed to restore.
  4. we only add joinInfo of endpoint to store.

for now, the bridge has implemented and others is working on them

what do you think?

@coolljt0725 coolljt0725 changed the title [WIP] Add network restore to support docker live restore container Add network restore to support docker live restore container May 17, 2016
@coolljt0725
Copy link
Contributor Author

@mavenugo @aboch @mrjana @crosbymichael
All the drivers (didn't test remote and windows) has implemented, this design is much more clear.
This design only add joinInfo to store, the sandbox is reconstruct from option pass from daemon.
And all driver has implement a Restore API to restore endpoint to driver.
Still testing on local heavily to see if all the functions are working as designed.
Please take a look. :)


if len(endpoint.extConnConfig.PortBindings) > 0 {
// TODO: daemon pass the ports to restore the portMapping may be a better way
endpoint.portMapping, err = n.allocatePorts(endpoint, n.config.DefaultBindingIP, d.config.EnableUserlandProxy)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this will cause a new port mapping being done for each exposed port, a new instance of docker-proxy be spawned when there is already one running ?

Also, if the user specified an explicit port mapping (-p X:x) for the container, won't this call fail when proxy tries to bind to an already used host port ?

Copy link
Contributor

@aboch aboch May 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the old mapped ports are lost. If container is stopped, nobody will release the ports mapped during the previous daemon life.

I am afraid we cannot avoid storing the driver's endpoint to store.

Also, given portmapper is not storing the ports to store, he could give out already reserved ports to new containers (I know portmapper does not give out port that he has just freed, he keeps moving to the next one, but we should not relay on that) while designing the daemon reload changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aboch thanks for your response

Won't this will cause a new port mapping being done for each exposed port, a new instance of docker-proxy be spawned when there is already one running ?

docker-proxy will die once the daemon process dies, because we send SIGTERM to it if the daemon process dies
see https://github.com/docker/docker/blob/master/vendor/src/github.com/docker/libnetwork/portmapper/proxy.go#L110 so we have to spawned a new one. and

Also, if the user specified an explicit port mapping (-p X:x) for the container, won't this call fail when proxy tries to bind to an already used host port ?

also don't happened.

Also, the old mapped ports are lost. If container is stopped, nobody will release the ports mapped during the previous daemon life.
I am afraid we cannot avoid storing the driver's endpoint to store.
Also, given portmapper is not storing the ports to store, he could give out already reserved ports to new containers (I know portmapper does not give out port that he has just freed, he keeps moving to the next one, but we should not relay on that) while designing the daemon reload changes.

The old mapped ports are not lost, the n.allocatePorts(endpoint, n.config.DefaultBindingIP, d.config.EnableUserlandProxy) will re-allocate the ports of the container and store these information in PortMapper , and to avoid duplication iptables rules, we add a check (https://github.com/docker/libnetwork/pull/1135/files/88e1c42a523211f8aba604be3373d98bac96b7c4#diff-b90cadcd0928c1e490272f4761a52bacR350) before insert. so n.allocatePorts here is to re-construct the portMapper.
The problem is that the order of the sandboxes to be restored is changed(for example, A container start first, and then B, but during restore, the B may first), so for the random ports, the number could be changed, may not be the same with previous.
But I still think we can avoid storing the driver's endpoint to store, we can pass the port mapping information form daemon( which we can see in docker ps), and use these information to re-construct portMapper, I working on this local, I'm sure it can work, I'll update this PR when I finished.

@aboch
Copy link
Contributor

aboch commented May 23, 2016

@coolljt0725

docker-proxy will die once the daemon process dies, because we send SIGTERM to it if the daemon process dies

Are you planning to keep this behavior ? I was thinking app networking should not be disrupted if the daemon goes down. If an app relies on the userland-proxy functionality,I thought we should not kill the proxy.

@coolljt0725
Copy link
Contributor Author

@aboch I tend to not keep this behavior, but the hard thing is to restore the userland-proxy on restoring, we have to restore it, so we can kill the userland-proxy once container exit. I'm trying if we can find a good way to restore the userland-proxy

@mavenugo
Copy link
Contributor

ping @chenchun . Could you please give your feedback ?

@chenchun
Copy link
Contributor

chenchun commented May 27, 2016

@coolljt0725 I think you have to preserve docker-proxy or containers running in a same bridge network will not be able to reach each other via mapped ports.
I can think of two ways to achieve this.
a). Make docker-proxy process be the child of containerd-shim process. But I don't know if there is a good way to do it.
b). Preserve pids of docker-proxy processes along with allocated ports in store.
I took the second way.

type portMappingState struct {
    dbIndex   uint64
    dbExists  bool
    Pid       int
    Host      string  //the key in store, hostip:port/tcp
    Container string
    Proto     string
}

But if users disabled userland proxy. There is no need to worry about this.

@chenchun
Copy link
Contributor

chenchun commented May 27, 2016

I also stored bridge endpoints in store instead of doing some replay work. I think it is the simple way. @mavenugo Are you worried about all the possible ungraceful shutdown issues as more states are going to be stored?

@chenchun
Copy link
Contributor

the daemon pass sandboxes options which need to be restored in New on network controller creation, in sandboxCleanup will cleanup unused sandboxes and restore the needed sandboxes.

I restored all of them because I think after involking new network controller, daemon supposed to remove sandboxes of stoped containers during shutdown.

@coolljt0725
Copy link
Contributor Author

@chenchun the docker-proxy is the most hard thing for avoiding store bridge endpoint

@coolljt0725
Copy link
Contributor Author

@chenchun @aboch I updated the userland-proxy restore. In the this implementation, we find the pid of user-land proxy and and use os.FindProcess to restore the proxy process so we can kill it once the container exit.

@aboch and for the port re-allocate, we use container.NetworkSettings.Ports to populate the endpoint.extConnConfig.PortBindings(the code changes is on daemon side), so we can re-construct the portMapper exactly the same with previous.

@coolljt0725
Copy link
Contributor Author

The daemon changes for network restore in this design is coolljt0725/docker@f880a92

@@ -57,6 +57,9 @@ type Driver interface {
// Leave method is invoked when a Sandbox detaches from an endpoint.
Leave(nid, eid string) error

// Restore reconstruct driver struct
Restore(nid, eid string, sboxKey string, ifInfo InterfaceInfo, options map[string]interface{}) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to not add this new method. And reuse CreateEndpoint(..., restore bool) and Join(..., restore bool)

@aboch
Copy link
Contributor

aboch commented Jun 10, 2016

@coolljt0725

I still think it is better to follow existing logic where the network driver takes care of restoring the resources it owns. Therefore restoring its endpoints as it does for the networks.

Got briefed more about the daemon restart use case, and it looks this is meant mainly for daemon upgrade and other corner cases which are thought to complete in a reasonable small time window.
Provided this, I would like to take back my comment about not killing the userland proxy on daemon shutdown. I think it is fine, also given the embedded DNS will of course be down during the reload.

Based on the above thinking, I pushed a PR for bridge driver to take care of the endpoint and port mapping restore.

I see this will simplify your libnetwork PR and also your docker side changes (no more need to re-build thesandbox options for the port bindings).

So at the moment the plan for daemon reload network support is to go with your changes up to libnetwork level + the drivers' changes to manage the endpoint restore.

@aboch
Copy link
Contributor

aboch commented Jun 11, 2016

Changes from this PR have been moved to #1244

@coolljt0725
Copy link
Contributor Author

@aboch Thank you, I'm going to close this and let's focus on #1244

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants