Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix: network not found #1473

Merged

Conversation

shaloulcy
Copy link
Contributor

@shaloulcy shaloulcy commented Jun 5, 2018

the network manager must be initialized after container manager has restored all containers, as the network manager need to get the real ActiveSandboxes which are returned by container manager?

Signed-off-by: Eric Li [email protected]

Ⅰ. Describe what this PR did

the network manager must be initialized after container manager has restored all containers, as the network manager need to get the real ActiveSandboxes which are returned by container manager?

what is real ActiveSandboxes?

For example, a container is running. Then we reboot the OS, the container status in metadata is running. But actually it is stopped. We must restore all containers, then we can get the real container status

Ⅱ. Does this pull request fix one issue?

fixed #1439

Ⅲ. Describe how you did it

Ⅳ. Describe how to verify it

Ⅴ. Special notes for reviews

@pouchrobot pouchrobot added areas/network kind/bug This is bug report for project size/S labels Jun 5, 2018
@codecov-io
Copy link

codecov-io commented Jun 5, 2018

Codecov Report

Merging #1473 into master will decrease coverage by 0.03%.
The diff coverage is 33.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1473      +/-   ##
==========================================
- Coverage   41.58%   41.55%   -0.04%     
==========================================
  Files         267      267              
  Lines       17339    17343       +4     
==========================================
- Hits         7211     7207       -4     
- Misses       9241     9247       +6     
- Partials      887      889       +2
Impacted Files Coverage Δ
daemon/mgr/container.go 50.66% <0%> (-0.16%) ⬇️
daemon/daemon.go 56.25% <0%> (ø) ⬆️
daemon/mgr/network.go 67.99% <100%> (-0.34%) ⬇️
ctrd/image.go 76.57% <0%> (-2.86%) ⬇️
ctrd/container.go 50% <0%> (+0.7%) ⬆️

@@ -49,9 +49,6 @@ import (
type ContainerMgr interface {
// 1. the following functions are related to regular container management

// Restore containers from meta store to memory and recover those container.
Restore(ctx context.Context) error
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we should remove this part. Restore action is one of the essential functionality of ContainerManager. I think maybe your removing this is because the adding restore action in NewContainerManager . While I will add my reason in the following comments.

@@ -182,7 +179,7 @@ func NewContainerManager(ctx context.Context, store *meta.Store, cli ctrd.APICli

go mgr.execProcessGC()

return mgr, nil
return mgr, mgr.Restore(ctx)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the intialization of container manager, I suggest we should not add restore action here. It will break the pure functionality of function NewContainerManager .
As for adding this feature, I am wondering if we could add this function in

	containerMgr, err := internal.GenContainerMgr(ctx, d)
	if err != nil {
		return err
	}
	d.containerMgr = containerMgr

As a result, the code will be like

	containerMgr, err := internal.GenContainerMgr(ctx, d)
	if err != nil {
		return err
	}
	d.containerMgr = containerMgr
    if err := containerMgr.Restore(); err != nil {
        return err
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine!

@shaloulcy shaloulcy force-pushed the network_not_found branch 2 times, most recently from 50a63b2 to 9bd27c5 Compare June 8, 2018 05:30
daemon/daemon.go Outdated
}

// ensure all containers have been restored
time.Sleep(100 * time.Millisecond)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time.Sleep is a tricky method to wait network init to be finished, how about use a channel to wait ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the Sleep is to prevent the situation described #1421. I want to catch the dead container exit status from statusCh. But the healthy container will never return exit status unless it exited.

networkMgr, err := internal.GenNetworkMgr(d.config, d)
if err != nil {
return err
}
d.networkMgr = networkMgr
containerMgr.(*mgr.ContainerManager).NetworkMgr = networkMgr

// Notes(ziren): we must call containerMgr.Restore after NetworkMgr initialized,
// otherwize will panic
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If change this, won't panic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the NetworkManange is nil, markStoppedAndRelease will return directly. So it won't panic

the network manager must be initialized after container manager has restored all containers,
as the network manager need get the real ActiveSandboxes which are returned by container manager

Signed-off-by: Eric Li <[email protected]>
@shaloulcy
Copy link
Contributor Author

@rudyfly @HusterWan I have update the codes, PTAL, thanks~

@rudyfly
Copy link
Collaborator

rudyfly commented Jun 21, 2018

LGTM

@pouchrobot pouchrobot added the LGTM one maintainer or community participant agrees to merge the pull reuqest. label Jun 21, 2018
@allencloud allencloud merged commit 8b1f8f3 into AliyunContainerService:master Jun 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
areas/network kind/bug This is bug report for project LGTM one maintainer or community participant agrees to merge the pull reuqest. size/S
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[bug] network not found
6 participants