-
Notifications
You must be signed in to change notification settings - Fork 949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bugfix: network not found #1473
bugfix: network not found #1473
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1473 +/- ##
==========================================
- Coverage 41.58% 41.55% -0.04%
==========================================
Files 267 267
Lines 17339 17343 +4
==========================================
- Hits 7211 7207 -4
- Misses 9241 9247 +6
- Partials 887 889 +2
|
daemon/mgr/container.go
Outdated
@@ -49,9 +49,6 @@ import ( | |||
type ContainerMgr interface { | |||
// 1. the following functions are related to regular container management | |||
|
|||
// Restore containers from meta store to memory and recover those container. | |||
Restore(ctx context.Context) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think we should remove this part. Restore action is one of the essential functionality of ContainerManager. I think maybe your removing this is because the adding restore action in NewContainerManager
. While I will add my reason in the following comments.
daemon/mgr/container.go
Outdated
@@ -182,7 +179,7 @@ func NewContainerManager(ctx context.Context, store *meta.Store, cli ctrd.APICli | |||
|
|||
go mgr.execProcessGC() | |||
|
|||
return mgr, nil | |||
return mgr, mgr.Restore(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the intialization of container manager, I suggest we should not add restore action here. It will break the pure functionality of function NewContainerManager
.
As for adding this feature, I am wondering if we could add this function in
containerMgr, err := internal.GenContainerMgr(ctx, d)
if err != nil {
return err
}
d.containerMgr = containerMgr
As a result, the code will be like
containerMgr, err := internal.GenContainerMgr(ctx, d)
if err != nil {
return err
}
d.containerMgr = containerMgr
if err := containerMgr.Restore(); err != nil {
return err
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fine!
50a63b2
to
9bd27c5
Compare
daemon/daemon.go
Outdated
} | ||
|
||
// ensure all containers have been restored | ||
time.Sleep(100 * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time.Sleep
is a tricky method to wait network init to be finished, how about use a channel to wait ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, the Sleep is to prevent the situation described #1421. I want to catch the dead container exit status from statusCh. But the healthy container will never return exit status unless it exited.
networkMgr, err := internal.GenNetworkMgr(d.config, d) | ||
if err != nil { | ||
return err | ||
} | ||
d.networkMgr = networkMgr | ||
containerMgr.(*mgr.ContainerManager).NetworkMgr = networkMgr | ||
|
||
// Notes(ziren): we must call containerMgr.Restore after NetworkMgr initialized, | ||
// otherwize will panic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If change this, won't panic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the NetworkManange is nil, markStoppedAndRelease will return directly. So it won't panic
the network manager must be initialized after container manager has restored all containers, as the network manager need get the real ActiveSandboxes which are returned by container manager Signed-off-by: Eric Li <[email protected]>
9bd27c5
to
8eae5bb
Compare
@rudyfly @HusterWan I have update the codes, PTAL, thanks~ |
LGTM |
the network manager must be initialized after container manager has restored all containers, as the network manager need to get the real ActiveSandboxes which are returned by container manager?
Signed-off-by: Eric Li [email protected]
Ⅰ. Describe what this PR did
the network manager must be initialized after container manager has restored all containers, as the network manager need to get the real ActiveSandboxes which are returned by container manager?
what is real ActiveSandboxes?
For example, a container is running. Then we reboot the OS, the container status in metadata is running. But actually it is stopped. We must restore all containers, then we can get the real container status
Ⅱ. Does this pull request fix one issue?
fixed #1439
Ⅲ. Describe how you did it
Ⅳ. Describe how to verify it
Ⅴ. Special notes for reviews