Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] upgrade/restart container failed on AliOS4.9 #1470

Closed
Letty5411 opened this issue Jun 5, 2018 · 3 comments
Closed

[bug] upgrade/restart container failed on AliOS4.9 #1470

Letty5411 opened this issue Jun 5, 2018 · 3 comments
Assignees
Labels
kind/bug This is bug report for project

Comments

@Letty5411
Copy link
Contributor

Ⅰ. Issue Description

Due to PR:#1307 upgrade/restart related integration tests fail.

Ⅱ. Describe what happened

The error message is similar, so I'll take restart cmd as example.
Run the following script:

#sh -x 1.sh
+ set -x
+ pouch run -d --name test1 busybox
7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf
+ pouch restart -t 1 test1
Error: {"message":"failed to create task for container(7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf): OCI runtime create failed: container_linux.go:265: starting container process caused \"process_linux.go:348: container init caused \\\"process_linux.go:331: running prestart hook 0 caused \\\\\\\"error running hook: exit status 1, stdout: , stderr: time=\\\\\\\\\\\\\\\"2018-06-05T19:01:49+08:00\\\\\\\\\\\\\\\" level=fatal msg=\\\\\\\\\\\\\\\"no sandbox present for 7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf\\\\\\\\\\\\\\\" \\\\\\\\n\\\\\\\"\\\"\": unknown"}

The error will happen when this two CMDs run quickly one after the other in script, it can't be reproduced manually.

Log of pouchd:

DEBU[2018-06-05T19:01:49.019584085+08:00] Calling GET /v1.24/images/busybox/json, client @
DEBU[2018-06-05T19:01:49.01975094+08:00] Get a grpc client                             elapsed=6.867µs
INFO[2018-06-05T19:01:49.027294262+08:00] Calling POST /v1.24/containers/create?name=test1, client @
INFO[2018-06-05T19:01:49.027527851+08:00] create container with args: {"Cmd":null,"Entrypoint":[],"Env":null,"Image":"busybox","OnBuild":null,"Shell":null,"HostConfig":{"NetworkMode":"bridge","OomScoreAdj":-500,"RestartPolicy":{"Name":"no"},"BlkioDeviceReadBps":null,"BlkioDeviceReadIOps":null,"BlkioDeviceWriteBps":null,"BlkioDeviceWriteIOps":null,"BlkioWeightDevice":null,"DeviceCgroupRules":null,"Devices":[],"MemoryExtra":0,"MemorySwappiness":-1,"MemoryWmarkRatio":0,"OomKillDisable":false,"Ulimits":null},"NetworkingConfig":{}}
DEBU[2018-06-05T19:01:49.027614569+08:00] Get a grpc client                             elapsed=1.172µs
DEBU[0271] prepare snapshot                              key=7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf module="containerd/snapshot" parent="sha256:432b65032b9466b4dadcc5c7b11701e71d21c18400aae946b101ad16be62333a"
DEBU[0271] event published                               module="containerd/snapshot" ns=default topic="/snapshot/prepare" type=containerd.events.SnapshotPrepare
DEBU[2018-06-05T19:01:49.031876446+08:00] Get a grpc client                             elapsed=915ns
DEBU[0271] get snapshot mounts                           key=7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf module="containerd/snapshot"
DEBU[2018-06-05T19:01:49.032194936+08:00] Get a grpc client                             elapsed=888ns
DEBU[2018-06-05T19:01:49.035978988+08:00] bind volumes: []
DEBU[2018-06-05T19:01:49.035999159+08:00] Get a grpc client                             elapsed=1.012µs
DEBU[0271] stat snapshot                                 key=7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf module="containerd/snapshot"
DEBU[2018-06-05T19:01:49.036257307+08:00] Get a grpc client                             elapsed=897ns
INFO[2018-06-05T19:01:49.038856883+08:00] Calling POST /v1.24/containers/test1/start, client @
DEBU[2018-06-05T19:01:49.038947031+08:00] create endpoint for container [7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf] on network [bridge]
DEBU[2018-06-05T19:01:49.03902409+08:00] Assigning addresses for endpoint 7ba712d6's interface on network bridge
DEBU[2018-06-05T19:01:49.039043578+08:00] RequestAddress(LocalDefault/172.17.0.0/24, <nil>, map[])
DEBU[2018-06-05T19:01:49.041043437+08:00] Assigning addresses for endpoint 7ba712d6's interface on network bridge
DEBU[2018-06-05T19:01:49.042951347+08:00] Programming external connectivity on endpoint 7ba712d6 (6ddb9b11e080d087e5d64a126e6137e436ebd3a128ca4b25cf03b62a03af71e6)
DEBU[2018-06-05T19:01:49.043399984+08:00] Get a grpc client                             elapsed=1.19µs
INFO[2018-06-05T19:01:49.043446565+08:00] start to subscribe io, backend: jsonfile, id: 7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf
INFO[2018-06-05T19:01:49.043474315+08:00] start to subscribe io, backend: jsonfile, id: 7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf
INFO[2018-06-05T19:01:49.043643304+08:00] success to get image registry.hub.docker.com/library/busybox:latest, container id 7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf
DEBU[2018-06-05T19:01:49.043666661+08:00] Get a grpc client                             elapsed=1.068µs
DEBU[0271] stat snapshot                                 key=7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf module="containerd/snapshot"
DEBU[0271] get snapshot mounts                           key=7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf module="containerd/snapshot"
DEBU[0271] event published                               module="containerd/containers" ns=default topic="/containers/create" type=containerd.events.ContainerCreate
INFO[2018-06-05T19:01:49.045780117+08:00] success to new container: 7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf
DEBU[0271] get snapshot mounts                           key=7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf module="containerd/snapshot"
INFO[0271] shim containerd-shim started                  address="/containerd-shim/default/7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf/shim.sock" debug=false module="containerd/tasks" pid=11965
INFO[0000] Firewalld running: false
DEBU[2018-06-05T19:01:49.257610762+08:00] sandbox set key processing took 117.550858ms for container 7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf
DEBU[0271] event published                               module="containerd/tasks" ns=default topic="/tasks/create" type=containerd.events.TaskCreate
INFO[2018-06-05T19:01:49.325172503+08:00] success to create task(pid=11981) in container(7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf)
DEBU[0271] event published                               module="containerd/tasks" ns=default topic="/tasks/start" type=containerd.events.TaskStart
INFO[2018-06-05T19:01:49.338689619+08:00] success to start task in container(7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf)
INFO[2018-06-05T19:01:49.338713881+08:00] success to add container, id: 7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf
DEBU[2018-06-05T19:01:49.33954004+08:00] Calling GET /v1.24/containers/test1/json, client @
INFO[2018-06-05T19:01:49.379813872+08:00] the task has quit, id: 7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf, err: <nil>, exitcode: 0, time: 2018-06-05 11:01:49.379486687 +0000 UTC
DEBU[0271] event published                               module="containerd/events" ns=default topic="/tasks/exit" type=containerd.events.TaskExit
INFO[2018-06-05T19:01:49.408530839+08:00] Calling POST /v1.24/containers/test1/restart?t=1, client @
DEBU[2018-06-05T19:01:49.408603639+08:00] Get a grpc client                             elapsed=2.314µs
INFO[0271] shim reaped                                   id=7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf module="containerd/tasks"
DEBU[0271] event published                               module="containerd/containers" ns=default topic="/containers/delete" type=containerd.events.ContainerDelete
INFO[2018-06-05T19:01:49.421445522+08:00] success to destroy container: 7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf
INFO[2018-06-05T19:01:49.421950564+08:00] close containerio backend: jsonfile, id: 7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf
INFO[2018-06-05T19:01:49.421956289+08:00] finished to subscribe io, backend: jsonfile, id: 7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf
INFO[2018-06-05T19:01:49.421966454+08:00] close containerio backend: jsonfile, id: 7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf
DEBU[2018-06-05T19:01:49.421993707+08:00] remove endpoint(6ddb9b11e080d087e5d64a126e6137e436ebd3a128ca4b25cf03b62a03af71e6) on network(bridge)
DEBU[2018-06-05T19:01:49.422058146+08:00] Revoking external connectivity on endpoint 7ba712d6 (6ddb9b11e080d087e5d64a126e6137e436ebd3a128ca4b25cf03b62a03af71e6)
INFO[2018-06-05T19:01:49.421964651+08:00] finished to subscribe io, backend: jsonfile, id: 7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf
DEBU[0271] event published                               module="containerd/tasks" ns=default topic="/tasks/delete" type=containerd.events.TaskDelete
ERRO[2018-06-05T19:01:49.431106789+08:00] failed to delete container, container id: 7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf: container "7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf" in namespace "default": not found
DEBU[2018-06-05T19:01:49.484179559+08:00] Releasing addresses for endpoint 7ba712d6's interface on network bridge
DEBU[2018-06-05T19:01:49.484201879+08:00] ReleaseAddress(LocalDefault/172.17.0.0/24, 172.17.0.2)
DEBU[2018-06-05T19:01:49.500422311+08:00] start container 7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf when restarting
DEBU[2018-06-05T19:01:49.500504067+08:00] create endpoint for container [7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf] on network [bridge]
DEBU[2018-06-05T19:01:49.500572154+08:00] Assigning addresses for endpoint 7ba712d6's interface on network bridge
DEBU[2018-06-05T19:01:49.500590902+08:00] RequestAddress(LocalDefault/172.17.0.0/24, <nil>, map[])
DEBU[2018-06-05T19:01:49.502208544+08:00] Assigning addresses for endpoint 7ba712d6's interface on network bridge
DEBU[2018-06-05T19:01:49.50392161+08:00] Programming external connectivity on endpoint 7ba712d6 (2fa3c9cfae3b1f592353836d0b231cb02de785394b7099abc25dc55ce650ebe2)
INFO[2018-06-05T19:01:49.504365396+08:00] start to subscribe io, backend: jsonfile, id: 7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf
INFO[2018-06-05T19:01:49.504398369+08:00] close containerio backend: jsonfile, id: 7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf
INFO[2018-06-05T19:01:49.504416361+08:00] close containerio backend: jsonfile, id: 7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf
INFO[2018-06-05T19:01:49.504416251+08:00] start to subscribe io, backend: jsonfile, id: 7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf
DEBU[2018-06-05T19:01:49.504419613+08:00] Get a grpc client                             elapsed=1.864µs
INFO[2018-06-05T19:01:49.504436509+08:00] finished to subscribe io, backend: jsonfile, id: 7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf
INFO[2018-06-05T19:01:49.50443665+08:00] finished to subscribe io, backend: jsonfile, id: 7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf
DEBU[2018-06-05T19:01:49.504428036+08:00] remove endpoint(2fa3c9cfae3b1f592353836d0b231cb02de785394b7099abc25dc55ce650ebe2) on network(bridge)
DEBU[2018-06-05T19:01:49.504546111+08:00] Revoking external connectivity on endpoint 7ba712d6 (2fa3c9cfae3b1f592353836d0b231cb02de785394b7099abc25dc55ce650ebe2)
INFO[2018-06-05T19:01:49.504768919+08:00] success to get image registry.hub.docker.com/library/busybox:latest, container id 7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf
DEBU[2018-06-05T19:01:49.504796217+08:00] Get a grpc client                             elapsed=1.403µs
DEBU[0271] stat snapshot                                 key=7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf module="containerd/snapshot"
DEBU[0271] get snapshot mounts                           key=7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf module="containerd/snapshot"
DEBU[0271] event published                               module="containerd/containers" ns=default topic="/containers/create" type=containerd.events.ContainerCreate
INFO[2018-06-05T19:01:49.507122935+08:00] success to new container: 7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf
DEBU[0271] get snapshot mounts                           key=7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf module="containerd/snapshot"
INFO[0271] shim containerd-shim started                  address="/containerd-shim/default/7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf/shim.sock" debug=false module="containerd/tasks" pid=12128
DEBU[2018-06-05T19:01:49.548333004+08:00] Releasing addresses for endpoint 7ba712d6's interface on network bridge
DEBU[2018-06-05T19:01:49.548372478+08:00] ReleaseAddress(LocalDefault/172.17.0.0/24, 172.17.0.2)
DEBU[2018-06-05T19:01:49.549740869+08:00] receive event: 7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf exit
INFO[2018-06-05T19:01:49.549763892+08:00] handle event: 7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf exit
INFO[0271] shim reaped                                   id=7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf module="containerd/tasks"
DEBU[0271] event published                               module="containerd/containers" ns=default topic="/containers/delete" type=containerd.events.ContainerDelete
ERRO[2018-06-05T19:01:49.666298334+08:00] failed to create new containerd container: failed to create task for container(7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf): OCI runtime create failed: container_linux.go:265: starting container process caused "process_linux.go:348: container init caused \"process_linux.go:331: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: time=\\\\\\\"2018-06-05T19:01:49+08:00\\\\\\\" level=fatal msg=\\\\\\\"no sandbox present for 7ba712d6cde9bd55871ccefa5e8ba9d2cca68052db6b933ef588482d7bb0a1cf\\\\\\\" \\\\n\\\"\"": unknown
DEBU[2018-06-05T19:01:49.666739141+08:00] remove endpoint() on network(bridge)

Ⅲ. Describe what you expected to happen

Ⅳ. How to reproduce it (as minimally and precisely as possible)

Ⅴ. Anything else we need to know?

Ⅵ. Environment:

  • pouch version (use pouch version):
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
@pouchrobot pouchrobot added the kind/bug This is bug report for project label Jun 5, 2018
@HusterWan HusterWan added this to the v0.5.1-milestone milestone Jun 7, 2018
@HusterWan
Copy link
Contributor

@Letty5411 Hi, i have analysed the log you posted, the reason why your test case failed is the container exited immediately after run it.

create a containerd container have two part:

  • create a containerd container
  • run task

we have a hook to delete network endpoint and containerd container when the running container exit. so it will fail when use a deleted network sanbox to create a containerd task.

cc @allencloud

@allencloud
Copy link
Collaborator

Today @HusterWan and I discussed about it offline, and we have drawn a picture of it:
dingtalk_im_2782500773

And we found that in the test case of restart, here we have a scenario that when a have ran a container, we have no idea about when the container exits. If the container exits immediately, then container would trigger the hook in pouchd, and pouchd would execute the function markStoppendAndRelease. However, if the execution time is durationA, if in this duration, pouchd accepts another request to stop or start the container, some unexpected situation would happen.

markStoppendAndRelease would excute after exit of container and release the container IO and release the container endpoint or related network items. And we must guarantee the atomic of markStoppendAndRelease.

@allencloud
Copy link
Collaborator

I think this has been fixed in #1540

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is bug report for project
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants