Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

criu int test failures #2475

Closed
kolyshkin opened this issue Jun 16, 2020 · 5 comments · Fixed by #3546
Closed

criu int test failures #2475

kolyshkin opened this issue Jun 16, 2020 · 5 comments · Fixed by #3546

Comments

@kolyshkin
Copy link
Contributor

kolyshkin commented Jun 16, 2020

Saw some C/R test failures while working on #2411. Excerpt from logs at https://travis-ci.org/github/opencontainers/runc/jobs/699060158:

not ok 12 checkpoint --lazy-pages and restore
2897# (from function `__runc' in file tests/integration/helpers.bash, line 54,
2898#  in test file tests/integration/checkpoint.bats, line 183)
2899#   `__runc --criu "$CRIU" restore -d --work-path ./image-dir --image-path ./image-dir --lazy-pages test_busybox_restore <&60 >&51 2>&51' failed
2900# runc list (status=0):
2901# ID          PID         STATUS      BUNDLE      CREATED     OWNER
2902# runc spec (status=0):
2903# 
2904# runc state test_busybox (status=0):
2905# {
2906#   "ociVersion": "1.0.2-dev",
2907#   "id": "test_busybox",
2908#   "pid": 3967,
2909#   "status": "running",
2910#   "bundle": "/tmp/busyboxtest",
2911#   "rootfs": "/tmp/busyboxtest/rootfs",
2912#   "created": "2020-06-16T19:53:26.677415824Z",
2913#   "owner": ""
2914# }
2915# Warn  (criu/kerndat.c:869): Can't keep kdat cache on non-tempfs
2916# runc list (status=0):
2917# ID             PID         STATUS      BUNDLE             CREATED                          OWNER
2918# test_busybox   3967        running     /tmp/busyboxtest   2020-06-16T19:53:26.677415824Z   root
2919# runc kill test_busybox KILL (status=0):
2920# 
2921# runc delete test_busybox (status=0):
2922# 
2923not ok 13 checkpoint and restore in external network namespace
2924# (from function `teardown_busybox' in file tests/integration/helpers.bash, line 453,
2925#  from function `teardown' in test file tests/integration/checkpoint.bats, line 14)
2926#   `teardown_busybox' failed
2927# runc list (status=0):
2928# ID          PID         STATUS      BUNDLE      CREATED     OWNER
2929# runc spec (status=0):
2930# 
2931# runc run -d --console-socket /tmp/console.sock test_busybox (status=0):
2932# 
2933# runc state test_busybox (status=0):
2934# {
2935#   "ociVersion": "1.0.2-dev",
2936#   "id": "test_busybox",
2937#   "pid": 4207,
2938#   "status": "running",
2939#   "bundle": "/tmp/busyboxtest",
2940#   "rootfs": "/tmp/busyboxtest/rootfs",
2941#   "created": "2020-06-16T19:53:28.545165269Z",
2942#   "owner": ""
2943# }
2944# runc --criu /usr/local/sbin/criu checkpoint --work-path ./work-dir test_busybox (status=0):
2945# 
2946# runc state test_busybox (status=1):
2947# time="2020-06-16T19:53:28Z" level=error msg="container \"test_busybox\" does not exist"
2948# runc --criu /usr/local/sbin/criu restore -d --work-path ./work-dir --console-socket /tmp/console.sock test_busybox (status=0):
2949# 
2950# (00.057051) Unlock network
2951# (00.057055) Running network-unlock scripts
2952# (00.057056) 	RPC
2953# iptables-restore v1.8.2 (nf_tables): 
2954# line 5: CHAIN_USER_DEL failed (Device or resource busy): chain CRIU
2955# (00.071076) Error (criu/util.c:618): exited, status=4
2956# ip6tables-restore v1.8.2 (nf_tables): 
2957# line 5: CHAIN_USER_DEL failed (Device or resource busy): chain CRIU
2958# (00.083080) Error (criu/util.c:618): exited, status=4
2959# runc state test_busybox (status=0):
2960# {
2961#   "ociVersion": "1.0.2-dev",
2962#   "id": "test_busybox",
2963#   "pid": 4295,
2964#   "status": "running",
2965#   "bundle": "/tmp/busyboxtest",
2966#   "rootfs": "/tmp/busyboxtest/rootfs",
2967#   "created": "2020-06-16T19:53:28.844266921Z",
2968#   "owner": ""
2969# }
2970# old network namespace inode 4026532560
2971# new network namespace inode 4026532560
2972# runc --criu /usr/local/sbin/criu checkpoint --work-path ./work-dir test_busybox (status=0):
2973# 
2974# runc state test_busybox (status=1):
2975# time="2020-06-16T19:53:29Z" level=error msg="container \"test_busybox\" does not exist"
2976# runc --criu /usr/local/sbin/criu restore -d --work-path ./work-dir --console-socket /tmp/console.sock test_busybox (status=0):
2977# 
2978# (00.053987) Unlock network
2979# (00.053991) Running network-unlock scripts
2980# (00.053993) 	RPC
2981# iptables-restore v1.8.2 (nf_tables): 
2982# line 5: CHAIN_USER_DEL failed (Device or resource busy): chain CRIU
2983# (00.067055) Error (criu/util.c:618): exited, status=4
2984# ip6tables-restore v1.8.2 (nf_tables): 
2985# line 5: CHAIN_USER_DEL failed (Device or resource busy): chain CRIU
2986# (00.087037) Error (criu/util.c:618): exited, status=4
2987# runc state test_busybox (status=0):
2988# {
2989#   "ociVersion": "1.0.2-dev",
2990#   "id": "test_busybox",
2991#   "pid": 4414,
2992#   "status": "running",
2993#   "bundle": "/tmp/busyboxtest",
2994#   "rootfs": "/tmp/busyboxtest/rootfs",
2995#   "created": "2020-06-16T19:53:29.181115942Z",
2996#   "owner": ""
2997# }
2998# old network namespace inode 4026532560
2999# new network namespace inode 4026532560
3000# runc list (status=0):
3001# ID             PID         STATUS      BUNDLE             CREATED                          OWNER
3002# test_busybox   0           stopped     /tmp/busyboxtest   2020-06-16T19:53:29.181115942Z   root
3003# runc kill test_busybox KILL (status=1):
3004# time="2020-06-16T19:53:29Z" level=error msg="container not running"
3005# runc delete test_busybox (status=0):
3006# 
@kolyshkin
Copy link
Contributor Author

#2476 to show CRIU errors

@kolyshkin
Copy link
Contributor Author

Got another very similar failure while running CI for #2487 (unrelated!). Here's an excerpt from https://travis-ci.org/github/opencontainers/runc/jobs/716225208:

not ok 12 checkpoint --lazy-pages and restore
# (from function `__runc' in file tests/integration/helpers.bash, line 57,
#  in test file tests/integration/checkpoint.bats, line 194)
#   `__runc --criu "$CRIU" restore -d --work-path ./image-dir --image-path ./image-dir --lazy-pages test_busybox_restore <&${in_r} >&${out_w} 2>&${out_w}' failed
# runc list (status=0):
# ID          PID         STATUS      BUNDLE      CREATED     OWNER
# runc spec (status=0):
# 
# runc state test_busybox (status=0):
# {
#   "ociVersion": "1.0.2-dev",
#   "id": "test_busybox",
#   "pid": 4178,
#   "status": "running",
#   "bundle": "/tmp/busyboxtest",
#   "rootfs": "/tmp/busyboxtest/rootfs",
#   "created": "2020-08-09T02:02:58.166214958Z",
#   "owner": ""
# }
# Warn  (criu/kerndat.c:869): Can't keep kdat cache on non-tempfs
# runc list (status=0):
# ID             PID         STATUS      BUNDLE             CREATED                          OWNER
# test_busybox   4178        running     /tmp/busyboxtest   2020-08-09T02:02:58.166214958Z   root
# runc kill test_busybox KILL (status=0):
# 
# runc delete test_busybox (status=0):
# 
# /usr/local/libexec/bats-core/bats-exec-test: line 260: 18: Bad file descriptor
not ok 13 checkpoint and restore in external network namespace

Alas, the fix in #2476 is wrong :( here's a (supposedly) correct one: #2548

@kolyshkin
Copy link
Contributor Author

Another failure (from https://travis-ci.org/github/opencontainers/runc/jobs/725647287), another bug in the test code :(

not ok 12 checkpoint --lazy-pages and restore
# (in test file tests/integration/checkpoint.bats, line 202)
#   `[ $ret -eq 0 ]' failed
# runc list (status=0):
# ID          PID         STATUS      BUNDLE      CREATED     OWNER
# runc spec (status=0):
# 
# runc state test_busybox (status=0):
# {
#   "ociVersion": "1.0.2-dev",
#   "id": "test_busybox",
#   "pid": 4104,
#   "status": "running",
#   "bundle": "/tmp/busyboxtest",
#   "rootfs": "/tmp/busyboxtest/rootfs",
#   "created": "2020-09-09T16:31:16.67180128Z",
#   "owner": ""
# }
# grep: ./work-dir/restore.log: No such file or directory
# Warn  (criu/kerndat.c:869): Can't keep kdat cache on non-tempfs
# runc list (status=0):
# ID             PID         STATUS      BUNDLE             CREATED                         OWNER
# test_busybox   4104        running     /tmp/busyboxtest   2020-09-09T16:31:16.67180128Z   root
# runc kill test_busybox KILL (status=0):
# 
# runc delete test_busybox (status=0):
# 
/usr/local/libexec/bats-core/bats-exec-test: line 271:  4119 Killed                  __runc --criu "$CRIU" checkpoint --lazy-pages --page-server 0.0.0.0:${port} --status-fd ${lazy_w} --work-path ./work-dir --image-path ./image-dir test_busybox  (wd: /tmp/busyboxtest)
/usr/local/libexec/bats-core/bats-exec-test: line 271:  4181 Killed                  ${CRIU} lazy-pages --page-server --address 127.0.0.1 --port ${port} -D image-dir  (wd: /tmp/busyboxtest)

@kolyshkin
Copy link
Contributor Author

#2581 to (finally?) fix showing errors 😅

@kolyshkin
Copy link
Contributor Author

Filed as checkpoint-restore/criu#1338

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant