Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race in tty integration test with slow container startup #2186

Merged
merged 1 commit into from
Dec 26, 2019

Conversation

liggitt
Copy link
Contributor

@liggitt liggitt commented Dec 18, 2019

This fixes a race condition in the tty runc run [terminal=false] integration test (found while testing #2185).

# Make sure that the handling of non-detached IO is done properly. See #1354.
(
__runc run test_busybox
) &
wait_for_container 15 1 test_busybox
testcontainer test_busybox running

Currently, the test assumes that as soon as runc state returns information about the container, it is running and testcontainer test_busybox running should succeed.

When the container start is slow (because of load or synthetic delay), the runc state command can return a created state, allowing wait_for_container to succeed, but letting the testcontainer test_busybox running command fail.

This race can be demonstrated by synthetically slowing down the transition from created to running:

diff --git a/libcontainer/container_linux.go b/libcontainer/container_linux.go
index 61195572..9afe71b3 100644
--- a/libcontainer/container_linux.go
+++ b/libcontainer/container_linux.go
@@ -302,7 +302,7 @@ func awaitProcessExit(pid int, exit <-chan struct{}) <-chan struct{} {
                        select {
                        case <-exit:
                                return
-                       case <-time.After(time.Millisecond * 100):
+                       case <-time.After(time.Millisecond * 200):
                                stat, err := system.Stat(pid)
                                if err != nil || stat.State == system.Zombie {
                                        close(isDead)
@@ -318,6 +318,7 @@ func awaitFifoOpen(path string) <-chan openResult {
        fifoOpened := make(chan openResult)
        go func() {
                f, err := os.OpenFile(path, os.O_RDONLY, 0)
+               time.Sleep(100*time.Millisecond)
                if err != nil {
                        fifoOpened <- openResult{err: newSystemErrorWithCause(err, "open exec fifo for reading")}
                        return

Without this change:

$ make integration TESTPATH=/tty.bats
...
not ok 9 runc run [terminal=false]
# (from function `testcontainer' in file tests/integration/helpers.bash, line 258,
#  in test file tests/integration/tty.bats, line 209)
#   `testcontainer test_busybox running' failed
# runc list (status=0):
# ID          PID         STATUS      BUNDLE      CREATED     OWNER
# runc spec (status=0):
# 
# runc state test_busybox (status=0):
# {
#   "ociVersion": "1.0.1-dev",
#   "id": "test_busybox",
#   "pid": 5317,
#   "status": "created",
#   "bundle": "/tmp/busyboxtest",
#   "rootfs": "/tmp/busyboxtest/rootfs",
#   "created": "2019-12-18T16:57:42.087082814Z",
#   "owner": ""
# }
# runc state test_busybox (status=0):
# {
#   "ociVersion": "1.0.1-dev",
#   "id": "test_busybox",
#   "pid": 5317,
#   "status": "created",
#   "bundle": "/tmp/busyboxtest",
#   "rootfs": "/tmp/busyboxtest/rootfs",
#   "created": "2019-12-18T16:57:42.087082814Z",
#   "owner": ""
# }
# runc list (status=0):
# ID             PID         STATUS      BUNDLE             CREATED                          OWNER
# test_busybox   5317        created     /tmp/busyboxtest   2019-12-18T16:57:42.087082814Z   root
# runc kill test_busybox KILL (status=0):
# 
# Command "eval __runc state 'test_busybox' | grep -q 'stopped'" failed 10 times. Output: time="2019-12-18T16:57:51Z" level=error msg="container \"test_busybox\" does not exist"
# container "test_busybox" does not exist
# runc delete test_busybox (status=1):
# time="2019-12-18T16:57:52Z" level=error msg="container \"test_busybox\" does not exist"
# container "test_busybox" does not exist
...
Makefile:79: recipe for target 'localintegration' failed
make: *** [localintegration] Error 1
make: *** [Makefile:76: integration] Error 2

With this change:

$ make integration TESTPATH=/tty.bats
...
ok 1 runc run [tty ptsname]
ok 2 runc run [tty owner]
ok 3 runc run [tty owner] ({u,g}id != 0)
ok 4 runc exec [tty ptsname]
ok 5 runc exec [tty owner]
ok 6 runc exec [tty owner] ({u,g}id != 0)
ok 7 runc exec [tty consolesize]
ok 8 runc create [terminal=false]
ok 9 runc run [terminal=false]
ok 10 runc run -d [terminal=false]

This PR allows providing a desired status to wait_for_container and additionally waiting until runc state includes that status.

cc @mrunalp @Random-Liu

@liggitt
Copy link
Contributor Author

liggitt commented Dec 18, 2019

looks like the travis job that is failing is also failing against master with the same error messages - https://travis-ci.org/opencontainers/runc/builds/626104372?utm_medium=notification&utm_source=github_status

@AkihiroSuda
Copy link
Member

#2175 tracks CI failure

@crosbymichael crosbymichael merged commit 52951a7 into opencontainers:master Dec 26, 2019
@liggitt liggitt deleted the tty-test-race branch December 27, 2019 04:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants