Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_pid_file failure #3734

Open
jrbourbeau opened this issue Apr 21, 2020 · 0 comments
Open

test_pid_file failure #3734

jrbourbeau opened this issue Apr 21, 2020 · 0 comments
Labels
flaky test Intermittent failures on CI.

Comments

@jrbourbeau
Copy link
Member

I've started seeing distributed/cli/tests/test_dask_scheduler.py::test_pid_file sporadically fail in CI builds (xref https://travis-ci.org/github/dask/distributed/jobs/677794876#L1468-L1549)

Traceback:
________________________________ test_pid_file _________________________________

loop = <tornado.platform.asyncio.AsyncIOLoop object at 0x7f07c59270b8>

    def test_pid_file(loop):

        def check_pidfile(proc, pidfile):

            start = time()

            while not os.path.exists(pidfile):

                sleep(0.01)

                assert time() < start + 5

    

            text = False

            start = time()

            while not text:

                sleep(0.01)

                assert time() < start + 5

                with open(pidfile) as f:

                    text = f.read()

            pid = int(text)

            if sys.platform.startswith("win"):

                # On Windows, `dask-XXX` invokes the dask-XXX.exe

                # shim, but the PID is written out by the child Python process

                assert pid

            else:

                assert proc.pid == pid

    

        with tmpfile() as s:

            with popen(["dask-scheduler", "--pid-file", s, "--no-dashboard"]) as sched:

>               check_pidfile(sched, s)

distributed/cli/tests/test_dask_scheduler.py:207: 

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

../../../miniconda/envs/dask-distributed/lib/python3.6/contextlib.py:88: in __exit__

    next(self.gen)

distributed/utils_test.py:1041: in popen

    terminate_process(proc)

distributed/utils_test.py:1005: in terminate_process

    proc.wait(10)

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <subprocess.Popen object at 0x7f07c592cda0>, timeout = 10

endtime = 190.384243671

    def wait(self, timeout=None, endtime=None):

        """Wait for child process to terminate.  Returns returncode

        attribute."""

        if self.returncode is not None:

            return self.returncode

    

        if endtime is not None:

            warnings.warn(

                "'endtime' argument is deprecated; use 'timeout3m'.",

                DeprecationWarning,

                stacklevel=2)

        if endtime is not None or timeout is not None:

            if endtime is None:

                endtime = _time() + timeout

            elif timeout is None:

                timeout = self._remaining_time(endtime)

    

        if endtime is not None:

            # Enter a busy loop if we have a timeout.  This busy loop was

            # cribbed from Lib/threading.py in Thread.wait() at r71065.

            delay = 0.0005 # 500 us -> initial delay of 1 ms

            while True:

                if self._waitpid_lock.acquire(False):

                    try:

                        if self.returncode is not None:

                            break  # Another thread waited.

                        (pid, sts) = self._try_wait(os.WNOHANG)

                        assert pid == self.pid or pid == 0

                        if pid == self.pid:

                            self._handle_exitstatus(sts)

                            break

                    finally:

                        self._waitpid_lock.release()

                remaining = self._remaining_time(endtime)

                if remaining <= 0:

>                   raise TimeoutExpired(self.args, timeout)

E                   subprocess.TimeoutExpired: Command '['/home/travis/miniconda/envs/dask-distributed/bin/dask-scheduler', '--pid-file', '/tmp/tmp3afymwkx.', '--no-dashboard']' timed out after 10 seconds

../../../miniconda/envs/dask-distributed/lib/python3.6/subprocess.py:1469: TimeoutExpired
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky test Intermittent failures on CI.
Projects
None yet
Development

No branches or pull requests

1 participant