Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows/Python 2.7 tests of dask-distributed failing on master/v0.10.0 #1738

Closed
shoyer opened this issue Nov 23, 2017 · 12 comments · Fixed by #2261
Closed

Windows/Python 2.7 tests of dask-distributed failing on master/v0.10.0 #1738

shoyer opened this issue Nov 23, 2017 · 12 comments · Fixed by #2261

Comments

@shoyer
Copy link
Member

shoyer commented Nov 23, 2017

Python 2.7 builds on Windows are failing:
https://ci.appveyor.com/project/shoyer/xray/build/1.0.3018

The tests that are failing are all variations of test_dask_distributed_integration_test. Example error message:

=================================== ERRORS ====================================
_____ ERROR at teardown of test_dask_distributed_integration_test[scipy] ______
    @pytest.fixture
    def loop():
        with pristine_loop() as loop:
            # Monkey-patch IOLoop.start to wait for loop stop
            orig_start = loop.start
            is_stopped = threading.Event()
            is_stopped.set()
            def start():
                is_stopped.clear()
                try:
                    orig_start()
                finally:
                    is_stopped.set()
            loop.start = start
    
            yield loop
            # Stop the loop in case it's still running
            try:
                loop.add_callback(loop.stop)
            except RuntimeError as e:
                if not re.match("IOLoop is clos(ed|ing)", str(e)):
                    raise
            else:
>               is_stopped.wait()
C:\Python27-conda64\envs\test_env\lib\site-packages\distributed\utils_test.py:102: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
C:\Python27-conda64\envs\test_env\lib\contextlib.py:24: in __exit__
    self.gen.next()
C:\Python27-conda64\envs\test_env\lib\site-packages\distributed\utils_test.py:139: in pristine_loop
    loop.close(all_fds=True)
C:\Python27-conda64\envs\test_env\lib\site-packages\tornado\ioloop.py:716: in close
    self.remove_handler(self._waker.fileno())
C:\Python27-conda64\envs\test_env\lib\site-packages\tornado\platform\common.py:91: in fileno
    return self.reader.fileno()
C:\Python27-conda64\envs\test_env\lib\socket.py:228: in meth
    return getattr(self._sock,name)(*args)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
args = (<socket._closedsocket object at 0x00000000131F27F0>, 'fileno')
    def _dummy(*args):
>       raise error(EBADF, 'Bad file descriptor')
E       error: [Errno 9] Bad file descriptor
C:\Python27-conda64\envs\test_env\lib\socket.py:174: error
---------------------------- Captured stderr call -----------------------------
distributed.scheduler - INFO -   Scheduler at:      tcp://127.0.0.1:1094
distributed.worker - INFO -       Start worker at:       tcp://127.0.0.1:1096
distributed.worker - INFO -       Start worker at:       tcp://127.0.0.1:1095
distributed.worker - INFO -          Listening to:       tcp://127.0.0.1:1096
distributed.worker - INFO -          Listening to:       tcp://127.0.0.1:1095
distributed.worker - INFO - Waiting to connect to:       tcp://127.0.0.1:1094
distributed.worker - INFO - Waiting to connect to:       tcp://127.0.0.1:1094
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -               Threads:                          1
distributed.worker - INFO -               Threads:                          1
distributed.worker - INFO -                Memory:                    2.00 GB
distributed.worker - INFO -                Memory:                    2.00 GB
distributed.worker - INFO -       Local Directory: C:\projects\xray\_test_worker-4043f797-3668-459a-9d5b-017dbc092ad5\worker-ozlw8t
distributed.worker - INFO -       Local Directory: C:\projects\xray\_test_worker-0b2d640d-07ba-493f-967c-f8d8de38e3b5\worker-_xbrz6
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO - -------------------------------------------------
distributed.scheduler - INFO - Register tcp://127.0.0.1:1096
distributed.worker - INFO -         Registered to:       tcp://127.0.0.1:1094
distributed.worker - INFO - -------------------------------------------------
distributed.scheduler - INFO - Register tcp://127.0.0.1:1095
distributed.worker - INFO -         Registered to:       tcp://127.0.0.1:1094
distributed.worker - INFO - -------------------------------------------------
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:1095
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:1096
distributed.scheduler - INFO - Receive client connection: Client-06708a40-ce25-11e7-898c-00155d57f2dd
distributed.scheduler - INFO - Connection to client Client-06708a40-ce25-11e7-898c-00155d57f2dd broken
distributed.scheduler - INFO - Remove client Client-06708a40-ce25-11e7-898c-00155d57f2dd
distributed.scheduler - INFO - Close client connection: Client-06708a40-ce25-11e7-898c-00155d57f2dd
distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:1095
distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:1096
distributed.scheduler - INFO - Remove worker tcp://127.0.0.1:1095
distributed.scheduler - INFO - Remove worker tcp://127.0.0.1:1096
distributed.scheduler - INFO - Lost all workers
distributed.worker - INFO - Close compute stream
distributed.worker - INFO - Close compute stream
distributed.scheduler - INFO - Scheduler closing...
distributed.scheduler - INFO - Scheduler closing all comms

@mrocklin any guesses about what this could be?

@mrocklin
Copy link
Contributor

At first glance, no. Does this happen both on latest release and on git-master?

@shoyer
Copy link
Member Author

shoyer commented Nov 23, 2017 via email

@mrocklin
Copy link
Contributor

If I may generalize the question, did this start happening recently? Do we know what triggered this change?

@shoyer
Copy link
Member Author

shoyer commented Nov 23, 2017 via email

@spencerahill
Copy link
Contributor

FWIW aospy is having similar failures starting roughly at the same time: spencerahill/aospy#238

@mrocklin
Copy link
Contributor

mrocklin commented Nov 23, 2017 via email

@shoyer
Copy link
Member Author

shoyer commented Nov 23, 2017

Comparing a failed build to the last successful build on master, I find the following difference in the dependencies:

# passed, just before the v0.10.0 release
ca-certificates           2017.7.27.1                   0    conda-forge
dask                      0.15.4                     py_0    conda-forge
dask-core                 0.15.4                     py_0    conda-forge
distributed               1.19.3                   py27_0    conda-forge
setuptools                36.6.0                   py27_1    conda-forge

# failed, at the v0.10.0 release
ca-certificates           2017.11.5                     0    conda-forge
dask                      0.16.0                     py_0    conda-forge
dask-core                 0.16.0                     py_0    conda-forge
distributed               1.20.0                   py27_0    conda-forge
setuptools                36.7.2                   py27_0    conda-forge

It looks like dask 0.16 or distributed 0.1.20 release is the most likely culprit. There were no Python changes to xarray in the v0.10.0 release commit (only setup.py and documentation changes).

@mrocklin
Copy link
Contributor

mrocklin commented Nov 23, 2017 via email

@pitrou
Copy link

pitrou commented Nov 23, 2017

@shoyer is this using an old-ish Python 2.7 version?

@shoyer
Copy link
Member Author

shoyer commented Nov 23, 2017

This is Python 2.7.14.

We do have a build against git master for dask, but only on Linux (Appveyor only gives us one free simultaneous build).

@pitrou
Copy link

pitrou commented Nov 23, 2017

The error probably means the loop was already closed. I don't know why that is.

@shoyer
Copy link
Member Author

shoyer commented Nov 29, 2017

One useful clue: when I run these tests on OS X, I get the following warnings:

========================================= warnings summary ==========================================
xarray/tests/test_distributed.py::test_dask_distributed_integration_test[scipy]
  /Users/shoyer/conda/envs/xarray-py36/lib/python3.6/site-packages/distributed/utils_test.py:528: UserWarning: This test leaked 8 file descriptors
    warnings.warn("This test leaked %d file descriptors" % diff)
  /Users/shoyer/conda/envs/xarray-py36/lib/python3.6/site-packages/distributed/utils_test.py:538: UserWarning: This test leaked 15 MB of memory
    warnings.warn("This test leaked %d MB of memory" % diff)

xarray/tests/test_distributed.py::test_dask_distributed_integration_test[netcdf4]
  /Users/shoyer/conda/envs/xarray-py36/lib/python3.6/site-packages/distributed/utils_test.py:528: UserWarning: This test leaked 6 file descriptors
    warnings.warn("This test leaked %d file descriptors" % diff)

-- Docs: http://doc.pytest.org/en/latest/warnings.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants