Windows/Python 2.7 tests of dask-distributed failing on master/v0.10.0 #1738

shoyer · 2017-11-23T00:42:29Z

Python 2.7 builds on Windows are failing:
https://ci.appveyor.com/project/shoyer/xray/build/1.0.3018

The tests that are failing are all variations of test_dask_distributed_integration_test. Example error message:

=================================== ERRORS ====================================
_____ ERROR at teardown of test_dask_distributed_integration_test[scipy] ______
    @pytest.fixture
    def loop():
        with pristine_loop() as loop:
            # Monkey-patch IOLoop.start to wait for loop stop
            orig_start = loop.start
            is_stopped = threading.Event()
            is_stopped.set()
            def start():
                is_stopped.clear()
                try:
                    orig_start()
                finally:
                    is_stopped.set()
            loop.start = start
    
            yield loop
            # Stop the loop in case it's still running
            try:
                loop.add_callback(loop.stop)
            except RuntimeError as e:
                if not re.match("IOLoop is clos(ed|ing)", str(e)):
                    raise
            else:
>               is_stopped.wait()
C:\Python27-conda64\envs\test_env\lib\site-packages\distributed\utils_test.py:102: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
C:\Python27-conda64\envs\test_env\lib\contextlib.py:24: in __exit__
    self.gen.next()
C:\Python27-conda64\envs\test_env\lib\site-packages\distributed\utils_test.py:139: in pristine_loop
    loop.close(all_fds=True)
C:\Python27-conda64\envs\test_env\lib\site-packages\tornado\ioloop.py:716: in close
    self.remove_handler(self._waker.fileno())
C:\Python27-conda64\envs\test_env\lib\site-packages\tornado\platform\common.py:91: in fileno
    return self.reader.fileno()
C:\Python27-conda64\envs\test_env\lib\socket.py:228: in meth
    return getattr(self._sock,name)(*args)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
args = (<socket._closedsocket object at 0x00000000131F27F0>, 'fileno')
    def _dummy(*args):
>       raise error(EBADF, 'Bad file descriptor')
E       error: [Errno 9] Bad file descriptor
C:\Python27-conda64\envs\test_env\lib\socket.py:174: error
---------------------------- Captured stderr call -----------------------------
distributed.scheduler - INFO -   Scheduler at:      tcp://127.0.0.1:1094
distributed.worker - INFO -       Start worker at:       tcp://127.0.0.1:1096
distributed.worker - INFO -       Start worker at:       tcp://127.0.0.1:1095
distributed.worker - INFO -          Listening to:       tcp://127.0.0.1:1096
distributed.worker - INFO -          Listening to:       tcp://127.0.0.1:1095
distributed.worker - INFO - Waiting to connect to:       tcp://127.0.0.1:1094
distributed.worker - INFO - Waiting to connect to:       tcp://127.0.0.1:1094
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -               Threads:                          1
distributed.worker - INFO -               Threads:                          1
distributed.worker - INFO -                Memory:                    2.00 GB
distributed.worker - INFO -                Memory:                    2.00 GB
distributed.worker - INFO -       Local Directory: C:\projects\xray\_test_worker-4043f797-3668-459a-9d5b-017dbc092ad5\worker-ozlw8t
distributed.worker - INFO -       Local Directory: C:\projects\xray\_test_worker-0b2d640d-07ba-493f-967c-f8d8de38e3b5\worker-_xbrz6
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO - -------------------------------------------------
distributed.scheduler - INFO - Register tcp://127.0.0.1:1096
distributed.worker - INFO -         Registered to:       tcp://127.0.0.1:1094
distributed.worker - INFO - -------------------------------------------------
distributed.scheduler - INFO - Register tcp://127.0.0.1:1095
distributed.worker - INFO -         Registered to:       tcp://127.0.0.1:1094
distributed.worker - INFO - -------------------------------------------------
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:1095
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:1096
distributed.scheduler - INFO - Receive client connection: Client-06708a40-ce25-11e7-898c-00155d57f2dd
distributed.scheduler - INFO - Connection to client Client-06708a40-ce25-11e7-898c-00155d57f2dd broken
distributed.scheduler - INFO - Remove client Client-06708a40-ce25-11e7-898c-00155d57f2dd
distributed.scheduler - INFO - Close client connection: Client-06708a40-ce25-11e7-898c-00155d57f2dd
distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:1095
distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:1096
distributed.scheduler - INFO - Remove worker tcp://127.0.0.1:1095
distributed.scheduler - INFO - Remove worker tcp://127.0.0.1:1096
distributed.scheduler - INFO - Lost all workers
distributed.worker - INFO - Close compute stream
distributed.worker - INFO - Close compute stream
distributed.scheduler - INFO - Scheduler closing...
distributed.scheduler - INFO - Scheduler closing all comms

@mrocklin any guesses about what this could be?

The text was updated successfully, but these errors were encountered:

mrocklin · 2017-11-23T00:46:30Z

At first glance, no. Does this happen both on latest release and on git-master?

shoyer · 2017-11-23T00:48:35Z

It looks like it, I'll double check.

…

On Wed, Nov 22, 2017 at 4:46 PM Matthew Rocklin ***@***.***> wrote: At first glance, no. Does this happen both on latest release and on git-master? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1738 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABKS1vBBuWl_-ilWNa7yrd2bIiI6c8vkks5s5MBmgaJpZM4QoKTw> .

mrocklin · 2017-11-23T00:49:27Z

If I may generalize the question, did this start happening recently? Do we know what triggered this change?

shoyer · 2017-11-23T01:04:57Z

It definitely started recently (past week). I don't think we changed anything relevant in xarray so my guess is something to do with a dependency. Hopefully it will be clear from the appveyor logs!

…

On Wed, Nov 22, 2017 at 4:49 PM Matthew Rocklin ***@***.***> wrote: If I may generalize the question, did this start happening recently? Do we know what triggered this change? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1738 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABKS1hI27jKOwumHya6BpXxDcQNa_Hn1ks5s5MEXgaJpZM4QoKTw> .

spencerahill · 2017-11-23T05:54:51Z

FWIW aospy is having similar failures starting roughly at the same time: spencerahill/aospy#238

mrocklin · 2017-11-23T14:37:28Z

cc @pitrou in case this reminds him of anything

…

On Thu, Nov 23, 2017 at 12:54 AM, Spencer Hill ***@***.***> wrote: FWIW aospy is having similar failures starting roughly at the same time: spencerahill/aospy#238 <spencerahill/aospy#238> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1738 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AASszD_B8JyfRw354324H3J0l2XeDHBcks5s5QitgaJpZM4QoKTw> .

shoyer · 2017-11-23T17:20:46Z

Comparing a failed build to the last successful build on master, I find the following difference in the dependencies:

# passed, just before the v0.10.0 release
ca-certificates           2017.7.27.1                   0    conda-forge
dask                      0.15.4                     py_0    conda-forge
dask-core                 0.15.4                     py_0    conda-forge
distributed               1.19.3                   py27_0    conda-forge
setuptools                36.6.0                   py27_1    conda-forge

# failed, at the v0.10.0 release
ca-certificates           2017.11.5                     0    conda-forge
dask                      0.16.0                     py_0    conda-forge
dask-core                 0.16.0                     py_0    conda-forge
distributed               1.20.0                   py27_0    conda-forge
setuptools                36.7.2                   py27_0    conda-forge

It looks like dask 0.16 or distributed 0.1.20 release is the most likely culprit. There were no Python changes to xarray in the v0.10.0 release commit (only setup.py and documentation changes).

mrocklin · 2017-11-23T17:22:26Z

It might be useful in the future to change one of XArray's test builds to be against git-master rather than latest release. This would help us to identify issues like this closer to when they occurred.

…

On Thu, Nov 23, 2017 at 12:20 PM, Stephan Hoyer ***@***.***> wrote: Comparing a failed <https://ci.appveyor.com/project/shoyer/xray/build/1.0.3015> build to the last successful <https://ci.appveyor.com/project/shoyer/xray/build/1.0.3012> build on master, I find the following difference in the dependencies: # passed, just before the v0.10.0 release ca-certificates 2017.7.27.1 0 conda-forge dask 0.15.4 py_0 conda-forge dask-core 0.15.4 py_0 conda-forge distributed 1.19.3 py27_0 conda-forge setuptools 36.6.0 py27_1 conda-forge # failed, at the v0.10.0 release ca-certificates 2017.11.5 0 conda-forge dask 0.16.0 py_0 conda-forge dask-core 0.16.0 py_0 conda-forge distributed 1.20.0 py27_0 conda-forge setuptools 36.7.2 py27_0 conda-forge It looks like dask 0.16 or distributed 0.1.20 release is the most likely culprit. There were no Python changes to xarray in the v0.10.0 release commit (only setup.py and documentation changes). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1738 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AASszEI5Wou1G91aMSsN7lCuovp2cDSAks5s5alwgaJpZM4QoKTw> .

pitrou · 2017-11-23T17:25:44Z

@shoyer is this using an old-ish Python 2.7 version?

shoyer · 2017-11-23T18:43:15Z

This is Python 2.7.14.

We do have a build against git master for dask, but only on Linux (Appveyor only gives us one free simultaneous build).

pitrou · 2017-11-23T21:20:09Z

The error probably means the loop was already closed. I don't know why that is.

shoyer · 2017-11-29T10:12:13Z

One useful clue: when I run these tests on OS X, I get the following warnings:

========================================= warnings summary ==========================================
xarray/tests/test_distributed.py::test_dask_distributed_integration_test[scipy]
  /Users/shoyer/conda/envs/xarray-py36/lib/python3.6/site-packages/distributed/utils_test.py:528: UserWarning: This test leaked 8 file descriptors
    warnings.warn("This test leaked %d file descriptors" % diff)
  /Users/shoyer/conda/envs/xarray-py36/lib/python3.6/site-packages/distributed/utils_test.py:538: UserWarning: This test leaked 15 MB of memory
    warnings.warn("This test leaked %d MB of memory" % diff)

xarray/tests/test_distributed.py::test_dask_distributed_integration_test[netcdf4]
  /Users/shoyer/conda/envs/xarray-py36/lib/python3.6/site-packages/distributed/utils_test.py:528: UserWarning: This test leaked 6 file descriptors
    warnings.warn("This test leaked %d file descriptors" % diff)

-- Docs: http://doc.pytest.org/en/latest/warnings.html

shoyer mentioned this issue Nov 29, 2017

Mark dask-distributed tests on Windows as xfail #1747

Merged

fujiisoup mentioned this issue Nov 29, 2017

Fix in vectorized item assignment #1746

Merged

4 tasks

shoyer mentioned this issue Aug 20, 2018

xarray.backends refactor #2261

Merged

2 tasks

shoyer closed this as completed in #2261 Oct 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Windows/Python 2.7 tests of dask-distributed failing on master/v0.10.0 #1738

Windows/Python 2.7 tests of dask-distributed failing on master/v0.10.0 #1738

shoyer commented Nov 23, 2017

mrocklin commented Nov 23, 2017

shoyer commented Nov 23, 2017 via email

mrocklin commented Nov 23, 2017

shoyer commented Nov 23, 2017 via email

spencerahill commented Nov 23, 2017

mrocklin commented Nov 23, 2017 via email

shoyer commented Nov 23, 2017

mrocklin commented Nov 23, 2017 via email

pitrou commented Nov 23, 2017

shoyer commented Nov 23, 2017

pitrou commented Nov 23, 2017

shoyer commented Nov 29, 2017

Windows/Python 2.7 tests of dask-distributed failing on master/v0.10.0 #1738

Windows/Python 2.7 tests of dask-distributed failing on master/v0.10.0 #1738

Comments

shoyer commented Nov 23, 2017

mrocklin commented Nov 23, 2017

shoyer commented Nov 23, 2017 via email

mrocklin commented Nov 23, 2017

shoyer commented Nov 23, 2017 via email

spencerahill commented Nov 23, 2017

mrocklin commented Nov 23, 2017 via email

shoyer commented Nov 23, 2017

mrocklin commented Nov 23, 2017 via email

pitrou commented Nov 23, 2017

shoyer commented Nov 23, 2017

pitrou commented Nov 23, 2017

shoyer commented Nov 29, 2017