Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows Subsystem for Linux "Could not set timeout on TCP stream: [Errno 92] Protocol not available" #2543

Closed
jwilson8767 opened this issue Feb 25, 2019 · 15 comments

Comments

@jwilson8767
Copy link

distributed/comm/tcp.py gives the warning "Could not set timeout on TCP stream: [Errno 92] Protocol not available" when spawning workers using Windows Subsystem for Linux as it expects "TCP_USER_TIMEOUT" to work -- and it doesn't appear to in WSL. I have created an issue over here: microsoft/WSL#3873 as I think this is an OS type issue, but I just wanted to bring it up over here to discuss workarounds/proper documentation. Currently it appears (though I'm having trouble confirming conclusively) that this allows worker processes to be orphaned following exceptions, and while some get cleaned up with the message "distributed.comm.tcp - WARNING - Closing dangling stream in
" many others remain running after the program has terminated.

What can we do to mitigate this or how do we better document this limitation?

Host OS: Windows 10.0.17134.619
Container OS: Ubuntu 18.04.2 LTS

jwilson@M:~$ python3
Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import platform
>>> platform.version()
'#523-Microsoft Mon Dec 31 17:49:00 PST 2018'
>>> import sys
>>> sys.platform
'linux'

Thanks

@jwilson8767
Copy link
Author

Could this be related to reusing sockets as py-amqp was over in this issue: celery/py-amqp#145?

@mrocklin
Copy link
Member

Thanks for raising the issue @jwilson8767 .

I personally have no particular experience here and am unlikely to be able to help. I just wanted to make sure that your well-worded issue here was acknowledged and appreciated.

Perhaps @beberg has suggestions? (no worries if not)

@jwilson8767
Copy link
Author

Just to follow up, this issue hasn't progressed on the WSL side. I'm hoping the recently announced WSL2 will offer some changes that will make multiprocessing as a whole a lot more efficient, but I'm unsure what changes to the network stack may help this particular issue.

jurreht added a commit to jurreht/distributed that referenced this issue Jun 25, 2019
This works around an issue in WSL that causes Distributed with
default config to crash on initializion. Fix by checking whether
the stack is WSL. If so, proceed as if Windows is running, which
fixes the issue.
@jurreht
Copy link

jurreht commented Jun 25, 2019

After encoutering this issue myself, I have added a workaround in #2802. This resolves the issues on WSL, as can be verified by running the test suite on a WSL setup. (This is not available in Travis, correct?)

@JSKenyon
Copy link

#2802 has been closed for now, but it would be particularly awesome if the distributed scheduler could work under WSL. Unfortunately, I lack the expertise to be of much assistance. Did you manage to find a workaround @jwilson8767?

@jwilson8767
Copy link
Author

jwilson8767 commented Aug 26, 2019

No, because I couldn't use dask reliably in my development environment I haven't used it at all.

I do still have hope that WSL2 will not have the same issues at the networking level as WSL1.

@StianOvrevage
Copy link

What is the status of this?

I'm getting

distributed.comm.tcp - WARNING - Could not set timeout on TCP stream: [Errno 92] Protocol not available

on dask 2.9.2 and it does not appear to be working.

@jwilson8767
Copy link
Author

@StianOvrevage It hasn't been touched. I'm waiting for WSL2 to be fully released before I even attempt to use Dask on it as their networking stack appears to be both more robust and more complex which has been causing some people issues migrating tools/workflows from WSL1 to WSL2.

@StianOvrevage
Copy link

Ok, thanks for the update! Looking forward to WSL2 then :)

@aidanmontare-edu
Copy link

WSL 2 is available for some people:
https://docs.microsoft.com/en-us/windows/wsl/install-win10#update-to-wsl-2

I can't install it since I don't yet have the newest Windows 10 build, but has anyone tried this?

@dazzag24
Copy link

I have WSL2 and I just re-ran a notebook in which I had previously observed this problem.

Whereas previously with with WSL v1 I would observe this:

from dask.distributed import Client
client = Client()
----------------
distributed.comm.tcp - WARNING - Could not set timeout on TCP stream: [Errno 92] Protocol not available
distributed.comm.tcp - WARNING - Could not set timeout on TCP stream: [Errno 92] Protocol not available
distributed.comm.tcp - WARNING - Could not set timeout on TCP stream: [Errno 92] Protocol not available
distributed.comm.tcp - WARNING - Could not set timeout on TCP stream: [Errno 92] Protocol not available
distributed.comm.tcp - WARNING - Could not set timeout on TCP stream: [Errno 92] Protocol not available

with WSLv2 I now observe this:

from dask.distributed import Client
client = Client()
client
-----------------------------
Client    Scheduler: tcp://127.0.0.1:33239   Dashboard: http://127.0.0.1:8787/status 
Cluster    Workers: 4   Cores: 8   Memory: 13.22 GB

@aidanmontare-edu
Copy link

That's cool!

I'm considering upgrading to WSL 2, the only problem is that I store all my files in the Windows system, and the FAQ from Microsoft says that version 2 doesn't have great performance across file systems.

lucventurini added a commit to lucventurini/pytritex that referenced this issue Aug 6, 2020
@zhpfu
Copy link

zhpfu commented Oct 21, 2020

Is there a solution for the WSL1 version?

@jwilson8767
Copy link
Author

@zhpfu Definitely not, as the issue is with WSL1's network stack itself.

@GenevieveBuckley
Copy link
Contributor

Closing this issue. Anyone with this problem should try upgrading to WSL 2 (see here).

Thank you for the report @jwilson8767

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants