Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I request and free a WSL interop socket? #10812

Closed
grepwood opened this issue Nov 22, 2023 · 20 comments
Closed

How do I request and free a WSL interop socket? #10812

grepwood opened this issue Nov 22, 2023 · 20 comments
Assignees
Labels

Comments

@grepwood
Copy link

I'm using WSL2 in an enterprise environment. A recent "update" in WSL across the whole company has made it impossible for us to work.

We have a certain tailored systemd service that needs to run all the time. One of the child processes of that service relies on WSL interop, because it's a Windows binary. With the latest update, we found that it's not enough that we have /init as the interpreter for Windows binaries set up by binfmt_misc - such a process also requires an environment variable WSL_INTEROP to be a path to a working WSL interop socket - usually located in /run/WSL with a filename that starts with the PID of the process who owns the socket followed by _interop, such as 666_interop.

I have tried to make 3 solutions for this issue, but all are terrible.

  1. When the parent process starts, it will locate the latest (according to file creation mtime) interop socket and parasitize it. This proved to be error-prone, because as soon as the rightful owner of the socket closes, the socket disappears and our Windows binary becomes unusable.
  2. I have noticed that systemd being PID 1, spawns a child process /init with PID 2 that never closes for the runtime of the entire WSL2 instance. Trying to parasitize this socket was even worse, because this socket is special in that it does not work at all.
  3. Each time WSL interop capacity is needed, the parent process will repeat the search from point 1, ensuring that new processes always have a fresh interop socket. This proved to be slightly more reliable, but it still is vulnerable - there may occur a situation where no valid socket exists at the time.

As such, I would like to ask how can I properly request an exclusive WSL interop socket when it's needed and free it when my process no longer needs it? Like good old fashioned malloc and free from C.

Copy link

Hi I'm an AI powered bot that finds similar issues based off the issue title.

Please view the issues below to see if they solve your problem, and if the issue describes your problem please consider closing this one and thumbs upping the other issue to help us prioritize it. Thank you!

Open similar issues:

Closed similar issues:

Note: You can give me feedback by thumbs upping or thumbs downing this comment.

@OneBlue
Copy link
Collaborator

OneBlue commented Nov 22, 2023

@grepwood: WSL_INTEROP isn't required to run windows binaries.

$ unset WSL_INTEROP
$ wsl_build cmd.exe echo foo
Microsoft Windows [Version 10.0.22621.2715]
(c) Microsoft Corporation. All rights reserved.

C:\Users\piboulay\Desktop\repos\wsl_build>

Can you share repro instructions for your situation where windows binaries aren't executed properly ?

@grepwood
Copy link
Author

grepwood commented Nov 22, 2023

@grepwood: WSL_INTEROP isn't required to run windows binaries.

I'm sorry but it doesn't look like it, at least on our end. Other team members were able to work around this by placing the Windows binary in question on a Windows drive, such that WSL would have to execute it from a location in /mnt/c, where C:\ is mounted. We also have another way, which involves editing Windows registry entries, but we fear that editing the registry will aggrevate our company's IT Security, so we would like to not rely on this in particular.

Can you share repro instructions for your situation where windows binaries aren't executed properly ?

Of course.

  1. Download https://github.com/sakai135/wsl-vpnkit/releases/download/v0.4.1/wsl-vpnkit.tar.gz - this is the systemd service we need, but we're going to tailor it a bit, to better fit the enterprise environment I'm working in.
  2. Extract these files from the tarball, to those locations:
app/wsl-gvproxy.exe -> /usr/local/bin/wsl-gvproxy.exe
app/wsl-vm -> /usr/local/bin/wsl-vm
app/wsl-vpnkit -> /usr/local/bin/wsl-vpnkit
app/wsl-vpnkit.service -> /lib/systemd/system/wsl-vpnkit.service
  1. Create a softlink /etc/systemd/system/multi-user.target.wants/wsl-vpnkit.service that leads to /lib/systemd/system/wsl-vpnkit.service.
  2. Delete the downloaded tarball. It is no longer needed.
  3. Edit /lib/systemd/system/wsl-vpnkit.service because it's messy. It should look more like this:
[Unit]
Description=Provides network connectivity to WSL 2 when blocked by VPN
After=network.target

[Service]
ExecStart=/usr/local/bin/wsl-vpnkit
Restart=always
KillMode=control-group

[Install]
WantedBy=multi-user.target
  1. Edit /usr/local/bin/wsl-vpnkit - since we're following the Filesystem Hierarchy Standard, we don't need absolute paths to any wsl-vpnkit binaries. And we need to adjust some variables for the company. Here's a pseudo-patch for this file:
-VMEXEC_PATH=${VMEXEC_PATH:-/app/wsl-vm}
-GVPROXY_PATH=${GVPROXY_PATH:-/app/wsl-gvproxy.exe}
+VMEXEC_PATH=${VMEXEC_PATH:-$(command -v wsl-vm)}
+GVPROXY_PATH=${GVPROXY_PATH:-$(command -v wsl-gvproxy.exe)}
...
-CHECK_HOST=${CHECK_HOST:-example.com}
-CHECK_DNS=${CHECK_DNS:-1.1.1.1}
+CHECK_HOST=${CHECK_HOST:-corporate.intranet.domain}
+CHECK_DNS=${CHECK_DNS:-corporate_dns_server_ipv4_here}
  1. Once it's all adjusted, proceed with service wsl-vpnkit start.

Expected result: wsl-vpnkit should start without issues.

Actual result: with regards to this source file https://github.com/sakai135/wsl-vpnkit/blob/dcc6b97809fb013a7bab7d03639b818c30925055/wsl-vpnkit which corresponds to what we're running, we are failing the check in line 152. Relevant code:

$GVPROXY_PATH -help 2>/dev/null
if [ $? -eq 1 ]; then
    echo "$GVPROXY_PATH is not executable due to WSL interop settings or Windows permissions"
    exit 1
fi

We have found in https://github.com/sakai135/wsl-vpnkit/tree/dcc6b97809fb013a7bab7d03639b818c30925055#wsl-gvproxyexe-is-not-executable-due-to-wsl-interop-settings-or-windows-permissions a confirmation that the Windows executable's location plays some kind of factor in this issue. We were successful in determining, that you can still run Windows binaries outside of those directories (whatever they are), if and only if your Windows binary possesses WSL_INTEROP that points to a valid WSL interop socket.

Our perfect preferred solution assumes that we will keep the Windows binary inside the filesystem that belongs to the WSL instance, because we want to avoid scenarios where users accidentally remove wsl-gvproxy.exe from their C:\ drive and are then surprised when wsl-vpnkit stops working.

@OneBlue
Copy link
Collaborator

OneBlue commented Nov 22, 2023

Ok thank you for the detailed explanation. I think I might know what's happening, but to be sure can you collect /logs of a repro, and share an strace of the execution of the windows executable ? (strace -f GVPROXY_PATH)

@nealey
Copy link

nealey commented Nov 27, 2023

Ok thank you for the detailed explanation. I think I might know what's happening, but to be sure can you collect /logs of a repro, and share an strace of the execution of the windows executable ? (strace -f GVPROXY_PATH)

I can provide that!

gvproxy.log

@nealey
Copy link

nealey commented Nov 27, 2023

Hello! Could you please provide more logs to help us better diagnose your issue?

I can provide that, too!

WslLogs-2023-11-27_11-57-21.zip

@OneBlue
Copy link
Collaborator

OneBlue commented Nov 27, 2023

Thank you @nealey. Looking at the logs I can see that the execution request makes it to the Windows side, but fails with access denied:

38	True	Microsoft.Windows.Subsystem.Lxss	LxssException	0	11-27-2023 10:57:37.150	"	"	"Code: 	
File: 	D:\a\1\s\src\windows\common\SubProcess.cpp
FunctionName: 	
HRESULT: 	0x80070005
Line number: 	206
Message: 	""ApplicationName: \\wsl.localhost\Debian\opt\wsl-vpnkit\app\wsl-gvproxy.exe, CommandLine: wsl-gvproxy.exe -help""

Can you make sure that your distribution's default user has read & execute access to that file ?

@nealey
Copy link

nealey commented Nov 27, 2023

Thank you @nealey. Looking at the logs I can see that the execution request makes it to the Windows side, but fails with access denied:

38	True	Microsoft.Windows.Subsystem.Lxss	LxssException	0	11-27-2023 10:57:37.150	"	"	"Code: 	
File: 	D:\a\1\s\src\windows\common\SubProcess.cpp
FunctionName: 	
HRESULT: 	0x80070005
Line number: 	206
Message: 	""ApplicationName: \\wsl.localhost\Debian\opt\wsl-vpnkit\app\wsl-gvproxy.exe, CommandLine: wsl-gvproxy.exe -help""

Can you make sure that your distribution's default user has read & execute access to that file ?

Yes:

WE47763:/home/neale % id
uid=1000(neale) gid=1000(neale) groups=1000(neale),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),112(docker)
WE47763:/home/neale % ls -l /opt/wsl-vpnkit/app/wsl-gvproxy.exe
-rwxr-xr-x 1 neale neale 10302976 Apr  4  2023 /opt/wsl-vpnkit/app/wsl-gvproxy.exe*
WE47763:/home/neale % /opt/wsl-vpnkit/app/wsl-gvproxy.exe -help 2>&1 | head
Usage of wsl-gvproxy.exe:
  -debug
        Print debug info
  -forward-dest value
        Forwards a unix socket to the guest virtual machine over SSH
  -forward-identity value
        Path to SSH identity key for forwarding
  -forward-sock value
        Forwards a unix socket to the guest virtual machine over SSH
  -forward-user value

However, based on the strace I attached, it appears /init is using /run/WSL/1_interop to launch \wsl.localhost\Debian\opt\wsl-vpnkit\app\wsl-gvproxy.exe. When I specify that socket, I get the same failure:

WE47763:/home/neale % WSL_INTEROP=/run/WSL/1_interop /opt/wsl-vpnkit/app/wsl-gvproxy.exe -help
/opt/wsl-vpnkit/app/wsl-gvproxy.exe: Invalid argument

@blakeduffey
Copy link

As a reminder - this behavior started in version 2.0.5. It was not present in 2.0.4 and earlier.

sakai135/wsl-vpnkit#246

@OneBlue
Copy link
Collaborator

OneBlue commented Nov 29, 2023

Thank you @blakeduffey. I did some digging and here's the root cause of the behavior change:

In 2.0.5 we enabled the NoRemoteImages flag in our process mitigation policy for wslservice.exe.

This causes execution of windows binaries stored in the linux filesystem to fail (because internally we end up executing \wsl.localhost\distro\path\to\program).

This mitigation policy only applies to processes that are children of wslservice.exe (so it doesn't apply to the wsl.exe process tree), therefore if you have a valid WSL_INTEROP env variable that points to a wsl.exe process tree, the process mitigation policy is not set and the execution succeeds.

If not, WSL will default to the wslhost.exe that is started with the distribution by wslservice.exe, which will fail given its process mitigation policy.

@grepwood
Copy link
Author

@nealey I cannot thank you enough for providing the logs! Fantastic job! My workplace policy basically prevents me from using personal accounts on the workstation, and the security is pretty tight - as in, can't even plug a pendrive and have it work. That's why I would have to pretty much rewrite the logs by hand while reading them off the screen.

@OneBlue thank you! This gives me an idea for a little hacky workaround until this issue is resolved properly. I'll let you all know if it worked.

@OneBlue
Copy link
Collaborator

OneBlue commented Nov 30, 2023

@OneBlue thank you! This gives me an idea for a little hacky workaround until this issue is resolved properly. I'll let you all know if it worked.

FYI I'm working on a fix for this. It should be in the next WSL release.

@blakeduffey
Copy link

@OneBlue thank you! This gives me an idea for a little hacky workaround until this issue is resolved properly. I'll let you all know if it worked.

FYI I'm working on a fix for this. It should be in the next WSL release.

awesome - happy to test once released.

@grepwood
Copy link
Author

I am able to request and free WSL interop sockets. More soon, need to run the groceries

@grepwood
Copy link
Author

And it's done https://gist.github.com/grepwood/8e42b2bd0e56cd964cbd77b6d182aff0

This could be used to work around the shortcomings of this bug. When it gets properly resolved, this could be used for something else.

@OneBlue
Copy link
Collaborator

OneBlue commented Nov 30, 2023

And it's done https://gist.github.com/grepwood/8e42b2bd0e56cd964cbd77b6d182aff0

Ah ! That's a good hack. Of course I'd recommend not to use it because the behavior our interop socket isn't an API so it might change anytime.

To give an update we're planning to publish an update with the fix for this bug either today or tomorrow.

@OneBlue
Copy link
Collaborator

OneBlue commented Dec 1, 2023

Fixed in 2.0.14

@puetzk
Copy link

puetzk commented Dec 2, 2023

Indeed, just debugging a similar problem (though a different executable, my own https://github.com/puetzk/wsl-mount-helpers rather than wsl-vpnkit) that started failing with 2.0.9 (or at least started to fail after I updated to 2.0.9, I don't know what version I had before to pinpoint a specific change). but the 2.0.14 pre-release seems to fix it. Thanks!

@xgalaxy
Copy link

xgalaxy commented Dec 11, 2023

@XA21X
Copy link

XA21X commented Dec 29, 2023

I've been running OmniSSHAgent's omni-socat via a systemd user service using NixOS-WSL. I couldn't figure out why it suddenly broke (probably after an OS upgrade I forgot about). It only affected the systemd unit (different process hierarchy etc). I suspect this was the cause, as it failed with an "Invalid argument" error, and upgrading to the latest WSL 2.0.14 instantly fixed it. 🎉

What luckily led me to this GitHub issue was the /run/WSL/1_interop string in the strace while I was debugging it. 😅

Dec 29 11:55:44 nixos run-omni-socat[6719]: connect(5, {sa_family=AF_UNIX, sun_path="/run/WSL/1_interop"}, 110) = 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants