Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network unreachable after upgrading to v0.3.5 #165

Closed
F21 opened this issue Nov 16, 2022 · 27 comments · Fixed by #166
Closed

Network unreachable after upgrading to v0.3.5 #165

F21 opened this issue Nov 16, 2022 · 27 comments · Fixed by #166

Comments

@F21
Copy link

F21 commented Nov 16, 2022

I am using wsl-vpnkit on Windows 11 22H2 Build 22621.819.
Ubuntu is Ubuntu 22.04.1 LTS 5.15.74.2-microsoft-standard-WSL2 in WSL2.

I am receiving Network is unreachable errors when sshing into a machine over a private network connected to Cloudflare WARP (via zero trust):

$ wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit status
Service wsl-vpnkit is running
$ wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit restart
$ wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit start
 * start-stop-daemon: wsl-vpnkit is already running
$ ssh myhost
ssh: connect to host 10.0.0.1 port 22: Network is unreachable

Downgrading to v0.3.4 allows me to connect again.

Logs from powershell:

PS C:\Users\user> wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit stop
PS C:\Users\user> wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit status
Service wsl-vpnkit is not running
PS C:\Users\user> wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit start
PS C:\Users\user> wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit status
Service wsl-vpnkit is not running
PS C:\Users\user> wsl.exe -d wsl-vpnkit --cd /app tail -f /var/log/wsl-vpnkit.log
udhcpc: started, v1.35.0
udhcpc: broadcasting discover
time="2022-11-17T08:52:21+11:00" level=info msg="new connection from remote to 14996"
udhcpc: broadcasting select for 192.168.67.3, server 192.168.67.1
time="2022-11-16T21:52:21Z" level=fatal msg="cannot read size from socket: EOF"
udhcpc: sendto: No such device or address
udhcpc: read error: Network is down, reopening socket
sh: can't kill pid 62: No such process
wsl-vm exited
udhcpc: bind: No such device
@F21
Copy link
Author

F21 commented Nov 16, 2022

@sakai135 Still having issues after building from ed4f38c

$ wsl.exe -d wsl-vpnkit --cd /app DEBUG=1 service wsl-vpnkit start
+ test -f /var/run/wsl-vpnkit.pid
+ ps
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    3 root      0:00 {init} plan9 --control-socket 5 --log-level 4 --server-fd 6 --pipe-fd 8 --log-truncate
    6 root      0:00 /init
    7 root      0:00 /init
    8 root      0:00 {wsl-vpnkit} /bin/sh /etc/init.d/wsl-vpnkit start
    9 root      0:00 ps
+ start-stop-daemon --pidfile /var/run/wsl-vpnkit.pid --make-pidfile --background --stdout /var/log/wsl-vpnkit.log --stderr /var/log/wsl-vpnkit.log --exec wsl-vpnkit --verbose --start
 * start-stop-daemon: fopen `/var/run/wsl-vpnkit.pid': No such file or directory
 * Detaching to start `wsl-vpnkit' ...
+ ret=0
+ exit 0
$ wsl.exe -d wsl-vpnkit --cd /app DEBUG=1 service wsl-vpnkit status
+ test -f /var/run/wsl-vpnkit.pid
+ cat /var/run/wsl-vpnkit.pid
11
+ ps
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    3 root      0:00 {init} plan9 --control-socket 5 --log-level 4 --server-fd 6 --pipe-fd 8 --log-truncate
   12 root      0:00 /init
   13 root      0:00 /init
   14 root      0:00 {wsl-vpnkit} /bin/sh /etc/init.d/wsl-vpnkit status
   16 root      0:00 ps
+ test -f /var/run/wsl-vpnkit.pid
+ cat /var/run/wsl-vpnkit.pid
+ pgrep -P 11
+ ret=1
+ '[' 1 '=' 0 ]
+ echo Service wsl-vpnkit is not running
Service wsl-vpnkit is not running
+ exit 1
$ wsl.exe -d wsl-vpnkit --cd /app DEBUG=1 service wsl-vpnkit restart
+ test -f /var/run/wsl-vpnkit.pid
+ cat /var/run/wsl-vpnkit.pid
11
+ ps
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    3 root      0:00 {init} plan9 --control-socket 5 --log-level 4 --server-fd 6 --pipe-fd 8 --log-truncate
   19 root      0:00 /init
   20 root      0:00 /init
   21 root      0:00 {wsl-vpnkit} /bin/sh /etc/init.d/wsl-vpnkit restart
   23 root      0:00 ps
+ start-stop-daemon --pidfile /var/run/wsl-vpnkit.pid --retry 2 --verbose --stop
 * Will stop PID 11
 * start-stop-daemon: no matching processes found
+ ret=0
+ pgrep wsl-vm
+ start
+ _debug_check
+ '[' -n 1 ]
+ set -x
+ test -f /var/run/wsl-vpnkit.pid
+ ps
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    3 root      0:00 {init} plan9 --control-socket 5 --log-level 4 --server-fd 6 --pipe-fd 8 --log-truncate
   19 root      0:00 /init
   20 root      0:00 /init
   21 root      0:00 {wsl-vpnkit} /bin/sh /etc/init.d/wsl-vpnkit restart
   26 root      0:00 ps
+ start-stop-daemon --pidfile /var/run/wsl-vpnkit.pid --make-pidfile --background --stdout /var/log/wsl-vpnkit.log --stderr /var/log/wsl-vpnkit.log --exec wsl-vpnkit --verbose --start
 * start-stop-daemon: fopen `/var/run/wsl-vpnkit.pid': No such file or directory
 * Detaching to start `wsl-vpnkit' ...
+ ret=0
+ ret=0
+ exit 0
$ wsl.exe -d wsl-vpnkit --cd /app DEBUG=1 service wsl-vpnkit status
+ test -f /var/run/wsl-vpnkit.pid
+ cat /var/run/wsl-vpnkit.pid
28
+ ps
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    3 root      0:00 {init} plan9 --control-socket 5 --log-level 4 --server-fd 6 --pipe-fd 8 --log-truncate
   29 root      0:00 /init
   30 root      0:00 /init
   31 root      0:00 {wsl-vpnkit} /bin/sh /etc/init.d/wsl-vpnkit status
   33 root      0:00 ps
+ test -f /var/run/wsl-vpnkit.pid
+ cat /var/run/wsl-vpnkit.pid
+ pgrep -P 28
+ ret=1
+ '[' 1 '=' 0 ]
+ echo Service wsl-vpnkit is not running
Service wsl-vpnkit is not running
+ exit 1

@sakai135 sakai135 reopened this Nov 16, 2022
@sakai135
Copy link
Owner

sakai135 commented Nov 16, 2022

@F21 Just to check, does running wsl-vpnkit in the foreground work?

wsl.exe -d wsl-vpnkit --cd /app wsl-vpnkit

Can you also verify that the --wait is gone from:

wsl.exe -d wsl-vpnkit --cd /app cat wsl-vpnkit.service

@F21
Copy link
Author

F21 commented Nov 16, 2022

@sakai135 Yes, running it in the foreground works for me:

$ wsl.exe -d wsl-vpnkit --cd /app DEBUG=1 service wsl-vpnkit status
+ test -f /var/run/wsl-vpnkit.pid
+ ps
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    3 root      0:00 {init} plan9 --control-socket 5 --log-level 4 --server-fd 6 --pipe-fd 8 --log-truncate
    6 root      0:00 /init
    7 root      0:00 /init
    8 root      0:00 {wsl-vpnkit} /bin/sh /etc/init.d/wsl-vpnkit status
    9 root      0:00 ps
+ test -f /var/run/wsl-vpnkit.pid
+ ret=1
+ '[' 1 '=' 0 ]
+ echo Service wsl-vpnkit is not running
Service wsl-vpnkit is not running
+ exit 1
$ wsl.exe -d wsl-vpnkit --cd /app wsl-vpnkit
starting wsl-vpnkit
cleaning up iptables...
iptables cleanup done
restoring WSL 2 ip route...
ip: RTNETLINK answers: File exists
restored WSL 2 ip route
wsl-gvproxy.exe exists at /mnt/c/Users/user/wsl-vpnkit/wsl-gvproxy.exe
starting gvproxy at /mnt/c/Users/user/wsl-vpnkit/wsl-gvproxy.exe...
started gvproxy
waiting for dhcp...
INFO[0000] waiting for packets...
udhcpc: started, v1.35.0
time="2022-11-17T10:35:45+11:00" level=info msg="new connection from remote to 5020"
udhcpc: broadcasting discover
udhcpc: broadcasting select for 192.168.67.3, server 192.168.67.1
udhcpc: lease of 192.168.67.3 obtained from 192.168.67.1, lease time 3600
default via 192.168.67.1 dev eth1  metric 207
dhcp completed
removing WSL 2 ip route...
removed WSL 2 ip route
adding rules to iptables...
iptables done
check: ✔️ ping success to IPv4 WSL 2 gateway / Windows host (172.28.16.1)
check: ✔️ ping success to IPv4 Windows host (192.168.67.2)
check: ✔️ ping success to IPv4 gateway (192.168.67.1)
check: ✔️ nslookup success for example.com A using 192.168.67.1
check: ✔️ nslookup success for example.com A using 192.168.67.1
check: ✔️ nslookup success for example.com A using 172.28.16.1
check: ✔️ nslookup success for example.com A using 1.1.1.1
check: ✔️ ping success to IPv4 external host (example.com)
check: ✔️ nslookup success for example.com AAAA using 192.168.67.1
check: ✔️ nslookup success for example.com AAAA using 192.168.67.1
check: ✔️ nslookup success for example.com AAAA using 172.28.16.1
check: ✔️ nslookup success for example.com AAAA using 1.1.1.1
ping: bad address 'example.com'
check: ➖ ping fail to IPv6 external host (example.com)
check: ✔️ wget success for https://example.com

Contents of wsl-vpnkit.service:

$ wsl.exe -d wsl-vpnkit --cd /app cat wsl-vpnkit.service
#! /bin/sh

EXECUTABLE="wsl-vpnkit"                 # starting point
SUB_PROCESS="wsl-vm"                    # forked sub processes
PID_PATH="/var/run/$EXECUTABLE.pid"
LOG_PATH="/var/log/$EXECUTABLE.log"

ret=0

_debug_check() {
    if [ -n "$DEBUG" ]; then
        set -x
        test -f $PID_PATH && cat $PID_PATH
        ps
    fi
}

start() {
    _debug_check
    start-stop-daemon \
        --pidfile $PID_PATH \
        --make-pidfile \
        --background \
        --stdout $LOG_PATH \
        --stderr $LOG_PATH \
        --exec $EXECUTABLE \
        ${DEBUG+--verbose} \
        --start
    ret=$?
}

stop() {
    _debug_check
    start-stop-daemon \
        --pidfile $PID_PATH \
        --retry 2 \
        ${DEBUG+--verbose} \
        --stop
    ret=$?
    # kill sub processes if still running
    pgrep $SUB_PROCESS >/dev/null && pkill -9 $SUB_PROCESS
}

status() {
    _debug_check
    test -f $PID_PATH && pgrep -P $(cat $PID_PATH) &>/dev/null
    ret=$?
    if [ $ret = 0 ]; then
        echo Service $EXECUTABLE is running
    else
        echo Service $EXECUTABLE is not running
    fi
}

restart() {
    stop
    start
    ret=$?
}

case "$1" in
    start)
        start
        ;;
    stop)
        stop
        ;;
    restart)
        restart
        ;;
    status)
        status
        ;;
    *)
        echo "Usage: $EXECUTABLE {start|stop|restart|status}"
        exit 1
esac

exit $ret

@sakai135
Copy link
Owner

Can you show what's in the logs when starting the service?

wsl.exe -d wsl-vpnkit --cd /app rm /var/log/wsl-vpnkit.log
wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit restart
sleep 5
wsl.exe -d wsl-vpnkit --cd /app cat /var/log/wsl-vpnkit.log

Also if you could provide the WSL version as well.

wsl.exe --version

@F21
Copy link
Author

F21 commented Nov 16, 2022

The logs seem to be empty. I am doing this in powershell:

PS C:\Users\user> wsl --shutdown
PS C:\Users\user> wsl.exe -d wsl-vpnkit --cd /app rm /var/log/wsl-vpnkit.log
PS C:\Users\user> wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit restart
 * start-stop-daemon: no matching processes found
PS C:\Users\user> sleep 5
PS C:\Users\user> wsl.exe -d wsl-vpnkit --cd /app cat /var/log/wsl-vpnkit.log
PS C:\Users\user> wsl.exe --version
WSL version: 1.0.0.0
Kernel version: 5.15.74.2
WSLg version: 1.0.47
MSRDC version: 1.2.3575
Direct3D version: 1.606.4
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.22621.819
PS C:\Users\user>
PS C:\Users\user> wsl -l -v
  NAME                   STATE           VERSION
* Ubuntu                 Running         2
  wsl-vpnkit             Stopped         2
  docker-desktop         Stopped         2
  docker-desktop-data    Stopped         2

@sakai135
Copy link
Owner

Thanks for the info. I just realized that WSL is now at 1.0.0... Unfortunately, updating it didn't let me recreate the issue.

Can I ask you to try it with removing the indicated lines from wsl-vpnkit.service? This should revert the start part of the script back to what is in v0.3.4.

wsl.exe -d wsl-vpnkit --cd /app notepad.exe wsl-vpnkit.service
start() {
    _debug_check                              <<=====
    start-stop-daemon \
        --pidfile $PID_PATH \
        --make-pidfile \
        --background \
        --stdout $LOG_PATH \
        --stderr $LOG_PATH \
        --exec $EXECUTABLE \
        ${DEBUG+--verbose} \                  <<=====
        --start
    ret=$?
}

@F21
Copy link
Author

F21 commented Nov 17, 2022

Hmm. It's still not starting for me. The logs appear to be empty as well:

PS C:\Users\user> wsl --shutdown
PS C:\Users\user> wsl.exe -d wsl-vpnkit --cd /app rm /var/log/wsl-vpnkit.log
PS C:\Users\user> wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit restart
 * start-stop-daemon: no matching processes found
PS C:\Users\user> sleep 5
PS C:\Users\user> wsl.exe -d wsl-vpnkit --cd /app cat /var/log/wsl-vpnkit.log
PS C:\Users\user> wsl.exe -d wsl-vpnkit --cd /app cat wsl-vpnkit.service
#! /bin/sh

EXECUTABLE="wsl-vpnkit"                 # starting point
SUB_PROCESS="wsl-vm"                    # forked sub processes
PID_PATH="/var/run/$EXECUTABLE.pid"
LOG_PATH="/var/log/$EXECUTABLE.log"

ret=0

_debug_check() {
    if [ -n "$DEBUG" ]; then
        set -x
        test -f $PID_PATH && cat $PID_PATH
        ps
    fi
}

start() {
    start-stop-daemon \
        --pidfile $PID_PATH \
        --make-pidfile \
        --background \
        --stdout $LOG_PATH \
        --stderr $LOG_PATH \
        --exec $EXECUTABLE \
        --start
    ret=$?
}

stop() {
    _debug_check
    start-stop-daemon \
        --pidfile $PID_PATH \
        --retry 2 \
        ${DEBUG+--verbose} \
        --stop
    ret=$?
    # kill sub processes if still running
    pgrep $SUB_PROCESS >/dev/null && pkill -9 $SUB_PROCESS
}

status() {
    _debug_check
    test -f $PID_PATH && pgrep -P $(cat $PID_PATH) &>/dev/null
    ret=$?
    if [ $ret = 0 ]; then
        echo Service $EXECUTABLE is running
    else
        echo Service $EXECUTABLE is not running
    fi
}

restart() {
    stop
    start
    ret=$?
}

case "$1" in
    start)
        start
        ;;
    stop)
        stop
        ;;
    restart)
        restart
        ;;
    status)
        status
        ;;
    *)
        echo "Usage: $EXECUTABLE {start|stop|restart|status}"
        exit 1
esac

exit $ret
PS C:\Users\user> wsl.exe -d wsl-vpnkit --cd /app DEBUG=1 service wsl-vpnkit status
+ test -f /var/run/wsl-vpnkit.pid
+ ps
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    3 root      0:00 {init} plan9 --control-socket 5 --log-level 4 --server-fd 6 --pipe-fd 8 --log-truncate
    6 root      0:00 /init
    7 root      0:00 /init
    8 root      0:00 {wsl-vpnkit} /bin/sh /etc/init.d/wsl-vpnkit status
    9 root      0:00 ps
+ test -f /var/run/wsl-vpnkit.pid
+ ret=1
+ '[' 1 '=' 0 ]
+ echo Service wsl-vpnkit is not running
Service wsl-vpnkit is not running
+ exit 1
PS C:\Users\user> wsl.exe -d wsl-vpnkit --cd /app DEBUG=1 service wsl-vpnkit start
PS C:\Users\user> wsl.exe -d wsl-vpnkit --cd /app DEBUG=1 service wsl-vpnkit status
+ test -f /var/run/wsl-vpnkit.pid
+ cat /var/run/wsl-vpnkit.pid
14
+ ps
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    3 root      0:00 {init} plan9 --control-socket 5 --log-level 4 --server-fd 6 --pipe-fd 8 --log-truncate
   15 root      0:00 /init
   16 root      0:00 /init
   17 root      0:00 {wsl-vpnkit} /bin/sh /etc/init.d/wsl-vpnkit status
   19 root      0:00 ps
+ test -f /var/run/wsl-vpnkit.pid
+ cat /var/run/wsl-vpnkit.pid
+ pgrep -P 14
+ ret=1
+ '[' 1 '=' 0 ]
+ echo Service wsl-vpnkit is not running
Service wsl-vpnkit is not running
+ exit 1
PS C:\Users\user> wsl.exe -d wsl-vpnkit --cd /app DEBUG=1 service wsl-vpnkit restart
+ test -f /var/run/wsl-vpnkit.pid
+ cat /var/run/wsl-vpnkit.pid
14
+ ps
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    3 root      0:00 {init} plan9 --control-socket 5 --log-level 4 --server-fd 6 --pipe-fd 8 --log-truncate
   22 root      0:00 /init
   23 root      0:00 /init
   24 root      0:00 {wsl-vpnkit} /bin/sh /etc/init.d/wsl-vpnkit restart
   26 root      0:00 ps
+ start-stop-daemon --pidfile /var/run/wsl-vpnkit.pid --retry 2 --verbose --stop
 * Will stop PID 14
 * start-stop-daemon: no matching processes found
+ ret=0
+ pgrep wsl-vm
+ start
+ start-stop-daemon --pidfile /var/run/wsl-vpnkit.pid --make-pidfile --background --stdout /var/log/wsl-vpnkit.log --stderr /var/log/wsl-vpnkit.log --exec wsl-vpnkit --start
+ ret=0
+ ret=0
+ exit 0
PS C:\Users\user> wsl.exe -d wsl-vpnkit --cd /app DEBUG=1 service wsl-vpnkit status
+ test -f /var/run/wsl-vpnkit.pid
+ cat /var/run/wsl-vpnkit.pid
30
+ ps
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    3 root      0:00 {init} plan9 --control-socket 5 --log-level 4 --server-fd 6 --pipe-fd 8 --log-truncate
   31 root      0:00 /init
   32 root      0:00 /init
   33 root      0:00 {wsl-vpnkit} /bin/sh /etc/init.d/wsl-vpnkit status
   35 root      0:00 ps
+ test -f /var/run/wsl-vpnkit.pid
+ cat /var/run/wsl-vpnkit.pid
+ pgrep -P 30
+ ret=1
+ '[' 1 '=' 0 ]
+ echo Service wsl-vpnkit is not running
Service wsl-vpnkit is not running
+ exit 1
PS C:\Users\user> wsl.exe -d wsl-vpnkit --cd /app cat /var/log/wsl-vpnkit.log
PS C:\Users\user>

@sakai135
Copy link
Owner

Does running the command to start the service from within your Ubuntu distro make any difference?

@F21
Copy link
Author

F21 commented Nov 17, 2022

Same issue using my Ubuntu distro:

$  wsl.exe -d wsl-vpnkit --cd /app DEBUG=1 service wsl-vpnkit status
..+ test -f /var/run/wsl-vpnkit.pid
+ cat /var/run/wsl-vpnkit.pid
13
+ ps
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    3 root      0:00 {init} plan9 --control-socket 5 --log-level 4 --server-fd 6 --pipe-fd 8 --log-truncate
   14 root      0:00 /init
   15 root      0:00 /init
   16 root      0:00 {wsl-vpnkit} /bin/sh /etc/init.d/wsl-vpnkit status
   18 root      0:00 ps
+ test -f /var/run/wsl-vpnkit.pid
+ cat /var/run/wsl-vpnkit.pid
+ pgrep -P 13
+ ret=1
+ '[' 1 '=' 0 ]
+ echo Service wsl-vpnkit is not running
Service wsl-vpnkit is not running
+ exit 1
$  wsl.exe -d wsl-vpnkit --cd /app DEBUG=1 service wsl-vpnkit start
$ wsl.exe -d wsl-vpnkit --cd /app DEBUG=1 service wsl-vpnkit status
+ test -f /var/run/wsl-vpnkit.pid
+ cat /var/run/wsl-vpnkit.pid
25
+ ps
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    3 root      0:00 {init} plan9 --control-socket 5 --log-level 4 --server-fd 6 --pipe-fd 8 --log-truncate
   26 root      0:00 /init
   27 root      0:00 /init
   28 root      0:00 {wsl-vpnkit} /bin/sh /etc/init.d/wsl-vpnkit status
   30 root      0:00 ps
+ test -f /var/run/wsl-vpnkit.pid
+ cat /var/run/wsl-vpnkit.pid
+ pgrep -P 25
+ ret=1
+ '[' 1 '=' 0 ]
+ echo Service wsl-vpnkit is not running
Service wsl-vpnkit is not running
+ exit 1
$ wsl.exe -d wsl-vpnkit --cd /app DEBUG=1 service wsl-vpnkit restart
+ test -f /var/run/wsl-vpnkit.pid
+ cat /var/run/wsl-vpnkit.pid
25
+ ps
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    3 root      0:00 {init} plan9 --control-socket 5 --log-level 4 --server-fd 6 --pipe-fd 8 --log-truncate
   33 root      0:00 /init
   34 root      0:00 /init
   35 root      0:00 {wsl-vpnkit} /bin/sh /etc/init.d/wsl-vpnkit restart
   37 root      0:00 ps
+ start-stop-daemon --pidfile /var/run/wsl-vpnkit.pid --retry 2 --verbose --stop
 * Will stop PID 25
 * start-stop-daemon: no matching processes found
+ ret=0
+ pgrep wsl-vm
+ start
+ start-stop-daemon --pidfile /var/run/wsl-vpnkit.pid --make-pidfile --background --stdout /var/log/wsl-vpnkit.log --stderr /var/log/wsl-vpnkit.log --exec wsl-vpnkit --start
+ ret=0
+ ret=0
+ exit 0
$ wsl.exe -d wsl-vpnkit --cd /app DEBUG=1 service wsl-vpnkit status
+ test -f /var/run/wsl-vpnkit.pid
+ cat /var/run/wsl-vpnkit.pid
41
+ ps
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    3 root      0:00 {init} plan9 --control-socket 5 --log-level 4 --server-fd 6 --pipe-fd 8 --log-truncate
   42 root      0:00 /init
   43 root      0:00 /init
   44 root      0:00 {wsl-vpnkit} /bin/sh /etc/init.d/wsl-vpnkit status
   46 root      0:00 ps
+ test -f /var/run/wsl-vpnkit.pid
+ cat /var/run/wsl-vpnkit.pid
+ pgrep -P 41
+ ret=1
+ '[' 1 '=' 0 ]
+ echo Service wsl-vpnkit is not running
Service wsl-vpnkit is not running
+ exit 1
$ wsl.exe -d wsl-vpnkit --cd /app DEBUG=1 service wsl-vpnkit start
$ wsl.exe -d wsl-vpnkit --cd /app DEBUG=1 service wsl-vpnkit status
+ test -f /var/run/wsl-vpnkit.pid
+ cat /var/run/wsl-vpnkit.pid
53
+ ps
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    3 root      0:00 {init} plan9 --control-socket 5 --log-level 4 --server-fd 6 --pipe-fd 8 --log-truncate
   54 root      0:00 /init
   55 root      0:00 /init
   56 root      0:00 {wsl-vpnkit} /bin/sh /etc/init.d/wsl-vpnkit status
   58 root      0:00 ps
+ test -f /var/run/wsl-vpnkit.pid
+ cat /var/run/wsl-vpnkit.pid
+ pgrep -P 53
+ ret=1
+ '[' 1 '=' 0 ]
+ echo Service wsl-vpnkit is not running
Service wsl-vpnkit is not running
+ exit 1

@sakai135
Copy link
Owner

Thanks for testing. I'm out of ideas at the moment. I'll look into it again in a few days. Let me know if you figure anything out in the meantime.

@F21
Copy link
Author

F21 commented Nov 17, 2022

Since running it in the foreground works perfectly, the problem must be in the call to start-stop-daemon or the binary is not able to be executed by start-stop-daemon. If we have a way to get more logs or debugging info out of start-stop-daemon, it should help alot.

@F21
Copy link
Author

F21 commented Nov 17, 2022

I think there's a problem with the status script here:

test -f $PID_PATH && pgrep -P $(cat $PID_PATH) &>/dev/null

Debugging it gives:

$ wsl.exe -d wsl-vpnkit --cd /app DEBUG=1 pgrep -P $(cat /var/run/wsl-vpnkit.pid)
cat: /var/run/wsl-vpnkit.pid: No such file or directory
pgrep: option requires an argument: P
BusyBox v1.35.0 (2022-08-01 15:14:44 UTC) multi-call binary.

Usage: pgrep [-flanovx] [-s SID|-P PPID|PATTERN]

Display process(es) selected by regex PATTERN

        -l      Show command name too
        -a      Show command line too
        -f      Match against entire command line
        -n      Show the newest process only
        -o      Show the oldest process only
        -v      Negate the match
        -x      Match whole name (not substring)
        -s      Match session ID (0 for current)
        -P      Match parent process ID
        -u EUID Match against effective UID
        -U UID  Match against UID
$ wsl.exe -d wsl-vpnkit --cd /app DEBUG=1 cat /var/run/wsl-vpnkit.pid
28

I think the problem is not in the status command. I've noticed that the PID file will appear on when running start and then disappear after a minute or so (probably because the binary didn't start properly).

The problem is most likely in the start command.

@F21
Copy link
Author

F21 commented Nov 17, 2022

This is the result of starting it with strace:

$ wsl.exe -d wsl-vpnkit --cd /app DEBUG=1 strace start-stop-daemon --verbose --pidfile /var/run/wsl-vpnkit.pid --make-pidfile --background --stdout /var/log/wsl-vpnkit.log --stderr /var/log/wsl-vpnkit.log --exec wsl-vpnkit --start
execve("/sbin/start-stop-daemon", ["start-stop-daemon", "--verbose", "--pidfile", "/var/run/wsl-vpnkit.pid", "--make-pidfile", "--background", "--stdout", "/var/log/wsl-vpnkit.log", "--stderr", "/var/log/wsl-vpnkit.log", "--exec", "wsl-vpnkit", "--start"], 0x7fffca42a4f0 /* 22 vars */) = 0
arch_prctl(ARCH_SET_FS, 0x7fdb28211b48) = 0
set_tid_address(0x7fdb28211fb0)         = 16
brk(NULL)                               = 0x561d42004000
brk(0x561d42006000)                     = 0x561d42006000
mmap(0x561d42004000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x561d42004000
open("/etc/ld-musl-x86_64.path", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/lib/libeinfo.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/libeinfo.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/libeinfo.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
fstat(3, {st_mode=S_IFREG|0755, st_size=30448, ...}) = 0
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 960) = 960
mmap(NULL, 36864, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fdb28172000
mmap(0x7fdb28174000, 12288, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0x2000) = 0x7fdb28174000
mmap(0x7fdb28177000, 4096, PROT_READ, MAP_PRIVATE|MAP_FIXED, 3, 0x5000) = 0x7fdb28177000
mmap(0x7fdb28179000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x6000) = 0x7fdb28179000
close(3)                                = 0
open("/lib/librc.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/librc.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/librc.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
fstat(3, {st_mode=S_IFREG|0755, st_size=55016, ...}) = 0
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 960) = 960
mmap(NULL, 61440, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fdb28163000
mmap(0x7fdb28166000, 28672, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0x3000) = 0x7fdb28166000
mmap(0x7fdb2816d000, 8192, PROT_READ, MAP_PRIVATE|MAP_FIXED, 3, 0xa000) = 0x7fdb2816d000
mmap(0x7fdb28170000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0xc000) = 0x7fdb28170000
close(3)                                = 0
mprotect(0x7fdb28179000, 4096, PROT_READ) = 0
mprotect(0x7fdb28170000, 4096, PROT_READ) = 0
mprotect(0x7fdb2820e000, 4096, PROT_READ) = 0
mprotect(0x561d40459000, 4096, PROT_READ) = 0
rt_sigprocmask(SIG_UNBLOCK, [RT_1 RT_2], NULL, 8) = 0
rt_sigaction(SIGINT, {sa_handler=0x561d40453406, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fdb281c1c9f}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {sa_handler=0x561d40453406, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fdb281c1c9f}, NULL, 8) = 0
rt_sigaction(SIGTERM, {sa_handler=0x561d40453406, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fdb281c1c9f}, NULL, 8) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdb28162000
mmap(NULL, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdb2815e000
stat("/usr/local/sbin/wsl-vpnkit", 0x7ffecc589850) = -1 ENOENT (No such file or directory)
munmap(0x7fdb2815e000, 16384)           = 0
mmap(NULL, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdb2815e000
stat("/usr/local/bin/wsl-vpnkit", 0x7ffecc589850) = -1 ENOENT (No such file or directory)
munmap(0x7fdb2815e000, 16384)           = 0
mmap(NULL, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdb2815e000
stat("/usr/sbin/wsl-vpnkit", {st_mode=S_IFREG|0755, st_size=5860, ...}) = 0
munmap(0x7fdb28162000, 4096)            = 0
stat("/usr/sbin/wsl-vpnkit", {st_mode=S_IFREG|0755, st_size=5860, ...}) = 0
open("/var/run/wsl-vpnkit.pid", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
ioctl(2, TIOCGWINSZ, {ws_row=49, ws_col=209, ws_xpixel=0, ws_ypixel=0}) = 0
ioctl(2, TIOCGWINSZ, {ws_row=49, ws_col=209, ws_xpixel=0, ws_ypixel=0}) = 0
writev(2, [{iov_base=" \33[1m\33[33m*\33[m ", iov_len=15}, {iov_base=NULL, iov_len=0}], 2 * ) = 15
writev(2, [{iov_base="", iov_len=0}, {iov_base=NULL, iov_len=0}], 2) = 0
writev(2, [{iov_base="start-stop-daemon: fopen `/var/r"..., iov_len=77}, {iov_base=NULL, iov_len=0}], 2start-stop-daemon: fopen `/var/run/wsl-vpnkit.pid': No such file or directory) = 77
ioctl(2, TIOCGWINSZ, {ws_row=49, ws_col=209, ws_xpixel=0, ws_ypixel=0}) = 0
writev(2, [{iov_base="", iov_len=0}, {iov_base="\33[K", iov_len=3}], 2) = 3
ioctl(1, TIOCGWINSZ, {ws_row=49, ws_col=209, ws_xpixel=0, ws_ypixel=0}) = 0
writev(1, [{iov_base="", iov_len=0}, {iov_base="\n", iov_len=1}], 2
) = 1
open("/proc", O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_DIRECTORY) = 3
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdb2815c000
stat("/proc/self/status", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
open("/proc/self/status", O_RDONLY|O_LARGEFILE) = 4
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdb28162000
read(4, "Name:\tstart-stop-daem\nUmask:\t002"..., 1024) = 1024
read(4, "_allowed:\t1\nMems_allowed_list:\t0"..., 1024) = 92
read(4, "", 1024)                       = 0
close(4)                                = 0
stat("/proc/self/ns/pid", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
readlink("/proc/self/ns/pid", "pid:[4026532459]", 29) = 16
getdents64(3, 0x7fdb2815c078 /* 62 entries */, 2048) = 1808
getdents64(3, 0x7fdb2815c078 /* 0 entries */, 2048) = 0
munmap(0x7fdb28162000, 4096)            = 0
close(3)                                = 0
munmap(0x7fdb2815c000, 8192)            = 0
ioctl(1, TIOCGWINSZ, {ws_row=49, ws_col=209, ws_xpixel=0, ws_ypixel=0}) = 0
ioctl(1, TIOCGWINSZ, {ws_row=49, ws_col=209, ws_xpixel=0, ws_ypixel=0}) = 0
ioctl(1, TIOCGWINSZ, {ws_row=49, ws_col=209, ws_xpixel=0, ws_ypixel=0}) = 0
ioctl(1, TIOCGWINSZ, {ws_row=49, ws_col=209, ws_xpixel=0, ws_ypixel=0}) = 0
ioctl(1, TIOCGWINSZ, {ws_row=49, ws_col=209, ws_xpixel=0, ws_ypixel=0}) = 0
writev(1, [{iov_base=" \33[1m\33[32m*\33[m Detaching to star"..., iov_len=53}, {iov_base="\n", iov_len=1}], 2 * Detaching to start `wsl-vpnkit' ...
) = 54
unlink("/var/run/wsl-vpnkit.pid")       = -1 ENOENT (No such file or directory)
rt_sigaction(SIGCHLD, {sa_handler=0x561d40453406, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fdb281c1c9f}, NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1 RT_2], [], 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[], ~[KILL STOP RTMIN RT_1 RT_2], 8) = 0
fork()                                  = 17
rt_sigprocmask(SIG_SETMASK, ~[KILL STOP RTMIN RT_1 RT_2], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
open("/etc/rc.conf", O_RDONLY|O_LARGEFILE) = 3
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdb28162000
read(3, "# Global OpenRC configuration se"..., 1024) = 1024
read(3, "dency\n# or do we want all of the"..., 1024) = 1024
read(3, " allows net.wlan and any service"..., 1024) = 1024
read(3, "# then child barfs on a configur"..., 1024) = 1024
read(3, "wait for a remote server to resp"..., 1024) = 1024
read(3, "a per service basis, like the ni"..., 1024) = 1024
read(3, "ies.\n# This is mainly used for s"..., 1024) = 1024
read(3, "mode Linux\n# \"vserver\"        - "..., 1024) = 1024
read(3, "being used.\n# Controllers listed"..., 1024) = 1024
read(3, "tting turns on the memory.use_hi"..., 1024) = 1024
read(3, "blkio=\"\"\n\n# Set the cpu controll"..., 1024) = 1024
read(3, "n\n# /etc/conf.d/<service>.\n# To "..., 1024) = 1024
read(3, " to all processes in the\n# cgrou"..., 1024) = 58
read(3, "", 1024)                       = 0
close(3)                                = 0
munmap(0x7fdb28162000, 4096)            = 0
stat("/etc/conf.d/rc", 0x7ffecc589750)  = -1 ENOENT (No such file or directory)
open("/etc/rc.conf.d", O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_DIRECTORY) = -1 ENOENT (No such file or directory)
stat("/proc/cmdline", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
open("/proc/cmdline", O_RDONLY|O_LARGEFILE) = 3
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdb28162000
read(3, "initrd=\\initrd.img WSL_ROOT_INIT"..., 1024) = 106
close(3)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++

The strange thing is that if I start it with strace, it starts and runs properly. status says it's running and I can stop it with stop. But if I start it using start, it fails to start properly.

If I edit the service file to use strace, it works properly:

start() {
    _debug_check
    strace start-stop-daemon \
        --verbose \
        --pidfile $PID_PATH \
        --make-pidfile \
        --background \
        --stdout $LOG_PATH \
        --stderr $LOG_PATH \
        --exec $EXECUTABLE \
        ${DEBUG+--verbose} \
        --start
    ret=$?
}

@F21
Copy link
Author

F21 commented Nov 17, 2022

Upon further investigation, it seems that the executable can't start in background mode.

If I remove these lines, it starts properly: https://github.com/sakai135/wsl-vpnkit/blob/main/distro/scripts/wsl-vpnkit.service#L23-L25

Unfortunately that starts it in the foreground.

@sakai135
Copy link
Owner

The issue does seem to be on the start command. WSL is somehow deciding that the wsl-vpnkit process is abandoned and kills the process and/or the wsl-vpnkit distro.

What I'm mystified by is that v0.3.4 uses nearly the same start-stop-daemon command, which works in your environment.

Can you try maybe replacing $EXECUTABLE with wsl-vpnkit?

@F21
Copy link
Author

F21 commented Nov 17, 2022

Same result (still doesn't start properly) if I replace $EXECUTABLE with wsl-vpnkit. I also tried replacing the contents of wsl-vpnkit with something simple like echo "hello!" exit 0. The strange thing is that the first time I did that, I can see hello! printed in the logs. However, if I stop and start the service again, there are no new entries in the logs.

@josesa-xx
Copy link
Contributor

josesa-xx commented Nov 20, 2022

Please notice that start-stop-daemon is buggy in the way it doesn't get consistently the pid of the daemon running in background, so the option in start --wait 1000 --progress was the way I found to get proper pid of forked process to be written in file in order to have consistent control with multiple restarts (run ./test.sh to see)

Likewise when doing stop it doesn't quite stop the background processes so I'm killing the ones named wsl-vm, so that the windows process wsl-gvproxy stops also.

v0.3.4 and all previous releases never worked across restarts because of that.

However that sill doesn't look sufficient as it doesn't kill them fast enough, while still leaving wsl-gvproxy.exe processes laying around, so I'm replacing that part with

    # kill sub processes if still running
    while pgrep -a wsl-vm >/dev/null; do pkill -9 wsl-vm; sleep 0.1; done
    while pgrep -a wsl-gvproxy >/dev/null; do pkill -9 wsl-gvproxy; sleep 0.1; done
    ${DEBUG+ps}

On v0.3.5 I can keep doing multiple restarts and the processes are almost cleanly shutdown and started.
Almost because on Win11 21H2 with wsl 1.0.0.0 (from store) it never breaks, however in Win10 21H2 with system wsl it breaks occasionally (the process wsl-gvproxy doesn't start on windows side, although it exists inside the wsl-vpnkit distro), and I see write failed 32: Broken pipemessages in the log.

After a few service restarts eventually it gets started on windows side and starts working again.

@F21
Copy link
Author

F21 commented Nov 20, 2022

I tried adding --wait 1000 --progress to the start() function in a074e1c, but it still doesn't work for me.

status will report it as running after starting, but after the timeout period is elapsed, it will go back to not running:

$ wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit status
Service wsl-vpnkit is not running
$ wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit start
$ wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit status
Service wsl-vpnkit is running
$ wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit status
Service wsl-vpnkit is running
$ wsl.exe -d wsl-vpnkit --cd /app ps -a
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    3 root      0:00 {init} plan9 --control-socket 5 --log-level 4 --server-fd 6 --pipe-fd 8 --log-truncate
   10 root      0:00 /init
   13 root      0:00 {wsl-vpnkit} /bin/sh /usr/sbin/wsl-vpnkit progress --start
   97 root      0:00 ping -4 -c 1 172.29.0.1
  108 root      0:00 /init
  109 root      0:00 /init
  110 root      0:00 ps -a
$ wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit status
Service wsl-vpnkit is running
$ wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit start
 * start-stop-daemon: wsl-vpnkit is already running
$ wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit status
Service wsl-vpnkit is running
$ wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit status
Service wsl-vpnkit is not running
$ wsl.exe -d wsl-vpnkit --cd /app ps -a
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    3 root      0:00 {init} plan9 --control-socket 5 --log-level 4 --server-fd 6 --pipe-fd 8 --log-truncate
  186 root      0:00 /init
  187 root      0:00 /init
  188 root      0:00 ps -a

Perhaps the best way would be to do the forking within wsl-vpnkit, rather than relying on start-stop-daemon's --background?

@josesa-xx
Copy link
Contributor

Add also the block to kill the processes in the stop, those 2 will allow consistent restart

Also run with
wsl.exe -d wsl-vpnkit --cd /app DEBUG=1 service wsl-vpnkit restart

@F21
Copy link
Author

F21 commented Nov 20, 2022

Still seeing the same issue though. In my case, it's that wsl-vpnkit doesn't even start properly, it will eventually fail and go back to "not started" status.

After adding --progress --wait 1000 and the 2 lines to shutdown, I am able to get some logs:

$ wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit status
Service wsl-vpnkit is not running
$ wsl.exe -d wsl-vpnkit --cd /app rm /var/log/wsl-vpnkit.log
$ wsl.exe -d wsl-vpnkit --cd /app cat /var/log/wsl-vpnkit.log
cat: can't open '/var/log/wsl-vpnkit.log': No such file or directory
$ wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit start
$ wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit status
Service wsl-vpnkit is running
$ wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit status
Service wsl-vpnkit is running
$ wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit status
Service wsl-vpnkit is running
$ wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit status
Service wsl-vpnkit is running
$ wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit status
Service wsl-vpnkit is running
$ wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit status
Service wsl-vpnkit is running
$ wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit status
Service wsl-vpnkit is running
$ wsl.exe -d wsl-vpnkit --cd /app ps -a
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    3 root      0:00 {init} plan9 --control-socket 5 --log-level 4 --server-fd 6 --pipe-fd 8 --log-truncate
  227 root      0:00 /init
  230 root      0:00 {wsl-vpnkit} /bin/sh /usr/sbin/wsl-vpnkit progress --start
  380 root      0:00 wget --spider -q https://example.com
  381 root      0:00 /init
  382 root      0:00 /init
  383 root      0:00 ps -a
$ wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit status
Service wsl-vpnkit is not running
$ wsl.exe -d wsl-vpnkit --cd /app cat /var/log/wsl-vpnkit.log
starting wsl-vpnkit
cleaning up iptables...
iptables cleanup done
restoring WSL 2 ip route...
ip: RTNETLINK answers: File exists
restored WSL 2 ip route
wsl-gvproxy.exe exists at /mnt/c/Users/user/wsl-vpnkit/wsl-gvproxy.exe
starting gvproxy at /mnt/c/Users/user/wsl-vpnkit/wsl-gvproxy.exe...
started gvproxy
waiting for dhcp...
time="2022-11-20T06:01:46Z" level=info msg="waiting for packets..."
udhcpc: started, v1.35.0
time="2022-11-20T17:01:46+11:00" level=info msg="new connection from remote to 15980"
udhcpc: broadcasting discover
udhcpc: broadcasting select for 192.168.67.3, server 192.168.67.1
udhcpc: lease of 192.168.67.3 obtained from 192.168.67.1, lease time 3600
default via 192.168.67.1 dev eth1  metric 214
dhcp completed
removing WSL 2 ip route...
removed WSL 2 ip route
adding rules to iptables...
iptables done
write failed 32: Broken pipe
time="2022-11-20T06:01:46Z" level=fatal msg="cannot read size from socket: EOF"
check: ❌ ping fail to IPv4 WSL 2 gateway / Windows host (172.29.0.1)
ping: sendto: Network unreachable
check: ❌ ping fail to IPv4 Windows host (192.168.67.2)
ping: sendto: Network unreachable
check: ❌ ping fail to IPv4 gateway (192.168.67.1)
nslookup: can't connect to remote host (192.168.67.1): Network unreachable
check: ❌ nslookup fail for example.com A using 192.168.67.1
nslookup: can't connect to remote host (192.168.67.1): Network unreachable
check: ❌ nslookup fail for example.com A using 192.168.67.1
nslookup: write to '172.29.0.1': Network unreachable
check: ❌ nslookup fail for example.com A using 172.29.0.1
nslookup: can't connect to remote host (1.1.1.1): Network unreachable
check: ❌ nslookup fail for example.com A using 1.1.1.1
ping: bad address 'example.com'
check: ❌ ping fail to IPv4 external host (example.com)
nslookup: can't connect to remote host (192.168.67.1): Network unreachable
check: ❌ nslookup fail for example.com AAAA using 192.168.67.1
nslookup: can't connect to remote host (192.168.67.1): Network unreachable
check: ❌ nslookup fail for example.com AAAA using 192.168.67.1
nslookup: write to '172.29.0.1': Network unreachable
check: ❌ nslookup fail for example.com AAAA using 172.29.0.1
nslookup: can't connect to remote host (1.1.1.1): Network unreachable
check: ❌ nslookup fail for example.com AAAA using 1.1.1.1
ping: bad address 'example.com'
check: ➖ ping fail to IPv6 external host (example.com)
wget: bad address 'example.com'
check: ❌ wget fail for https://example.com
cleaning up iptables...
iptables cleanup done
restoring WSL 2 ip route...
ip: RTNETLINK answers: File exists
restored WSL 2 ip route
stopped wsl-vpnkit

@josesa-xx
Copy link
Contributor

Can you send the output of wsl.exe -d wsl-vpnkit --cd /app DEBUG=1 service wsl-vpnkit restart ?

Run it a couple of times (3 or 4 in rapid succession) to see if it makes a difference.

@F21
Copy link
Author

F21 commented Nov 20, 2022

Here's what I got:

$ wsl.exe -d wsl-vpnkit --cd /app DEBUG=1 service wsl-vpnkit restart
+ test -f /var/run/wsl-vpnkit.pid
+ ps
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    3 root      0:00 {init} plan9 --control-socket 5 --log-level 4 --server-fd 6 --pipe-fd 8 --log-truncate
    6 root      0:00 /init
    7 root      0:00 /init
    8 root      0:00 {wsl-vpnkit} /bin/sh /etc/init.d/wsl-vpnkit restart
    9 root      0:00 ps
+ start-stop-daemon --pidfile /var/run/wsl-vpnkit.pid --retry 2 --verbose --stop
 * start-stop-daemon: fopen `/var/run/wsl-vpnkit.pid': No such file or directory
+ ret=0
+ pgrep -a wsl-vm
+ pgrep -a wsl-gvproxy
+ ps
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    3 root      0:00 {init} plan9 --control-socket 5 --log-level 4 --server-fd 6 --pipe-fd 8 --log-truncate
    6 root      0:00 /init
    7 root      0:00 /init
    8 root      0:00 {wsl-vpnkit} /bin/sh /etc/init.d/wsl-vpnkit restart
   13 root      0:00 ps
+ start
+ _debug_check
+ '[' -n 1 ]
+ set -x
+ test -f /var/run/wsl-vpnkit.pid
+ ps
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    3 root      0:00 {init} plan9 --control-socket 5 --log-level 4 --server-fd 6 --pipe-fd 8 --log-truncate
    6 root      0:00 /init
    7 root      0:00 /init
    8 root      0:00 {wsl-vpnkit} /bin/sh /etc/init.d/wsl-vpnkit restart
   14 root      0:00 ps
+ start-stop-daemon --pidfile /var/run/wsl-vpnkit.pid --make-pidfile --background --stdout /var/log/wsl-vpnkit.log --stderr /var/log/wsl-vpnkit.log --exec wsl-vpnkit --verbose --wait 1000 -- progress --start
 * start-stop-daemon: fopen `/var/run/wsl-vpnkit.pid': No such file or directory
 * Detaching to start `wsl-vpnkit' ...
+ ret=0
+ ret=0
+ exit 0
$ wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit status
Service wsl-vpnkit is running
$ wsl.exe -d wsl-vpnkit --cd /app DEBUG=1 service wsl-vpnkit restart
+ test -f /var/run/wsl-vpnkit.pid
+ cat /var/run/wsl-vpnkit.pid
16
+ ps
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    3 root      0:00 {init} plan9 --control-socket 5 --log-level 4 --server-fd 6 --pipe-fd 8 --log-truncate
    7 root      0:00 /init
   16 root      0:00 {wsl-vpnkit} /bin/sh /usr/sbin/wsl-vpnkit progress --start
   98 root      0:00 ping -4 -c 1 172.29.0.1
  104 root      0:00 /init
  105 root      0:00 /init
  106 root      0:00 {wsl-vpnkit} /bin/sh /etc/init.d/wsl-vpnkit restart
  108 root      0:00 ps
+ start-stop-daemon --pidfile /var/run/wsl-vpnkit.pid --retry 2 --verbose --stop
 * Will stop PID 16
 * Sending signal 15 to PID 16 ...                                                                                                                                                                         [ ok ]
+ ret=0
+ pgrep -a wsl-vm
+ pgrep -a wsl-gvproxy
+ ps
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    3 root      0:00 {init} plan9 --control-socket 5 --log-level 4 --server-fd 6 --pipe-fd 8 --log-truncate
    7 root      0:00 /init
   98 root      0:00 ping -4 -c 1 172.29.0.1
  104 root      0:00 /init
  105 root      0:00 /init
  106 root      0:00 {wsl-vpnkit} /bin/sh /etc/init.d/wsl-vpnkit restart
  112 root      0:00 ps
+ start
+ _debug_check
+ '[' -n 1 ]
+ set -x
+ test -f /var/run/wsl-vpnkit.pid
+ ps
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    3 root      0:00 {init} plan9 --control-socket 5 --log-level 4 --server-fd 6 --pipe-fd 8 --log-truncate
    7 root      0:00 /init
   98 root      0:00 ping -4 -c 1 172.29.0.1
  104 root      0:00 /init
  105 root      0:00 /init
  106 root      0:00 {wsl-vpnkit} /bin/sh /etc/init.d/wsl-vpnkit restart
  113 root      0:00 ps
+ start-stop-daemon --pidfile /var/run/wsl-vpnkit.pid --make-pidfile --background --stdout /var/log/wsl-vpnkit.log --stderr /var/log/wsl-vpnkit.log --exec wsl-vpnkit --verbose --wait 1000 -- progress --start
 * start-stop-daemon: fopen `/var/run/wsl-vpnkit.pid': No such file or directory
 * Detaching to start `wsl-vpnkit' ...
+ ret=0
+ ret=0
+ exit 0
$ wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit status
Service wsl-vpnkit is running
$ wsl.exe -d wsl-vpnkit --cd /app DEBUG=1 service wsl-vpnkit restart
+ test -f /var/run/wsl-vpnkit.pid
+ cat /var/run/wsl-vpnkit.pid
115
+ ps
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    3 root      0:00 {init} plan9 --control-socket 5 --log-level 4 --server-fd 6 --pipe-fd 8 --log-truncate
  105 root      0:00 /init
  115 root      0:00 {wsl-vpnkit} /bin/sh /usr/sbin/wsl-vpnkit progress --start
  202 root      0:00 ping -4 -c 1 172.29.0.1
  208 root      0:00 /init
  209 root      0:00 /init
  210 root      0:00 {wsl-vpnkit} /bin/sh /etc/init.d/wsl-vpnkit restart
  212 root      0:00 ps
+ start-stop-daemon --pidfile /var/run/wsl-vpnkit.pid --retry 2 --verbose --stop
 * Will stop PID 115
 * Sending signal 15 to PID 115 ...                                                                                                                                                                        [ ok ]
+ ret=0
+ pgrep -a wsl-vm
+ pgrep -a wsl-gvproxy
+ ps
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    3 root      0:00 {init} plan9 --control-socket 5 --log-level 4 --server-fd 6 --pipe-fd 8 --log-truncate
  105 root      0:00 /init
  202 root      0:00 ping -4 -c 1 172.29.0.1
  208 root      0:00 /init
  209 root      0:00 /init
  210 root      0:00 {wsl-vpnkit} /bin/sh /etc/init.d/wsl-vpnkit restart
  216 root      0:00 ps
+ start
+ _debug_check
+ '[' -n 1 ]
+ set -x
+ test -f /var/run/wsl-vpnkit.pid
+ ps
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    3 root      0:00 {init} plan9 --control-socket 5 --log-level 4 --server-fd 6 --pipe-fd 8 --log-truncate
  105 root      0:00 /init
  202 root      0:00 ping -4 -c 1 172.29.0.1
  208 root      0:00 /init
  209 root      0:00 /init
  210 root      0:00 {wsl-vpnkit} /bin/sh /etc/init.d/wsl-vpnkit restart
  217 root      0:00 ps
+ start-stop-daemon --pidfile /var/run/wsl-vpnkit.pid --make-pidfile --background --stdout /var/log/wsl-vpnkit.log --stderr /var/log/wsl-vpnkit.log --exec wsl-vpnkit --verbose --wait 1000 -- progress --start
 * start-stop-daemon: fopen `/var/run/wsl-vpnkit.pid': No such file or directory
 * Detaching to start `wsl-vpnkit' ...
+ ret=0
+ ret=0
+ exit 0
$ wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit status
Service wsl-vpnkit is running
$ wsl.exe -d wsl-vpnkit --cd /app DEBUG=1 service wsl-vpnkit restart
+ test -f /var/run/wsl-vpnkit.pid
+ cat /var/run/wsl-vpnkit.pid
219
+ ps
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    3 root      0:00 {init} plan9 --control-socket 5 --log-level 4 --server-fd 6 --pipe-fd 8 --log-truncate
  105 root      0:00 /init
  202 root      0:00 ping -4 -c 1 172.29.0.1
  209 root      0:00 /init
  219 root      0:00 {wsl-vpnkit} /bin/sh /usr/sbin/wsl-vpnkit progress --start
  319 root      0:00 ping -4 -c 1 example.com
  320 root      0:00 /init
  321 root      0:00 /init
  322 root      0:00 {wsl-vpnkit} /bin/sh /etc/init.d/wsl-vpnkit restart
  324 root      0:00 ps
+ start-stop-daemon --pidfile /var/run/wsl-vpnkit.pid --retry 2 --verbose --stop
 * Will stop PID 219
 * Sending signal 15 to PID 219 ...                                                                                                                                                                        [ ok ]
+ ret=0
+ pgrep -a wsl-vm
+ pgrep -a wsl-gvproxy
+ ps
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    3 root      0:00 {init} plan9 --control-socket 5 --log-level 4 --server-fd 6 --pipe-fd 8 --log-truncate
  105 root      0:00 /init
  202 root      0:00 ping -4 -c 1 172.29.0.1
  209 root      0:00 /init
  319 root      0:00 ping -4 -c 1 example.com
  320 root      0:00 /init
  321 root      0:00 /init
  322 root      0:00 {wsl-vpnkit} /bin/sh /etc/init.d/wsl-vpnkit restart
  328 root      0:00 ps
+ start
+ _debug_check
+ '[' -n 1 ]
+ set -x
+ test -f /var/run/wsl-vpnkit.pid
+ ps
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    3 root      0:00 {init} plan9 --control-socket 5 --log-level 4 --server-fd 6 --pipe-fd 8 --log-truncate
  105 root      0:00 /init
  202 root      0:00 ping -4 -c 1 172.29.0.1
  209 root      0:00 /init
  319 root      0:00 ping -4 -c 1 example.com
  320 root      0:00 /init
  321 root      0:00 /init
  322 root      0:00 {wsl-vpnkit} /bin/sh /etc/init.d/wsl-vpnkit restart
  329 root      0:00 ps
+ start-stop-daemon --pidfile /var/run/wsl-vpnkit.pid --make-pidfile --background --stdout /var/log/wsl-vpnkit.log --stderr /var/log/wsl-vpnkit.log --exec wsl-vpnkit --verbose --wait 1000 -- progress --start
 * start-stop-daemon: fopen `/var/run/wsl-vpnkit.pid': No such file or directory
 * Detaching to start `wsl-vpnkit' ...
+ ret=0
+ ret=0
+ exit 0
$ wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit status
Service wsl-vpnkit is running
$ wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit status
Service wsl-vpnkit is running
$ wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit status
Service wsl-vpnkit is running
$ wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit status
Service wsl-vpnkit is running
$ wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit status
Service wsl-vpnkit is running
$ wsl.exe -d wsl-vpnkit --cd /app ps -a
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    3 root      0:00 {init} plan9 --control-socket 5 --log-level 4 --server-fd 6 --pipe-fd 8 --log-truncate
  498 root      0:00 /init
  499 root      0:00 /init
  500 root      0:00 ps -a
$ wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit status
Service wsl-vpnkit is not running

@josesa-xx
Copy link
Contributor

josesa-xx commented Nov 20, 2022

Also do the same round as before where you start the service and keep checking the status, but this time run the status with additional debug
wsl.exe -d wsl-vpnkit --cd /app DEBUG=1 service wsl-vpnkit status

From powershell side see if wsl-gvproxy.exe process is running
ps -ea silent wsl-gvproxy

@F21
Copy link
Author

F21 commented Nov 20, 2022

Judging from the logs, I think --wait 1000 works around start-stop-daemon's bugginess. However, I think the problem is that wsl-vpnkit itself fails to start due to the broken pipe from my previous logs:

rite failed 32: Broken pipe
time="2022-11-20T06:01:46Z" level=fatal msg="cannot read size from socket: EOF"
check: ❌ ping fail to IPv4 WSL 2 gateway / Windows host (172.29.0.1)
ping: sendto: Network unreachable
check: ❌ ping fail to IPv4 Windows host (192.168.67.2)
ping: sendto: Network unreachable
check: ❌ ping fail to IPv4 gateway (192.168.67.1)
nslookup: can't connect to remote host (192.168.67.1): Network unreachable
check: ❌ nslookup fail for example.com A using 192.168.67.1
nslookup: can't connect to remote host (192.168.67.1): Network unreachable
check: ❌ nslookup fail for example.com A using 192.168.67.1
nslookup: write to '172.29.0.1': Network unreachable
check: ❌ nslookup fail for example.com A using 172.29.0.1
nslookup: can't connect to remote host (1.1.1.1): Network unreachable
check: ❌ nslookup fail for example.com A using 1.1.1.1
ping: bad address 'example.com'
check: ❌ ping fail to IPv4 external host (example.com)
nslookup: can't connect to remote host (192.168.67.1): Network unreachable
check: ❌ nslookup fail for example.com AAAA using 192.168.67.1
nslookup: can't connect to remote host (192.168.67.1): Network unreachable
check: ❌ nslookup fail for example.com AAAA using 192.168.67.1
nslookup: write to '172.29.0.1': Network unreachable
check: ❌ nslookup fail for example.com AAAA using 172.29.0.1
nslookup: can't connect to remote host (1.1.1.1): Network unreachable
check: ❌ nslookup fail for example.com AAAA using 1.1.1.1
ping: bad address 'example.com'
check: ➖ ping fail to IPv6 external host (example.com)
wget: bad address 'example.com'
check: ❌ wget fail for https://example.com
cleaning up iptables...
iptables cleanup done
restoring WSL 2 ip route...
ip: RTNETLINK answers: File exists
restored WSL 2 ip route
stopped wsl-vpnkit

I just started it in the foreground again, and there was no broken pipe and it started properly. So I think there's something causing the broken pipe (permissions maybe?) when starting it using start-stop-daemon.

@josesa-xx
Copy link
Contributor

I've re-written the service init script without start-stop-daemon see if this works for you #169

@F21
Copy link
Author

F21 commented Nov 21, 2022

After a bit of poking around, I noticed that the EOF error is from https://github.com/sakai135/wsl-vpnkit/blob/main/cmd/vm/main.go#L151

Sometimes, reading the size results in an EOF. I tried changing https://github.com/sakai135/wsl-vpnkit/blob/main/cmd/vm/main.go#L152 to ignore the EOF by prepending it with:

if err != nil && errors.Is(err, io.EOF) {
  continue
}

But that causes issues with the Tx() go routine. The code seems to be identical as the one in gvisor-tap-vsock as well: https://github.com/containers/gvisor-tap-vsock/blob/main/cmd/vm/main_linux.go, so not sure if the code is the problem or gvisor itself.

The strange thing is that running in the foreground always works reliably, but using the service causes issues even with #169, which removes the start-stop-daemon.

@F21 F21 mentioned this issue Dec 1, 2022
@F21
Copy link
Author

F21 commented Dec 6, 2022

Closed per #169

@F21 F21 closed this as completed Dec 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants