Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Routing is broken for some VPN clients #3091

Closed
jandubois opened this issue Oct 5, 2022 · 12 comments
Closed

Routing is broken for some VPN clients #3091

jandubois opened this issue Oct 5, 2022 · 12 comments
Assignees
Labels
area/vpn kind/bug Something isn't working platform/windows triage/needs-information Further information is requested

Comments

@jandubois
Copy link
Member

jandubois commented Oct 5, 2022

Actual Behavior

Rancher Desktop Host Resolver currently acts as a stub resolver within Rancher Desktop. While the DNS works fine some users are still encountering connection issues that limit user's network connectivity when running some VPN clients. The core of the issue is related to the routing that prevents packets from being routed beyond WSL network interface.

These issues are directly related to WSL infrastructure and are actively being tracked by the WSL project.

Below is a list of known VPN clients that may encounter routing issues over VPN.

PulseSecure
Cisco AnnyConnect
Checkpoint

Steps to Reproduce

Connect to VPN using Cisco Anyconnect and then from either a shell inside the distro try to resolve the domain within your VPN network (e.g private registry)

nslookup registry.domain.com

You will notice that name resolution works fine. However, trying to curl, ping or traceroute, you will encounter that no packets are getting passed WSL IP interface.

ping registry.domain.com

Result

No connection is made.

Expected Behavior

Connection works.

Additional Information

No response

Rancher Desktop Version

1.6.0-RC2

Rancher Desktop K8s Version

n/a

Which container engine are you using?

containerd (nerdctl)

What operating system are you using?

Windows

Operating System / Build Version

Windows 10

What CPU architecture are you using?

x64

Linux only: what package format did you use to install Rancher Desktop?

No response

Windows User Only

Cisco AnnyConnect
PulseSecure
Checkpoint

@jandubois jandubois added the kind/bug Something isn't working label Oct 5, 2022
@jandubois jandubois added the triage/next-candidate Discuss if it should be moved to "Next" milestone label Oct 5, 2022
@Nino-K Nino-K changed the title Routing is broken when using Cisco Anyconnect Routing is broken for some VPN clients Oct 5, 2022
@Nino-K
Copy link
Member

Nino-K commented Oct 5, 2022

After 1.6 release, users who are still experiencing VPN connectivity issues should attempt the following workaround.

Try our latest 1.6 release (although the following steps should work on 1.5 and 1.5.1).
The following steps assume the VPN client is already up and running.

Set the metric on WSL interface to the lowest possible to emulate VPN interface behavior.

Get-NetIPInterface -InterfaceAlias "vEthernet (WSL)" | Set-NetIPInterface -InterfaceMetric 1

Now find the interface name for the VPN, mine is “Ethernet 2”, set the metric on the VPN interface to a higher value by running:

Get-NetIPInterface -InterfaceAlias "Ethernet 2" | Set-NetIPInterface -InterfaceMetric 5001

Note: If you have not installed Rancher Desktop you can do so at this point.

Now try curling one of the domains that you would like to access within your VPN from inside WSL:

wsl -d rancher-dekstop
curl myPrivateRegistry.Domain.ca

Or you can try pulling a docker image from your private registry.

Please make sure to restore the WLS interface metric if you are planning to stop the VPN client by running the following:

Get-NetIPInterface -InterfaceAlias "vEthernet (WSL)" | Set-NetIPInterface -InterfaceMetric 0

Please note, setting the metric value to 0 means Automatic Metric.

If the above steps did not work for a user please provide the following information so that we can further investigate this issue.

Print routes for WSL interface that is “vEthernet (WSL)”

route print [WSL_IP]

Print route from VPN interface, on my machine that is “Ethernet 2”

route print [VPN_IP]

@gaktive gaktive added triage/needs-information Further information is requested and removed triage/next-candidate Discuss if it should be moved to "Next" milestone labels Oct 18, 2022
@jcageman
Copy link

jcageman commented Oct 19, 2022

the workaround worked for me. This morning i suddenly didn't have connectivity anymore from within wsl (dns works fine).
I found out that what breaks it is disconnecting and re-connecting to the vpn. You need to re-apply the above workaround everytime you change something (vpn / rancher-desktop). The commands are not remembered after reboots either. In my case it showed as if it did, but they didn't.

ps. while it was working i've found out that to reach localhost from a container you need to use 172.17.0.1 to reach localhost, host.docker.internal didn't work. it's fixed in v.1.6.1 :)

@byjrack
Copy link

byjrack commented Oct 26, 2022

And for those that can talk to your central team. The next hop on the WSL interface is 0.0.0.0 (the on-link gateway) so some solution providers may allow for handling routes that don't leave the host differently than a standard route. Basically the question becomes can the provider exclude hyper-v networks from their route control policies. Not all can and split tunnels make this harder, but more are seeing the need for these type of controls given the increase in virtualization.

Rather than moving the VPN interface down the stack I just remove the broken route on the WSL interface (in my case its the subnetIP/.0 route that gets set to 1/vpn gateway so traffic never gets to the virtual gateway). Then restart the WSL interface and it is back in action on its higher metric matched with the gateway. Another option that has worked is leaving the route as it is and just doing a route change to replace the VPN gateway with the WSL gateway IP. Now if your route control policy doesn't allow for adjusting existing local routes adjusting the vpn route higher, fixing the gateway, or removing the bad route from WSL will not work anyways. I have tried to toy with ICS (handles the NAT for WSL) but haven't had success compared to the simple route ops.

Having folks do this is a challenge though because of all all the cases where the VPN will reapply the broken route policy and the fact it needs an Elevated prompt. Yes creating a self-service fix or a scheduled task can help mask the problem for many users, but it's far from ideal. So we all need to be working with our VPN suppliers on controls we can use to exclude these virtual networks from policies in the first place.

@EvertonSA
Copy link

After 1.6 release, users who are still experiencing VPN connectivity issues should attempt the following workaround.

Try our latest 1.6 release (although the following steps should work on 1.5 and 1.5.1). The following steps assume the VPN client is already up and running.

Set the metric on WSL interface to the lowest possible to emulate VPN interface behavior.

Get-NetIPInterface -InterfaceAlias "vEthernet (WSL)" | Set-NetIPInterface -InterfaceMetric 1

Now find the interface name for the VPN, mine is “Ethernet 2”, set the metric on the VPN interface to a higher value by running:

Get-NetIPInterface -InterfaceAlias "Ethernet 2" | Set-NetIPInterface -InterfaceMetric 5001

Note: If you have not installed Rancher Desktop you can do so at this point.

Now try curling one of the domains that you would like to access within your VPN from inside WSL:

wsl -d rancher-dekstop curl myPrivateRegistry.Domain.ca

Or you can try pulling a docker image from your private registry.

Please make sure to restore the WLS interface metric if you are planning to stop the VPN client by running the following:

Get-NetIPInterface -InterfaceAlias "vEthernet (WSL)" | Set-NetIPInterface -InterfaceMetric 0

Please note, setting the metric value to 0 means Automatic Metric.

If the above steps did not work for a user please provide the following information so that we can further investigate this issue.

Print routes for WSL interface that is “vEthernet (WSL)”

route print [WSL_IP]

Print route from VPN interface, on my machine that is “Ethernet 2”

route print [VPN_IP]

Worth mentioning admin access is required. Not always the case on corporate environments.

@byjrack
Copy link

byjrack commented Jan 3, 2023

I know in our environment our VPN provider also keeps an eye on route metric as well and will restore it since it bypasses that force tunnel control. I need to test again with 1.7.0, but using wsl-vpnkit and a prov script has been the most reliable option for us to keep internal routes out of the Host route table.

Side benefit is no admin needed.

Minor note... Just retested on 1.7.0 and everything still seems to work as expected. the ProvScript just runs the status || start command from the README and flows seem to be very happy. I thought the Rancher team was poking at including the gvisor-tap approach as a feature flag, but not 100%.

Note this doesn't fix the issue where you have a proxy config in scope and K8s doesn't complete its start routine.

@EvertonSA
Copy link

I know in our environment our VPN provider also keeps an eye on route metric as well and will restore it since it bypasses that force tunnel control. I need to test again with 1.7.0, but using wsl-vpnkit and a prov script has been the most reliable option for us to keep internal routes out of the Host route table.

Side benefit is no admin needed.

Minor note... Just retested on 1.7.0 and everything still seems to work as expected. the ProvScript just runs the status || start command from the README and flows seem to be very happy. I thought the Rancher team was poking at including the gvisor-tap approach as a feature flag, but not 100%.

Note this doesn't fix the issue where you have a proxy config in scope and K8s doesn't complete its start routine.

Hi @byjrack , what do you mean by "ProvScript"? Never seem this term before. Is it like "provisioning scripts"?

I managed to get it working with wsl-vpnkit as well on 1.7.0 and looks very promising, but K8s doesn't complete its start routine as well:

image

If I change the kube/config to localhost, I'm able to use kubernetes, so not a big deal (but not nice). I'll be using OpenLens, so not having rancher dashboard is not a big deal (for me).

I tried to auto start wsl-vpnkit whenever I open rancher-desktop by adding a

[boot]
command="wsl.exe -d wsl-vpnkit --cd /app service wsl-vpnkit start"

but this was not persisted, not sure why. @byjrack if you could share how did you managed, or if i'm not getting things correct here. Thanks!

@byjrack
Copy link

byjrack commented Jan 4, 2023

yup you got it just a lazy typer

https://docs.rancherdesktop.io/how-to-guides/provisioning-scripts/

Avoids any wslconfig changes and keeps things at an app level

Are you using a proxy in your scope via WSLENV or another mechanism? The IP based connections for k8s rather than a .internal namespace I can add to no_proxy are a big hang up for me since CIDRs aren't supported in Alpine no_proxy. RD+K8s+WSL2+Proxy have just not been a winning mix for me thus far.

Now vpnkit may impact some of the intra-node flows as well, but I may just be misassigning blame to my proxy needs when vpnkit is fixing one thing and breaking another.

@EvertonSA
Copy link

@byjrack thanks a lot for the link, it will help me a lot

I don't have proxy constraints, but some other people from my company might have it.

@Nino-K
Copy link
Member

Nino-K commented Jun 19, 2023

We have implemented a gvisor based network stack in Rancher Desktop to tackle the routing issue when RD is used behind a VPN. I will close this issue since it should have been addressed in our latest release.

@Nino-K Nino-K closed this as completed Jun 19, 2023
@62mkv
Copy link

62mkv commented Sep 25, 2023

Hi @Nino-K Windows 10 w/VPN (Check Point Endpoint Security) user here; Rancher Desktop 1.10.0; I have issues accessing public internet (such as docker.io) from the rancher-desktop WSL VM when the VPN is connected (but not when it's off)

I don't seem to have host-switch.exe running when my Rancher Desktop is started; any special configuration I should/could apply?

Or just the dance with routes on every VPN re-connect, as this comment suggests?

Thanks in advance!

@Nino-K
Copy link
Member

Nino-K commented Sep 25, 2023

@62mkv the fact that host-switch.exe is not running could indicate that you may not have the new network tunnel enabled. Are you able to verify this using the rdctl list-settings to make sure it is actually enabled.

@62mkv
Copy link

62mkv commented Sep 26, 2023

Thanks @Nino-K it was indeed disabled; I have enabled it manually under "Network" tab and now this "unable to pull with VPN on" issue seems to be resolved! Thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/vpn kind/bug Something isn't working platform/windows triage/needs-information Further information is requested
Projects
None yet
Development

No branches or pull requests

7 participants