Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WSL DNS issue - interesting! #5058

Closed
bbulkow opened this issue Apr 7, 2020 · 9 comments
Closed

WSL DNS issue - interesting! #5058

bbulkow opened this issue Apr 7, 2020 · 9 comments
Labels

Comments

@bbulkow
Copy link

bbulkow commented Apr 7, 2020

I have two different WSL1 installations, one based on ubuntu 16.04, one based on ubuntu 18.04.

I've seen a lot of interesting points about DNS updates based on VPN, and it seems a touchy issue, and I've seen the responses. How to

What's interesting is the auto-generate in my U16 is different from my U18.

After VPN connection, in U16 --- this is what I expect

etc$ cat resolv.conf
# This file was automatically generated by WSL. To stop automatic generation of this file, add the following entry to /etc/wsl.conf:
# [network] 
# generateResolvConf = false
nameserver 10.10.10.20
nameserver 10.10.10.21
nameserver 192.168.4.1
search hsd1.ca.comcast.net      

After VPN connection, in U18 --- this is not what I expect

etc$ cat resolv.conf                                                                                                 
# This file was automatically generated by WSL. To stop automatic generation of this file, add the following entry to /etc/wsl.conf:                                                                                                      
# [network]                                                                                                         
# generateResolvConf = false                                                                                         
nameserver 192.168.4.1                                                                                               
nameserver 2601:647:5a01:bc38:9610:3eff:fe13:8902                                                                    nameserver 10.10.10.20                                                                                               
search hsd1.ca.comcast.net  

Since these are both auto-generated, why would they be different? Is there something I can do to make the 18.04 list correct? Does it have something to do with an IPv6 list being in the middle? I believe 16.04 doesn't have IPv6 by default, so maybe there's an issue with the resolv.conf generator if there's an IPv6 in it?

Please fill out the below information:

  • Your Windows build number: (Type ver at a Windows Command Prompt)
    Microsoft Windows [Version 10.0.19587.1000]
    Linux hostname 4.4.0-19587-Microsoft File system problem #1000-Microsoft Thu Mar 13 16:39:00 PST 2020 x86_64 x86_64 x86_64 GNU/Linux

  • What you're doing and what's happening: (Copy&paste the full set of specific command-line steps necessary to reproduce the behavior, and their output. Include screen shots if that helps demonstrate the problem.)
    Add a VPN route ( happened to use Forticlient, happens to have an IPv6 resolve host ), see if there are differences between an Ubuntu 16.04 and 18.04 host. In the case I've seen, the U18 host order is incorrect, with the VPN entries at the end not the beginning --- but U16 is correct.

  • What's wrong / what should be happening instead:
    I expect the different resolve.conf to be the same ( or very similar ) if they are all autogenerated.

  • Strace of the failing command, if applicable:
    Not an Strace amenable situation.

  • For WSL launch issues, please [collect detailed logs]
    Not a launch issue.

@bbulkow
Copy link
Author

bbulkow commented Apr 7, 2020

I just had a different case, where U18 succeeded and U16 did not, and this time, for an unknown reason, there was no IPv6 in the generated resolv.conf. Perhaps this has more to do with IPv6 and less to do with the version of Ubuntu? And why would the IP addresses of the resolver be different at different times? Very strange, will keep testing.

@therealkenc
Copy link
Collaborator

I just had a different case, where U18 succeeded and U16 did not

That was an experiment I almost asked you to try... more than one run on the same disto. But I couldn't think of a way to introduce a variable between runs, and asking you to just run it over and over seemed cruel and unusual.

It isn't the disto, 16.04 vs 18.04. The /etc/resolv.conf file is updated by WSL /init, which is the first thing run before /bin/bash. WSL /init is invariant to any distro. Also /init is statically linked, so even your distro's libc doesn't come into play.

Here is what I suspect, but can't speculate on specifics. Something is changing, silently, on the Windows side between runs; we just don't know what. If you have third-party networking software installed, like VPN, firewall, AV, whatever... that could be it. Or youknow, not 🤷.

@bbulkow
Copy link
Author

bbulkow commented Apr 8, 2020

Thanks for taking the time to reply.

I am running a bit more to determine what the difference is. I believe it is a timing issue, and the result is not deterministic, which is common in networking, right? Something like 'if the IPv6 is before the 10., then it doesn't work, if the 10. is above the IPv6, it works, but the list is ordered by how those two addresses come off the network? Or something different. I am suspicious of the IPv6 because it's the most obvious difference between the sometimes-fail U18 and the U16 which always works, wondering how I block the IPv6 addition to the resolver, maybe even turn off IPv6 totally in the windows host ( partially as a test ).

This morning, a new circumstance, it seems like the first invocation of WSL U18 worked, and subsequent does not. I am starting to think there's a window right after I establish my VPN when it's OK, then it goes bad? Will keep track and update.

When does /init run? Every shell invocation? More frequently, like an underlying network reconfig?

Do I have third party networking installed? Sure! Everyone does.

I had docker for a while, which inserts network components and has even done so in a way that impacts WSL, because I have a 'docker.internal' network showing up in /etc/hosts in these shells even though I've uninstalled docker. I should go on a hunt and root out the remaining bits, but this problem predates my reinstall of docker.

I have VMware, that inserts all kinds of networking fingers into the stack. Someday it will play nice with windows virtualization, or someday the one remaining project I have that won't compile on WSL or Docker I will figure out why, but until that day, it stays. ( I tried to revive an older machine in my home office to run docker and k8, but it didn't succeed, too old I think )

I have a third-party VPN ( fortinet ) because that's what my company uses now. We have been told to switch to it from OpenVPN and thus it is required for this project.

I have open source VPN ( OpenVPN ) because that's what the world uses and I get scripts for it now and again. Not running.

I have a third-party VPN service that I rarely use, I think it's built on a different copy of OpenVPN and shouldn't be in the way of anything.

@therealkenc
Copy link
Collaborator

therealkenc commented Apr 8, 2020

I believe it is a timing issue, and the result is not deterministic, which is common in networking, right?

Might be; can't rule out a timing thing. But... the WSL startup sequence you'd think would be deterministic even if the time is takes varies. It is going to make a bunch of calls asking "what's the upstream DNS?" and for some reason those calls are returning different things different runs.

I have a third-party VPN ( fortinet )

You might try unininstalling Fortinet temporarily and see if the problem goes away goes away. Not suggesting that as a fix, but it will eliminate the variable.

I have VMware, that inserts all kinds of networking fingers into the stack.

Lots of people run VMWare with WSL1, so that probably isn't it. But that would be the second thing to uninstall for the same reason.

I have open source VPN ( OpenVPN )

I use OpenVPN myself and it has been okay. Yes the OSS version versus the third-party wrappers shouldn't make a difference. Still, if it were the only thing installed, I'd be asking you to uninstall it to eliminate the variable.

Since you are dependent on third-party stuff and are confined to WSL1, your submission is effectively dupe ever-open #416. Not all third party networking stacks are supported. There is #4474, which was due to "due to some bad assumptions about network state" (message). The standard work-around is to stick 8.8.8.8 in /etc/resolv.conf and turn off put [network] generateResolvConf = false in /etc/resolv.conf (message).

@bbulkow
Copy link
Author

bbulkow commented Apr 13, 2020

If I uninstall fortinet forticlient, I won't have this problem because I won't be running a VPN. The issue, again, is that having the DNS resolver over the VPN far down in the list ( after IPv6 ) means it never resolves.

I am surprised you don't mention disabling IPv6. The fact this seems to correlate well with IPv6 ( my non-IPv6 Debian 16.04 is fine ). I will try that and report back.

I am working around this problem presently by adding the entry to /etc/hosts ( the host I am trying to connect to that's supposed to be served over the VPN ) whenever I hit the problem. Pretty soon I will invest in a script that does that, to make it easier.

@therealkenc
Copy link
Collaborator

therealkenc commented Apr 13, 2020

I am surprised you don't mention disabling IPv6.

I haven't mentioned a few suggestions in the "VPN doesn't work" submissions.

This one survived summary execution by dupe because (paraphrasing) OP "my networking works in U16 but not U18" is/was novel enough to let play out.

@bbulkow
Copy link
Author

bbulkow commented Apr 13, 2020

Extra data.

Removing IPv6 from the device driver ( the one providing general internet services ) didn't help. There are still IPv6 elements in Ubuntu, as expected, because IPv6 is still generally enabled ( the IPv6 nameserver is a local broadcast, fec0:0:0:ffff:1 )

Setting
[network]
generateResolvConf = false
did not have the effect I expected. I set that, edited resolv.conf, saved it, closed the WSL window in question, and reopened it, and the old form of the resolv.conf is still there, which doesn't resolve correctly. Guess I have to go back and see how setting that wsl.conf variable is expected to work.

Changing the usual sysctl's in /etc/sysctl.conf didn't have the expected behavior to disable IPv6 within that environment. Not entirely unexpected.

@nash8114
Copy link

Setting
[network]
generateResolvConf = false
did not have the effect I expected. I set that, edited resolv.conf, saved it, closed the WSL window in question, and reopened it, > and the old form of the resolv.conf is still there

After I changed my /etc/wsl.conf, I didn't see a change either. Every newly opened session would override my /etc/resolv.conf. This was 'fixed' by restarting WSL entirely (LxsManager service), though this may be overkill.
Do you still have the described issue? Or were you able to resolve it?

Copy link
Contributor

This issue has been automatically closed since it has not had any activity for the past year. If you're still experiencing this issue please re-file this as a new issue or feature request.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants