Skip to content
This repository has been archived by the owner on Jun 4, 2021. It is now read-only.

WireGuard/GCE - standard MTU causes fragmentation, some Google sites fail. #1089

Closed
Ronaldkornblow opened this issue Dec 8, 2017 · 18 comments

Comments

@Ronaldkornblow
Copy link

Expected behavior:

Browse Internet.

Actual Behavior:

Only some DNS responses happen? I am really not sure what is going on. Symptoms are that when I browse internet, not all sites work.
But when I: dig @10.192.122.1 google.com I get:
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 22030
;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;google.com. IN A

;; ANSWER SECTION:
google.com. 299 IN A 74.125.124.102
google.com. 299 IN A 74.125.124.139
google.com. 299 IN A 74.125.124.138
google.com. 299 IN A 74.125.124.113
google.com. 299 IN A 74.125.124.101
google.com. 299 IN A 74.125.124.100

;; Query time: 60 msec
;; SERVER: 10.192.122.1#53(10.192.122.1)
;; WHEN: Fri Dec 08 11:58:57 PST 2017
;; MSG SIZE rcvd: 135

Steps to Reproduce:

  1. sudo wg-quick up wg0-client
    This use to error out until I installed nscd and unbound. I installed those items and now it seems to get though quick up with no error. Still browsing gets the ERR_TIMED_OUT.

I do not
[ contents of streisand-diagnostics.md here ]

Additional Details:

Log output from Ansible or other relevant services (link to Gist for longer output):

Target Cloud Provider: gcp
Operating System of target host: ubuntu 16.04
Operating System of client: ubuntu 16.04
Version of Ansible, using ansible --version : ansible 2.3.0.0
Output from git rev-parse HEAD in your Streisand directory : 8730537
@cpu cpu self-assigned this Dec 8, 2017
@cpu cpu changed the title dns doesn't work? - wg-quick up wg0-client WireGuard/GCP - intermittent DNS failures. Dec 8, 2017
@cpu cpu changed the title WireGuard/GCP - intermittent DNS failures. WireGuard/GCE - intermittent DNS failures. Dec 8, 2017
@cpu
Copy link
Collaborator

cpu commented Dec 8, 2017

Hi @Ronaldkornblow, thanks for filing the issue :-)

Only some DNS responses happen? I am really not sure what is going on. Symptoms are that when I browse internet, not all sites work.

What I recommend we try first is to have you bring down the wireguard interface, bring it back up, browse to a site that doesn't work, and then check the status of a few things. Can you please run these commands and share the output?

  1. sudo wg-quick down wg0-client
  2. cat /etc/resolv.conf
  3. sudo wg-quick up wg0-client
  4. cat /etc/resolv.conf (same as before, please share the output from both)
  5. Browse to a broken website
  6. sudo wg (you might want to edit the endpoint: line to remove the server IP address)

When you say "not all sites work" - can you give some examples? Do they consistently fail? Have you tried using OpenVPN or one of the other VPN services, if so, do the same websites fail or do they only fail for WireGuard?

I've heard that Google's Cloud Env can have MTU problems that manifest as persistent network errors to some sites but we should rule out the easier stuff before we chase that rumour down.

Output from git rev-parse HEAD in your Streisand directory : 8730537

This is a commit from May - was that when you created your Streisand server?

Apologies for the barrage of questions :-)

@cpu cpu added the status/information-needed For items missing required information label Dec 8, 2017
@Ronaldkornblow
Copy link
Author

Ronaldkornblow commented Dec 8, 2017

Hi @cpu, Thank you for the quick response.

cat /etc/resolv.conf

blahblah:~/streisand$ cat /etc/resolv.conf
Generated by resolvconf
nameserver 127.0.0.1

cat /etc/resolv.conf

blahblah):~/streisand$ cat /etc/resolv.conf
Generated by resolvconf
nameserver 10.192.122.1

sudo wg

interface: wg0-client
public key: blahblah
private key: (hidden)
listening port: 60590
fwmark: 0xca6c

peer: blahblah
endpoint: blahblah:51820
allowed ips: 0.0.0.0/0
latest handshake: 1 minute, 16 seconds ago
transfer: 5.07 KiB received, 5.14 KiB sent

I replaced with blahblah.

When you say "not all sites work" - can you give some examples? Do they consistently fail? Have you tried using OpenVPN or one of the other VPN services, if so, do the same websites fail or do they only fail for WireGuard?

duckduckgo.com works.
constant fail.
google.com does not work and many other sites do not work. Most other sites do not work
No I have not tried other VPNs though streisand

This is a commit from May - was that when you created your Streisand server?

Yes. I did it back then.

Do you want netstat from the sites that do not work?

Thank you so much for your help

@cpu
Copy link
Collaborator

cpu commented Dec 8, 2017

google.com does not work and many other sites do not work.

Ok! I'll try setting up a Streisand server from the current code in master in GCE and seeing if I'm able to access google.com - hopefully the answer is "no" and I can iterate on troubleshooting faster.

No I have not tried other VPNs though streisand

Ok. That might be useful to know but if you want to wait and see if I'm able to reproduce the problem myself to save yourself some time/energy that's fine with me :-)

Do you want netstat from the sites that do not work?

That's ok for now. I'm going to see if I can repro and if not we can come back to the drawing board.

Thank you so much for your help

Happy to help!

@cpu cpu removed the status/information-needed For items missing required information label Dec 8, 2017
@cpu
Copy link
Collaborator

cpu commented Dec 8, 2017

google.com does not work and many other sites do not work.

Ok! I'll try setting up a Streisand server from the current code in master in GCE and seeing if I'm able to access google.com - hopefully the answer is "no" and I can iterate on troubleshooting faster.

@Ronaldkornblow : Good news - I was able to reproduce the problem and I can recommend a short-term workaround while I spend some time figuring out the best solution longer term.

In practice it seems a MTU larger than 1360 results in UDP fragmentation on the GCE network. Out of box my server/client both had an MTU of 1420. You should be able to resolve this by updating the MTU on the wg0-server interface on the server and the wg0-client interface on your client machine.

You can do so by running:
sudo ifconfig wg0-server mtu 1360 on the server
sudo ifconfig wg0-client mtu 1360 on the client

I'll have to spend some time thinking about how best to address this permanently. Thanks for reporting!

@Ronaldkornblow
Copy link
Author

@cpu Thank you for investigating.

sudo ifconfig wg0-server mtu 1360 on the server

sorry for noob question, but... How do I SSH into my streisand server?
The SSH button on my gce dashboard is disabled.
image

Thank you for the support.

@cpu
Copy link
Collaborator

cpu commented Dec 8, 2017

sorry for noob question, but... How do I SSH into my streisand server?

No apologies needed :-) From a command line you should be able to do ssh ubuntu@<the server ip> . I believe this will work out of box based on the default settings & SSH key used during provisioning.

Edit: Make sure to use the External IP you see in the GCE interface, not the internal 10.xx.xx.xx one

@Ronaldkornblow
Copy link
Author

ubuntu@streisand:/home$ sudo ifconfig wg0-server mtu 1360

sudo: unable to resolve host streisand

@cpu
Copy link
Collaborator

cpu commented Dec 9, 2017

@Ronaldkornblow That warning is OK. If you want to confirm it worked you can run ifconfig wg0-server on the server - you should see that it now says "mtu 1360" on the end of the first line of output.

@Ronaldkornblow
Copy link
Author

@cpu got it thank you.

@cpu
Copy link
Collaborator

cpu commented Dec 9, 2017

@Ronaldkornblow - Did the temporary MTU change fix your problems accessing Google websites?

@Ronaldkornblow
Copy link
Author

Ronaldkornblow commented Dec 9, 2017

@Ronaldkornblow - Did the temporary MTU change fix your problems accessing Google websites?

@cpu
Yes thank you.

What is a good way to make streisand start on boot along with the MTU changes?

@cpu
Copy link
Collaborator

cpu commented Dec 10, 2017

Yes thank you.

Great! Glad to hear.

What is a good way to make streisand start on boot along with the MTU changes?

I'll get back to you on this - I'm not sure yet.

@cpu
Copy link
Collaborator

cpu commented Dec 10, 2017

@Ronaldkornblow In the [Interface] section of /etc/wireguard/wg0-client.conf on your client machine add MTU = 1360 after the Address line. I believe you'll have to bring the interface down and then back up with: sudo wg-quick down wg0-client && sudo wg-quick up wg0-client if you're already connected for the change to take effect.

You can do the same thing on the server by adding MTU = 1360 after the Address line in the [Interface] section of /etc/wireguard/wg0-server.conf. You can bring down/up that interface with sudo wg-quick down wg0-server && sudo wg-quick up wg0-server

@cpu
Copy link
Collaborator

cpu commented Dec 10, 2017

@Ronaldkornblow I did a little bit more testing. After provisioning a fresh GCE instance with Streisand the wg0-server interface on the server was auto-configured by wg-quick to have a MTU of 1380. When I ran a config through wg-quick on my laptop the wg0-client interface came up with an MTU of 1420 - this is a reasonable default but doesn't play nicely with the GCE environment - Google pages didn't load, as expected. Adding MTU = 1380 in /etc/wireguard/wg0-client.conf to match the server's MTU was all that was required to fix the problem persistently. After adding that & reconnecting all of the Google websites worked again.

I think I initially A) suggested the wrong MTU, I was calculating the overhead of WireGuard incorrectly (See this post for a better accounting). B) suggested changing the server wg0-server interface MTU which I don't believe is necessary - wg-quick seems to do the job correctly based on the MTU of the ens4 outbound interface on a fresh GCE instance.

@cpu
Copy link
Collaborator

cpu commented Dec 10, 2017

I'm going to close this issue for now. I think adding the MTU to your client profile is the correct solution.

Unfortunately I don't think there is anything Streisand can do to generate client profiles with a correct hardcoded MTU that will be appropriate for all cases. If someone more clever than myself has an idea for an improvement I'm all ears :-)

@Ronaldkornblow Thanks for reporting the problem.

@cpu cpu closed this as completed Dec 10, 2017
@cpu cpu changed the title WireGuard/GCE - intermittent DNS failures. WireGuard/GCE - standard MTU causes fragmentation, some Google sites fail. Dec 10, 2017
@x8x
Copy link
Contributor

x8x commented Mar 7, 2018

@cpu A note on the generated server instructions for wireguard would be helpful for non techies, had same issue with GCE, had to change the MTU on the client to 1380. Another long time issue I have is client can connect to server but if I disconnect the client for a day and try to reconnect it doesn't works, have to restart wireguard on the server. Not many ideas about this one, I think the server on GCE is NATted, clients are NATted too, maybe a firewall rule / connection tracking / punching holes issue. The client sends data but the server doesn't answer back. Any clue welcome! :-)
Wondering if it's somehow related to this https://lists.zx2c4.com/pipermail/wireguard/2017-November/002062.html

@x8x
Copy link
Contributor

x8x commented Mar 10, 2018

@cpu @zx2c4 I did some more checking, tcpdump monitoring of both endpoints shows that the server gets the connection from the client but doesn't answer! After restarting the wireguard server, endpoints connects fine again. I'd appreciate any help in debugging this. First step I suppose iptables and ufw, any suggestion?

@cpu
Copy link
Collaborator

cpu commented Mar 13, 2018

@cpu A note on the generated server instructions for wireguard would be helpful for non techies, had same issue with GCE, had to change the MTU on the client to 1380

Sounds like a good idea 👍

I'm behind on Streisand notifications/threads, apologies. I won't be able to look at this for a while yet.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants