Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: poor outgoing network performance on Mac #20062

Merged
merged 7 commits into from
Feb 23, 2022

Conversation

moyara-twilio
Copy link
Contributor

@moyara-twilio moyara-twilio commented Feb 5, 2022

  • Closes

User facing changelog

Fixed poor network issue in MacOS on initial requests when using VMware CBCloud.

Additional details

I have been experiencing tests hangs when run cypress locally and always timed out. Digging into the issue, finally narrowed down net.connect makes super slow when http agent makes requests.
This eventually turned out that NodeJS bug in MacOS. I have reported in nodejs/node#41699 for more details.

In Cypress, when outgoing requests happen, http agent first lookup DNS records and find out the fasted IP family.
https://github.com/cypress-io/cypress/blob/develop/packages/network/lib/connect.ts#L24
When there are 4 hosts in a DNS record like www.google.com, it tries 4 net.connect() for the hosts which eventually makes 4x slower than usual request without net.connect() in my experiment.

Simplified sample code to reproduce the issue from https://github.com/cypress-io/cypress/blob/develop/packages/server/lib/request.js#L434

// test.js
let r = require('@cypress/request')
const { agent } = require("./packages/network");

const url = 'https://www.google.com'
const defaults = {
  agent,
  headers: {
    'Connection': 'keep-alive',
  },
  proxy: null, // upstream proxying is handled by CombinedAgent
}

console.time(url)
const reqStream = r({url, ...defaults});
reqStream.on('request', (req) => {
  console.timeLog(url, "on request")
})
reqStream.on('socket', socket => {
  console.timeLog(url, 'socket created')
  socket.on('lookup', () => {
    console.timeLog(url, 'socket lookup')
  })
  socket.on('connect', () => {
    console.timeLog(url, 'socket connected')
  })
})
reqStream.on('response', (incomingRes) => {
  incomingRes.on('data', data => {
    console.timeLog(url, 'data received')
  })
})

before fix

$ DEBUG=cypress:network:connect node test.js
cypress:network:connect beginning getAddress { hostname: 'www.google.com', port: 443 } +0ms
https://www.google.com: 9.085ms on request
cypress:network:connect got addresses { hostname: 'www.google.com', port: 443, addresses: [ { address: '172.253.62.103', family: 4 }, { address: '172.253.62.147', family: 4 }, { address: '172.253.62.104', family: 4 }, { address: '172.253.62.105', family: 4 }, { address: '172.253.62.106', family: 4 }, { address: '172.253.62.99', family: 4 }, { address: '2607:f8b0:4004:c07::63', family: 6 }, { address: '2607:f8b0:4004:c07::67', family: 6 }, { address: '2607:f8b0:4004:c07::69', family: 6 }, { address: '2607:f8b0:4004:c07::93', family: 6 } ] } +6ms
https://www.google.com: 95.359ms socket created
https://www.google.com: 96.893ms socket lookup
https://www.google.com: 2.167s socket connected
https://www.google.com: 10.292s data received
https://www.google.com: 10.293s data received
https://www.google.com: 10.293s data received
https://www.google.com: 10.293s data received
https://www.google.com: 10.293s data received

After fix

  cypress:network:connect beginning getAddress { hostname: 'www.google.com', port: 443 } +0ms
https://www.google.com: 10.707ms on request
  cypress:network:connect got addresses { hostname: 'www.google.com', port: 443, addresses: [ { address: '172.253.62.99', family: 4 }, { address: '172.253.62.103', family: 4 }, { address: '172.253.62.104', family: 4 }, { address: '172.253.62.105', family: 4 }, { address: '172.253.62.106', family: 4 }, { address: '172.253.62.147', family: 4 }, { address: '2607:f8b0:4004:835::2004', family: 6 } ] } +79ms
https://www.google.com: 247.343ms socket created
https://www.google.com: 256.244ms socket lookup
https://www.google.com: 325.534ms socket connected
https://www.google.com: 509.304ms data received
https://www.google.com: 509.633ms data received
https://www.google.com: 509.929ms data received
https://www.google.com: 510.103ms data received
https://www.google.com: 510.251ms data received

How has the user experience changed?

I have observed http requests hangs or pending status initially and randomly fails with timeout. Once the website has 3rd party dependencies such as Segment, Heap, Google Analytics etcs, it all goes slow for the net.connect issue and finally made the tests unreliable.

I verified with my colleagues that it works and this improves network performance significantly running tests in MacOS. This would solve most problem with timeouts or flakiness due to tests hang for Mac users.

Also verified the issue does not happen in Debian and Centos. Have not verified in Windows and Mac M1 chips.

PR Tasks

  • Have tests been added/updated?
  • Has the original issue (or this PR, if no issue exists) been tagged with a release in ZenHub? (user-facing changes only)
  • Has a PR for user-facing changes been opened in cypress-documentation?
  • Have API changes been updated in the type definitions?
  • Have new configuration options been added to the cypress.schema.json?

@cypress-bot
Copy link
Contributor

cypress-bot bot commented Feb 5, 2022

Thanks for taking the time to open a PR!

@moyara-twilio moyara-twilio marked this pull request as ready for review February 7, 2022 17:51
@moyara-twilio moyara-twilio requested a review from a team as a code owner February 7, 2022 17:51
@moyara-twilio moyara-twilio requested review from jennifer-shehane and removed request for a team February 7, 2022 17:51
@jennifer-shehane
Copy link
Member

@moyara-twilio Thanks for opening this issue, we are always looking to optimize improvements to the performance of Cypress. The system-tests here failing is unrelated to your work and related to #20075 just for context.

@moyara-twilio
Copy link
Contributor Author

Hello team! Any feedback or update on this?

@moyara-twilio
Copy link
Contributor Author

I verified with my old macbook that 10.14 Mojave does not have this issue.
Confirmed 11 Big Sur and 12 Monterey have this issue.
Have no machine to test over 10.15 Catalina. But seems started from Catalina or big Sur.

@brian-mann
Copy link
Member

Would be curious your thoughts @flotwig

@moyara
Copy link

moyara commented Feb 15, 2022

Hello, any update? this would improve developer experiences significantly in local development for MacOS users.

Copy link
Contributor

@flotwig flotwig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @moyara @moyara-twilio, thank you for opening a PR. Sorry it's taken so long to get to this one.

So it seems like the main change here is to always send servername when we are making an HTTPS request. Is that understanding correct? Can you confirm that this issue only happens with HTTPS, not HTTP? Trying to understand what the core issue is and how this fixes it.

packages/network/lib/connect.ts Outdated Show resolved Hide resolved
packages/network/lib/connect.ts Outdated Show resolved Hide resolved
@moyara
Copy link

moyara commented Feb 17, 2022

Hey @moyara @moyara-twilio, thank you for opening a PR. Sorry it's taken so long to get to this one.

So it seems like the main change here is to always send servername when we are making an HTTPS request. Is that understanding correct? Can you confirm that this issue only happens with HTTPS, not HTTP? Trying to understand what the core issue is and how this fixes it.

Hi @flotwig , it does not matter whether http or https. It seems like net.connect and immediate close triggers the issue. And this is MacOS specific. I specified more details in nodejs/node#41699.

servername is just for TLS connection. Some hosts such as load balancers rejected the connection without specifying the servername.

@ryanthemanuel
Copy link
Collaborator

@moyara-twilio we would like to have a little more information about exactly what is going on here. We have tried on Macs on our side and couldn't reproduce this issue. We tried monkey patching dns.lookup to return the exact addresses that you have in your logs and we still could not reproduce.

I was wondering if you could try a couple of things to help us narrow down the exact root cause in your exact set up. In the existing code without your change, could you:

  • Add additional console.timeLog statements in the onConnect callback to see how long each of the addresses take to resolve. I'm curious to see if one is particularly slow
  • Monkeypatch dns.lookup so that we can see which address is being picked on any dns lookups that are occurring. We know that we do a lookup for all addresses for a given hostname, but it appears that when we actually make the https connection it does a subsequent call to dns.lookup and I'm curious to see which address it's picking.

@moyara
Copy link

moyara commented Feb 17, 2022

@ryanthemanuel hmm interesting. I tried 4 different Mac from my colleagues and confirmed all happen.
Can you try below code? It is the simplified version what happens with the connection I believe.

const Url = require('url')
const net = require('net');
const tls = require('tls');
const https = require('https');

const url = "https://www.google.com"

// Does not matter port or host in net.connect
console.time('1.1.1.1')
net.connect({ port: 443, host: '1.1.1.1' }, function () {
  console.timeLog('1.1.1.1', 'connected')
  this.end();
}).on('close', () => {console.timeLog('1.1.1.1', 'closed')});
console.time(url)
https.request(Url.parse(url), (res) => {
  console.timeLog(url, 'response')
  res.on('data', () => {
    console.timeLog(url, 'data');
  })
}).on('socket', (socket) => {
  console.timeLog(url, 'connected')
  socket.on('lookup', () => console.timeLog(url, 'lookup'))
}).end();
https://www.google.com: 16.372ms connected
https://www.google.com: 20.92ms lookup
1.1.1.1: 24.846ms connected
1.1.1.1: 1.037s closed
https://www.google.com: 1.235s response
https://www.google.com: 1.235s data
https://www.google.com: 1.236s data
https://www.google.com: 1.236s data

When .destroy() and use tls.connect()

https://www.google.com: 14.742ms connected
https://www.google.com: 18.816ms lookup
1.1.1.1: 377.003ms connected
1.1.1.1: 381.022ms closed
https://www.google.com: 579.698ms response
https://www.google.com: 580.581ms data
https://www.google.com: 580.831ms data

It still happened even put the https.request() in onConnect callback right after .end() as well.

@moyara
Copy link

moyara commented Feb 17, 2022

Added some more logs in connect.ts

export function byPortAndAddress (hostname: string, port: number, address: net.Address) {
  // https://nodejs.org/api/net.html#net_net_connect_port_host_connectlistener
  return new Bluebird<net.Address>((resolve, reject) => {
    const onConnect = () => {
      console.timeLog('https://www.google.com', 'onConnect to', address);
      client.end()
      resolve(address)
    }

    let client = net.connect({ port, host: address.address }, onConnect)

    client.on('error', reject)
    client.on('close', () => {
      console.timeLog('https://www.google.com', 'closed', address);
    });
  })
}
https://www.google.com: 8.94ms on request
https://www.google.com: 148.957ms onConnect to { address: '142.251.33.196', family: 4 }
https://www.google.com: 163.701ms socket created
https://www.google.com: 164.96ms onConnect to { address: '2607:f8b0:4004:837::2004', family: 6 }
https://www.google.com: 165.383ms socket lookup
https://www.google.com: 1.238s socket connected
https://www.google.com: 1.239s closed { address: '142.251.33.196', family: 4 }
https://www.google.com: 2.240s closed { address: '2607:f8b0:4004:837::2004', family: 6 }
https://www.google.com: 2.362s data received
https://www.google.com: 2.362s data received
https://www.google.com: 2.362s data received
https://www.google.com: 2.362s data received

for more information
MacOS Monterey 12.2.1 (21D62)
Node v16.13.0

@ryanthemanuel
Copy link
Collaborator

ryanthemanuel commented Feb 18, 2022

@moyara I just updated to MacOS 12.2.1 and I'm still not seeing the issue (see below). I'll reach out to a few colleagues tomorrow to see if they can reproduce on their setup. I'm sure there's something different going on between our setups. Out of curiosity, what are your DNS settings?

https://www.google.com: 22.442ms connected
1.1.1.1: 39.597ms connected
1.1.1.1: 55.324ms closed
https://www.google.com: 52.886ms lookup
https://www.google.com: 191.72ms response
https://www.google.com: 192.608ms data
https://www.google.com: 192.921ms data
https://www.google.com: 193.215ms data
https://www.google.com: 193.48ms data
https://www.google.com: 193.754ms data
https://www.google.com: 194.038ms data
https://www.google.com: 194.214ms data
https://www.google.com: 194.61ms data
https://www.google.com: 194.863ms data
https://www.google.com: 195.221ms data
https://www.google.com: 195.477ms data
https://www.google.com: 195.628ms data
https://www.google.com: 195.755ms data
https://www.google.com: 210.175ms data
https://www.google.com: 210.932ms data
https://www.google.com: 211.15ms data
https://www.google.com: 211.323ms data
https://www.google.com: 211.464ms data
https://www.google.com: 211.823ms data
https://www.google.com: 212.003ms data
https://www.google.com: 212.145ms data
https://www.google.com: 212.436ms data
https://www.google.com: 220.975ms data
https://www.google.com: 221.322ms data
https://www.google.com: 222.237ms data
https://www.google.com: 222.572ms data
https://www.google.com: 222.893ms data
https://www.google.com: 223.115ms data
https://www.google.com: 223.238ms data
https://www.google.com: 223.359ms data
https://www.google.com: 224.619ms data
https://www.google.com: 224.941ms data
https://www.google.com: 225.386ms data
https://www.google.com: 225.548ms data
https://www.google.com: 225.69ms data
https://www.google.com: 225.817ms data
https://www.google.com: 225.936ms data
https://www.google.com: 226.055ms data
https://www.google.com: 227.166ms data
https://www.google.com: 227.311ms data
https://www.google.com: 228.275ms data
https://www.google.com: 228.394ms data

@moyara
Copy link

moyara commented Feb 18, 2022

@ryanthemanuel I have no specific DNS setting. I could reproduce it using my old Macbook(Big Sur). But another Mac with 10.14 Mojave does not have this issue in the same network.

  • 10.14 Mojave does not have the issue
  • 11 Big Sur has the issue
  • 12 Monterey has the issue

I'll also ask people outside my company to see if it only happens for the company laptops.

@moyara
Copy link

moyara commented Feb 18, 2022

OK turned out that my company wide software VMware CBCloud made the issue. So issue is only for the users who use the CBCloud.
But still have a proposal to replace client.end() with client.destroy(). The byPortAndAddress() method is to find out the fastest IP family to be cached and used in requests(https://github.com/cypress-io/cypress/blob/develop/packages/network/lib/agent.ts#L137). And this does not need to wait for FIN by server to be closed. This will somewhat resolve the issue with CBCloud as well. Does it make sense?

@ryanthemanuel
Copy link
Collaborator

@moyara we'd be good with replacing the call to client.end() with client.destroy(). Can you make sure and add a test that covers that scenario?

@moyara
Copy link

moyara commented Feb 18, 2022

@ryanthemanuel update the PR. Please review :)

@tbiethman tbiethman requested a review from flotwig February 21, 2022 16:36
@andreasonny83
Copy link

Looking forward to see this fix merged in. This issue has bothered me for a very long time.
Thanks a lot

@flotwig
Copy link
Contributor

flotwig commented Feb 23, 2022

@moyara-twilio @moyara thank you for the contribution and thank you for working with us to isolate the root of this issue :)

@flotwig flotwig merged commit 14d0b93 into cypress-io:develop Feb 23, 2022
@cypress-bot
Copy link
Contributor

cypress-bot bot commented Feb 28, 2022

Released in 9.5.1.

This comment thread has been locked. If you are still experiencing this issue after upgrading to
Cypress v9.5.1, please open a new issue.

@cypress-bot cypress-bot bot locked as resolved and limited conversation to collaborators Feb 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants