Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control networkidle wait time #1353

Closed
feus4177 opened this issue Nov 10, 2017 · 17 comments
Closed

Control networkidle wait time #1353

feus4177 opened this issue Nov 10, 2017 · 17 comments

Comments

@feus4177
Copy link

For our application, there is a significant gap (~1-2s) between when the main app requests are finished and the data resource requests are started. Currently the networkidle option waits for the network to be idle for 500ms. It would be nice to configure this value so that we can wait for all the requests to finish.

@zhangyy62
Copy link

zhangyy62 commented Nov 16, 2017

I had same question. My web application will send servel requests but I can't even set response time.

@demian85
Copy link

+1

I don't think the new values networkidle0 and networkidle2are enough to cover all cases. They don't make much sense either. Sounds like there should be anetworkidle` and other config option to handle the number of requests and seconds to wait. That would be more scalable.

@tommedema
Copy link

This is very important. I just updated puppeteer and the options networkIdleTimeout and networkIdleInfligh disappeared. It's now hardcoded at 500ms for the timeout and 0 or 2 for the in flight number.

Any reason this is now hardcoded?

@aslushnikov
Copy link
Contributor

Any reason this is now hardcoded?

@tommedema The implementation of networkidle0 and networkidle2 was moved to chromium.

The good news is that it's easy to implement a self-baked network-idle watcher, for example:

function waitForNetworkIdle(page, timeout, maxInflightRequests = 0) {
  page.on('request', onRequestStarted);
  page.on('requestfinished', onRequestFinished);
  page.on('requestfailed', onRequestFinished);

  let inflight = 0;
  let fulfill;
  let promise = new Promise(x => fulfill = x);
  let timeoutId = setTimeout(onTimeoutDone, timeout);
  return promise;

  function onTimeoutDone() {
    page.removeListener('request', onRequestStarted);
    page.removeListener('requestfinished', onRequestFinished);
    page.removeListener('requestfailed', onRequestFinished);
    fulfill();
  }

  function onRequestStarted() {
    ++inflight;
    if (inflight > maxInflightRequests)
      clearTimeout(timeoutId);
  }
  
  function onRequestFinished() {
    if (inflight === 0)
      return;
    --inflight;
    if (inflight === maxInflightRequests)
      timeoutId = setTimeout(onTimeoutDone, timeout);
  }
}

// Example
await Promise.all([
  page.goto('https://google.com'),
  waitForNetworkIdle(page, 500, 0), // equivalent to 'networkidle0'
]);

@tommedema
Copy link

thanks @aslushnikov , I guess I will publish that as a npm package once I found some time

@simonhaenisch
Copy link

I'm doing this now which I think should work as well (in a lot of cases)?

await Promise.all([
  page.waitForNavigation({ waitUntil: 'networkidle0' }),
  page.evaluate(() => history.pushState(null, null, '#')),
]);

@bbourn
Copy link

bbourn commented Apr 26, 2019

I couldn't get @aslushnikov 's example to catch all the request finishes / failures so I modified it.

/**
 * a self-baked network-idle watcher
 * @param {*} page
 * @param {*} timeout
 * @param {*} maxInflightRequests
 */
export const waitForNetworkIdle2 = (page, timeout) => {
  let lastTime
  let inFlight = 0
  function onTimeoutDone () {
    timers.shift()
    if (timers.length === 0) {
      if (inFlight) {
        consoleDev('function_helper.waitForNetworkIdle.onTimeoutDone inFlight: ', inFlight, ' continuing')
        timers.push(setTimeout(onTimeoutDone, timeout))
      } else {
        consoleDev('function_helper.waitForNetworkIdle.onTimeoutDone fulfill()')
        page.removeListener('request', onRequest)
        page.removeListener('requestfinished', onRequestFinishOrFail)
        page.removeListener('requestfailed', onRequestFinishOrFail)
        fulfill()
        consoleDev('function_helper.waitForNetworkIdle.onTimeoutDone took ', lastTime - moment().format('X'), ' seconds')
      }
    } else {
      consoleDev('function_helper.waitForNetworkIdle.onTimeoutDone timers: ', timers.length)
    }
  }

  const onRequest = async () => {
    inFlight++
    consoleWarn('function_helper.waitForNetworkIdle.onRequest inFlight: ', inFlight)
    // clear timeout, wait for request to finish
    clearTimeout(timers.shift())
    timers.push(setTimeout(onTimeoutDone, timeout))
    consoleWarn('function_helper.waitForNetworkIdle.onRequest timers: ', timers.length)
    lastTime = moment().format('X')
  }

  const onRequestFinishOrFail = async () => {
    consoleWarn('function_helper.waitForNetworkIdle.onRequestFinishOrFail inFlight: ', inFlight)
    // clear timeout, wait for request to finish
    clearTimeout(timers.shift())
    timers.push(setTimeout(onTimeoutDone, timeout))
    consoleWarn('function_helper.waitForNetworkIdle.onRequestFinishOrFail timers: ', timers.length)
    lastTime = moment().format('X')
    if (inFlight > 0) inFlight--
  }

  page.on('request', onRequest)
  page.on('requestfinished', onRequestFinishOrFail)
  page.on('requestfailed', onRequestFinishOrFail)

  const timers = []
  let fulfill
  let promise = new Promise((resolve, reject) => {
    fulfill = resolve
  })
  consoleWarn('function_helper.waitForNetworkIdle setTimeout()')
  timers.push(setTimeout(onTimeoutDone, timeout))

  return promise
}

@abratnap
Copy link

The waitForNetworkIdle implementations above doesn't work sometimes.

@abratnap
Copy link

@aslushnikov I am trying to fetch multiple dynamic pages using puppeteer. The default networkidle0 and networkidle2 aren't useful to me as I need to wait for more time. I was using the waitForNetworkIdle implementation as below.

await Promise.all([
  page.goto('https://google.com'),
  waitForNetworkIdle(page, 3000, 0), 
]);

I found that there is no response or timeout error being thrown as the promise for waitForNetworkIdle never fulfills . I think it happens as there might be requests for which no response comes back. So inflight > maxInflightRequests and it causes clearing timeout. Is there any way to handle this? What is the correct way to solve/handle this? Is support for custom networkidle needs to be provided from chromium?

@erkankarabulut
Copy link

Any reason this is now hardcoded?

@tommedema The implementation of networkidle0 and networkidle2 was moved to chromium.

The good news is that it's easy to implement a self-baked network-idle watcher, for example:

function waitForNetworkIdle(page, timeout, maxInflightRequests = 0) {
  page.on('request', onRequestStarted);
  page.on('requestfinished', onRequestFinished);
  page.on('requestfailed', onRequestFinished);

  let inflight = 0;
  let fulfill;
  let promise = new Promise(x => fulfill = x);
  let timeoutId = setTimeout(onTimeoutDone, timeout);
  return promise;

  function onTimeoutDone() {
    page.removeListener('request', onRequestStarted);
    page.removeListener('requestfinished', onRequestFinished);
    page.removeListener('requestfailed', onRequestFinished);
    fulfill();
  }

  function onRequestStarted() {
    ++inflight;
    if (inflight > maxInflightRequests)
      clearTimeout(timeoutId);
  }
  
  function onRequestFinished() {
    if (inflight === 0)
      return;
    --inflight;
    if (inflight === maxInflightRequests)
      timeoutId = setTimeout(onTimeoutDone, timeout);
  }
}

// Example
await Promise.all([
  page.goto('https://google.com'),
  waitForNetworkIdle(page, 500, 0), // equivalent to 'networkidle0'
]);

This waits until the timeout anyway!

@mifi
Copy link

mifi commented May 15, 2020

I couldn't get @aslushnikov's example to work.

  1. it never resolves (only times out)
  2. it doesn't handle the case where there are no requests.

Here is my working implementation:

function waitForNetworkIdle({ page, timeout = 30000, waitForFirstRequest = 1000, maxInflightRequests = 0 }) {
  let inflight = 0;
  let resolve;
  let reject;
  let firstRequestTimeoutId;
  let timeoutId;

  function cleanup() {
    clearTimeout(timeoutId);
    clearTimeout(firstRequestTimeoutId);
    /* eslint-disable no-use-before-define */
    page.removeListener('request', onRequestStarted);
    page.removeListener('requestfinished', onRequestFinished);
    page.removeListener('requestfailed', onRequestFinished);
    /* eslint-enable no-use-before-define */
  }

  function check() {
    if (inflight === 0 || inflight <= maxInflightRequests) {
      cleanup();
      resolve();
    }
  }

  function onRequestStarted() {
    clearTimeout(firstRequestTimeoutId);
    inflight += 1;
  }

  function onRequestFinished() {
    inflight -= 1;
    check();
  }

  function onTimeout() {
    cleanup();
    reject(new Error('Timeout'));
  }

  function onFirstRequestTimeout() {
    cleanup();
    resolve();
  }

  page.on('request', onRequestStarted);
  page.on('requestfinished', onRequestFinished);
  page.on('requestfailed', onRequestFinished);

  timeoutId = setTimeout(onTimeout, timeout);
  firstRequestTimeoutId = setTimeout(onFirstRequestTimeout, waitForFirstRequest);

  return new Promise((res, rej) => { resolve = res; reject = rej; });
}

waitForFirstRequest specifies how long to wait before resolving if there are never any triggered network requests. And remember to try-catch the when awaiting the promise. It throws on timeout.

@DevBrent
Copy link

DevBrent commented Jun 23, 2020

@mifi You left out any implementation similar to how networkIdle0 works where it delays 500ms before returning to catch delayed/rate limited network requests. There are good reasons to delay before assuming as soon as you hit zero requests there will be no further requests. We aimed to reduce this 500ms minimum down to 50-200ms to reduce PDF generation times.

Here's an update of your version with a waitForLastRequest time added.

function waitForNetworkIdle({ page, timeout = 30000, waitForFirstRequest = 1000, waitForLastRequest = 200, maxInflightRequests = 0 }) {
  let inflight = 0;
  let resolve;
  let reject;
  let firstRequestTimeoutId;
  let lastRequestTimeoutId;
  let timeoutId;
  maxInflightRequests = Math.max(maxInflightRequests, 0);

  function cleanup() {
    clearTimeout(timeoutId);
    clearTimeout(firstRequestTimeoutId);
    clearTimeout(lastRequestTimeoutId);
    /* eslint-disable no-use-before-define */
    page.removeListener('request', onRequestStarted);
    page.removeListener('requestfinished', onRequestFinished);
    page.removeListener('requestfailed', onRequestFinished);
    /* eslint-enable no-use-before-define */
  }

  function check() {
    if (inflight <= maxInflightRequests) {
      clearTimeout(lastRequestTimeoutId);
      lastRequestTimeoutId = setTimeout(onLastRequestTimeout, waitForLastRequest);
    }
  }

  function onRequestStarted() {
    clearTimeout(firstRequestTimeoutId);
    clearTimeout(lastRequestTimeoutId);
    inflight += 1;
  }

  function onRequestFinished() {
    inflight -= 1;
    check();
  }

  function onTimeout() {
    cleanup();
    reject(new Error('Timeout'));
  }

  function onFirstRequestTimeout() {
    cleanup();
    resolve();
  }

  function onLastRequestTimeout() {
    cleanup();
    resolve();
  }

  page.on('request', onRequestStarted);
  page.on('requestfinished', onRequestFinished);
  page.on('requestfailed', onRequestFinished);

  timeoutId = setTimeout(onTimeout, timeout); // Overall page timeout
  firstRequestTimeoutId = setTimeout(onFirstRequestTimeout, waitForFirstRequest);

  return new Promise((res, rej) => { resolve = res; reject = rej; });
}

@Kikobeats
Copy link
Contributor

Kikobeats commented Jun 23, 2020

For people who can be interested, I testing a waitUntil=auto implementation for a while, based on #1353 (comment)

It's simple and works like a charm:

  1. waitUntil sets to 'load':
    https://github.com/microlinkhq/browserless/blob/master/packages/goto/src/index.js#L196

  2. wait remaining requests on page level with networkidle2 and wrap with a timeout:
    https://github.com/microlinkhq/browserless/blob/master/packages/goto/src/index.js#L282

  3. ensure window state is updated:
    https://github.com/microlinkhq/browserless/blob/master/packages/goto/src/index.js#L283

👌

@lehni
Copy link

lehni commented Nov 6, 2020

await Promise.all([
  page.waitForNavigation({ waitUntil: 'networkidle0' }),
  page.evaluate(() => history.pushState(null, null, '#')),
]);

☝️ this great snippet by @simonhaenisch worked for most cases for me, but replacing the path with null then covered the few cases where it wouldn't work as expected:

await Promise.all([
  page.waitForNavigation({ waitUntil: 'networkidle0' }),
  page.evaluate(() => history.pushState(null, null, null)),
]);

@Kikobeats
Copy link
Contributor

@lehni thanks for the trick!

Can you tell us a public URL to see the difference?

Kikobeats added a commit to microlinkhq/browserless that referenced this issue Nov 6, 2020
@lehni
Copy link

lehni commented Nov 8, 2020

@Kikobeats sadly I can't. I tried the solution by @simonhaenisch in our e2e tests and noticed some strange fails in some situations that I thought could be linked to the use of the hash (just a hunch). I replaced it with null and those cases started working as expected. So I figured I should share the observation here. I haven't really debugged the original cause of the difference. It could be linked to pages that actually react to changes of the hash.

@God-damnit-all
Copy link

God-damnit-all commented Dec 19, 2021

For people who can be interested, I testing a waitUntil=auto implementation for a while, based on #1353 (comment)

It's simple and works like a charm:

  1. waitUntil sets to 'load':
    https://github.com/microlinkhq/browserless/blob/master/packages/goto/src/index.js#L196
  2. wait remaining requests on page level with networkidle2 and wrap with a timeout:
    https://github.com/microlinkhq/browserless/blob/master/packages/goto/src/index.js#L282
  3. ensure window state is updated:
    https://github.com/microlinkhq/browserless/blob/master/packages/goto/src/index.js#L283

👌

I corrected the links to point to this particular time in your repo's history:

  1. waitUntil sets to 'load':
    https://github.com/microlinkhq/browserless/blob/c49b817f8daebc96abac0a34bc0c8cf88d935470/packages/goto/src/index.js#L196
  2. wait remaining requests on page level with networkidle2 and wrap with a timeout:
    https://github.com/microlinkhq/browserless/blob/c49b817f8daebc96abac0a34bc0c8cf88d935470/packages/goto/src/index.js#L282
  3. ensure window state is updated:
    https://github.com/microlinkhq/browserless/blob/c49b817f8daebc96abac0a34bc0c8cf88d935470/packages/goto/src/index.js#L283

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests