Fix wait for networkIdle lifecycle event #578

ankur22 · 2022-10-10T17:04:38Z

Fixes: #597

Details

There are three lifecycle types that are emitted from the browser over CDP:

load
DOMContentLoaded
networkIdle

We should be able to to wait on any of these and proceed with the test once we receive them and action on them at an appropriate time. Currently (4485c6d) the wait for networkIdle waits indefinitely on some websites e.g. google.com.

IMO, we should be able to rely on the browser to send us the CDP lifecycle events, which in my tests it does do. What we seem to have (and I can only guess the reasoning behind it) are three redundancies in place incase we don't receive these events from the browser at all or within a set timeout (the timeout scenario is only for networkIdle) or not in the order we expect (if a webpage has a hierarchy of frames, then currently we wait for the child frames to receive and action on the lifecycle events before the main one).

The first redundancy we have is the timeout for networkIdle. This timer should fire when:

500ms have passed;
There are 0 inflight requests (actually this condition is needed for the timer to start)
We've not already received the networkIdle lifecycle event.

Unfortunately what happens in this issue is that another event is received (EventFrameStoppedLoading) which sets networkIdle in the frame's lifecycleEvents, and when all requests finally complete (which is after we receive EventFrameStoppedLoading) the time cannot be started. Therefore there is nothing in place to recalculate the frame's lifecycle, and unblock waiting actions. I also think 500ms is not long enough for a timeout and I've seen situations where the timeout has fired, but then we receive the networkIdle lifecycle CDP event a very short while later.

The second redundancy in place is listening to EventFrameStoppedLoading and setting all the lifecycle types to the frame's lifecycleEvents, and when a subsequent new lifecycle event is received from the browser it should recalculateLifecycle which would unblock any waits. The problem with this currently is that this event can be received but we then don't receive and action on any other lifecycle events after -- we actually do receive networkIdle after but we're not calling recalculateLifecycle for an unknown reason.

The fix for unblocking when waiting on networkIdle is:

Call recalculateLifecycle once we see networkIdle lifecycle events.
Increase the timeout to 1sec so that we're not timing out too early.

After this fix, I found another issue -- if we navigate to google.com, we correctly receive networkIdle and recalculateLifecycle. If we now navigate again to google.com, we unblock the wait as soon as we receive any lifecycle event. We should be waiting on the correct networkIdle event before unblocking waits. The issue is in clearLifecycle. We're looping through lifecycleEvents and setting the entry values to false. This actually doesn't help later when recalculateLifecycle is called as it doesn't take into account the value of the false entry, and instead emits all the lifecycle events it has seen so far (which were from a previous navigation) and unblocks waits too early.

The fix is:

Completely clear the frame's lifecycleEvents in clearLifecycle. This will ensure we unblock waits when a new lifecycle event is received for the new navigation.

There is now a third redundancy and a third question that needs answering -- if we navigate to youtube.com, it contains a hierarchy of frames, and so the "main" frame currently needs to wait for its child frames to receive the lifecycle events separately before it will itself action on lifecycle events and unblock waits. Again, i've not seen evidence where the child frames don't receive the events but the main frame does or vice-a-verse. So I think having this dependency on the main/child frames is unrequired, and we should add it back in once we have a real reason to do so.

Unanswered questions

The browser may send multiple lifecycle events to the same frame with the same type (e.g. DOMContentLoaded) when navigating to a page (e.g. youtube.com). We're currently receiving and actioning on the first event, but ignore all the same subsequent events -- should we be doing something with all the other events? I think we can only action on the first event as we can't predicate how the browser will behave and how many times the DOMContentLoaded will be fired.
By removing all three of the redundancies, it would make the lifecycle code a lot simpler, and we would just need to rely on the CDP events to dictate when to unblock waits. I have a feeling that this was added in due to out of order CDP events, which has now been resolved. If everyone agrees then I can go ahead and remove those redundancies and make things a lot simpler.

Issue

Test script:

import { chromium } from 'k6/x/browser';

export const options = {
  vus: 1,
  iterations: 1,
};

export default function () {
  const browser = chromium.launch({
    headless: false,
  });

  const context = browser.newContext();

  const page = context.newPage();

  page.goto('https://www.google.com', { waitUntil: 'networkidle' }).then(() => {
    console.log('done')
    return page.goto('https://www.google.com', { waitUntil: 'networkidle' }).then(() => {
      console.log('done')
    });
  }).finally(() => {
    browser.close();
  });
}

Version of xk6-browser where this test fails with the following error: 4485c6d

ERRO[0030] communicating with browser: websocket: close 1006 (abnormal closure): unexpected EOF  category=cdp elapsed="0 ms" goroutine=53
ERRO[0030] process with PID 41718 unexpectedly ended: signal: killed  category=browser elapsed="0 ms" goroutine=75
ERRO[0030] Uncaught (in promise) navigating to "https://www.google.com": timed out after 30s  executor=shared-iterations scenario=default

After the fix we don't see this timeout and xk6-browser correctly unblocks on wait when it receives the networkIdle lifecycle event.

common/frame_session.go

inancgumus · 2022-10-11T13:24:08Z

Before the fix the test would time out since the networkIdle event had already fired before the frame registered to the event loop. After the fix, the test passes.

Hey, Ankur! I couldn't reproduce the issue (neither before the fix nor after)—there was no timeout. I've changed the URL to the complex website's root URL. Could you help me to reproduce the problem?

imiric · 2022-10-11T15:29:26Z

@ankur22 I'm not sure I understand the description, and will have to try reproducing it myself to confirm. 🤯

But just wanted to say: kudos on trying to understand this mess of an informal state machine... 😮‍💨 IMO Frame.recalculateLifecycle(), Frame.onLifecycleEvent(), Frame.hasLifecycleEventFired(), Frame.hasSubtreeLifecycleEventFired(), and all related methods should be either heavily refactored or removed altogether. They're inherently racy, and incredibly difficult to reason about. It only becomes apparent they're broken when testing a complex site like that script has, but it's surprising they work even for simpler scripts.

imiric · 2022-10-17T10:52:53Z

I find all of this super confusing 😅

My main question is: why do we need {start,stop}NetworkIdleTimer() / clearLifecycle() / recalculateLifecycle() / has{Subtree,}LifecycleEventFired() at all? All of this seems to be done so we can keep track of the lifecycle state of each frame.

Frame lifecycle states are sequential and ordered: commit -> DOMContentLoaded -> load -> networkAlmostIdle -> networkIdle. This can be easily expressed as a state machine. They can be interrupted and loop again, as in the case of SPAs (also mentioned below), but this order should always be maintained. If we don't receive an event, or skip an event, then that's an error state and should be raised.

So I vote for a larger refactor to drop Frame.{lifecycle,subtreeLifecycle}Events and all these racy state calculation functions (Frame.onLoadingStopped() seems super hacky...), in favor of a simpler FSM implementation.

I looked at the proposed changes here, and they only raise more questions. E.g. why would increasing LifeCycleNetworkIdleTimeout to 1s be enough, and couldn't we run into the same issues when it's exceeded?

As for your questions:

The browser may send multiple lifecycle events to the same frame with the same type [...] should we be doing something with all the other events?

Yes, we need to be able to process those as well. Imagine a SPA that emits DOMContentLoaded, or maybe even networkIdle. Any internal navigation triggered by JS would cause us to receive those events again. (I assume so, and haven't confirmed it, but would be surprised if that isn't the case. We should have a SPA example site in that examples repo we have so that this can be reproduced.)

By removing all three of the redundancies, it would make the lifecycle code a lot simpler, and we would just need to rely on the CDP events to dictate when to unblock waits.

I think we should remove all of it, and not have any blocking waits. Easier said than done, of course, given how central this functionality is. But I think we would end up with a simpler, less racy, and more intuitive way of keeping track of frame states.

BTW, let's create an issue for this where we can discuss possible solutions, instead of doing this in a PR.

ankur22 · 2022-10-17T12:03:26Z

I find all of this super confusing 😅

Me too 😞 There's a lot of guess work as to why it's done this way and I think we should just remove most of it and and fix any issues that we may find -- which i doubt we will

My main question is: why do we need {start,stop}NetworkIdleTimer() / clearLifecycle() / recalculateLifecycle() / has{Subtree,}LifecycleEventFired() at all? All of this seems to be done so we can keep track of the lifecycle state of each frame.

Very good questions, here's what I think they all mean. NOTE: my answers are based on the fact that once upon a time CDP events were received out of order prior to #555 and a lot these are "fixes" to try and bring it back in order. I also think we can remove them all and just forward all incoming frame lifecycle CDP events to the handlers that are interested in them:

{start,stop}NetworkIdleTimer() -- The timer is started when we stop seeing requests being made by the page the test is navigating to. After 500ms it will try to emit networkIdle event to unblock any waiting navigation actions. The timer is stopped when the frame is destroyed, cleared, detached, and when the web page makes another request within the 500ms timeout. It's just a fail safe in place in case networkIdle isn't received via CDP from the browser after the last request made by the page.
clearLifecycle -- it's called once the page navigation ends. The assumption is that once the page has been navigated to, there's no need for that data now, and it prepares the frame for the next navigation.
recalculateLifecycle -- this will receive lifecycle CDP events, and allow these events to be forwarded to handlers if:
- Its child frames have already received the lifecycle event in question and emitted the event themselves first.
- It itself hasn't already forwarded (emitted) the lifecycle event to handlers.
- Forwards (emits) the lifecycle events to the main page -- Probably can remove this since no handlers are listening to this event.
has{Subtree,}LifecycleEventFired() -- this is needed to prevent emitting more than one of the lifecycle events to handlers, and also to help ensure that the child frames emit an event first before the main frame emits the event.

Frame lifecycle states are sequential and ordered: commit -> DOMContentLoaded -> load -> networkAlmostIdle -> networkIdle. This can be easily expressed as a state machine. They can be interrupted and loop again, as in the case of SPAs (also mentioned below), but this order should always be maintained. If we don't receive an event, or skip an event, then that's an error state and should be raised.

Agreed, and this is exactly how we're receiving events from the browser via CDP -- they're in order and all looks good. Why would we need a state machine?

So I vote for a larger refactor to drop Frame.{lifecycle,subtreeLifecycle}Events and all these racy state calculation functions (Frame.onLoadingStopped() seems super hacky...), in favor of a simpler FSM implementation.

Yep, I like this suggestion. Let me know if i should go ahead with it 🙂 Also, why do we need a FSM if the order of CDP events from the browser is correct?

I looked at the proposed changes here, and they only raise more questions. E.g. why would increasing LifeCycleNetworkIdleTimeout to 1s be enough, and couldn't we run into the same issues when it's exceeded?

I only increased it since it was firing too early. Yep, you're correct, it could still fire too early. I would prefer to remove it.

Yes, we need to be able to process those as well. Imagine a SPA that emits DOMContentLoaded, or maybe even networkIdle. Any internal navigation triggered by JS would cause us to receive those events again. (I assume so, and haven't confirmed it, but would be surprised if that isn't the case. We should have a SPA example site in that examples repo we have so that this can be reproduced.)

What's a SPA?

I think we should remove all of it, and not have any blocking waits. Easier said than done, of course, given how central this functionality is. But I think we would end up with a simpler, less racy, and more intuitive way of keeping track of frame states.

👍 One thing to point out, I don't know why we would need to keep track of the frame state. As long as we forward all relevant CDP lifecycle events to the frames, then they can unblock on waits and all's well. Am i missing something?

Further discussion for removing and simplifying lifecycle event in frames can be done in #593.

For now, it would be good if this was either approved if we're happy that the waits are unblocked when we receive a networkIdle lifecycle event or closed to go ahead with the larger refactor.

imiric · 2022-10-17T13:58:07Z

@ankur22 Thanks for the summary of the functionality. 👍

Some notes:

What's a SPA?

A single-page application. Though repeated lifecycle events could also be sent in a traditional multi-page application, if the frame navigates normally.
{start,stop}NetworkIdleTimer(): Right, so the timer is a workaround for either a) us missing the event, as seems to happen for google.com, which is why increasing the timeout works, or b) the case when the browser doesn't send the event, which could theoretically happen for SPA frames, where the network is never really idle.

The second is a tough problem we don't handle well at all now, and it would be good seeing how Playwright / Puppeteer handle this. Puppeteer, for example, has additional logic and custom networkidle0 and networkidle2 states to use depending on the site. This is understandably not flexible enough, and users were asking to be able to specify a custom time or number of responses to wait for. There are some JS workarounds in that thread, but we can consider a better implementation of this.

In any case, I think this is a secondary issue to what we're discussing here, and deserves a separate GH issue we can work on after this.
clearLifecycle(): Right, this seems to only be needed to "reset" the state, but it calls recalculateLifecycle(), so who really knows what the resulting state will be... 😕

In any case, it shouldn't be needed if we just leave the state according to whatever the last received event was, and assume that this could change at any point after a navigation event.
recalculateLifecycle(): Yeah, I would drop this altogether. Instead, just store the state according to whatever event was received. Unless I'm mistaken, no calculations or complex logic should be required...
has{Subtree,}LifecycleEventFired(): I'm still not sure why this is needed. 😕 We should emit as many lifecycle events as CDP events are received, and I'm not sure why we need the distinction of having to emit events from child frames first... We need some example sites and E2E tests for this, as I'm not following why we would need to do this.

Anyway, the whole idea of a method to determine whether a lifecycle event was fired seems redundant and racy. Redundant in the sense that it should be obvious that if the current state is networkIdle, then it can be assumed that DOMContentLoaded was received before it (otherwise we would've raised an error). So that check boils down to frame.state >= DOMContentLoaded. And racy in the sense that it doesn't specify for which document the event was expected. I.e. like we mentioned above, an event can be received multiple times, so are we checking that DOMContentLoaded was fired for the current document, or that it fired at any point in the past? From what I remember, I've run into bugs where this check passed because the event was received for some document in the past, but since we didn't clear it on time, it returned true for the current document, even though we hadn't received it yet. 😓

So, yeah, I'm uncertain about many of these assertions and hypotheses, so take them with a grain of salt, but I think we should try to simplify and remove all of this mess.

Agreed, and this is exactly how we're receiving events from the browser via CDP -- they're in order and all looks good. Why would we need a state machine?

I mean, it's implicitly a state machine anyway, which is what this spaghetti tries to model. We might as well make that explicit and be able to fail if, e.g., a wrong state transition is attempted. But you're right, the implementation doesn't have to be an FSM for this to work.

One thing to point out, I don't know why we would need to keep track of the frame state. As long as we forward all relevant CDP lifecycle events to the frames, then they can unblock on waits and all's well. Am i missing something?

Some structure around this would be helpful, if nothing else so that we can query in what state the frame is at any point in time. But you're right that we might not need it, so maybe the first step should be giving that barebones approach a try first.

For now, it would be good if this was either approved if we're happy that the waits are unblocked when we receive a networkIdle lifecycle event or closed to go ahead with the larger refactor.

I would rather not merge this as is, since we're not sure about the overall impact it might have on other sites, given the fragile state of this core feature. So my vote is to close and proceed with a larger refactor.

ankur22 · 2022-10-17T14:16:02Z

@imiric I think we're on the same page. Lots of what you have said are things that I've discovered myself, and somethings which are new (e.g. the links to puppeteer gives us some information as to why 500ms was possibly used in the first place).

I'm definitely up for removing as much as we can while maintaining the current behaviour. We will need to create new issues for SPAs when we come across them and can easily replicate them.

@inancgumus what do you think?

ankur22 · 2022-10-18T16:40:09Z

@inancgumus I think this will help you understand what the issues were and why the fixes were done. This is a table of events which I observed before any fixes:

| Log                                                                                        | nav to      | description                                                                                                                                                                                                                                                                                                                                                                               |
|--------------------------------------------------------------------------------------------|-------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| FrameSession:onPageLifecycle event:DOMContentLoaded                                        | about:blank | We recieve a DOMContentLoaded CDP event from the browser.                                                                                                                                                                                                                                                                                                                                 |
| Frame:recalculateLifecycle emit EventFrameAddLifecycle event:domcontentloaded              | about:blank | We now emit domcontentloaded to any listening handlers.                                                                                                                                                                                                                                                                                                                                   |
| FrameSession:onPageLifecycle event:load                                                    | about:blank | We recieve a load CDP event from the browser.                                                                                                                                                                                                                                                                                                                                             |
| Frame:recalculateLifecycle will not emit event:domcontentloaded                            | about:blank | Since we loop through all the events that we've recieved (which i believe is a reduancy that is unnecessary if we just work with the CDP events and not keep track of them) we won't emit this again.                                                                                                                                                                                     |
| Frame:recalculateLifecycle emit EventFrameAddLifecycle event:load                          | about:blank | We now emit load to any listening handler.                                                                                                                                                                                                                                                                                                                                                |
| FrameSession:onPageLifecycle event:networkIdle                                             | about:blank | We recieve a networkIdle CDP event from the browser.                                                                                                                                                                                                                                                                                                                                      |
| clearLifecycle                                                                             | about:blank | Navigation to about:blank is complete so clear the frame's lifecycle which in turn calls recalculateLifecycle.                                                                                                                                                                                                                                                                            |
| Frame:recalculateLifecycle will not emit SubtreeLifecycleEventFired event:domcontentloaded | about:blank | No need to emit domcontentloaded as we've cleared the lifecycle. If we stop tracking these events internally and just work with the CDP events we can completely remove this clear method.                                                                                                                                                                                                |
| Frame:recalculateLifecycle will not emit SubtreeLifecycleEventFired event:load             | about:blank | No need to emit load as we've cleared the lifecycle.                                                                                                                                                                                                                                                                                                                                      |
| FrameSession:onPageLifecycle event:DOMContentLoaded                                        | google.com  | We recieve a DOMContentLoaded CDP event from the browser.                                                                                                                                                                                                                                                                                                                                 |
| Frame:recalculateLifecycle emit EventFrameAddLifecycle event:domcontentloaded              | google.com  | We now emit domcontentloaded to any listening handler.                                                                                                                                                                                                                                                                                                                                    |
| Frame:recalculateLifecycle emit EventFrameAddLifecycle event:load                          | google.com  | We now emit load to any listening handler. This is wrong, we've not received the load CDP event yet. This issue exists because we're not clearing the lifecycleEvents correctly (2a6f6f63db74d0224d2baeeb408b67c43c890884 fixes this).                                                                                                                                                    |
| FrameSession:onPageLifecycle event:load                                                    | google.com  | We recieve a load CDP event from the browser.                                                                                                                                                                                                                                                                                                                                             |
| Frame:recalculateLifecycle will not emit SubtreeLifecycleEventFired event:domcontentloaded | google.com  | Obviously not going to emit domcontentloaded again as we've already done so.                                                                                                                                                                                                                                                                                                              |
| Frame:recalculateLifecycle will not emit EventFrameAddLifecycle event:load                 | google.com  | Not going to emit load again as we've already done so, but that's wrong.                                                                                                                                                                                                                                                                                                                  |
| FrameSession:onFrameStoppedLoading                                                         | google.com  | This is also a bug. We shouldn't rely on onFrameStoppedLoading which will set load, domcontentloaded and networkidle in lifecycleEvents, which it shouldn't need to do. Soon after this we get the response from the last request made by the page, but the networkIdle timeout cannot be started since networkIdle in lifecycleEvents has already been set due to onFrameStoppedLoading. |
| FrameSession:onPageLifecycle event:networkIdle                                             | google.com  | We ge this, but do nothing with it. We should call `fs.manager.frameLifecycleEvent(event.FrameID, LifecycleEventDOMContentLoad)` from `FrameSession` (fixed in 45cd802e94205e617eaf2bbd9870b4cbb5e06de1).                                                                                                                                                                                 |
| **TIMEOUT after 30secs**                                                                   |             |                                                                                                                                                                                                                                                                                                                                                                                           |

After the fixes have been applied we see this:

| log                                                                           | nav to      | description                                                                                                                                                                                                                                                   |
|-------------------------------------------------------------------------------|-------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| FrameSession:onPageLifecycle event:DOMContentLoaded                           | about:blank | We receive a DOMContentLoaded CDP event from the browser.                                                                                                                                                                                                     |
| Frame:recalculateLifecycle emit EventFrameAddLifecycle event:domcontentloaded | about:blank | We now emit domcontentloaded to any listening handlers.                                                                                                                                                                                                       |
| FrameSession:onPageLifecycle event:load                                       | about:blank | We receive a load CDP event from the browser.                                                                                                                                                                                                                 |
| Frame:recalculateLifecycle will not emit event:domcontentloaded               | about:blank | We don't re-emit domcontentloaded. Shouldn't even need to loop over received CDP events like this.                                                                                                                                                            |
| Frame:recalculateLifecycle emit EventFrameAddLifecycle event:load             | about:blank | We now emit load to any listening handlers.                                                                                                                                                                                                                   |
| FrameSession:onPageLifecycle event:networkIdle                                | about:blank | We receive a networkIdle CDP event from the browser. We're now doing something with this event.                                                                                                                                                               |
| Frame:recalculateLifecycle emit EventFrameAddLifecycle event:networkidle      | about:blank | We now emit networkidle to any listening handlers.                                                                                                                                                                                                            |
| Frame:recalculateLifecycle will not emit event:domcontentloaded               | about:blank | We don't re-emit domcontentloaded.                                                                                                                                                                                                                            |
| Frame:recalculateLifecycle will not emit event:load                           | about:blank | We don't re-emit load.                                                                                                                                                                                                                                        |
| clearLifecycle                                                                | about:blank | This time when we recalculateLifecycle is called, it has no events to loop over since lifecycleEvent has been completed cleared. In recalculateLifecycle, subtreeLifecycleEvents is left empty and not populated like before since the events slice is empty. |
| FrameSession:onPageLifecycle event:DOMContentLoaded                           | google.com  | We receive a DOMContentLoaded CDP event from the browser.                                                                                                                                                                                                     |
| Frame:recalculateLifecycle emit EventFrameAddLifecycle event:domcontentloaded | google.com  | We now emit domcontentloaded to any listening handlers.                                                                                                                                                                                                       |
| FrameSession:onPageLifecycle event:load                                       | google.com  | We receive a load CDP event from the browser.                                                                                                                                                                                                                 |
| Frame:recalculateLifecycle will not emit event:domcontentloaded               | google.com  | We don't re-emit domcontentloaded.                                                                                                                                                                                                                            |
| Frame:recalculateLifecycle emit EventFrameAddLifecycle event:load             | google.com  | We now emit load to any listening handlers.                                                                                                                                                                                                                   |
| FrameSession:onFrameStoppedLoading                                            | google.com  | In my local version of the this fix i've stopped it from setting load, domcontentloaded and networkIdle in lifecycleEvents. This solves another problem to do with simple sites which perform lots of requests.                                               |
| FrameSession:onPageLifecycle event:networkIdle                                | google.com  | We receive a networkIdle CDP event from the browser.                                                                                                                                                                                                          |
| Frame:recalculateLifecycle will not emit event:load                           | google.com  | We don't re-emit load.                                                                                                                                                                                                                                        |
| Frame:recalculateLifecycle emit EventFrameAddLifecycle event:networkidle      | google.com  | We now emit networkidle to any listening handlers.                                                                                                                                                                                                            |
| Frame:recalculateLifecycle will not emit event:domcontentloaded               | google.com  | We don't re-emit domcontentloaded.                                                                                                                                                                                                                            |
| clearLifecycle                                                                | google.com  | lifecycleEvent and subtreeLifecycleEvents completely cleared for the frame when navigation ends.                                                                                                                                                              |
| FrameSession:onPageLifecycle event:DOMContentLoaded                           | google.com  | We receive a DOMContentLoaded CDP event from the browser.                                                                                                                                                                                                     |
| Frame:recalculateLifecycle emit EventFrameAddLifecycle event:domcontentloaded | google.com  | We now emit domcontentloaded to any listening handlers.                                                                                                                                                                                                       |
| FrameSession:onPageLifecycle event:load                                       | google.com  | We receive a load CDP event from the browser.                                                                                                                                                                                                                 |
| Frame:recalculateLifecycle will not emit event:domcontentloaded               | google.com  | We don't re-emit domcontentloaded.                                                                                                                                                                                                                            |
| Frame:recalculateLifecycle emit EventFrameAddLifecycle event:load             | google.com  | We now emit load to any listening handlers.                                                                                                                                                                                                                   |
| FrameSession:onFrameStoppedLoading                                            | google.com  | In the local version this does nothing now.                                                                                                                                                                                                                   |
| FrameSession:onPageLifecycle event:networkIdle                                | google.com  | We receive a networkIdle CDP event from the browser.                                                                                                                                                                                                          |
| Frame:recalculateLifecycle will not emit event:domcontentloaded               | google.com  | We don't re-emit domcontentloaded.                                                                                                                                                                                                                            |
| Frame:recalculateLifecycle will not emit event:load                           | google.com  | We don't re-emit load.                                                                                                                                                                                                                                        |
| Frame:recalculateLifecycle emit EventFrameAddLifecycle event:networkidle      | google.com  | We now emit networkIdle to any listening handlers.                                                                                                                                                                                                            |
| wait in unblocked | google.com | test ends |

I also noticed that even without the fix in this PR, we could make the script pass by applying the following diff on 4485c6d.

Yes that does fix this issue. By doing this what will happen in this scenario (google.com) is that we will receive the last request response which the page has made (success or failure) and the timeout timer will start. After 500ms it will fire and recalculateLifecycle will be called, which will emit the networkIdle event, and will unblock the wait. Inbetween the last response and the timeout firing, we will have received a CDP networkIdle event from the browser, which will be ignored.

Although this works, I don't think we should be relying on the timeout in this scenario (i can see a scenario where a timeout would be necessary and I'm working on a test for it). We should be working directly with the CDP browser events where possible. We will definitely need to improve this implementation though where there are websites which poll frequently or are quite chatty for other reasons -- and in those scenarios it might be best to allow the user to determine what a safe timeout is instead of forcing 500ms.

Out of curiosity, I experimented with tracking frame lifecycle event addition and removal in this #594.

I don't believe you need to track them, and that solution has just makes things more complicated. If we clear lifecycleEvents by overwriting it with a new map (as i've done in 2a6f6f6) then when recalculateLifecycle is called straight after, the subtreeLifecycleEvents is cleared, allowing future navigations to correctly emit new load, DOMContentLoad and networkIdle events to the internal handlers, which in turn will unblock waits.

There are many issues with lifecycle events. In this PR i've tried to fix the networkIdle issue without a complete refactor. But, I've also come to the realisation that this lifecycle events code we currently have is too complicated and most of it should be deleted. I just need to create the tests to prove it 🙂

I would like to close this PR and create a new PR with three fixes (two of which you have already pointed out in your patch in this comment:

Clear the lifecycleEvents in clearLifecycle -- this will ensure that we don't emit the wrong events when navigating from about:blank to google.com or google.com to another site.
Call fs.manager.frameLifecycleEvent(event.FrameID, LifecycleEventNetworkIdle) from onPageLifecycle when we receive networkIdle CDP event -- this will ensure that we emit networkidle events to the internal handlers, which will unblock waits.
Not do anything with onFrameStoppedLoading -- this will ensure that we don't start emitting events to early before they're needed.

All fixes will have a test to lock in the behaviour.

Once that is done I will move onto refactoring the lifecycle events code, adding more tests where necessary to ensure that we don't regresse.

WDYT @inancgumus & @imiric?

inancgumus · 2022-10-18T19:17:51Z

The diff I shared in this comment was about demonstrating one part of the issue that I discovered after printing the events as you did. It wasn't a solution at all ;) I appreciated the detailed explanation ❤️

And #594 was a proper possible fix for our "current pubsub architecture". However, I agree that the solution was as complicated as the current architecture. So I'm OK to refactor and simplify the current architecture as long as we have proper tests in place 👍

I would like to close this PR and create a new PR with three fixes (two of which you have already pointed out in your patch in this comment:

It's up to you, but IMO, it's OK to merge this PR just with the diff I shared. Then you can continue where you left off in another PR.

By the way, I believe it's more effective to create issues first for laying out a problem, its possible solutions, and discussions around it. Then we can send as many PRs as possible until the issue closes.

ankur22 · 2022-10-19T09:12:53Z

It's up to you, but IMO, it's OK to merge this PR just with the #578 (comment) I shared. Then you can continue where you left off in another PR.

Cool, i'll just do the fix here and clean up the commits so that we only have the most relevant fixes and tests.

By the way, I believe it's more effective to create issues first for laying out a problem, its possible solutions, and discussions around it. Then we can send as many PRs as possible until the issue closes.

Yep, I will endeavour to open a new issue next time, even if the fix initially feels trivial.

inancgumus

I'll review this probably tomorrow: It seems like we have good tests now 👏 After skimming it a little, the only trivial thing that caught my eye was counters. IMO, we can make them atomic.Int64 or something instead of a sync.RWMutex.

When we test using WaitUntil, it is transformed from a go struct into a goja value. If we don't provide a struct tag, goja will transform the name to snake case (wait_until). Adding the js struct tag fixes this.

When a site that is navigated to emits a DOMContentLoaded well before load, this is a signal that we're waiting on assets to load over the network. In such a case (e.g. navigating to google.com) when we goto with waitUntil = networkIdle, due to how we handle such lifecycle events, we end up in a situation where the networkIdle event is ignored. This change ensure that we action on these events too. Partly fixes: https://github.com/orgs/grafana/projects/80/views/14

When we navigate to a page, lifecycle events were being stored. If a new navigation occurred, the previous events would be used and fired off to the internal handles. This is incorrect as it means that WaitUntil will fail to wait for the correct event after the initial navigation to a page. Partly fixes: https://github.com/orgs/grafana/projects/80/views/14

Some websites may have many network requests which are performed after the page has fully loaded and we receive a FrameStoppedLoading. For now we want to wait for the networkIdle event, and in the future if we need, we should add a timer for very chatty websites where we may never see a networkIdle event. Partly fixes: https://github.com/orgs/grafana/projects/80/views/14

inancgumus

It's awesome that we have tests for this behavior now 👏 🥳 LGTM!

I made some opinionated changes to make tests more readable and maintainable in #603. Please feel free to skip them, I don't mind. I thought it would be better to create another branch instead of reviewing the PR in place.

Changes:

Uses an atomic counter.
Moves home assertion to a helper closure.
IMO, editor support is useful when dealing with long HTMLs and scripts (especially when debugging later). So I moved two of the home page tests to static files.

ankur22 · 2022-10-24T12:28:04Z

Thanks for these changes.

Uses an atomic counter.

What's the benefit of this change?

Moves home assertion to a helper closure.

Looks good thanks, i'll cherry pick this change from your PR.

IMO, editor support is useful when dealing with long HTMLs and scripts (especially when debugging later). So I moved two of the home page tests to static files.

Not too sure about this, tbh. I prefer things inline. I'll make the change though if you have a very strong opinion on this.

inancgumus · 2022-10-24T12:41:06Z

Sure 👍

Not too sure about this, tbh. I prefer things inline. I'll make the change though if you have a very strong opinion on this.

IMHO, with HTMLs in a static folder, the test flow becomes clear, and we get editor support like syntax highlighting, completion, etc.—which can be useful when debugging it later. The downside is we will need to open another tab to see what's in the HTMLs. However, that's an easy problem to solve, we can always open the HTML in a side-by-side editor. It seems like a minor problem, at least for me.

All of my changes were suggestions, not strong opinions. As I explained in my previous comment, feel free to apply or skip them. When I have a strong opinion on something, I request changes instead of approving PRs ;)

imiric

Great work @ankur22! 👏

I'm not sure I understand the flow for all of these changes, but the sample script in #597 passes on this branch, but not on main, so it seems like an improvement.

FWIW the doesn't_timeout_waiting_for_networkIdle test passes for me on main, so it seems susceptible to timing. Nice job on the tests though 👍

ankur22 · 2022-10-26T10:41:03Z

Thanks for the review @imiric.

FWIW the doesn't_timeout_waiting_for_networkIdle test passes for me on main, so it seems susceptible to timing. Nice job on the tests though 👍

This is an annoying issue, which has caught me a third time now 😅 (including just now when i was debugging it after you mentioned it). The WaitUntil: common.LifecycleEventNetworkIdle, isn't being serialised to waitUntil and is instead being serialised to wait_until. On main that test isn't waiting for networkidle, and is instead waiting on load so that's why it passes. I had to first fix that serialisation issue in cd3f227 before being able to replicate the timeout issue.

ankur22 marked this pull request as draft October 10, 2022 17:04

ankur22 changed the title ~~Fix wait in networkIdle when event already fired~~ Fix wait for networkIdle when event already fired Oct 10, 2022

ankur22 added the bug Something isn't working label Oct 10, 2022

ankur22 self-assigned this Oct 10, 2022

ankur22 added this to the v0.6.0 milestone Oct 10, 2022

ankur22 mentioned this pull request Oct 11, 2022

The "complex" test fails on the very first step (page.goto) #542

Closed

ankur22 force-pushed the fix/networkidle branch from c818b3c to ff9f89d Compare October 11, 2022 12:54

ankur22 marked this pull request as ready for review October 11, 2022 13:00

ankur22 requested review from inancgumus, imiric and mstoykov October 11, 2022 13:00

ankur22 commented Oct 11, 2022

View reviewed changes

common/frame_session.go Show resolved Hide resolved

ankur22 force-pushed the fix/networkidle branch from ff9f89d to 519c86f Compare October 11, 2022 13:05

ankur22 changed the title ~~Fix wait for networkIdle when event already fired~~ Fix wait for networkIdle for complex scenarios Oct 11, 2022

ankur22 force-pushed the fix/networkidle branch 2 times, most recently from 90b87ab to 0887571 Compare October 11, 2022 13:18

ankur22 marked this pull request as draft October 11, 2022 15:01

ankur22 changed the title ~~Fix wait for networkIdle for complex scenarios~~ Draft: Fix wait for networkIdle for complex scenarios Oct 12, 2022

ankur22 force-pushed the fix/networkidle branch 3 times, most recently from 31f0952 to 37a64e3 Compare October 14, 2022 10:29

ankur22 changed the title ~~Draft: Fix wait for networkIdle for complex scenarios~~ Fix wait for networkIdle lifecycle event Oct 14, 2022

ankur22 marked this pull request as ready for review October 14, 2022 14:15

ankur22 mentioned this pull request Oct 17, 2022

Simplify lifecycle events for frames #593

Closed

inancgumus mentioned this pull request Oct 18, 2022

Add frame lifecycle event listener #594

Closed

ankur22 force-pushed the fix/networkidle branch from 37a64e3 to 98f7557 Compare October 19, 2022 14:19

ankur22 closed this Oct 19, 2022

ankur22 force-pushed the fix/networkidle branch from 5828d3e to 4485c6d Compare October 19, 2022 14:23

ankur22 reopened this Oct 19, 2022

ankur22 force-pushed the fix/networkidle branch 2 times, most recently from e8186bf to d81d9aa Compare October 19, 2022 16:03

ankur22 requested a review from inancgumus October 19, 2022 16:27

inancgumus reviewed Oct 20, 2022

View reviewed changes

ankur22 added 4 commits October 24, 2022 11:11

Update WaitUntil struct tag

cd3f227

When we test using WaitUntil, it is transformed from a go struct into a goja value. If we don't provide a struct tag, goja will transform the name to snake case (wait_until). Adding the js struct tag fixes this.

ankur22 force-pushed the fix/networkidle branch from f5158dd to 4a55816 Compare October 24, 2022 10:11

inancgumus approved these changes Oct 24, 2022

View reviewed changes

inancgumus added 2 commits October 25, 2022 15:38

Add assertHome helper to TestLifecycleNetworkIdle

abdd521

Move pages to static in TestLifecycleNetworkIdle

657bd49

inancgumus mentioned this pull request Oct 25, 2022

Fix/networkidle review #603

Closed

imiric approved these changes Oct 26, 2022

View reviewed changes

ankur22 merged commit 8f704ab into main Oct 26, 2022

ankur22 deleted the fix/networkidle branch October 26, 2022 12:52

imiric mentioned this pull request Oct 27, 2022

Navigation timeout with device_emulation.js example #538

Closed

imiric linked an issue Oct 27, 2022 that may be closed by this pull request

Navigation timeout with device_emulation.js example #538

Closed

ankur22 mentioned this pull request Nov 8, 2022

Refactor the frame's lifecycle event handling code #644

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix wait for networkIdle lifecycle event #578

Fix wait for networkIdle lifecycle event #578

ankur22 commented Oct 10, 2022 •

edited

Loading

inancgumus commented Oct 11, 2022 •

edited

Loading

imiric commented Oct 11, 2022

imiric commented Oct 17, 2022

ankur22 commented Oct 17, 2022 •

edited

Loading

imiric commented Oct 17, 2022

ankur22 commented Oct 17, 2022 •

edited

Loading

ankur22 commented Oct 18, 2022 •

edited

Loading

inancgumus commented Oct 18, 2022 •

edited

Loading

ankur22 commented Oct 19, 2022

inancgumus left a comment •

edited

Loading

inancgumus left a comment •

edited

Loading

ankur22 commented Oct 24, 2022

inancgumus commented Oct 24, 2022 •

edited

Loading

imiric left a comment

ankur22 commented Oct 26, 2022

Fix wait for networkIdle lifecycle event #578

Fix wait for networkIdle lifecycle event #578

Conversation

ankur22 commented Oct 10, 2022 • edited Loading

Details

Unanswered questions

Issue

inancgumus commented Oct 11, 2022 • edited Loading

imiric commented Oct 11, 2022

imiric commented Oct 17, 2022

ankur22 commented Oct 17, 2022 • edited Loading

imiric commented Oct 17, 2022

ankur22 commented Oct 17, 2022 • edited Loading

ankur22 commented Oct 18, 2022 • edited Loading

inancgumus commented Oct 18, 2022 • edited Loading

ankur22 commented Oct 19, 2022

inancgumus left a comment • edited Loading

Choose a reason for hiding this comment

inancgumus left a comment • edited Loading

Choose a reason for hiding this comment

ankur22 commented Oct 24, 2022

inancgumus commented Oct 24, 2022 • edited Loading

imiric left a comment

Choose a reason for hiding this comment

ankur22 commented Oct 26, 2022

ankur22 commented Oct 10, 2022 •

edited

Loading

inancgumus commented Oct 11, 2022 •

edited

Loading

ankur22 commented Oct 17, 2022 •

edited

Loading

ankur22 commented Oct 17, 2022 •

edited

Loading

ankur22 commented Oct 18, 2022 •

edited

Loading

inancgumus commented Oct 18, 2022 •

edited

Loading

inancgumus left a comment •

edited

Loading

inancgumus left a comment •

edited

Loading

inancgumus commented Oct 24, 2022 •

edited

Loading