Timeout error getting DevTools URL during browser launch #559

imiric · 2022-09-30T16:08:40Z

This happens very rarely on current main (db80f94), even on Cloud test runs.

In some cases while running the test script from #510 we see the following event logged:

launching browser: getting DevTools URL: timed out after 30s
	at reflect.methodValueCall (native)
	at file:///tmp/PvwCGT/script.js:28:34(6)
	at native
 executor=constant-vus scenario=default		test_run_id: 141246

The effect of this is that the iteration fails, and it gets manifested as a pause in execution for the VU while waiting for the default 30s timeout (see related issue #502). There's no reason this particular timeout should be that long, so we can shorten it, but also fix the root cause, which seems to be because of race condition between when the process starts, and us attaching the stdout listener to get the DevTools URL. Also, try to look into a more robust way of getting the URL that doesn't involve parsing stdout.

The text was updated successfully, but these errors were encountered:

inancgumus · 2022-09-30T16:35:42Z

Related: #491?

imiric · 2022-10-03T09:51:04Z

On second thought, I think this issue is a duplicate of #491. That context deadline exceeded is likely the same error as this timeout one, just before the logging improvements we did after v0.4.0.

So I'll close this issue and we can track it #491, since it also happens on Linux, though apparently much rarely than on WSL2. If it's easily reproducible on WSL2 then it would help us resolve it more quickly.

imiric · 2022-10-04T13:38:51Z

I'm reopening this issue, as the root cause is different from #491. While the errors are the same, #491 is caused by incorrectly handling when the browser exits with a non-0 exit code, in general, or maybe something specific to Snap. In either case, it's a different root cause than this issue, even though the errors are the same.

While #491 is always reproducible, this issue happens very rarely on some iterations, and we've only seen it in the Cloud. Since we ruled out the chances of a race condition in #563, the only explanation is that some environments do see a >30s delay to actually start the browser process. So it's likely that we can't do anything about it in the code, and may need to address this on the infra side. Let's leave this issue open until we decide.

inancgumus · 2022-10-05T07:44:58Z

So it's likely that we can't do anything about it in the code, and may need to address this on the infra side. Let's leave this issue open until we decide.

💡❓

In this case, one (previously discussed) idea can be utilizing Device Farms on AWS. So that we can have a farm of browsers that are pre-launched, connect to them for each test start, and mitigate the long time it takes to launch a browser. To do that, we'll also need to work on #17.

If using Device Farms turns out to be unfruitful, we might also want to evaluate/discuss pre-launched browsers without using Device Farms. For example, we can pre-launch the browsers and then connect to them for each test start. Instead of launching a new one for every test, we can connect to one of the available instances.

imiric · 2022-10-05T09:10:37Z

@inancgumus That architecture might address this issue, but it's still a long ways off, and will not be part of the initial public beta release.

Your second point is about instance reuse, which is a sensitive topic, as we must not allow data from previous test runs to be accessible to subsequent ones that use the same instance. Since we've had issues properly cleaning the data directory (#403, #484), instance reuse is currently disabled in the Cloud beta (i.e. new instances are created for each test run, and are terminated when the run ends). If we're going to reuse browsers between test runs, then it's even more critical that we handle this properly.

What we might want to explore is launching the browser before we start k6, and then connecting to it when the test starts.

There's a problem with this, though: BrowserType.connect() is a JS API, and depends on scripts actually using it. What do we do if the script calls launch() on every iteration, as they do now? Try to force the connection to an existing browser process anyway? Similarly with Browser.close(), do we disregard it and only disconnect? Or do we simply block users from using launch and close() in the Cloud? 😕

So there are many open questions and issues we need to resolve before we can consider #17 as a solution to this.

What I was referring to with addressing it on the infra side, is just using a different EC2 instance type. Instead of a general purpose one like m5, pick an IO-optimized one that ensures (or minimizes the chances of) not having to wait 30s for the browser process to launch. From my tests, CPU and RAM usage was fine, so it's likely this is storage IO related. I still have to run tests to confirm this, but in any case, it's worth talking to DevOps, as there might be some Linux optimizations we can consider as well.

inancgumus · 2022-10-05T12:53:48Z

Thanks, @imiric; these all are good points. It seems like there are a lot of things we need to consider, research, and evaluate.

imiric · 2022-10-11T10:01:25Z

After #555, we no longer see this issue in the Cloud 🎉 So I'm closing this.

That said, it's very unintuitive that #555 would be related to this issue, as during browser launch the event system hasn't been initialized yet, and no event handlers take part in this process... 😕 But maybe it impacted the launch indirectly? 🤷‍♂️ In any case, we can reopen this if it pops up again.

imiric added the bug Something isn't working label Sep 30, 2022

imiric closed this as not planned Won't fix, can't repro, duplicate, stale Oct 3, 2022

imiric mentioned this issue Oct 3, 2022

Timeout launching browser installed as a Snap package #491

Open

imiric reopened this Oct 4, 2022

imiric linked a pull request Oct 4, 2022 that will close this issue

Get DevTools URL from data dir file instead of by parsing stdout #563

Closed

imiric removed a link to a pull request Oct 4, 2022

Get DevTools URL from data dir file instead of by parsing stdout #563

Closed

imiric closed this as completed Oct 11, 2022

imiric added this to the v0.6.0 milestone Oct 11, 2022

imiric linked a pull request Oct 11, 2022 that will close this issue

Add a queue per handler to keep events in order #555

Merged

imiric mentioned this issue Oct 12, 2022

Move all not implemented APIs to return promises #586

Merged

inancgumus closed this as not planned Won't fix, can't repro, duplicate, stale Oct 26, 2022

inancgumus removed this from the v0.6.0 milestone Oct 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeout error getting DevTools URL during browser launch #559

Timeout error getting DevTools URL during browser launch #559

imiric commented Sep 30, 2022 •

edited

Loading

inancgumus commented Sep 30, 2022 •

edited

Loading

imiric commented Oct 3, 2022

imiric commented Oct 4, 2022 •

edited

Loading

inancgumus commented Oct 5, 2022 •

edited

Loading

imiric commented Oct 5, 2022

inancgumus commented Oct 5, 2022 •

edited

Loading

imiric commented Oct 11, 2022

Timeout error getting DevTools URL during browser launch #559

Timeout error getting DevTools URL during browser launch #559

Comments

imiric commented Sep 30, 2022 • edited Loading

inancgumus commented Sep 30, 2022 • edited Loading

imiric commented Oct 3, 2022

imiric commented Oct 4, 2022 • edited Loading

inancgumus commented Oct 5, 2022 • edited Loading

💡❓

imiric commented Oct 5, 2022

inancgumus commented Oct 5, 2022 • edited Loading

imiric commented Oct 11, 2022

imiric commented Sep 30, 2022 •

edited

Loading

inancgumus commented Sep 30, 2022 •

edited

Loading

imiric commented Oct 4, 2022 •

edited

Loading

inancgumus commented Oct 5, 2022 •

edited

Loading

inancgumus commented Oct 5, 2022 •

edited

Loading