Default chromium headless user agent is blocked by some sites #1497

roobre · 2024-10-21T15:45:23Z

Feature Description

A customer reported that Synthetic Monitoring browser checks were not working, and instead of an error, metrics were returned for a chrome-error:// url:

probe_browser_web_vital_ttfb{url="chrome-error://chromewebdata/",scenario="ui",rating="good"} 0.39999999990686774

Upon investigation, we found that the particular URL that they were trying to hit had some kind of protection that denylisted the user agent chromium is using when running in headless mode:

17:27:00 ~ $> curl -vH 'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/130.0.0.0 Safari/537.36' $REDACTED
> GET /ch/de/ HTTP/1.1
> Host: $REDACTED
> Accept: */*
> User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/130.0.0.0 Safari/537.36
>
* Request completely sent off
* Recv failure: Connection reset by peer
* OpenSSL SSL_read: Connection reset by peer, errno 104
* closing connection #0
curl: (56) Recv failure: Connection reset by peer

This does not happen with other user agents.

Investigating the Chromium source code, we figures out that:

The "product" name (browser name) for headless mode is hardcoded: https://source.chromium.org/chromium/chromium/src/+/main:headless/lib/browser/headless_browser_impl.cc;l=57?q=%5C%22Headless&ss=chromium
There is a test asserting that it is not possible to change the user agent when running in headless mode: https://source.chromium.org/chromium/chromium/src/+/main:headless/lib/browser/headless_browser_impl_unittest.cc;l=40?q=%5C%22HeadlessChrome&ss=chromium

Experimentally, we verified that:

Using --headless causes chromium to send a different user-agent than normally, namely Chrome/130.0.0.0 is replaced by HeadlessChrome/130.0.0.0
Specifying --headless and --user-agent (which otherwise allows to use a custom user agent) yields to an empty string being sent as a user agent header, which is also blocked by some sites such as that reported by the user.
EDIT: The above is not true. I misunderstood that was the case from a mistake on how I supplied those options to chromium.

It looks like k6 allows configuring the user agent in code: https://grafana.com/docs/k6/latest/javascript-api/k6-browser/newcontext/

And as such, this issue can be workarounded by the user by creating a browser context like:

  const context = await browser.newContext({
    userAgent: "Totally legal Chromium/130.0.0"
  });

However

It is very difficult to arrive to the fix from the lack of errors received
Similar tools (reportedly, Dynatrace) do not have this problem (I suspect they are overriding the user agent as well), which lead users to believe our tool is faulty

Already existing or connected issues / PRs (optional)

https://github.com/grafana/synthetic-monitoring/issues/165

Tasks

Give feedback

Remove Headless from user agent #1536
Update docs
Update k6 release notes and tasks
Options

The text was updated successfully, but these errors were encountered:

roobre · 2024-10-31T16:43:35Z

An update on this:

Specifying --headless and --user-agent (which otherwise allows to use a custom user agent) yields to an empty string being sent as a user agent header, which is also blocked by some sites such as that reported by the user.

That turned out to not be true, but rather a mishap in my testing. Without that limitation, crocochrome now circumvents it by itself, so this is not a blocker for us: grafana/crocochrome#43

I still think however that this might be worth considering from the browser perspective.

inancgumus · 2024-11-01T13:44:03Z

@roobre We've discussed this, and we'll remove Headless from user agents in the next version.

inancgumus · 2024-11-13T14:46:17Z

There are two issues with the current issue:

It makes better sense to remove Headless by default, as discussed internally. If this becomes a problem, we can add the K6_BROWSER_OMIT_HEADLESS option feature later on top of this feature.
We need to repurpose this issue and narrow its scope, as adding K6_BROWSER_USER_AGENT_EXTRA (or K6_BROWSER_USER_AGENT_SUFFIX) is now in issue Allow adding a custom suffix to the default user agent #1509.

Other than these, the rest is fine, and #1536 is being reviewed. It will solve issues for many users by default.

Note: The issue description is updated.

@roobre

roobre added the feature A new feature label Oct 21, 2024

inancgumus added the next Might be eligible for the next planning (not guaranteed!) label Oct 21, 2024

inancgumus self-assigned this Nov 1, 2024

roobre mentioned this issue Nov 4, 2024

Allow adding a custom suffix to the default user agent #1509

Open

inancgumus mentioned this issue Nov 12, 2024

Remove Headless from user agent #1536

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default chromium headless user agent is blocked by some sites #1497

Default chromium headless user agent is blocked by some sites #1497

roobre commented Oct 21, 2024 •

edited by inancgumus

Loading

Tasks

roobre commented Oct 31, 2024

inancgumus commented Nov 1, 2024

inancgumus commented Nov 13, 2024 •

edited

Loading

Default chromium headless user agent is blocked by some sites #1497

Default chromium headless user agent is blocked by some sites #1497

Comments

roobre commented Oct 21, 2024 • edited by inancgumus Loading

Feature Description

Suggested Solution (optional)

Already existing or connected issues / PRs (optional)

Tasks

roobre commented Oct 31, 2024

inancgumus commented Nov 1, 2024

inancgumus commented Nov 13, 2024 • edited Loading

roobre commented Oct 21, 2024 •

edited by inancgumus

Loading

inancgumus commented Nov 13, 2024 •

edited

Loading