Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default chromium headless user agent is blocked by some sites #1497

Open
1 of 3 tasks
roobre opened this issue Oct 21, 2024 · 3 comments
Open
1 of 3 tasks

Default chromium headless user agent is blocked by some sites #1497

roobre opened this issue Oct 21, 2024 · 3 comments
Assignees
Labels
feature A new feature next Might be eligible for the next planning (not guaranteed!)

Comments

@roobre
Copy link
Member

roobre commented Oct 21, 2024

Feature Description

A customer reported that Synthetic Monitoring browser checks were not working, and instead of an error, metrics were returned for a chrome-error:// url:

probe_browser_web_vital_ttfb{url="chrome-error://chromewebdata/",scenario="ui",rating="good"} 0.39999999990686774

Upon investigation, we found that the particular URL that they were trying to hit had some kind of protection that denylisted the user agent chromium is using when running in headless mode:

17:27:00 ~ $> curl -vH 'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/130.0.0.0 Safari/537.36' $REDACTED
> GET /ch/de/ HTTP/1.1
> Host: $REDACTED
> Accept: */*
> User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/130.0.0.0 Safari/537.36
>
* Request completely sent off
* Recv failure: Connection reset by peer
* OpenSSL SSL_read: Connection reset by peer, errno 104
* closing connection #0
curl: (56) Recv failure: Connection reset by peer

This does not happen with other user agents.

Investigating the Chromium source code, we figures out that:

Experimentally, we verified that:

  • Using --headless causes chromium to send a different user-agent than normally, namely Chrome/130.0.0.0 is replaced by HeadlessChrome/130.0.0.0
  • Specifying --headless and --user-agent (which otherwise allows to use a custom user agent) yields to an empty string being sent as a user agent header, which is also blocked by some sites such as that reported by the user.
  • EDIT: The above is not true. I misunderstood that was the case from a mistake on how I supplied those options to chromium.

It looks like k6 allows configuring the user agent in code: https://grafana.com/docs/k6/latest/javascript-api/k6-browser/newcontext/

And as such, this issue can be workarounded by the user by creating a browser context like:

  const context = await browser.newContext({
    userAgent: "Totally legal Chromium/130.0.0"
  });

However

  • It is very difficult to arrive to the fix from the lack of errors received
  • Similar tools (reportedly, Dynatrace) do not have this problem (I suspect they are overriding the user agent as well), which lead users to believe our tool is faulty

Suggested Solution (optional)

SaaS offerings using xk6-browser would benefit from a way to change this user agent on behalf of the user (while still respecting an user override if specified in the options). This would help us:

  • Not fall into catch-all filters that users may not be aware to be in place, which can happen in large organizations or as part of "firewalling" or "protections" from third parties
  • Add our own parts to the user agent, e.g. Grafana-Synthetic-Monitoring, so users and systems can identify these request.

Ideally, this mechanism should not alter the rest of the user agent. As an example, we could define two new environment variables: (Updated by #1509).

Change the default user agent for headless chromium:

1c1
< Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/130.0.0.0 Safari/537.36
---
> Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36 Grafana-Synthetic-Monitoring/1.0.0

In this solution, if the user specifies userAgent in the browser.newContext options, the user's specified UA would take preference.

Already existing or connected issues / PRs (optional)

https://github.com/grafana/synthetic-monitoring/issues/165

Tasks

  1. inancgumus
@roobre roobre added the feature A new feature label Oct 21, 2024
@inancgumus inancgumus added the next Might be eligible for the next planning (not guaranteed!) label Oct 21, 2024
@roobre
Copy link
Member Author

roobre commented Oct 31, 2024

An update on this:

Specifying --headless and --user-agent (which otherwise allows to use a custom user agent) yields to an empty string being sent as a user agent header, which is also blocked by some sites such as that reported by the user.

That turned out to not be true, but rather a mishap in my testing. Without that limitation, crocochrome now circumvents it by itself, so this is not a blocker for us: grafana/crocochrome#43

I still think however that this might be worth considering from the browser perspective.

@inancgumus
Copy link
Member

@roobre We've discussed this, and we'll remove Headless from user agents in the next version.

@inancgumus
Copy link
Member

inancgumus commented Nov 13, 2024

There are two issues with the current issue:

  1. It makes better sense to remove Headless by default, as discussed internally. If this becomes a problem, we can add the K6_BROWSER_OMIT_HEADLESS option feature later on top of this feature.
  2. We need to repurpose this issue and narrow its scope, as adding K6_BROWSER_USER_AGENT_EXTRA (or K6_BROWSER_USER_AGENT_SUFFIX) is now in issue Allow adding a custom suffix to the default user agent #1509.

Other than these, the rest is fine, and #1536 is being reviewed. It will solve issues for many users by default.

Note: The issue description is updated.

@roobre

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new feature next Might be eligible for the next planning (not guaranteed!)
Projects
None yet
Development

No branches or pull requests

2 participants