Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(plugin-stealth): Add support for UA hints #413

Merged
merged 8 commits into from
Feb 2, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,11 @@
const { PuppeteerExtraPlugin } = require('puppeteer-extra-plugin')

/**
* Fixes the UserAgent info (composed of UA string, Accept-Language, Platform).
* Fixes the UserAgent info (composed of UA string, Accept-Language, Platform, and UA hints).
*
* If you don't provide any values this plugin will default to using the regular UserAgent string (while stripping the headless part).
* Default language is set to "en-US,en", default platform is "win32".
* Default language is set to "en-US,en", the other settings match the UserAgent string.
* If you are running on Linux, it will mask the settins to look like Windows. This behavior can be disabled with the `maskLinux` option.
*
* By default puppeteer will not set a `Accept-Language` header in headless:
* It's (theoretically) possible to fix that using either `page.setExtraHTTPHeaders` or a `--lang` launch arg.
Expand All @@ -28,43 +29,147 @@ const { PuppeteerExtraPlugin } = require('puppeteer-extra-plugin')
*
* // Stealth plugins are just regular `puppeteer-extra` plugins and can be added as such
* const UserAgentOverride = require("puppeteer-extra-plugin-stealth/evasions/user-agent-override")
* // Define custom UA, locale and platform
* const ua = UserAgentOverride({ userAgent: "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)", locale: "de-DE,de;q=0.9", platform: "Win32" })
* // Define custom UA and locale
* const ua = UserAgentOverride({ userAgent: "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)", locale: "de-DE,de" })
* puppeteer.use(ua)
*
* @param {Object} [opts] - Options
Niek marked this conversation as resolved.
Show resolved Hide resolved
* @param {string} [opts.userAgent] - The user agent to use (default: browser.userAgent())
* @param {string} [opts.locale] - The locale to use in `Accept-Language` header and in `navigator.languages` (default: `en-US,en;q=0.9`)
* @param {string} [opts.platform] - The platform to use in `navigator.platform` (default: `Win32`)
* @param {string} [opts.locale] - The locale to use in `Accept-Language` header and in `navigator.languages` (default: `en-US,en`)
* @param {boolean} [opts.maskLinux] - Wether to hide Linux as platform in the user agent or not - true by default
*
*/
class Plugin extends PuppeteerExtraPlugin {
constructor(opts = {}) {
super(opts)

this._headless = false
}

get name() {
return 'stealth/evasions/user-agent-override'
}

get dependencies() {
return new Set(['user-preferences'])
}

get defaults() {
return {
userAgent: null,
locale: 'en-US,en',
platform: 'Win32'
maskLinux: true
}
}

async onPageCreated(page) {
// Determine the full user agent string, strip the "Headless" part
let ua =
this.opts.userAgent ||
(await page.browser().userAgent()).replace('HeadlessChrome/', 'Chrome/')

if (
this.opts.maskLinux &&
ua.includes('Linux') &&
!ua.includes('Android') // Skip Android user agents since they also contain Linux
) {
ua = ua.replace(/\(([^)]+)\)/, '(Windows NT 10.0; Win64; x64)') // Replace the first part in parentheses with Windows data
}

// Full version number from Chrome
const uaVersion = ua.includes('Chrome/')
? ua.match(/Chrome\/([\d|.]+)/)[1]
: (await page.browser().version()).match(/\/([\d|.]+)/)[1]

// Get platform identifier (short or long version)
const _getPlatform = (extended = false) => {
if (ua.includes('Mac OS X')) {
return extended ? 'Mac OS X' : 'MacIntel'
} else if (ua.includes('Android')) {
return 'Android'
} else if (ua.includes('Linux')) {
return 'Linux'
} else {
return extended ? 'Windows' : 'Win32'
}
}

// Source in C++: https://source.chromium.org/chromium/chromium/src/+/master:chrome/browser/chrome_content_browser_client.cc;l=1187-1238
Niek marked this conversation as resolved.
Show resolved Hide resolved
const _getBrands = () => {
const seed = uaVersion.split('.')[0] // the major version number of Chrome

const order = [
[0, 1, 2],
[0, 2, 1],
[1, 0, 2],
[1, 2, 0],
[2, 0, 1],
[2, 1, 0]
][seed % 6]
const escapedChars = [' ', ' ', ';']

const greaseyBrand = `${escapedChars[order[0]]}Not${
escapedChars[order[1]]
}A${escapedChars[order[2]]}Brand`

const greasedBrandVersionList = []
greasedBrandVersionList[order[0]] = {
brand: greaseyBrand,
version: '99'
}
greasedBrandVersionList[order[1]] = {
brand: 'Chromium',
version: seed
}
greasedBrandVersionList[order[2]] = {
brand: 'Google Chrome',
version: seed
}

return greasedBrandVersionList
}

// Return OS version
const _getPlatformVersion = () => {
if (ua.includes('Mac OS X ')) {
return ua.match(/Mac OS X ([^)]+)/)[1]
} else if (ua.includes('Android ')) {
return ua.match(/Android ([^;]+)/)[1]
} else if (ua.includes('Windows ')) {
return ua.match(/Windows .*?([\d|.]+);/)[1]
} else {
return ''
}
}

// Get architecture, this seems to be empty on mobile and x86 on desktop
const _getPlatformArch = () => (_getMobile() ? '' : 'x86')

// Return the Android model, empty on desktop
const _getPlatformModel = () =>
_getMobile() ? ua.match(/Android.*?;\s([^)]+)/)[1] : ''

const _getMobile = () => ua.includes('Android')

const override = {
userAgent:
this.opts.userAgent ||
(await page.browser().userAgent()).replace(
'HeadlessChrome/',
'Chrome/'
),
acceptLanguage: this.opts.locale || 'en-US,en',
platform: this.opts.platform || 'Win32'
userAgent: ua,
platform: _getPlatform(),
userAgentMetadata: {
brands: _getBrands(),
fullVersion: uaVersion,
platform: _getPlatform(true),
platformVersion: _getPlatformVersion(),
architecture: _getPlatformArch(),
model: _getPlatformModel(),
mobile: _getMobile()
}
}

// In case of headless, override the acceptLanguage in CDP.
// This is not preferred, as it messed up the header order.
// On headful, we set the user preference language setting instead.
if (this._headless) {
override.acceptLanguage = this.opts.locale || 'en-US,en'
}

this.debug('onPageCreated - Will set these user agent options', {
Expand All @@ -73,7 +178,28 @@ class Plugin extends PuppeteerExtraPlugin {
})

page._client.send('Network.setUserAgentOverride', override)
} // onPageCreated
}

async beforeLaunch(options) {
// Check if launched headless
this._headless = options.headless
}

async beforeConnect() {
// Treat browsers using connect() as headless browsers
this._headless = true
}

get data() {
return [
{
name: 'userPreferences',
value: {
intl: { accept_languages: this.opts.locale || 'en-US,en' }
}
}
]
}
}

const defaultExport = opts => new Plugin(opts)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -120,22 +120,136 @@ test('stealth: navigator.languages with custom locale', async t => {
t.deepEqual(lang, 'de-DE')
})

test('stealth: navigator.platform with default platform', async t => {
const puppeteer = addExtra(vanillaPuppeteer).use(Plugin())
test('stealth: navigator.platform with maskLinux true (default)', async t => {
const puppeteer = addExtra(vanillaPuppeteer).use(
Plugin({
userAgent:
'Mozilla/5.0 (X11; Ubuntu; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.9.9999.99 Safari/537.36'
})
)
const browser = await puppeteer.launch({ headless: true })
const page = await browser.newPage()

const platform = await page.evaluate(() => navigator.platform)
t.true(platform === 'Win32')
})

test('stealth: navigator.platform with custom platform', async t => {
test('stealth: navigator.platform with maskLinux false', async t => {
const puppeteer = addExtra(vanillaPuppeteer).use(
Plugin({ platform: 'MyFunkyPlatform' })
Plugin({
userAgent:
'Mozilla/5.0 (X11; Ubuntu; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.9.9999.99 Safari/537.36',
maskLinux: false
})
)
const browser = await puppeteer.launch({ headless: true })
const page = await browser.newPage()

const platform = await page.evaluate(() => navigator.platform)
t.true(platform === 'MyFunkyPlatform')
t.true(platform === 'Linux')
})

const _testUAHint = async (userAgent, locale) => {
const puppeteer = addExtra(vanillaPuppeteer).use(
Plugin({ userAgent, locale })
)

const browser = await puppeteer.launch({
headless: false, // only works on headful
args: ['--enable-features=UserAgentClientHint']
})

const majorVersion = parseInt(
(await browser.version()).match(/\/([^\.]+)/)[1]
)
if (majorVersion < 88) {
return null // Skip test on browsers that don't support UA hints
}

const page = await browser.newPage()

await page.goto('https://headers.cf/headers/?format=raw')

return page
}

test('stealth: test if UA hints are correctly set - Windows 10', async t => {
const page = await _testUAHint(
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.9999.99 Safari/537.36',
'en-AU'
)
if (!page) {
t.true(true) // skip
return
}
const firstLoad = await page.content()
t.true(
firstLoad.includes(
`sec-ch-ua: "Google Chrome";v="99", " Not;A Brand";v="99", "Chromium";v="99"`
)
)
t.true(firstLoad.includes(`Accept-Language: en-AU`))

await page.reload()
const secondLoad = await page.content()
t.true(secondLoad.includes('sec-ch-ua-mobile: ?0'))
t.true(secondLoad.includes('sec-ch-ua-full-version: "99.0.9999.99"'))
t.true(secondLoad.includes('sec-ch-ua-arch: "x86"'))
t.true(secondLoad.includes('sec-ch-ua-platform: "Windows"'))
t.true(secondLoad.includes('sec-ch-ua-platform-version: "10.0"'))
t.true(secondLoad.includes('sec-ch-ua-model: ""'))
})

test('stealth: test if UA hints are correctly set - macOS 11', async t => {
const page = await _testUAHint(
'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.9999.99 Safari/537.36',
'de-DE'
)
if (!page) {
t.true(true) // skip
return
}
const firstLoad = await page.content()
t.true(
firstLoad.includes(
`sec-ch-ua: "Google Chrome";v="99", " Not;A Brand";v="99", "Chromium";v="99"`
)
)
t.true(firstLoad.includes(`Accept-Language: de-DE`))

await page.reload()
const secondLoad = await page.content()
t.true(secondLoad.includes('sec-ch-ua-mobile: ?0'))
t.true(secondLoad.includes('sec-ch-ua-full-version: "99.0.9999.99"'))
t.true(secondLoad.includes('sec-ch-ua-arch: "x86"'))
t.true(secondLoad.includes('sec-ch-ua-platform: "Mac OS X"'))
t.true(secondLoad.includes('sec-ch-ua-platform-version: "11_1_0"'))
t.true(secondLoad.includes('sec-ch-ua-model: ""'))
})

test('stealth: test if UA hints are correctly set - Android 10', async t => {
const page = await _testUAHint(
'Mozilla/5.0 (Linux; Android 10; SM-P205) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.9999.99 Safari/537.36',
'nl-NL'
)
if (!page) {
t.true(true) // skip
return
}
const firstLoad = await page.content()
t.true(
firstLoad.includes(
`sec-ch-ua: "Google Chrome";v="99", " Not;A Brand";v="99", "Chromium";v="99"`
)
)
t.true(firstLoad.includes(`Accept-Language: nl-NL`))

await page.reload()
const secondLoad = await page.content()
t.true(secondLoad.includes('sec-ch-ua-mobile: ?1'))
t.true(secondLoad.includes('sec-ch-ua-full-version: "99.0.9999.99"'))
t.true(secondLoad.includes('sec-ch-ua-arch: ""'))
t.true(secondLoad.includes('sec-ch-ua-platform: "Android"'))
t.true(secondLoad.includes('sec-ch-ua-platform-version: "10"'))
t.true(secondLoad.includes('sec-ch-ua-model: "SM-P205"'))
})
Original file line number Diff line number Diff line change
Expand Up @@ -6,19 +6,20 @@

- [class: Plugin](#class-plugin)

### class: [Plugin](https://github.com/berstend/puppeteer-extra/blob/e6133619b051febed630ada35241664eba59b9fa/packages/puppeteer-extra-plugin-stealth/evasions/user-agent-override/index.js#L41-L77)
### class: [Plugin](https://github.com/berstend/puppeteer-extra/blob/ab0047d1af7dc38412744abdb61bcfc35c42dc34/packages/puppeteer-extra-plugin-stealth/evasions/user-agent-override/index.js#L42-L203)

- `opts` **[Object](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Object)?** Options (optional, default `{}`)
- `opts.userAgent` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)?** The user agent to use (default: browser.userAgent())
- `opts.locale` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)?** The locale to use in `Accept-Language` header and in `navigator.languages` (default: `en-US,en;q=0.9`)
- `opts.platform` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)?** The platform to use in `navigator.platform` (default: `Win32`)
- `opts.locale` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)?** The locale to use in `Accept-Language` header and in `navigator.languages` (default: `en-US,en`)
- `opts.maskLinux` **[boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean)?** Wether to hide Linux as platform in the user agent or not - true by default

**Extends: PuppeteerExtraPlugin**

Fixes the UserAgent info (composed of UA string, Accept-Language, Platform).
Fixes the UserAgent info (composed of UA string, Accept-Language, Platform, and UA hints).

If you don't provide any values this plugin will default to using the regular UserAgent string (while stripping the headless part).
Default language is set to "en-US,en", default platform is "win32".
Default language is set to "en-US,en", the other settings match the UserAgent string.
If you are running on Linux, it will mask the settins to look like Windows. This behavior can be disabled with the `maskLinux` option.

By default puppeteer will not set a `Accept-Language` header in headless:
It's (theoretically) possible to fix that using either `page.setExtraHTTPHeaders` or a `--lang` launch arg.
Expand All @@ -42,11 +43,10 @@ puppeteer.use(stealth)

// Stealth plugins are just regular `puppeteer-extra` plugins and can be added as such
const UserAgentOverride = require('puppeteer-extra-plugin-stealth/evasions/user-agent-override')
// Define custom UA, locale and platform
// Define custom UA and locale
const ua = UserAgentOverride({
userAgent: 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)',
locale: 'de-DE,de;q=0.9',
platform: 'Win32'
locale: 'de-DE,de'
})
puppeteer.use(ua)
```
Expand Down