Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include web discovery bundles in the brave extension #9381

Merged
merged 14 commits into from
Sep 22, 2021

Conversation

kkuehlz
Copy link
Contributor

@kkuehlz kkuehlz commented Jul 8, 2021

We'll vendor the code at sync time until we can port their build system

Resolves brave/brave-browser#18166

Test Plan

See following comment: #9381 (comment)

Submitter Checklist:

  • I confirm that no security/privacy review is needed, or that I have requested one
  • There is a ticket for my issue
  • Used Github auto-closing keywords in the PR description above
  • Wrote a good PR/commit description
  • Added appropriate labels (QA/Yes or QA/No; release-notes/include or release-notes/exclude; OS/...) to the associated issue
  • Checked the PR locally: npm run test -- brave_browser_tests, npm run test -- brave_unit_tests, npm run lint, npm run gn_check, npm run tslint
  • Ran git rebase master (if needed)

Reviewer Checklist:

  • A security review is not needed, or a link to one is included in the PR description
  • New files have MPL-2.0 license header
  • Adequate test coverage exists to prevent regressions
  • Major classes, functions and non-trivial code blocks are well-commented
  • Changes in component dependencies are properly reflected in gn
  • Code follows the style guide
  • Test plan is specified in PR before merging

After-merge Checklist:

Test Plan:

Copy link
Collaborator

@remusao remusao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the bundles seem to be missing (they are not injected directly from manifest.json but are needed to initialize instances of Worker dynamically):

I think we might need to figure out the value of this config.baseURL which is needed to find the path to the bundle at runtime (and enable the workers).

Optionally, these bundles come with sourcemaps, but maybe we can ignore that until we do some more build system work and move to the webpack stack.

@kkuehlz kkuehlz force-pushed the humanweb_bundled_extension branch 3 times, most recently from c049e5f to 7e34e60 Compare July 13, 2021 23:17
@kkuehlz kkuehlz force-pushed the humanweb_bundled_extension branch from 7e34e60 to fc4ad3b Compare July 14, 2021 04:27
common/pref_names.h Outdated Show resolved Hide resolved
@bridiver bridiver requested a review from petemill July 14, 2021 20:19
@kkuehlz kkuehlz force-pushed the humanweb_bundled_extension branch from fc4ad3b to 1353245 Compare July 14, 2021 21:12
@petemill
Copy link
Member

Should we at least put a more unobfuscated build / bundle output from the current humanweb source here?
Also if we really want the source in this repo (or as a DEP) in the short-term before we convert the humanweb code to webpack it would actually be pretty easy to make and have a build step which calls npm run build-humanweb and just run the same command humanweb does in its current repo.

@remusao
Copy link
Collaborator

remusao commented Jul 30, 2021

Should we at least put a more unobfuscated build / bundle output from the current humanweb source here?
Also if we really want the source in this repo (or as a DEP) in the short-term before we convert the humanweb code to webpack it would actually be pretty easy to make and have a build step which calls npm run build-humanweb and just run the same command humanweb does in its current repo.

Given the amount of work for v0 and unless there is strong opposition to this idea, I'd ideally prefer to keep the code in a separate repo for simplicity. I think there are also few advantages in having it separated for now (and open-source, of course):

  1. Single purpose repository which is independent from browser (it's "just" an extension and the scope is easier to reason about).
  2. CI and build/test flows already work (it would need to be ported/adapted to brave-core).
  3. Easier for people looking for technical documentation about human-web to find it.

I'm sure all of these can be solved but there are many other blocking items on the list of things to do.

Wdyt?

@diracdeltas
Copy link
Member

Given the amount of work for v0 and unless there is strong opposition to this idea, I'd ideally prefer to keep the code in a separate repo for simplicity. I think there are also few advantages in having it separated for now (and open-source, of course):

That is fine as long as people can verify that the bundled code is the same as the code in the repo, like via a reproducible build process that is documented in the open source repo.

@diracdeltas
Copy link
Member

Instead of including the bundled code in the git tree, you could have a build time script that pulls the open source code from the separate repo and bundles it, for instance.

@remusao
Copy link
Collaborator

remusao commented Aug 1, 2021

Instead of including the bundled code in the git tree, you could have a build time script that pulls the open source code from the separate repo and bundles it, for instance.

I guess we can publish releases from the open-source repository and include them in package.json/package-lock.json right? Then people know which tag is currently in use in Brave.

@kkuehlz kkuehlz force-pushed the humanweb_bundled_extension branch 2 times, most recently from 196c901 to ad01a71 Compare August 14, 2021 00:24
@kkuehlz
Copy link
Contributor Author

kkuehlz commented Aug 14, 2021

Instead of including the bundled code in the git tree, you could have a build time script that pulls the open source code from the separate repo and bundles it, for instance.

@diracdeltas did this in the newest push

@kkuehlz
Copy link
Contributor Author

kkuehlz commented Aug 16, 2021

@remusao I added the copywriter's text and the entry to the settings page, all tied up to extension enable/disable.

@kkuehlz kkuehlz requested a review from remusao August 16, 2021 22:59
Comment on lines 44 to 41
const pref = prefs.find(p => p.key === WEB_DISCOVERY_PREF_KEY)
if (pref) {
toggleWebDiscovery(pref)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. Since you already check if (pref ...) in the toggleWebDiscovery function maybe this can be simplified with:

Suggested change
const pref = prefs.find(p => p.key === WEB_DISCOVERY_PREF_KEY)
if (pref) {
toggleWebDiscovery(pref)
}
toggleWebDiscovery(prefs.find(p => p.key === WEB_DISCOVERY_PREF_KEY))

build/commands/lib/util.js Outdated Show resolved Hide resolved
Kevin Kuehler added 4 commits September 21, 2021 22:14
This way users that don't have the feature enabled won't incur the
performance cost of loading the script.

Includes webRequest optimizations in the web-discovery codebase.
@kkuehlz kkuehlz force-pushed the humanweb_bundled_extension branch from 8436c59 to 5991191 Compare September 22, 2021 05:14
Kevin Kuehler added 2 commits September 22, 2021 01:12
We inject if the following conditions are met:
  1. WDP pref is enabled
  2. Not loading a chrome:// resource
  3. Page is not loaded from bfcache (avoids double inject)
@kkuehlz kkuehlz force-pushed the humanweb_bundled_extension branch from 23f8c99 to 9a90817 Compare September 22, 2021 08:37
@atuchin-m
Copy link
Collaborator

I've pushed a patch from @remusao in the slack chat.

@remusao remusao force-pushed the humanweb_bundled_extension branch from 33db218 to 13c89fb Compare September 22, 2021 15:27
Copy link
Collaborator

@mkarolin mkarolin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chromium_src ++

const url = tab.url || tab.pendingUrl || ''
return url ? !/chrome:\/\//.test(url) : false
}
// We inject on `complete` since multiple `loading` events can fire.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually very late in the process. As discussed on slack I think we want webRequest.onCommitted instead

@kkuehlz kkuehlz merged commit 32d132d into master Sep 22, 2021
@kkuehlz kkuehlz deleted the humanweb_bundled_extension branch September 22, 2021 19:06
@remusao
Copy link
Collaborator

remusao commented Sep 22, 2021

Testing guidelines for QA:

  1. Launch browser and go to Settings.
  2. Check 1: Filter prefs by Web Discovery and check that there is a Web Discovery Project preference available:

image

  1. Check 2: The preference should be disabled by default.
  2. Open a new tab and visit brave://inspect/#extensions.
  3. Click on inspect below the line with name Brave

image

  1. Then visit the console tab (there are a bunch of logs which you can ignore as they come from other features of the Brave extension).
  2. Check 3: Enter the following expression in the console prompt: WDP.isRunning and press enter. You should see false being displayed.

image

  1. Now go back to the "Settings" page and turn Web Discovery Project on.
  2. Check 4: Entering WDP.isRunning again in the console should now display true.
  3. Check 5: Restarting the browser, should preserve the setting (it should stay true after restart; same after opting-out => it should stay false after restart once we disable Web Discovery Project).
  4. Back to the console window, select the cog on the right (bottom one) then select "Log XMLHttpRequests" (this will allow us to see network requests from the console directly.

image

  1. Enter the following command in the prompt:
WDP.modules['web-discovery-project'].background.webDiscoveryProject.patternsLoader.resourceWatcher.forceUpdate()
  1. You should see some network requests like this:
    image

  2. Then open a new tab in the browser and visit the following Google SERP: https://www.google.com/search?q=joker

  3. Close the tab once the page has fully loaded and go back to the console devtool.

  4. Run the following command:

WDP.modules['web-discovery-project'].background.webDiscoveryProject.strictQueries.map(x=>x.tDiff=0)
  1. Check 6: Wait up to ~20 seconds and you should see network requests being emitted like in the screenshot below.

image

The important bit here is the first call to similar URL you visited in the tab (double-fetch), then one or more calls to https://collector.wdp.brave.com/.

  1. In Settings, opting-out then restarting the browser, then running WDP.isRunning in console should show false. There should be no network requests seen for wdp.brave.com in the console.

@kjozwiak
Copy link
Member

Verification PASSED on Win 10 x64 using the following build:

Brave | 1.32.20 Chromium: 94.0.4606.54 (Official Build) nightly (64-bit)
-- | --
Revision | c8191a1d5cccbf64e8fe7269043f8ace8d74dd05-refs/branch-heads/4606@{#1130}
OS | Windows 10 Version 20H2 (Build 19042.1237)

Went through and verified the STR/Cases outlined via #9381 (comment) as per the following:

  • ensured that Web Discovery Project is disabled by default via brave://settings/search
  • ensured that clicking on Learn more opens https://brave.com/privacy/browser/#web-discovery-project without any issues
  • ensured that WDP.isRunning returns false when Web Discovery Project is disabled
  • ensured that WDP.isRunning returns true when Web Discovery Project is enabled
WDP.isRunning = false WDP.isRunning = true
runningFalse runningTrue
  • ensured that WDP.isRunning stays true if enabled after restarting the browser several times
  • ensured that WDP.isRunning stays false if disabled after restarting the browser several times
  • ensured that an error was displayed via the browser console when using WDP.modules['web-discovery-project'].background.webDiscoveryProject.patternsLoader.resourceWatcher.forceUpdate() but WDP.isRunning is false
  • ensured that WDP.modules['web-discovery-project'].background.webDiscoveryProject.patternsLoader.resourceWatcher.forceUpdate() displayed the correct data when WDP.isRunning is enabled/appearing as true
Disabled Output Enabled Output
disabledError enabledFetch

saveData

  • waited ~10min once WDP was disabled after confirming that WDP.isRunning is false and ensured that no calls to wdp.brave.com occurred

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Include web-discovery-project in Brave