Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter user-tracking parameters from query string #3239

Merged
merged 1 commit into from
Sep 24, 2019

Conversation

fmarier
Copy link
Member

@fmarier fmarier commented Aug 22, 2019

Fixes brave/brave-browser#4239.

If a URL's query string includes one of the parameter names known
to track individual users, we remove them.

https://support.google.com/analytics/answer/7519794
https://stackoverflow.com/questions/52847475/what-is-fbclid-the-new-facebook-parameter
https://about.ads.microsoft.com/en-us/blog/post/january-2018/conversion-tracking-update-on-bing-ads
https://developer.mailchimp.com/documentation/mailchimp/guides/getting-started-with-ecommerce/#e-commerce-tracking-and-reports

Submitter Checklist:

Test Plan:

Visit the following URLs:

and verify that the query string is removed and that the URL bar only shows https://brave.com/.

Reviewer Checklist:

  • New files have MPL-2.0 license header.
  • Request a security/privacy review as needed.
  • Adequate test coverage exists to prevent regressions
  • Verify test plan is specified in PR before merging to source

After-merge Checklist:

  • The associated issue milestone is set to the smallest version that the
    changes has landed on.
  • All relevant documentation has been updated.

@fmarier fmarier added this to the 0.71.x - Nightly milestone Aug 22, 2019
@fmarier fmarier self-assigned this Aug 22, 2019
@fmarier fmarier force-pushed the francois-filter-query-string-4239 branch from 4dd3953 to 57a3a42 Compare August 22, 2019 20:32
@fmarier fmarier requested review from iefremov and jumde August 22, 2019 20:32
@fmarier
Copy link
Member Author

fmarier commented Aug 22, 2019

Note to reviewers: while the custom query string parsing code is unfortunate, Chromium's url::ExtractQueryKeyValue() (which actually comes from Mozilla) makes a lot of assumptions about how the query string is supposed to be structured and cannot distinguish between these:

  • ?foo
  • ?foo=
  • ?foo&

Given that the format of the query string is not specified in RFC 3986, some frameworks don't agree with the way that the Mozilla parser interprets the query string and instead considers ?foo an empty key with a value of foo.

My parser aims at keeping to a minimum the changes we make to the query string and handles seemingly invalid query strings just fine. I have tried to optimize for the cases where (1) there is no query string and (2) the query string doesn't include any of the trackers.

@fmarier fmarier requested a review from bridiver August 22, 2019 21:00
@fmarier fmarier force-pushed the francois-filter-query-string-4239 branch from 57a3a42 to 8045692 Compare August 23, 2019 00:54
Copy link
Contributor

@iefremov iefremov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides minors and nits, I think we should try to simplify the parsing part.

@fmarier
Copy link
Member Author

fmarier commented Aug 23, 2019

After a discussion on Slack with @iefremov and @bridiver , we came to the conclusion that we should replace the string manipulation code with regular expressions:

s/&(fbclid|gclid|msclkid|mc_eid)=[^&]+//g
s/^(fbclid|gclid|msclkid|mc_eid)=[^&]+&//g
s/^(fbclid|gclid|msclkid|mc_eid)=[^&]+$//g

@fmarier fmarier force-pushed the francois-filter-query-string-4239 branch 2 times, most recently from 20b255a to 63fd13f Compare August 23, 2019 23:14
@fmarier fmarier requested a review from iefremov August 23, 2019 23:19
@fmarier fmarier dismissed iefremov’s stale review August 23, 2019 23:48

All review comments addressed in latest revision.

@fmarier fmarier force-pushed the francois-filter-query-string-4239 branch from 63fd13f to 86b95c7 Compare August 24, 2019 00:41
@fmarier fmarier force-pushed the francois-filter-query-string-4239 branch from 86b95c7 to 10a9663 Compare September 10, 2019 22:02
@fmarier fmarier removed the request for review from jumde September 10, 2019 22:04
@fmarier fmarier force-pushed the francois-filter-query-string-4239 branch from 10a9663 to 281c794 Compare September 18, 2019 23:26
@iefremov iefremov self-requested a review September 23, 2019 06:30
iefremov
iefremov previously approved these changes Sep 23, 2019
…browser#4239)

If a URL's query string includes one of the parameter names known
to track individual users, we remove them.

We essentially apply the following to the query string:

    s/&(fbclid|gclid|msclkid|mc_eid)=[^&]+//g
    s/^(fbclid|gclid|msclkid|mc_eid)=[^&]+&//g
    s/^(fbclid|gclid|msclkid|mc_eid)=[^&]+$//g

https://support.google.com/analytics/answer/7519794
https://stackoverflow.com/questions/52847475/what-is-fbclid-the-new-facebook-parameter
https://about.ads.microsoft.com/en-us/blog/post/january-2018/conversion-tracking-update-on-bing-ads
https://developer.mailchimp.com/documentation/mailchimp/guides/getting-started-with-ecommerce/#e-commerce-tracking-and-reports
@fmarier fmarier force-pushed the francois-filter-query-string-4239 branch from 85b00d4 to 609f59d Compare September 23, 2019 22:37
@fmarier fmarier requested a review from iefremov September 24, 2019 00:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prevent tracking based on link decoration via query string or fragment
4 participants