Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade the default rules (2nd attempt) #134

Merged
merged 25 commits into from
Dec 25, 2020
Merged

Upgrade the default rules (2nd attempt) #134

merged 25 commits into from
Dec 25, 2020

Conversation

ArenaL5
Copy link
Contributor

@ArenaL5 ArenaL5 commented Aug 4, 2020

This is a rebase of PR #125, pushed to a different branch from master, so it will be easier to maintain. The old pull request was near-impossible to rebase.

As of writing this comment (4th of August, 2020) this string of commits builds on top of tumpio's most recent commit, 6999201.

Commits should be self-explanatory; if they turn out not to be, I can clarify before merging.

ArenaL5 and others added 18 commits August 4, 2020 17:38
Add rules for GMX, DuckDuckGo, Tumblr, YouTube, Amazon and Bing, and replicate functionality of Neat URL webextension at Smile4ever/Neat-URL
Now filters JPG, PNG, GIF and WEBP. It's also based in regular expressions, to filter URLs with different capitalizations (http://bad.site/TRACKER.GIF?id=666) and to register less false positives (http://good.site/giftrackingcounter?lang=en)

Co-authored-by: crssi <[email protected]>
Every request for an image is filtered now, regardless of name or file format. Exemptions for common crop and size parameters, and special rules for particular sites (Facebook, Instagram, WhatsApp) have been added.

Co-authored-by: crssi <[email protected]>
Co-authored-by: Geeknik Labs <[email protected]>
Also some optional whitelist rules to restore functionality in YouTube.
Add exemptions for:
- embedded interactive Google Maps in 3rd-party sites,
- any kind of image shown in DuckDuckGo search results,
- and map and aerial view tilesets following WMS and WMTS standards (like one would find while editing OpenStreetMap).
by whitelisting two innocuous URL parameters: `title` and `wpCaptchaId`.
exempt common parameters found in websites that use IDs for the picture, instead of static paths (as found in region government page www.xunta.gal).
They can be later targeted by a different rule if necessary.
Trimming any of these parameters blocks captchas when creating an email account at Microsoft
Some webpages (mis)use it for size selection, and some pictures in Twitter don't load without this.
- Dafont (font shopping and typesetting tests)
- Fontstruct (idem)
- Fontshop (idem)
- SignBank (transcription of sign languages)
GMX Mailbox makes extensive use of the SID parameter and will not work otherwise. It will refuse to load importart content or enter a redirection loop.

The other rules are for Reddit, maps in Facebook, Google-based embedded maps, and the Ubuntu wiki.
- Add more whitelisted URL parameters to the general image filter, in particular, PHP parameters used by Oracle's Site-Satellite cache.
- Improve support for maps embedded in Facebook.
- Trim unnecessary URL parameters in embedded Google maps.
Most whitelist rules are now separate from the general image filter. They're also logged by default.

The URL exclusion in the general image filter is best used for requests that will be processed by another filter.
@codecov-commenter
Copy link

codecov-commenter commented Aug 4, 2020

Codecov Report

Merging #134 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #134   +/-   ##
=======================================
  Coverage   99.52%   99.52%           
=======================================
  Files          10       10           
  Lines         629      629           
=======================================
  Hits          626      626           
  Misses          3        3           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6999201...c3db43c. Read the comment docs.

- Whitelist rules completely reworked: they're no longer baked into the
filter rules, and are now more specific, easily disabled, and more
helpful (they are also always logged):
  * GMX web client whitelisted (uses `sid` parameter);
  * DuckDuckGo whitelisted (helpful, and built with privacy in mind);
  * CAPTCHAs are logged as a whitelist rule now, ensuring no parameters
  are removed when creating a new account;
  * user avatars and user karmas are a logged whitelist rule now for
  similar reasons;
  * Reddit external previews are whitelisted by necessity;
  * Youtube seekbar and thumbnail previews are folded into a single
  whitelist filter now.
- New filter to anonymize Reddit's banner and community images.
- New filter to avoid image downsamplers: when enabled, retrieves the
original picture from the original domain. Can be disabled.
- New filter for cdn.embedly.com (an unnecessary wrapper for embedding
videos seen, by instance, in the site Know Your Meme)
- Google Street View filter disabled by default as it breaks Street View
in Firefox ESR (works correctly in the most current desktop Firefox).
- Redundant filter deleted (for URL paramater `fbclid`).
- General image filter tweaked:
  * whitelisted parameters for a couple systems of signed URLs, like Amazon's
  (https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/private-content-signed-urls.html)
  and Facebook's (several URL parameters beginning with `_nc_`);
  * whitelisted innocuous parameters that don't break webpages, but
  result in a much longer log, like `*style`, `version`, `preview`
  and `i10c` (this last one is used for downsampling user avatars);
  * removed `url` from the whitelist (it will break some image in some
  random website but it is NOT worth it);
  * removed `wpCaptchaId` and `userid` as they are covered by the
  whitelist rules for user avatars and CAPTCHAs now.
- Some site-specific anti-redirector filters were merged into a generic
one. This new filter is effective against social network VK's
redirector.
- Tags are now more descriptive and unique for every filter.
- Facebook brands:
   * WhatsApp Web filters changed to fit the new avatar URLs.
   * Instagram's redirector is now accounted for.
   * More URL parameters used by Facebook blacklisted. `igshid` blacklisted globally.
- Removed `wprov` parameter by Wikipedia.
- Whitelisted more parameters for a different syntax for Amazon searches
- The optional YouTube filter now also blocks the "watchtime" images (unblocked recently at EasyPrivacy and AdGuard because of problems with logged accounts)
- General image filters:
   * Exceptions to this filter are clearer now.
   * Whitelisted URL parameters `bg` and `fg` (background and foreground colour) and `latex` (used for at least one LaTeX renderer, at WordPress). Also whitelisted `quality`, `sign`, `ssl`, `token-hash` and `token-time` (some of these are necessary at VK).
   * A second filter added for a gallery syntax (a PHP script to select an image based on numeric IDs). `uuid` is removed from the whitelist on the first filter.
- Whitelist rules:
   * Added rules for YouTube icons on profile pages (low impact) and LinkedIn (very high impact).
   * Images from `outlook.office.com` (Outlook's web client) and `www.osapublishing.org` (a site publishing scientific papers) are now allowed in same-domain policy.
   * Images from GettyImages, iStockPhoto, ImageBank and AltMetric are now allowed globally.
   * Tweaked rule for whitelisting avatars.
   * Rule for MoinMoin-powered wikis is now global.
   * ReCAPTCHA rule merged into general CAPTCHA rules.
   * Whitelisted maps and street view for Google, Bing, and HERE
Also, more whitelist rules, and tweaks to other rules.
URLs are very different, often change, and never use optional query parameters. Whitelisting the whole server will not cause the browsear to leak extra data to LinkedIn.
This filter should cut tracking parameters from hyperlinks at LinkedIn. Exceptions for special pages like lost password retrieval have been made.

Also, tweaks to other filters.
@codecov-io
Copy link

codecov-io commented Nov 7, 2020

Codecov Report

Merging #134 (124f5a4) into master (6999201) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #134   +/-   ##
=======================================
  Coverage   99.52%   99.52%           
=======================================
  Files          10       10           
  Lines         629      629           
=======================================
  Hits          626      626           
  Misses          3        3           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6999201...85ce45d. Read the comment docs.

-Amazon's cart
-Linkedin's password management and recovery screens
-Google's reCAPTCHA
@tumpio tumpio merged commit b75d0e0 into tumpio:master Dec 25, 2020
@ArenaL5 ArenaL5 deleted the rules branch June 27, 2021 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants