Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional cleaning features #77

Open
Cimbali opened this issue Mar 5, 2020 · 11 comments
Open

Additional cleaning features #77

Cimbali opened this issue Mar 5, 2020 · 11 comments
Labels
question Further information is requested

Comments

@Cimbali
Copy link
Owner

Cimbali commented Mar 5, 2020

Right now CleanLink does the following:

  • clean URLs by rewriting paths or removing query parameters
  • detect embedded URIs and redirect or drop requests that contain them

I’ve just had to add another feature because Google was loading its URL redirect page in an iframe: we now “promote” the iframe request from www.google.com/url?url=… to the main frame in the tab. For now, this behaviour is hardcoded (see dbd58dc).

This could possibly be

  • detected,
  • generalized to more domains/iframe pages, and
  • maintained in the rules.

Other features are not implemented as they have not been required until now, such as:

  • matching query parameters on their value instead of (or additionally to) their key,
  • rewriting query parameters
  • any cleaning in the hash part of the URL (though we already detect embedded URLs in the hash if they start with #!/)

So please report in this issue any use cases for these or other link cleaning features and we’ll see about implementing them.

@Cimbali Cimbali pinned this issue Mar 5, 2020
@Cimbali Cimbali added the question Further information is requested label Mar 30, 2020
@Cimbali
Copy link
Owner Author

Cimbali commented Apr 1, 2020

So while we’re cleaning javascript links, some websites use onmousedown instead of onclick, like google (of course) which causes the iframe load I mentioned above, and also #101.

I believe we should remove onmousedown events from links, as those are diverting what the links should really do. Maybe some rules on websites to allow javascript actions are appropriate.
This is now implemented.

Cimbali added a commit that referenced this issue Apr 1, 2020
Modifies the common extract_javascript_link() to return an URL object.
Fix #101, mentioned in #77.
Also reinstate respecting the target or not, point 3 in #103
@Cimbali
Copy link
Owner Author

Cimbali commented Apr 7, 2020

Suggested by @tarihci in #104: generic user-specified redirects (à la Redirector), e.g. reddit -> old.reddit, imgur.com -> imgurp.com, google.com -> google.{my favourite TLD}

@Cimbali
Copy link
Owner Author

Cimbali commented Apr 8, 2020

Potential tracking hashes, suggested by @Rtizer-9 in #108 (comment):

A doubt @Cimbali:

Is the #4cff708c1aff in
https://www.forbes.com/sites/lisettevoytko/2020/03/13/twitters-most-liked-tweet-of-the-week-elon-musk-dismisses-coronavirus-fears/#4cff708c1aff

a tracking parameter? Coz even if you remove that from address bar and revisit the the page, it presents you with another hash.

The way this is suspected to work is:

  • visit the page, get a unique hash added to the URL
  • share the page or copy the URL, now the hash will inform the site where the link came from
  • proposed solution: add a “remove hash” rule and apply it to forbes.com

This will work on loading the page, and when performing a “copy clean link”.

@Cimbali
Copy link
Owner Author

Cimbali commented Apr 14, 2020

I’ve seen another hash that was suspicious today, on an outgoing link from facebook. Additionally to the fbclid parameter there was #.XpWKwogtXjo.facebook appended to the URL

@jawz101
Copy link

jawz101 commented Jan 8, 2021

maybe recognize canonical url thingies? https://addons.mozilla.org/en-US/firefox/addon/canonical-link

@e-t-l
Copy link

e-t-l commented Oct 29, 2021

@Cimbali apologies if this is a dumb or duplicate question, but I can't figure out whether this is already implemented: Does CL redirect AMP pages to non-AMP HTML?

(If not, the amp2html repo has some very straightforward code to do the job that you might be able to borrow. I believe it uses RegEx but I'm not super familiar with it)

@Cimbali
Copy link
Owner Author

Cimbali commented Oct 29, 2021

@e-t-l There seem to be 2 types of redirections in that add-on:

  1. cache redirection based on URLs (anything in redirector.js) roughly − we cover that by auto-detecting links in URLs. If the link doesn’t start with www. we may miss it, that’s a good use case for manual entries in the redirection preferences.
  2. redirect from amp pages that display a « canonical » link to that link (what’s in amp2html.js) − we don’t do that. It’s unclear from the manifest’s whether this code is applied to all pages or only to bing / google / ampproject.

@e-t-l
Copy link

e-t-l commented Nov 10, 2021

Would it be possible to add a sub-option to Link Tracking: "Disabled sites tracking"?

I'm finding that while CL is very helpful, it often breaks sites randomly, and it's not always clear why. It's gotten to the point where if anything doesn't work correctly on a site, the first troubleshooting I do is disable CL for that tab (and unfortunately, that does usually seem to fix it). I'd like to go through it in more detail, figure out what exactly CL didn't like, and fix it, but I usually don't have time to do that. I either leave CL disabled in that tab, or I'll add a quick whitelist rule for the entire domain, which is a more heavy-handed approach than I'd prefer.

If we had a history option to track what the current site was when the user disabled CL for the tab, then users like me could disable the tab when we just need to get stuff done, but later when we have more time, we could look back over this history and figure out how to handle the redirection properly.

P.S. Another related feature might be an option to submit rules directly from our local Rules page to the central repo. If a user finds that CL breaks a website and is able to construct an appropriate rule to mitigate the breakage, it would be great for the community if they could submit that to the masterlist of whitelist rules. The submission process would have to have some sort of review step so you and/or the community could confirm that it is indeed a good rule to include. But it seems like having this process integrated into the addon would be much simpler for us and you than having people open a new issue every time they want to contribute a rule as the Wiki currently suggests.

@Cimbali
Copy link
Owner Author

Cimbali commented May 6, 2022

I’ve found another scheme of redirections where domain and sub-domain are separated:

https://consents.prismamedia.com/?redirectHost=https%3A%2F%2Fwww.domain.name&redirectUri=%2fpath%2to%2fpage

@Cimbali
Copy link
Owner Author

Cimbali commented May 6, 2022

Thanks a lot for your feedback @e-t-l. I do find browsing with CL quite heavy myself (especially when it’s not the only potential cause as to why things go wrong).

Let me try to sum up your 3 suggestions to see if I understand them right:

  1. allow to track cleaned links even without redirecting, i.e. “what would CL do” but without doing it.
  2. maintain history beyond the current tab for later review
  3. built-in method to contribute rules to the central repo.

Those would definitely all increase usability of CL, though with small caveats that I think in 1. it could often not be clear what would cause the site to break or not, and the risk for 1. and 2. is to use up a lot of resources to keep track of (potentially) cleaned links. Not to say it’s not doable of course.

With all 3 of those suggestions, they are initially not implemented in order to minimise the privacy impact of CleanLinks. I agree that it does impede usability but it would need some consideration to strike a balance between usability and privacy. Basically I’m wary of building a log that’s reflective of the browser history, even when used for good reasons afterwards.

@Rtizer-9
Copy link

Rtizer-9 commented May 6, 2022

I raised this point very earlier in one of the issues that if CL is going to work on a whitelist basis intercepting every single link instead of following a blacklist like ClearUrls or adblocker filterlists then more often than not breakage is gonna happen because of the variety of redirections sites have employed.

CL definitely needs a bit of change to increase the usability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants