Additional cleaning features #77

Cimbali · 2020-03-05T21:53:23Z

Right now CleanLink does the following:

clean URLs by rewriting paths or removing query parameters
detect embedded URIs and redirect or drop requests that contain them

I’ve just had to add another feature because Google was loading its URL redirect page in an iframe: we now “promote” the iframe request from www.google.com/url?url=… to the main frame in the tab. For now, this behaviour is hardcoded (see dbd58dc).

This could possibly be

detected,
generalized to more domains/iframe pages, and
maintained in the rules.

Other features are not implemented as they have not been required until now, such as:

matching query parameters on their value instead of (or additionally to) their key,
rewriting query parameters
any cleaning in the hash part of the URL (though we already detect embedded URLs in the hash if they start with #!/)

So please report in this issue any use cases for these or other link cleaning features and we’ll see about implementing them.

The text was updated successfully, but these errors were encountered:

Cimbali · 2020-04-01T10:08:46Z

So while we’re cleaning javascript links, some websites use onmousedown instead of onclick, like google (of course) which causes the iframe load I mentioned above, and also #101.

~~I believe we should remove onmousedown events from links, as those are diverting what the links should really do. Maybe some rules on websites to allow javascript actions are appropriate.~~
This is now implemented.

Modifies the common extract_javascript_link() to return an URL object. Fix #101, mentioned in #77. Also reinstate respecting the target or not, point 3 in #103

Cimbali · 2020-04-07T14:41:28Z

Suggested by @tarihci in #104: generic user-specified redirects (à la Redirector), e.g. reddit -> old.reddit, imgur.com -> imgurp.com, google.com -> google.{my favourite TLD}

Cimbali · 2020-04-08T10:57:28Z

Potential tracking hashes, suggested by @Rtizer-9 in #108 (comment):

A doubt @Cimbali:

Is the #4cff708c1aff in
https://www.forbes.com/sites/lisettevoytko/2020/03/13/twitters-most-liked-tweet-of-the-week-elon-musk-dismisses-coronavirus-fears/#4cff708c1aff

a tracking parameter? Coz even if you remove that from address bar and revisit the the page, it presents you with another hash.

The way this is suspected to work is:

visit the page, get a unique hash added to the URL
share the page or copy the URL, now the hash will inform the site where the link came from
proposed solution: add a “remove hash” rule and apply it to forbes.com

This will work on loading the page, and when performing a “copy clean link”.

Cimbali · 2020-04-14T10:16:46Z

I’ve seen another hash that was suspicious today, on an outgoing link from facebook. Additionally to the fbclid parameter there was #.XpWKwogtXjo.facebook appended to the URL

jawz101 · 2021-01-08T15:52:46Z

maybe recognize canonical url thingies? https://addons.mozilla.org/en-US/firefox/addon/canonical-link

e-t-l · 2021-10-29T15:34:12Z

@Cimbali apologies if this is a dumb or duplicate question, but I can't figure out whether this is already implemented: Does CL redirect AMP pages to non-AMP HTML?

(If not, the amp2html repo has some very straightforward code to do the job that you might be able to borrow. I believe it uses RegEx but I'm not super familiar with it)

Cimbali · 2021-10-29T16:07:27Z

@e-t-l There seem to be 2 types of redirections in that add-on:

cache redirection based on URLs (anything in redirector.js) roughly − we cover that by auto-detecting links in URLs. If the link doesn’t start with www. we may miss it, that’s a good use case for manual entries in the redirection preferences.
redirect from amp pages that display a « canonical » link to that link (what’s in amp2html.js) − we don’t do that. It’s unclear from the manifest’s whether this code is applied to all pages or only to bing / google / ampproject.

e-t-l · 2021-11-10T21:25:08Z

Would it be possible to add a sub-option to Link Tracking: "Disabled sites tracking"?

I'm finding that while CL is very helpful, it often breaks sites randomly, and it's not always clear why. It's gotten to the point where if anything doesn't work correctly on a site, the first troubleshooting I do is disable CL for that tab (and unfortunately, that does usually seem to fix it). I'd like to go through it in more detail, figure out what exactly CL didn't like, and fix it, but I usually don't have time to do that. I either leave CL disabled in that tab, or I'll add a quick whitelist rule for the entire domain, which is a more heavy-handed approach than I'd prefer.

If we had a history option to track what the current site was when the user disabled CL for the tab, then users like me could disable the tab when we just need to get stuff done, but later when we have more time, we could look back over this history and figure out how to handle the redirection properly.

P.S. Another related feature might be an option to submit rules directly from our local Rules page to the central repo. If a user finds that CL breaks a website and is able to construct an appropriate rule to mitigate the breakage, it would be great for the community if they could submit that to the masterlist of whitelist rules. The submission process would have to have some sort of review step so you and/or the community could confirm that it is indeed a good rule to include. But it seems like having this process integrated into the addon would be much simpler for us and you than having people open a new issue every time they want to contribute a rule as the Wiki currently suggests.

Cimbali · 2022-05-06T13:50:21Z

I’ve found another scheme of redirections where domain and sub-domain are separated:

https://consents.prismamedia.com/?redirectHost=https%3A%2F%2Fwww.domain.name&redirectUri=%2fpath%2to%2fpage

Cimbali · 2022-05-06T14:01:40Z

Thanks a lot for your feedback @e-t-l. I do find browsing with CL quite heavy myself (especially when it’s not the only potential cause as to why things go wrong).

Let me try to sum up your 3 suggestions to see if I understand them right:

allow to track cleaned links even without redirecting, i.e. “what would CL do” but without doing it.
maintain history beyond the current tab for later review
built-in method to contribute rules to the central repo.

Those would definitely all increase usability of CL, though with small caveats that I think in 1. it could often not be clear what would cause the site to break or not, and the risk for 1. and 2. is to use up a lot of resources to keep track of (potentially) cleaned links. Not to say it’s not doable of course.

With all 3 of those suggestions, they are initially not implemented in order to minimise the privacy impact of CleanLinks. I agree that it does impede usability but it would need some consideration to strike a balance between usability and privacy. Basically I’m wary of building a log that’s reflective of the browser history, even when used for good reasons afterwards.

Rtizer-9 · 2022-05-06T14:29:16Z

I raised this point very earlier in one of the issues that if CL is going to work on a whitelist basis intercepting every single link instead of following a blacklist like ClearUrls or adblocker filterlists then more often than not breakage is gonna happen because of the variety of redirections sites have employed.

CL definitely needs a bit of change to increase the usability.

Cimbali pinned this issue Mar 5, 2020

Cimbali added the question Further information is requested label Mar 30, 2020

Cimbali added a commit that referenced this issue Apr 1, 2020

Refactor injected script to handle onmousedown, etc

6f336db

Modifies the common extract_javascript_link() to return an URL object. Fix #101, mentioned in #77. Also reinstate respecting the target or not, point 3 in #103

This was referenced Apr 6, 2020

Twitter embeds not working properly. #108

Open

Interface improvements (+ bring back Catch Text Links) #104

Closed

Cimbali mentioned this issue Aug 12, 2020

Working together with other url cleaning extension creators? #63

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional cleaning features #77

Additional cleaning features #77

Cimbali commented Mar 5, 2020

Cimbali commented Apr 1, 2020 •

edited

Loading

Cimbali commented Apr 7, 2020

Cimbali commented Apr 8, 2020

Cimbali commented Apr 14, 2020

jawz101 commented Jan 8, 2021

e-t-l commented Oct 29, 2021 •

edited

Loading

Cimbali commented Oct 29, 2021

e-t-l commented Nov 10, 2021

Cimbali commented May 6, 2022 •

edited

Loading

Cimbali commented May 6, 2022

Rtizer-9 commented May 6, 2022 •

edited

Loading

Additional cleaning features #77

Additional cleaning features #77

Comments

Cimbali commented Mar 5, 2020

Cimbali commented Apr 1, 2020 • edited Loading

Cimbali commented Apr 7, 2020

Cimbali commented Apr 8, 2020

Cimbali commented Apr 14, 2020

jawz101 commented Jan 8, 2021

e-t-l commented Oct 29, 2021 • edited Loading

Cimbali commented Oct 29, 2021

e-t-l commented Nov 10, 2021

Cimbali commented May 6, 2022 • edited Loading

Cimbali commented May 6, 2022

Rtizer-9 commented May 6, 2022 • edited Loading

Cimbali commented Apr 1, 2020 •

edited

Loading

e-t-l commented Oct 29, 2021 •

edited

Loading

Cimbali commented May 6, 2022 •

edited

Loading

Rtizer-9 commented May 6, 2022 •

edited

Loading