Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can redirecting/cleaning ad-blocked requests be avoided? #106

Open
Rtizer-9 opened this issue Apr 1, 2020 · 16 comments
Open

Can redirecting/cleaning ad-blocked requests be avoided? #106

Rtizer-9 opened this issue Apr 1, 2020 · 16 comments
Labels
enhancement New feature or request

Comments

@Rtizer-9
Copy link

Rtizer-9 commented Apr 1, 2020

Hey @Cimbali I was thinking about it for a long time; is it possible to make cleanlinks avoid wasting time in cleaning ads and tracking domains which are already blocked and better dealt with by the already installed dedicated adblockers?

You can consider these as my doubts instead of feature requests if they doesn't apply (maybe I'm misunderstanding the working of cleanlinks)

  1. How does cleanlinks actually deal with requests which are also being blocked by ublock origin, ghostery etc.?

  2. If the blocked requests which we see in log are indicator of them being already blocked then wouldn't it be better to just not include them in the log and avoid wasting time in intercepting and redirecting. Log always contain blocked ads requests which ALSO has a redirected results shown, so what's actually happening here?

  3. Do you think introducing a mechanism in cleanlinks would be good through which it can avoid intercepting such completely unnecessary ads/tracking/cookie... requests? Maybe it can be done by first checking the requests against a blacklist which only has hosts and NOT cosmetic filters. This way we can:

  • keep cleanlinks and adblockers (request blockers) working completely separate.

  • keep the cleanlinks log more clean coz now its not intercepting various requests and most probably not even wasting time in redirecting those already OR to-be-blocked requests. On github itself, cleanlinks always has numerous collector.githubapp entries even though they are always blocked by ublock and need not be redirected. Since, cleanlinks intercepts and redirects every request and not just cleans parameters, it also is prone to more entries in log. (I saw that parent-child url issue and think it can also benefit a lot if the log is devoid of as much entries as possible.)

  • adblockers also tend to use several scriptlets to work around anti-adblocking. I've not found a single case or I may have solved it another way without noticing, where cleanlink intercepts such requests and interferes with the adblockers working so I'm not saying that it's actually happening but don't you think it would be just better not to intercept those requests? Earlier, I had also asked the developer of clearurls to not block ad domains and then he introduced the option to avoid blocking doubleclick domain.

  • I went to several pages for eg: https://www.theguardian.com/world/live/2020/apr/01/coronavirus-live-news-us-deaths-could-reach-240000-un-secretary-general-crisis-worst-since-second-world-war-us-uk-europe-latest-updates and tested cleanlinks functionality both with and without adblockers and it indeed was intercepting as well as blocking various ads requests so what's actually going on here? How does cleanlinks "know" its and ad request and needs to be blocked?

@Cimbali
Copy link
Owner

Cimbali commented Apr 1, 2020

The only requests CleanLinks blocks full fill 2 conditions:

  • the are to domains other than your current page, and
  • they contain your current URL

CleanLinks is definitely not meant to be an ad-blocker, but it basically detects “for free” which requests leak your current address. Since it doesn’t make sense to re-load your current page as a script or an iframe inside the same page, it just cancels those requests.

I think the main issue here is the way in which Firefox applies the add-ons that either cancel or redirect each request. From the onBeforeRequest documentation:

When multiple blocking handlers modify a request, only one set of modifications take effect. Redirects and cancellations have the same precedence. So if you canceled a request, you might see another request with the same requestId again if another blocking handler redirected the request.

What I understand from that is that all requests are passed to all the add-ons. If there was a way to put CleanLinks only after an ad-blocker to reduce the number of requests it gets, I would be more than happy to do that.

There is also potential for clashes as you fear. If we redirect a request while an adblocker blocks it, it might get redirected instead. Not sure how much damage that can actually do, to be honest, probably not a lot?

The only thing I can think of off the top of my head − but I have no idea if it’s feasible − would to to query adblockers when we clean, and if they block that request, then do nothing on the CleanLinks side.

I must say that with uMatrix, a lot of the scripts that would go on to create a lot of noise for CleanLinks are blocked right away, so that’s a strict but rather workable setup.

@Cimbali
Copy link
Owner

Cimbali commented Apr 1, 2020

Also @Rtizer-9 I’d be glad to have your input on the mock-ups in #104 for (an attempt at) a more readable interface.

@Cimbali Cimbali changed the title Can redirecting/cleaning ads and tracking domains be avoided? Can redirecting/cleaning ad-blocked requests be avoided? Apr 1, 2020
@Cimbali Cimbali added the enhancement New feature or request label Apr 1, 2020
@Rtizer-9
Copy link
Author

Rtizer-9 commented Apr 2, 2020

query adblockers when we clean

I was exactly thinking about this but was not sure if it's even possible or not.

use uMatrix

Even though it's one of the best thing that has happened in the privacy community, you know that it's not something that everybody uses and given the huge amount of websites I visit, its really hard to configure uMatrix for each and every case. Many websites are badly developed and you need to turn various extensions off for them to work and obviously every cleanlinks user won't be using it.

Adblocker querying

I think that in this approach there would still be various cases where we will get various domains not intercepted by adblockers but still completely unnecessary to redirect (think about ad domains in filterlist which are whitelisted for specific websites coz blocking them breaks the website). Those unnecessary domains will still be intercepted and will clutter the cleanlinks log.

A great example of what I've in mind is already implemented in https://addons.mozilla.org/en-US/firefox/addon/skip-redirect/ and https://addons.mozilla.org/en-US/firefox/addon/remove-redirect

Skip redirect is one of the most prominent redirector and as you can see they have a separate blacklist and whitelist which consist of regex and can also whitelist on url basis (I think I read about this issue as well here).

If you look at their blacklist which it avoids intercepting
/abp /account /adfs /auth /cookie /download /login /logoff /logon /logout /oauth /preferences /profile /register /saml /signin /signoff /signon /signout /signup /sso /subscribe /verification

you can clearly see if you implement something along this line in the cleanlinks as well then the number of issues created here will come down to 60 or 70% maybe. Majority of the users whining about turning cleanlinks off again and again wouldn't have complained about it in the first place if such an implementation was there already which will avoid breaking many websites. If I'm right it can also lessen down the number of rules needed to be written for specific domains for logins/signup etc to work.

Well with that being said, I would also like to inform you that as I can recall from my experience of using that extension, I've also needed to turn it off for few cases but it can be solved easily by whitelisting or turning it off (off option is the topmost one).

Now just to make myself more clear, what I'm expecting from cleanlinks is:

  • just simply stop intercepting ANY kind of unwanted request which need not be intercepted whether ad or not. Its just not a good thing for a redirector to intercept such unwanted links coz it leads to various breakage as it can be seen from the number of issues created here.

  • as a result of above, it can also help in keeping log of cleanlinks more clean and tidy and also stop wasting resources in redirecting them.

@Cimbali
Copy link
Owner

Cimbali commented Apr 2, 2020

In theory it’s possible to query adblockers. In practice I didn’t see any onMessageExternal or onConnectExternal calls in uBlock and uMatrix, so I don’t think they provide any API to answer requests right now.

I think we already look at oauth, saml and sso URLs, but we can expand that list. Most of the others you propose seem to make sense to me, though I’m not sure what /abp and /adfs are used for.

I think the CleanLinks way would be to redirect anything that’s not whitelisted though. Temporary breakage is not that bad when you can edit your own whitelist.

So to sum it up (correct me if I’m wrong):

  • short term, improve general whitelist with any-domain patterns such as /signin etc.
  • long term, ask ad-blockers whether we need to bother with some requests or whether they’ll be blocked anyway.

@Rtizer-9
Copy link
Author

Rtizer-9 commented Apr 2, 2020

I'm more in favor of improving the general whitelist with more and more patterns which will benefit everyone in long turn by decreasing breakages.

Querying ad-blockers, if can be done easily and won't lead to problems in the long turn, then yeah why not?

I think improving whitelist, to the point requests just don't need to be queried with ad-blockers etc is better than going down the road of introducing more and more complex functionality into the extension and making it a complete Christmas tree. Other similar extensions don't deal with requests in such amount and that's one of the reason for their lesser breakage. Those whitelists in skip redirect has proven to work so that's a great approach really. I mean stopping login, signup requests redirection is the most logical thing to do.

Cleanlinks is on the way of becoming the most advanced redirector and one of the must have extension and therefore I think its more necessary that we focus on the features which benefit most people (less breakage) rather than listening to idiots who think the developer "owe" it to them (that "users just shouldn't need to disable add-on...its your duty...stop developing this add-on" thread was one of the most stupid thing I've ever read on GitHub). Adding or removing new functionality on a per issue basis is not a feasible thing to do as it hinders the "good" development and kind of only caters to useless requests which are only specific to a single user and can easily be taken care of by them for their "special" use cases.

I don't know if querying ad-blockers is really the way to go coz what if user have another similar extensions installed which "intercepts" adblockers and cleanlinks' communication. Ad-blockers blocks requests in very tricky ways and thus it's simply better not to touch those requests. On the other hand if there's way cleanlinks can coordinate with these kind of extensions which results in lesser breakage, less interference between addons, less resource consumption then yeah it will be great.

Just do what you think is better for the extension in the long run :)

PS: my google-fu told me that /abp and /adfs are some kind of authentication methods. Just search adfs authentication.

@Rtizer-9 Rtizer-9 closed this as completed Apr 2, 2020
@Rtizer-9
Copy link
Author

Rtizer-9 commented Apr 2, 2020

I'm unable to come up with any constructive feedback for #104 at the moment, at least right now and I think those new hierarchical implementation you suggested looks great. Uh, maybe there can be very small icon (just like those cross symbol) in side which allows copying or clicking of the requests...idk.

@Cimbali
Copy link
Owner

Cimbali commented Apr 2, 2020

Let’s leave this open, those are good suggestions. I’ll fix the short-term one ASAP and leave the other one as more of a wishlist thing.

On communicating with other add-ons:

  • either we use very simple message, e.g. send a messag that just contains the requestId and get back a boolean (whether the ad-blocker recommends blocking or not),
  • or we make it a communication channel (onConnectExternal)
    In both cases we need to specify the id of the add-on we’re talking to, so I suppose messages can’t be intercepted to fool CleanLinks into not cleaning some links. That’s worth checking.

@Cimbali
Copy link
Owner

Cimbali commented Apr 5, 2020

  1. On the short-term whitelist additions, I think /authorize could be added as well. I think all parameters should be whitelisted wholesale.
  2. On the “other requests” I didn’t mention that obviously users dan disable “Clean all outgoing HTTP Requests, instead of only top frame” to only have clicked links and redirects be cleaned.

@Rtizer-9
Copy link
Author

Rtizer-9 commented Apr 5, 2020

I think instead of giving users just two options of either being able to select all requests or just the top frame, the option can be more configurable like for eg that of ClearUrls.

The benefit here is user will be able to use request handling as per his use case and also a drawback with that top frame is that there are various redirections which happens with other requests like fonts etc which really needs a redirector extension. That was the main reason defaults are now changed from only top frame to others as well in ClearUrls.

Although for the time being, I accept , that option is really handy.

Let's see what happens after implementing the whitelist additions. I think after that we'll be able to look at things from a more clear perspective coz the breakage is going to be very less most probably and various other things will need a change (IMO less efforts coz we won't be needing them in the first place because of less errors).

Cimbali added a commit that referenced this issue Apr 5, 2020
Should generalise these patterns a lot and reduce many problems, as
suggested in #106.

Allow fine-tune google/youtube opt-outs, google docs signin, and
guardian tracking parameter.
@Rtizer-9

This comment has been minimized.

@Cimbali

This comment has been minimized.

@Rtizer-9

This comment has been minimized.

@Cimbali

This comment has been minimized.

@Rtizer-9

This comment has been minimized.

@Rtizer-9

This comment has been minimized.

@Cimbali

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants