Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent uBO from hiding html or body when matched by a generic cosmetic filter #1692

Closed
8 tasks done
peace2000 opened this issue Aug 22, 2021 · 42 comments
Closed
8 tasks done
Labels
enhancement New feature or request fixed issue has been addressed

Comments

@peace2000
Copy link
Member

peace2000 commented Aug 22, 2021

Prerequisites

I tried to reproduce the issue when...

  • uBO is the only extension
  • uBO with default lists/settings
  • using a new, unmodified browser profile

Description

There have been numerous occasions where Annoyance lists have blanked out websites because either html or body have contained values that have been matched by a generic cosmetic filter. This is a pretty common issue for Annoyance lists, though there have been a few occasions in normal adblocking as well.

Usually these issues have been fixed by adding :not(html) or :not(body) exception to the problematic generic filter.

I was wondering if it would be reasonable to add a safeguard measure to uBO: to prevent it from applying cosmetic filters, that have a match in html or body in websites where an user is visiting. I think that neither html or body should ever be blocked as that will result in a blank website.

One recent issue: easylist/easylist#8431 - https://webshop.elektroskandia.no/ was blanked out because body in that website had a value of .consent-summary-shown. It was matched by this generic GDPR banner filter: ##.consent-summary-shown. (It was later fixed by adding an exception: ##.consent-summary-shown:not(body)).

But that wasn't the only case. In Fanboy's Annoyance list, there are currently:

  • 260 :not(html) exceptions
  • 298 :not(body) exceptions

Adguard Annoyance:

  • 127 :not(html) exceptions
  • 160 :not(body) exceptions

Easylist:

  • 4 :not(html) exceptions
  • 7 :not(body) exceptions

I know these website blanking issues are mainly related to Annoyance lists that are not turned on by default in uBO, but they are still available and people use them. Not all issues get reported to filter list maintainers and there could be many unreported issues relating to these lists. Each :not(html) or :not(body) exception that currently exists, are related to fixing blank websites.

A specific URL where the issue occurs

https://webshop.elektroskandia.no/ (fixed now but this one is a recent case so I'll use it as a sample)

Steps to Reproduce

  1. Disable any possible Annoyance lists (to get rid of later added whitelistings)
  2. Add filter ##.consent-summary-shown to custom filters
  3. Go to https://webshop.elektroskandia.no/

Expected behavior

uBO would ignore this generic filter for this website, because it matches to the body element.

Actual behavior

Website is blanked, due to a match to the body element.

kuva

uBlock Origin version

1.37.3b13

Browser name and version

Firefox 91.0.1

Operating System and version

Windows 10

@gwarser gwarser added the enhancement New feature or request label Aug 22, 2021
@peace2000
Copy link
Member Author

peace2000 commented Aug 23, 2021

@felix-22 thanks for links!

Yeah adding :not(body):not(html) after each generic entry systemically, would pre-emptively help and would completely solve this issue, I even suggested that same thing recently: easylist/easylist#8367 though it would bloat filterlist a bit. I don't know about performance impact. But encountering this same issue over and over again (hundreds of :not(html) and :not(body) exceptions) feels unnecessary as it would be possible to do pre-emptive measures.

@kiboke
Copy link

kiboke commented Aug 23, 2021

The best thing to do is to adjust blockers not to block body and html elements. The question is, could all major blockers work together to achieve that?

@peace2000
Copy link
Member Author

peace2000 commented Aug 23, 2021

The best thing to do is to adjust blockers not to block body and html elements. The question is, could all major blockers work together to achieve that?

I hope so:

https://github.com/AdguardTeam/AdguardBrowserExtension/issues/1845
https://gitlab.com/eyeo/adblockplus/abc/adblockpluscore/-/issues/361

@gorhill
Copy link
Member

gorhill commented Aug 23, 2021

The best thing to do is to adjust blockers not to block body and html elements.

Majority of CSS selector-based cosmetic filters are enforced through user styles, this means internally each of those filters would need to be suffixed with :not(html):not(body), while ensuring they are still properly reported in the logger. It's not trivial.

@mjethani
Copy link

What if this is taken care of at the DOM surveyor level?

-            pendingNodes.add(document.querySelectorAll('[id],[class]'));
+            pendingNodes.add(document.querySelectorAll(':not(html):not(body)[id],:not(html):not(body)[class]'));

I'm guessing that this would cause issues for selectors containing descendant and child combinators.

@peace2000
Copy link
Member Author

peace2000 commented Aug 24, 2021

If this were to be implemented, it still should be possible to be able e.g. to target html or body via different styling rules or via scriplets, the only thing that should be prevented is using display: none !important to them.

@gorhill
Copy link
Member

gorhill commented Aug 24, 2021

it still should be possible to be able e.g. to target html or body via different styling rules or via scriplets

The way I see it the automatic exclusion of html/body should apply only to generic cosmetic filters which are made of a single class or id identifier, so this means you would still be able to target html or body by using these explicitly.

@gorhill
Copy link
Member

gorhill commented Aug 24, 2021

I'm guessing that this would cause issues for selectors containing descendant and child combinators.

This is an interesting idea. The issue of combinators maybe could be solved by having a separate reporting for ids/classes taken directly from html/body elements.

@mjethani
Copy link

I'm guessing that this would cause issues for selectors containing descendant and child combinators.

To clarify, let's say there's a document like this:

<body>
  <article></article>
</body>

And a filter like ##.page-body > article.

The document uses JS to add the page-body class to the body element.

If the DOM surveyor ignores the body element, the filter will never work?

@gorhill
Copy link
Member

gorhill commented Aug 24, 2021

If the DOM surveyor ignores the body element, the filter will never work?

When I say "separate reporting", I mean the ids/classes of html/body elements would be reported in special properties (not through these), and treated differently in the background process -- for these uBO would only look-up the complex set, and thus would be able to find and apply something like .page-body > article.

gorhill added a commit to gorhill/uBlock that referenced this issue Aug 24, 2021
Related issue:
- uBlockOrigin/uBlock-issues#1692

The ids/classes from html/body elements will leave out
looking up lowly generic cosmetic filters made of a single
identifier.

This does not absolutely guarantee that html/body elements
will never be targeted, but it should greatly mitigate the
probability that this erroneously happens.
@gorhill gorhill changed the title Prevent Ubo from hiding html or body when matched by a generic cosmetic filter Prevent uBO from hiding html or body when matched by a generic cosmetic filter Aug 24, 2021
@peace2000
Copy link
Member Author

peace2000 commented Aug 24, 2021

which are made of a single class or id identifier

There are some generics that have multiple values (samples from Fanboy's Annoyances):

##.js-stickyFooter.u-bottom0.u-fixed
##.u-zIndexMetabar.u-fixed
##.article-section > .ui-button-close
###coiOverlay[role="banner"][style*="flex"]

Though I agree that they are not very common.

@peace2000
Copy link
Member Author

peace2000 commented Aug 24, 2021

Installed 1.37.3b15 and tested with the site (webshop.elektroskandia.no) and with the sample rule I gave (##.consent-summary-shown), works fine, thanks!

@gwarser gwarser added the fixed issue has been addressed label Aug 24, 2021
@gwarser gwarser closed this as completed Aug 24, 2021
@gorhill
Copy link
Member

gorhill commented Aug 24, 2021

works fine, thanks

Thanks to @mjethani, this was a good idea to look at the content script angle -- until then I was looking at the background process angle which required a whole lot of changes -- it turns out the changes ended up being relatively trivial.

@uBlock-user
Copy link
Contributor

Fanboy's Annoyances

If I enable that list with the generic filter you posted, the issue can be repro'd again..

@peace2000
Copy link
Member Author

peace2000 commented Aug 24, 2021

Yep, strange... I can reproduce too.

@peace2000
Copy link
Member Author

If both of these:

##.consent-summary-shown
##.consent-summary-shown:not(body)

Are present, the site goes blank.

@gorhill
Copy link
Member

gorhill commented Aug 24, 2021

Investigating.


Ok, I figured why this happens, investigating best fix.

@uBlock-user uBlock-user removed the fixed issue has been addressed label Aug 24, 2021
@uBlock-user uBlock-user reopened this Aug 24, 2021
gorhill added a commit to gorhill/uBlock that referenced this issue Aug 24, 2021
@peace2000
Copy link
Member Author

Tested v.16, works without an issue!

@uBlock-user uBlock-user added the fixed issue has been addressed label Aug 24, 2021
@peace2000
Copy link
Member Author

peace2000 commented Aug 24, 2021

Wait...

This form still blanks the site:

##[class*="consent-summary-shown"]

Imo it would be safer to ignore also rules that match to the html or body and have either of these selectors: *, ^, $, ~, |.

Sample rules from Fanboy's Annoyances:

##div[class*="NewsletterSubscribe_"]
##div[class*="newsletter-signup_"]
##div[class^="BackToTop_"]
##div[class^="backToTop_"]

@gwarser
Copy link

gwarser commented Aug 24, 2021

Only simple #id and .class are ignored.

@mjethani
Copy link

Would there be any perf penalty for not applying to body/html?

Regarding the current solution, there are at least three ways to do it:

  1. Use :not(html):not(body) suffix (current)
  2. Use body prefix
  3. Filter out html and body nodes in JS

@kiboke
Copy link

kiboke commented Aug 26, 2021

Use body prefix

@mjethani Some websites inject stuff between head and body, so #2 would not be a perfect solution.

@mjethani
Copy link

… ABP unconditionally injects all of them.

Aside from higher CPU usage, this also causes a very high amount of memory usage. The last time I checked, using ABP causes your system to use more memory than if you didn't use an ad blocker. :not(html):not(body) might not be practical over there.

@gorhill
Copy link
Member

gorhill commented Aug 26, 2021

Use body prefix

Filter list author would disagree with this, there are cases of sites injecting elements as child of html.

Filter out html and body nodes in JS

Not sure what you mean, the rules are injected as user styles, JS is of no use in such case. I did consider having user styles to unhide body/html, but this does not really work as the blocker can't know which display style to assign the body element, a site could well use flex as value.

As far as uBO is concerned, :not(html):not(body) is the only valid solution for highly generic cosmetic filters, and would be quite trivial if there was no logger or DOM inspector.

I know roughly the path to implement this, but for the second part of dealing with highly generic cosmetic filters, this will have to wait for the next dev cycle. At least for now the most likely to cause false positives are being mitigated.

@gorhill
Copy link
Member

gorhill commented Aug 26, 2021

The last time I checked, using ABP causes your system to use more memory than if you didn't use an ad blocker.

Right, I tweeted about this a few months ago: https://twitter.com/gorhill/status/1345042527171325953.

I do not worry using :not(html):not(body) in uBO since it does not unconditionally inject all generic cosmetic filters. In any case, only benchmarking/measuring once the changes are made will tell whether this is an issue.

@mjethani
Copy link

Filter out html and body nodes in JS

Not sure what you mean, the rules are injected as user styles, JS is of no use in such case.

I was referring to the current solution with querySelectorAll(). It's possible to get all the nodes and then filter the result in JS. But it's unlikely that this would perform better than :not(html):not(body).

In any case, only benchmarking/measuring once the changes are made will tell whether this is an issue.

+1

@gorhill
Copy link
Member

gorhill commented Aug 26, 2021

Using querySelectorAll() with highly generic cosmetic filters has caused performance issue in the past, see #756, this is why the hidden element badge count in the popup panel executes only on-demand.

@mjethani
Copy link

If the DOM surveyor ignores the body element, the filter will never work?

This is what I meant:

<html>
  <body>
    <div class="bar">
      Hello.
    </div>
    <script>
      setTimeout(() => {
        document.body.className = 'foo';
      },
      3000);
    </script>
  </body>
</html>

Now I've added the filter ##.foo .bar and it never gets applied. This is because body is being ignored entirely, even though the target of the filter is the div element.

The filter ##.foo .bar should work, right?

@mjethani
Copy link

I did consider having user styles to unhide body/html, but this does not really work as the blocker can't know which display style to assign the body element, a site could well use flex as value.

I thought about this too. Aside from not knowing what value to use for the display property, it's also that a site might want to hide the body element for whatever reason, and this would force it to be shown at all times. Because a user rule with !important would always override any author rule.

@mjethani
Copy link

@mjethani Some websites inject stuff between head and body, so #2 would not be a perfect solution.

@kiboke if it's really outside body it wouldn't be visible, as far as I can tell.

<html>
  <body>
    Body.
  </body>
  <script>
    setTimeout(() => {
      let e = document.createElement('p');
      e.innerText = 'Hi!';
      document.documentElement.appendChild(e);
    },
    1000);
  </script>
</html>

@kiboke
Copy link

kiboke commented Aug 26, 2021

@kiboke if it's really outside body it wouldn't be visible, as far as I can tell.

@mjethani I just tried the code in Firefox and I do see Hi!

@gorhill
Copy link
Member

gorhill commented Aug 26, 2021

The filter ##.foo .bar should work, right?

Yes, but this is an edge case for uBO because its mutation observer does not listen to attribute changes. This is why :watch-attr() exists, to solve such case:

...##body.foo:watch-attr(class) .bar

@mjethani
Copy link

I looked at revert too.

Suppose this is the user style sheet:

.consent-summary-shown {
  display: none !important;
}

body.consent-summary-shown {
  display: revert !important;
}

I suspect that revert !important still overrides any value in the author style sheet.

It's also not clear how this would work with multiple user style sheets.

@krystian3w
Copy link

krystian3w commented Aug 27, 2021

would always override any author rule.

Then webmasters maybe start promote legacy userContent.css - win with addon API.


still overrides any value in the author style sheet.

IMO this is better implementation that oldie auto / initial. If I try reset cursor: revert I see useage CSS embeded into "browser" after hover on many elements instead photo or webaster idea for cursor.

So can break flexbox/grid pages too.

Default CSS Settings

Most browsers will display the <body> element with the following default values:

body {
  display: block;
  margin: 8px;
}

body:focus {
  outline: none;
}

This is why :watch-attr() exists, to solve such case

this explains part of the situations when on my PC the filter didn't hide the <body> and on gwaser PC it always did and then when on his it never hid and on mine it almost always in 2018-2020.

@krystian3w
Copy link

krystian3w commented Sep 6, 2021

Maybe * aka all URL-s also need similar disable breakage.

Now possible use * to avoid create specific filter with :watch-attr() if someone must check ID/Classes/attribs injected very late into dom tree/main nodes.


E.g.:

https://czyodebrac.pl/co-to-za-numer-dzwonil/2147483647/

on page no longer works these generic filter:

##.cli-modal-open #cookie-law-info-bar

#cookie-law-info-bar is not body/html tag but "<div>".

or have very Race Conditon in uBO 1.37.3rc0, but 1.37.2 injected filter almost immediately.

@gorhill
Copy link
Member

gorhill commented Jan 9, 2022

High profile case involving highly generic cosmetic filter: uBlockOrigin/uAssets#11244.

gorhill added a commit to gorhill/uBlock that referenced this issue Jan 12, 2022
@gorhill
Copy link
Member

gorhill commented Jan 12, 2022

Highly generic cosmetic filters should not longer affect html/body elements. To confirm, I used the original steps-to-reproduce, except with the following filter:

##[class*="consent-summary-shown"]

@uBlock-user uBlock-user added the fixed issue has been addressed label Jan 12, 2022
@mtxadmin
Copy link

I would propose to add article tag to html and body. Should I open a new issue?

@gorhill
Copy link
Member

gorhill commented Jan 12, 2022

Should I open a new issue?

No, doing what you suggest would just allow advertisers to use article to bypass cosmetic filters -- there is no restriction on the number of article tags in a page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request fixed issue has been addressed
Projects
None yet
Development

No branches or pull requests

10 participants