Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MutationObserver with querySelector for elements #77

Closed
zbraniecki opened this issue Sep 18, 2015 · 28 comments
Closed

MutationObserver with querySelector for elements #77

zbraniecki opened this issue Sep 18, 2015 · 28 comments
Labels
needs implementer interest Moving the issue forward requires implementers to express interest

Comments

@zbraniecki
Copy link

MutationObserver is a great tool for writing small libraries that operate on specific types of elements.
Examples:

  • Libraries that extend the behavior of particular element
    ** Date/Time pickers on <input type="datetime">
    ** Additional API on some types of <link> elements
  • Libraries that modify custom elements (from outside extension of web-components)
  • Libraries that modify elements with certain attributes:
    ** Libraries that aid with specific aria-* values
    ** Libraries that handle localization of data-l10n-* elements

Currently, MutationObserver API is designed to aid code that wants to operate on all subtree operations of a given root.
But in all those cases the basic premise is that they want to handle specific set of elements. Their additions/removals and possibly modifications.

The result is that in all those cases that I encountered in my work there's a massive amount of noise generated because the library is fishing for a particular element type, while it has to retrieve all node additions/removals and filter them out manually.

In many cases, the ratio of nodes inserted (dynamically or by the parser) to elements the library wants to operate on is 100/1.

My understanding is that the platform is in much better position to optimize filtering the elements for MutationObserver than anything that the MutationObserver callback can do.
Additionally, even if the code logic would be the same, the sole fact that the mutations argument has to be constructed and then the callback for them has to be called, for many mutations that are not relevant to the MutationObserver is a waste of CPU time and memory/GC allocation.

My initial proposal would look something like this:

const observerConfig = {
  attributes: true,
  attributeFilter: ['data-l10n-id', 'data-l10n-args'],
  childNodes: true,
  querySelector: '*[data-l10n-id], *[data-l10n-args]'
}; 

var mo = new MutationObserver(onMutations);
mo.observe(this.doc, observerConfig);

function onMutations(mutations) {
  const targets = new Set();
  const removedTargets = new Set();

  for (let mutation of mutations) {
    switch (mutation.type) {
      case 'attributes':
        targets.add(mutation.target);
        break;
      case 'childList':
        mutation.addedNodes.forEach(node => targets.add(node));
        mutation.removedNodes.forEach(node => removedTargets.add(node));
        break;
    }
  }

  translateElements(targets);
  cleanupAfterElements(removedTargets);
} 
@smaug----
Copy link
Collaborator

So could you perhaps clarify the proposal a bit. How should adding an attribute work? what about removal? At which point should the selector run in each case?

@smaug----
Copy link
Collaborator

Oh, silly me, you want selector based filtering for child nodes only, not attributes (the example just happens to do something with attributes too). But similar question still: at which point should the selector run, especially in case of child node removal?

@zbraniecki
Copy link
Author

I don't know, and I don't think it matters that much. For all cases I can imagine, the only reason I want to observe removals is to clean up my own objects, so I just need to know that an element with certain data-l10n-id or certain data-l10n-format has been removed.

@smaug---- Does it answer your question?

@zbraniecki
Copy link
Author

And the reason I have both attributes and childNodes is that usually, when the task is to "cover nodes with certain attribute", then I will need to know when they are added/removed and when an already existing node has this attribute added/removed/modified. Which ends up looking like the example, and I think it's fine and allows for fairly good level of customization.

@smaug----
Copy link
Collaborator

It does matter quite a bit when the selector runs. When adding a child node selector can run
after we've added the node to child or it can actually run as late as just before delivering the MutationRecords, That might even be the case you want, since what if the attributes in the element have been modified after the element was added to the parent node but before MutationRecord delivery - the same selector might not select the element anymore. But then, running the selector so late would be rather big difference to how MutationObserver otherwise works... so I guess selector would need to run right after the element has been added to its parent, and the user of the MutationObserver needs to validate that the selector still applies when MutationRecord is delivered.

Removing a node would require, I guess, running selector before any DOM mutation has happened.

How to make this all perform well? selector matching isn't exactly cheap.
The performance of selectors was one criticism the old Shadow DOM spec got and it is now
using just simple slots.
@heycam @rniwa may have ideas about how to make this work fast.

But, do we want to use selectors. I think I'd be more comfortable with "raw DOM" data.
Filter out based on attributes in the element. childNodeAttributeFilter or some such.
That way the same list of filters could be used for two things, for attribute changes and child node changes.

var filter = ['data-l10n-id', 'data-l10n-args'];
var mo = new MutationObserver(yourCallback);
mo.observe(document, {
attributes: true,
attributeFilter: filter,
childNodes: true,
childNodeAttributeFilter: filter
});

That would filter out child nodes which don't have the relevant attribute, even in case innerHTML is used.

@heycam
Copy link
Contributor

heycam commented Oct 12, 2015

It might be simpler to have a separate selector observer mechanism, i.e. something that just watches for changes to the set of elements in a subtree that match the given selectors, and not be a filter on the existing fields on a MutationRecord.

Then we can just require the mutation observer records be generated whenever the element is restyled, and use the restyle process to do the "element now matching" / "element no longer matching" checks. It would mean that a sequence like:

observe e for ".x" change;
e.className.toggle("x");
e.className.toggle("x");

wouldn't generate a record, while:

observe e for ".x" change;
e.className.toggle("x");
e.offsetTop;
e.className.toggle("x");

would generate two (and possibly be coalesced? I think the way MutationObserver works allows for this kind of coalescing).

I don't think selector observers need to be more expensive than the selector matching we do for restyles, especially if we're modelling whether an element matches one of these selectors or not is done by pretending we have a style rule in the document with the relevant selector (and some hack to make it rooted at the root of the observation). I think the performance impact of adding a real style rule to a document being about the same as one for selector observers is reasonable.

The observer probably needs to maintain a set of elements that currently match, so that when an element is removed from the tree we remove it (and its descendants) from the set.

We'd also need to do something different for elements in display:none subtrees, which Gecko doesn't give styles unless script does a getComputedStyle, and we'd probably need to adjust the restyle process to get that to work...

@rniwa
Copy link
Collaborator

rniwa commented Oct 12, 2015

I don't think we want to use CSS selectors for filtering purposes. Evaluating selectors every time a node is added, removed, or modified is going to be unacceptably expensive.

Then we can just require the mutation observer records be generated whenever the element is restyled, and use the restyle process to do the "element now matching" / "element no longer matching" checks.

That would expose the timing of styling recalc. Since that's an area where different browsers optimize things differently, designing an interoperable API that doesn't degrade performance on all browsers is going to very hard.

In addition, all the use cases I've seen for filtering mutation records so far involved filtering nodes based on element names or attribute names (or combinations of the both in some rare cases). Given that I agree with @smaug---- that DOM-based approach is better. Also, it's not necessary for the filtering to be perfect. If we can weed out, say, 90% of all other nodes, then scripts can quite efficiently pick the one they need after that.

@smaug----
Copy link
Collaborator

Ah, indeed we may want to filter also based on nodeNames.

so something like
childNodeAttributeFilter: "class"
childNodeFilter: ["div", "span"]
Though, there is the issue with namespaces, since attribute filters are for null-namespace attributes only and element name based filters would need to be something else.

But perhaps for now we could just add attribute name based filter for child nodes.
childNodeAttributeFilter is a bit long name, any better suggestions?

@zbraniecki
Copy link
Author

If we can weed out, say, 90% of all other nodes, then scripts can quite efficiently pick the one they need after that.

Totally agree. :) All my use cases are about attribute based system. I can imagine HTML shims to use node filter as well, but we can probably do without full query selector for now.

@domenic
Copy link
Member

domenic commented Oct 12, 2015

My understanding is that the platform is in much better position to optimize filtering the elements for MutationObserver than anything that the MutationObserver callback can do.

I question this premise. Do we have any data, e.g. from a custom build of a popular rendering engine, to show this to be the case?

@smaug----
Copy link
Collaborator

Well, we would create a lot less garbage/cycle collectable objects if browser engine would do filtering.

@domenic
Copy link
Member

domenic commented Oct 12, 2015

I have no doubt about the theoretical benefits. (Indeed, for pretty much any API that allows authors to do things in JavaScript, you can say, "if we moved some of this to C++/Rust, it would generate less garbage.") I am questioning whether it actually shows up in real-world scenarios.

@zbraniecki
Copy link
Author

I instrumented Firefox OS localization framework that localizes nodes registered with data-l10n-id to record numbers.

Launching a single app (Settings) gave me:

  • 117 mutations
  • 254 added nodes
  • 5 removed nodes
  • 67 nodes that contain data-l10n-id

Navigating around the app (between 5 panels) gave me:

  • 427 mutations
  • 1264 added nodes
  • 36 removed nodes
  • 231 nodes that contain data-l10n-id

The additional problem we face is that the addedNode may be in fact a DOMFragment and we have to find nodes with data-l10n-id inside it, so we use... querySelector.

I just started working on a counter part to that API that brings Intl capabilities for elements. It will use data-intl-format attribute and will require pretty much the same MutationObserver heuristics.
The only problem is that the signal/noise ratio will be even lower.

And we have a shim for a color picker that only wants to register input[type=color], but has to listen to all mutations.

Does it bring an examples you've been looking for @domenic ?
If not, can you provide more information on what you are looking for? I'll try to get it.

@smaug----
Copy link
Collaborator

I don't know what you mean with DOMFragment, but you would still need to still look for
data-l10n-id nodes under the addedNodes even if childNodeAttributeFilter was added.
MutationObserver doesn't know about the subtrees of the added nodes.

@domenic
Copy link
Member

domenic commented Oct 12, 2015

@zbraniecki thanks for investigating, but that wasn't quite what I was wondering. I was wondering about speed (in seconds) for filtering in C++, as this issue proposes, vs. filtering in JS using suitably-efficient strategies (so e.g. looping and checking el.localName instead of using querySelector).

It's not worth it to add an API to the platform if the only benefit is less "noise" to the mutation observer callbacks; you can write a trivial wrapper library to handle that. The benefit would be if there are real performance advantages, which is why I was hoping to see benchmarks.

@zbraniecki
Copy link
Author

I launched a light-workload marionette test scenario on the app:

  • mutations: 667
  • addednodes: 2464
  • removednodes: 450
  • affectedl10nnodes: 407

that's signal/noise ratio of 0.14. That's a lot of resources going nowhere.

@zbraniecki
Copy link
Author

It's not worth it to add an API to the platform if the only benefit is less "noise" to the mutation observer callbacks; you can write a trivial wrapper library to handle that. The benefit would be if there are real performance advantages, which is why I was hoping to see benchmarks.

Gotcha. Ok. I'd need help from someone with platform experience to write a shim for me and I'll be happy to test that!

@zbraniecki
Copy link
Author

But also, notice that it's not only about speed in seconds. It's also about the amount of allocations and matching GCs.

My naive understanding is that there are four benefits:

  1. clearner code
  2. faster code
  3. less CPU cycles wasted
  4. less allocations/GCs

Wrapper could help with 1), you are asking about 2). Do we agree on 3 and 4?

@domenic
Copy link
Member

domenic commented Oct 12, 2015

Well, 3) is just the same as 2). But yes, 2/3 and 4 are both worth investigating.

I guess my issue is that 4 is a general problem with every web API ever; if you move portions of it into C++, there are less allocations and GCs. In the end why don't we just write all our code in webasm...

@zbraniecki
Copy link
Author

Yeah, I see your point. I believe that the problem with MutationObserver is that most use cases I saw in our platform are the opposite of what MutationObserver API is designed for.

It seems to be designed to help in cases where the mutation observer callback is interested in vast majority of nodes under given root element.

I see MutationObserver as a tool for scenarios where the number of nodes of interest compared to amount of "traffic" is very, very low.
L10n here is an exceptionally high ratio scenario, because one can argue that a lot of elements are localizable in a fully localizable UI.

But MutationObserver is also the only way to properly shim a color picker, file picker, any other widget, mentioned intl API etc. and those will literally affect 4-15 elements in a document that may contain hundreds of nodes.

In order to use MutationObserver efficiently as a shim, one would also want to enable it before parser is done with DOM and get the DOM from the Parser filtered. But that increases the number of nodes that can get mutations unless we can improve the filtering.

So, while I agree with the concern, I believe that MutationObserver is special here - an API with a great potential to enhance the Web stack, that currently enables doing so at an unreasonable cost.

@domenic
Copy link
Member

domenic commented Oct 12, 2015

Yeah, it makes sense to add one-off hacks for general problems if the payoff is worthwhile. I just think it's worth measuring first. So yeah, we want speed and memory usage benchmarks to show this would be a worthwhile platform feature for every browser to implement. That seems like the bare minimum to get cross-vendor interest.

@rniwa
Copy link
Collaborator

rniwa commented Oct 12, 2015

I don't think such an investigation is necessary to justify the API. A high noise-to-signal ratio itself is a problem worth solving, and we've always wanted to add a filtering mechanism for MutationObserver from the beginning. We never got around to it because each of us were distracted by other things.

In addition, without doing any investigation, I can tell you that creating a JS array with thousands of entries in the browser engine is expensive and will cause a GC churn. The fact MutationObserver records are objects is already bad, and we spent a lot of time minimizing the amount of information being collected for performance. This is why collecting old values for CharacterData and attribute value is optional. There is literally nothing controversial / mysterious here about the cost.

@smaug----
Copy link
Collaborator

So I was looking at http://www.w3.org/2008/webapps/wiki/MutationReplacement#NodeWatch_.28A_Microsoft_Proposal.29 again, and discussing with zbraniecki and I think one possible option here is to have a way to observe mutations, without actually creating the MutationRecords.
Then one can know that there is some mutation in DOM tree, and use querySelectorAll and effectively implement NodeWatch in JS. One could for example schedule querySelectorAll to run using rAF if there is a DOM mutation. In this l10n case, all the elements which have been handled could be marked with some attribute, say l10nHandled="true" and the selector would skip such elements.

var pending = false;
var mo = new MutationObserver(
function() {
if (pending) return;
pending = true;
window.requestAnimationFrame(function() {
pending = false;
var qa = document.querySelectorAll("");
translate(qa);
}
}
);
mo.observe(document, {childList: true, subtree: true, attributes: true, noRecordDelivery: true; });
That would effectively just optimize out tons of record objects (and engines wouldn't need to create bunch of js wrappers for the native nodes either). A question is that whether we need to optimize the records out, or can engines deal with that garbage well enough. noRecordDelivery would be rather trivial change so perhaps it is worth to add.

@zbraniecki
Copy link
Author

This is an early POC of how it might look like: https://bug1214026.bmoattachments.org/attachment.cgi?id=8672860 - I'll try to plug it into our OS and measure impact on performance. If it's good, then we can look into noRecordDelivery as a way to cut memory usage.

@ukari
Copy link

ukari commented Jul 23, 2016

zbraniecki, I thing I meet the same problem like yours.
In some case, you could use mutation.target to instead of 'querySelector' to decrease the calling numbers of mutation.
Here is my code

new MutationObserver(
    function (mutations)
    {
      mutations.map(
        function (mutation)
        {
          if (mutation.target != '[object HTMLBodyElement]' &&
              mutation.target != '[object HTMLScriptElement]' &&
              (mutation.removedNodes.length != 0 ||
              mutation.addedNodes.length != 0)) {
            Languages.update(); //Here is what you want to do
          }
        }
      );
    }
  ).observe(
    document.querySelector('body'),
    {attributes: false, childList: true, characterData: false, subtree:true}
  );

@rniwa
Copy link
Collaborator

rniwa commented Jul 23, 2016

I still think we should just add a filter based on element name & attribute name but adding an option to not receive individual records also seems like a good idea as well. I think we should do all those things.

@annevk annevk added the needs implementer interest Moving the issue forward requires implementers to express interest label Aug 23, 2016
@LeaVerou
Copy link

LeaVerou commented Mar 1, 2018

A huge use case for this would be attaching events to elements before DOMContentLoaded.
Currently, most scripts add events to elements on DOMContentLoaded which on slow networks and/or large pages can take quite a while. In the meantime, users see a UI that doesn’t work, and don’t know when it will start working (since DOMContentLoaded is not something a user can train to recognize).

MutationObserver has the potential to fix this very pervasive problem and give developers the power to attach events to elements as they are rendered. However, currently that would require looping over every single element and using .matches() on it, which would be very expensive in exactly the case where it matters most (pages with lots of HTML).

If the platform could do most of this filtering efficiently, then using MO for this would be a feasible solution. It doesn’t have to be the entire selector syntax, but at least tags, classes, ids, attributes (partial too), commas.

I wonder if we could mine jQuery code (since that has a very specific pattern of $(selector).eventName(callback) while in vanilla JS it’s done in many different ways) to see what types of selectors people use. It might be an even more restricted set.

@annevk
Copy link
Member

annevk commented Mar 13, 2018

I'm going to close this on performance grounds as well as it having objections from implementers. #398 is still something we can (and I think should) pursue. Extensions that are similar in scope we could also consider in a new issue, but full selector support is out.

The research @LeaVerou suggests seems like a great first step for any such issue. Lifting some of the burden from libraries and frameworks is definitely something implementers are interested in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs implementer interest Moving the issue forward requires implementers to express interest
Development

No branches or pull requests

8 participants