-
Notifications
You must be signed in to change notification settings - Fork 779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(utils): greatly improve the speed of querySelectorAll #3423
Conversation
) { | ||
return node.actualNode.nodeName !== 'UL'; | ||
|
||
var tests = ['without cache', 'with cache']; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first 30 lines are new code to the file, the rest is whitespace difference.
// top-level node. but since the top-level node is the one | ||
// with the cache, this only works when we are testing the full | ||
// tree (i.e. without cache) | ||
if (describeName === 'without cache') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wrapping if
is also new to handle the test for shadowDOM id querying (which only works when not using the cache for this particular test).
@@ -128,6 +128,7 @@ function convertAttributes(atts) { | |||
return { | |||
key: attributeKey, | |||
value: attributeValue, | |||
type: typeof att.value === 'undefined' ? 'attrExist' : 'attrValue', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added to tell the difference between the selectors [href]
and [href=""]
(both returned the same structure of value: ''
even though one is only checking existence of the attribute and the other is checking for an empty string).
lib/core/utils/selector-cache.js
Outdated
* @returns {Boolean} | ||
*/ | ||
function isGlobalSelector(exp) { | ||
return exp.tag === '*' && !exp.attributes && !exp.id && !exp.classes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about psuedo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we don't cache the content of a pseudo selector, they are still considered a global selector such that *:not([foo])
will use the global selector cache and then matchesExpression
to match the pseudo selector.
lib/core/utils/selector-cache.js
Outdated
// use the last part of the expression to find nodes as it's more | ||
// specific. e.g. for `body h1` use `h1` and not `body` | ||
const exp = expression[expression.length - 1]; | ||
let nodes = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to start off with the entire list and then filter down. Rather than start with an empty list and then either add or remove depending on if the list is currently empty. The way you've done this now can create problems if at one point you filter down to 0 nodes half way through the operation.
For example, lets say you have a[name].foo
.
- Start with an empty
nodes
array - Put all
a
elements intonodes
- Filter
nodes
, removing any that doesn't have aname
attribute - Filter
nodes
, removing any that don't have afoo
class
If during step 3, the nodes
array becomes empty, in step 4 you'll get all elements with the foo
class in your results. And because this isn't a complex selector, your matchExpression
fallback won't catch this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch
lib/core/utils/selector-cache.js
Outdated
// wanting to make sure the same node isn't added twice (~3 seconds | ||
// total over using array.includes) | ||
let matchedNodes = []; | ||
nodeSet.forEach(node => matchedNodes.push(node)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're going to return an array, why bother with a set? Wouldn't it be easier to just use an array and then filter it for unique values before you return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found it was actually way more performant to use a Set and convert it to an array at the end than to use an array and check if the element already exists in the array before adding.
lib/core/utils/selector-cache.js
Outdated
// use the last part of the expression to find nodes as it's more | ||
// specific. e.g. for `body h1` use `h1` and not `body` | ||
const exp = expression[expression.length - 1]; | ||
let nodes = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this was addressed: #3423 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the function no longer returns the empty array if passed in and only returns nodes that are in both arrays, it fixed the issue at the same time.
At step 3 when nothing is found, the empty array is passed to step 4, but since there are no nodes in the array the script won't add any nodes from the .foo
array.
function getSharedValues(a, b) {
return a.filter(node => b.includes(node));
}
getSharedValues([], ['a', 'b', 'c']); // []
getSharedValues(['a', 'b', 'c'], []); // []
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is....? If somewhere along this algorithm you end up with the array getting emptied again, because of the ternary, you're not calling getSharedValues. You're doing nodes = cachedNodes
, effectively.
This doesn't feel like a safe algorithm. It's hard to track. If you want this setup, have nodes
be null
until it is first assigned, and have your ternary be:
nodes = nodes === null ? cachedNodes : getSharedValues(nodes, cachedNodes);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. Added a test to catch this problem now.
@@ -128,6 +128,7 @@ function convertAttributes(atts) { | |||
return { | |||
key: attributeKey, | |||
value: attributeValue, | |||
type: typeof att.value === 'undefined' ? 'attrExist' : 'attrValue', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have a test I think. Should have caught that first time around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the convertSelector
API does not get exported to a public axe function and is only used internally. We could add it to the _thisWillBeDeletedDoNotUse
test api, but I think adding a suite of tests for it should be a separate pr.
lib/core/utils/selector-cache.js
Outdated
// use the last part of the expression to find nodes as it's more | ||
// specific. e.g. for `body h1` use `h1` and not `body` | ||
const exp = expression[expression.length - 1]; | ||
let nodes = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is....? If somewhere along this algorithm you end up with the array getting emptied again, because of the ternary, you're not calling getSharedValues. You're doing nodes = cachedNodes
, effectively.
This doesn't feel like a safe algorithm. It's hard to track. If you want this setup, have nodes
be null
until it is first assigned, and have your ternary be:
nodes = nodes === null ? cachedNodes : getSharedValues(nodes, cachedNodes);
Tested latest changes using my compare script, tensorflow.org passed, and so did the few random sites I tested. |
* fix(utils): greatly improve the speed of our querySelectorAll code * fixes * finalize * tests * fix ie11 * const... * changes * tie map to cache * fix test * remove test * fixes * revert rule descriptions
Following the QSA proposal, this greatly improves the speed of QSA as it no longer has to scan the entire DOM tree for top-level queries. For large sites the speed increase is substantial.
For https://js.tensorflow.org/api/latest/, the top slowest performanceTimer results were as follow:
We can see that the
#gather
step is what causes most rules to be slow (and not so much the#runchecks
step), causing axe-core to take 20 seconds to just run through all the rules.By caching node information ahead of time, the top slowest results are now as follows:
This change shaves off about 7-10 seconds from the time needed to run all of axe rules on the site (
audit_start_to_end
).As the page size increases, so too does the savings. Testing an extremely large site (152,000 nodes), this is the final time difference:
This is about a 20 second savings. On medium size pages, the savings is minimal (almost no difference).
To ensure the results are the same, I ran axe-core against the tensorflow site, recorded the results, then ran the caching verison of axe core and compared the two results for equality. Here's the code I used.
NOTE: This selector cache can still produce slower speeds with certain selectors. For example, the code short circuits the
body *
selector for theregion
rule, but won't do that for a similar selector such asbody p
. We don't currently have any such selector, but a user could write a custom rule that uses that. If we are so inclined, we could try to make make sure any selector is fast, but currently this is optimized for our own selectors for rules.