-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize ReorderGlobals ordering #6625
Conversation
Interesting problem! (cc @rluble) The number differences here are small; makes me wonder if magic-import that pushes higher indices benefit more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there a test for which the exponential sort was optimal?
src/passes/ReorderGlobals.cpp
Outdated
// each one moves, which is logically a mapping between indices. | ||
using IndexIndexMap = std::vector<Index>; | ||
|
||
// We will also track counts of uses for each global. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth briefly explaining why we use a double
here.
Sadly no. It's very hard to make such a test, as it would need very many deep dependency chains (on which it pays to take them into account a little, but not too much). (I would not include code for that in the pass, except that it ends up as just another constant value, and we do measure the sizes, so it seems safe.) |
Co-authored-by: Thomas Lively <[email protected]>
Co-authored-by: Thomas Lively <[email protected]>
Co-authored-by: Thomas Lively <[email protected]>
Co-authored-by: Thomas Lively <[email protected]>
The old ordering in that pass did a topological sort while sorting by uses
both within topological groups and between them. That could be unoptimal
in some cases, however, and actually on J2CL output this pass made the
binary larger, which is how we noticed this.
The problem is that such a toplogical sort keeps topological groups in
place, but it can be useful to interleave them sometimes. Imagine this:
Here
$e
depends on$c
, etc. The optimal order may interleave the twoarms here, e.g.
$a, $b, $d, $c, $e
. That is because the dependencies definea partial order, and so the arms here are actually independent.
Sorting by toplogical depth first might help in some cases, but also is not
optimal in general, as we may want to mix toplogical depths:
$a, $c, $b, $d, $e
does so, and it may be the best ordering.This PR implements a natural greedy algorithm that picks the global with
the highest use count at each step, out of the set of possible globals, which
is the set of globals that have no unresolved dependencies. So we start by
picking the first global with no dependencies and add at at the front; then
that unlocks anything that depended on it and we pick from that set, and
so forth.
This may also not be optimal, but it is easy to make it more flexible by
customizing the counts, and we consider 4 sorts here:
and we break ties by the original order, so this is as close to the original
order as we can be.
so the count is the total that might be unlocked. This gives more weight
to globals that can unlock more later, so it is less greedy.
makes sense as they may depend on other globals too.
In practice it is simple to generate cases where 1, 2, or 3 is optimal (see
new tests), but on real-world J2CL I see that 4 (with a particular exponential
coefficient) is best, so the pass computes all 4 and picks the best. As a
result it will never worsen the size and it has a good chance of
improving.
The differences between these are small, so in theory we could pick any
of them, but given they are all modifications of a single algorithm it is
very easy to compute them all with little code complexity.
Some data on J2CL:
There is a slight runtime cost to this: J2CL goes from 0.9666 to 1.1351
seconds. As this is one of our faster passes the slight slowdown seems
worth it in return for the guarantee to never increase size, and the small
improvement.