Optimize ReorderGlobals ordering with a new algorithm (#6625) · WebAssembly/binaryen@f8086ad

Commit

Optimize ReorderGlobals ordering with a new algorithm (#6625)

The old ordering in that pass did a topological sort while sorting by uses
both within topological groups and between them. That could be unoptimal
in some cases, however, and actually on J2CL output this pass made the
binary larger, which is how we noticed this.

The problem is that such a toplogical sort keeps topological groups in
place, but it can be useful to interleave them sometimes. Imagine this:

     $c - $a
    /
  $e
    \
     $d - $b

Here $e depends on $c, etc. The optimal order may interleave the two
arms here, e.g. $a, $b, $d, $c, $e. That is because the dependencies define
a partial order, and so the arms here are actually independent.

Sorting by toplogical depth first might help in some cases, but also is not
optimal in general, as we may want to mix toplogical depths:
$a, $c, $b, $d, $e does so, and it may be the best ordering.

This PR implements a natural greedy algorithm that picks the global with
the highest use count at each step, out of the set of possible globals, which
is the set of globals that have no unresolved dependencies. So we start by
picking the first global with no dependencies and add at at the front; then
that unlocks anything that depended on it and we pick from that set, and
so forth.

This may also not be optimal, but it is easy to make it more flexible by
customizing the counts, and we consider 4 sorts here:

*   Set all counts to 0. This means we only take into account dependencies,
    and we break ties by the original order, so this is as close to the original
    order as we can be.
*   Use the actual use counts. This is the simple greedy algorithm.
*   Set the count of each global to also contain the counts of its children,
    so the count is the total that might be unlocked. This gives more weight
    to globals that can unlock more later, so it is less greedy.
*   As last, but weight children's counts lower in an exponential way, which
    makes sense as they may depend on other globals too.

In practice it is simple to generate cases where 1, 2, or 3 is optimal (see
new tests), but on real-world J2CL I see that 4 (with a particular exponential
coefficient) is best, so the pass computes all 4 and picks the best. As a
result it will never worsen the size and it has a good chance of
improving.

The differences between these are small, so in theory we could pick any
of them, but given they are all modifications of a single algorithm it is
very easy to compute them all with little code complexity.

The benefits are rather small here, but this can save a few hundred
bytes on a multi-MB Java file. This comes at a tiny compile time cost, but
seems worth it for the new guarantee to never regress size.

Loading branch information

kripken authored May 31, 2024

1 parent 0c23394 commit f8086ad

0 comments on commit `f8086ad`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `f8086ad`

Commit

There are no files selected for viewing

0 comments on commit f8086ad

0 comments on commit `f8086ad`