Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precompute into select arms #6212

Merged
merged 66 commits into from
Jan 10, 2024
Merged

Conversation

kripken
Copy link
Member

@kripken kripken commented Jan 8, 2024

E.g.

(i32.add
  (select
    (i32.const 100)
    (i32.const 200)
    (..condition..)
  )
  (i32.const 50)
)

;; =>

(select
  (i32.const 150)
  (i32.const 250)
  (..condition..)
)

We cannot fully precompute the select, but we can "partially precompute" it, by precomputing
its arms using the parent.

This may require looking several steps up the parent chain, which is an awkward operation in
our simple walkers, so to do it we capture stacks of parents and operate directly on them. This
is a little slower than a normal walk, so only do it when we see a promising select, and only in
-O2 and above (this makes the pass 7% or so slower; not a large cost, but best to avoid it in
-O1).

This helps with i32 code as above, and I see small benefits on various real-world wasm files,
but the larger benefit is with GC code: if the select is over two itables in J2Wasm then we can
precompute the reads from the itable and vtables and end up with two function pointers,
which then lead to two direct calls. Doing all that will require followup work.

@kripken kripken requested a review from tlively January 8, 2024 23:35
Copy link
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments on code, will look at tests next.

src/passes/Precompute.cpp Show resolved Hide resolved
src/passes/Precompute.cpp Outdated Show resolved Hide resolved
src/passes/Precompute.cpp Outdated Show resolved Hide resolved
// When we consider the first select we can see that the computation result
// is always infinity, so we can modify, and the same thing happens with the
// second select, causing a second modification. In this example the result
// is the same either way, but at least in theory an actual problem can
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to show an example where a problem occurs. It seems that if there isn't any stale information stored anywhere, doing multiple optimizations to the same expression should be fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing the same expression twice from two stack traces seems risky, though. The second time we got there, we'd be reading a stale trace basically before we modify it. (Though, it is possible this happens to work out - I didn't look into it, but it seems complex and best avoided.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah, I don't have a good way to break this down and prove it safe.

src/passes/Precompute.cpp Outdated Show resolved Hide resolved
Comment on lines +573 to +576
if (isValidPrecomputation(ifFalse)) {
// Wonderful, we can precompute here! The select can now contain the
// computed values in its arms.
select->ifTrue = ifTrue.getConstExpression(*getModule());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we essentially have to do twice as much work because we separately evaluate isValidPrecomputation and getConstExpression? Could we reduce overhead by combining them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I see what you mean. isValidPrecomputation is a quick check that a precomputed result is valid for our purposes here (the actual precomputing happened earlier in precomputeExpression). And getConstExpression generates an Expression from a precomputed result. I'm not sure how we could combine those?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I misunderstood what was happening. Disregard!

src/passes/Precompute.cpp Outdated Show resolved Hide resolved
Copy link
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the coverage, it looks like we're missing tests where one arm is precomputable and the other arm isn't and vice versa.

@kripken
Copy link
Member Author

kripken commented Jan 10, 2024

Good catch on the coverage, added. Nice that the coverage tool helped here!

@kripken
Copy link
Member Author

kripken commented Jan 10, 2024

Btw I had an idea for how to simplify the stack handling here, so let me try that before you read the code again.

@kripken
Copy link
Member Author

kripken commented Jan 10, 2024

My idea didn't work out. I think the current code is the best we can get.

(The idea was, rather than saving stacks and processing them later, to instead look at each select during the walk and note the replacement of the parent for later, and when we reach the parent, replace them at that time. So it all happened in a single walk. That mostly works, but selects can play two roles here: they can help optimize a parent, but they can also be replaced as a parent themselves, which leads to the need to carefully do the incremental work or else there can be various incorrectness bugs. Handling those things leads to more overall complexity and code size, sadly, though it might have been a little faster to avoid the stack copies.)

Copy link
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kripken kripken merged commit 141f7ca into WebAssembly:main Jan 10, 2024
14 checks passed
@kripken kripken deleted the precompute.partial branch January 10, 2024 23:02
radekdoulik pushed a commit to dotnet/binaryen that referenced this pull request Jul 12, 2024
E.g.

(i32.add
  (select
    (i32.const 100)
    (i32.const 200)
    (..condition..)
  )
  (i32.const 50)
)

;; =>

(select
  (i32.const 150)
  (i32.const 250)
  (..condition..)
)

We cannot fully precompute the select, but we can "partially precompute" it, by precomputing
its arms using the parent.

This may require looking several steps up the parent chain, which is an awkward operation in
our simple walkers, so to do it we capture stacks of parents and operate directly on them. This
is a little slower than a normal walk, so only do it when we see a promising select, and only in
-O2 and above (this makes the pass 7% or so slower; not a large cost, but best to avoid it in
-O1).
@gkdn gkdn mentioned this pull request Aug 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants