-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Precompute into select arms #6212
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments on code, will look at tests next.
src/passes/Precompute.cpp
Outdated
// When we consider the first select we can see that the computation result | ||
// is always infinity, so we can modify, and the same thing happens with the | ||
// second select, causing a second modification. In this example the result | ||
// is the same either way, but at least in theory an actual problem can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to show an example where a problem occurs. It seems that if there isn't any stale information stored anywhere, doing multiple optimizations to the same expression should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replacing the same expression twice from two stack traces seems risky, though. The second time we got there, we'd be reading a stale trace basically before we modify it. (Though, it is possible this happens to work out - I didn't look into it, but it seems complex and best avoided.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yeah, I don't have a good way to break this down and prove it safe.
if (isValidPrecomputation(ifFalse)) { | ||
// Wonderful, we can precompute here! The select can now contain the | ||
// computed values in its arms. | ||
select->ifTrue = ifTrue.getConstExpression(*getModule()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we essentially have to do twice as much work because we separately evaluate isValidPrecomputation
and getConstExpression
? Could we reduce overhead by combining them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I see what you mean. isValidPrecomputation
is a quick check that a precomputed result is valid for our purposes here (the actual precomputing happened earlier in precomputeExpression
). And getConstExpression
generates an Expression from a precomputed result. I'm not sure how we could combine those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I misunderstood what was happening. Disregard!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the coverage, it looks like we're missing tests where one arm is precomputable and the other arm isn't and vice versa.
Good catch on the coverage, added. Nice that the coverage tool helped here! |
Btw I had an idea for how to simplify the stack handling here, so let me try that before you read the code again. |
My idea didn't work out. I think the current code is the best we can get. (The idea was, rather than saving stacks and processing them later, to instead look at each select during the walk and note the replacement of the parent for later, and when we reach the parent, replace them at that time. So it all happened in a single walk. That mostly works, but selects can play two roles here: they can help optimize a parent, but they can also be replaced as a parent themselves, which leads to the need to carefully do the incremental work or else there can be various incorrectness bugs. Handling those things leads to more overall complexity and code size, sadly, though it might have been a little faster to avoid the stack copies.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
E.g. (i32.add (select (i32.const 100) (i32.const 200) (..condition..) ) (i32.const 50) ) ;; => (select (i32.const 150) (i32.const 250) (..condition..) ) We cannot fully precompute the select, but we can "partially precompute" it, by precomputing its arms using the parent. This may require looking several steps up the parent chain, which is an awkward operation in our simple walkers, so to do it we capture stacks of parents and operate directly on them. This is a little slower than a normal walk, so only do it when we see a promising select, and only in -O2 and above (this makes the pass 7% or so slower; not a large cost, but best to avoid it in -O1).
E.g.
We cannot fully precompute the select, but we can "partially precompute" it, by precomputing
its arms using the parent.
This may require looking several steps up the parent chain, which is an awkward operation in
our simple walkers, so to do it we capture stacks of parents and operate directly on them. This
is a little slower than a normal walk, so only do it when we see a promising select, and only in
-O2
and above (this makes the pass 7% or so slower; not a large cost, but best to avoid it in-O1
).This helps with i32 code as above, and I see small benefits on various real-world wasm files,
but the larger benefit is with GC code: if the select is over two itables in J2Wasm then we can
precompute the reads from the itable and vtables and end up with two function pointers,
which then lead to two direct calls. Doing all that will require followup work.