-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cranelift: Use a fixpoint loop to compute the best value for each eclass #7859
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great!
|
||
for (value, def) in self.func.dfg.values_and_defs() { | ||
// If the cost of this value is finite, then we've already found | ||
// its final cost. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the drive-by from the sidelines, just a possible clarification request here though after thinking about cost updates during a long drive today:
It's not immediately obvious to me why this (once finite, then final) property is the case; I'm curious what reasoning y'all have gone through on this and/or what you've observed? I think a node's cost can continue to decrease as we discover more finite costs (consider a union node: min(20, infinity) == 20
in first pass, min(20, 10) == 10
in second pass; then another node that uses that as an arg). Or is there an argument we can make why this shouldn't happen in practice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great point Chris! When @fitzgen and I were discussing the fixpoint change yesterday, we reasoned that it was okay to skip finite values because we were assuming two things:
- We would remove the behavior where
Cost
addition would saturate toMAX_COST
, notinfinity()
- As we can't produce cycles, a fixpoint would cause everything to eventually settle out to finite cost
As you pointed out, the flaw with this reasoning is that the handling of Union
values will not behave this way, instead preferring finite values to infinite.
Since addition now saturates to infinity which will ensure that Result
nodes don't appear finite until all their dependencies have been processed, what do you think about only computing the min if both arguments to a Union
are finite? I think that change would make more concrete our use of the infinity()
cost: it's a marker for where all the arguments have not yet been processed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order for a union(a, b)
to be finite but not in its final form would require one of a
or b
to finite and the other infinite, but the only way we can still have an infinite cost for an operand value when computing the cost of the current value is if the operand value's index is larger than the current value's index. That cannot happen for union values, since they are only added to the DFG after their operands.
This is, however, a pretty subtle argument, so I'd be fine skipping this early-continue
optimization. I'll land this PR without it, because that is pretty obviously correct, and if we want to experiment with different approaches to optimizing the loop from there, we can open follow up PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point Nick, sorry for muddying the waters there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay actually I was wrong, thanks Trevor for asking very pointed questions in private chat :-p
The union's operand values are always defined before the union, but if one of those operand values is a funky one where its operands are out of order, then the operand could still be infinite by the time we get to the union, and then the union's min
would drop the infinite. That would be a finite cost that is potentially not in its final form, depending on the cost we still need to compute for the still-infinite operand.
So this "optimization" of early-continuing was not correct! Bullet dodged.
This ae-graphs code is all very subtle, and we should spend some time thinking about what we can do to make things more obviously correct, even if it is just adding additional debug asserts and comments. It shouldn't take 3.5 engineers who are all intimately familiar with Cranelift a full day to diagnose and fix this kind of bug and still introduce subtle flaws in the fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And for clarity: since we are doing the "full" fixpoint now, even if we "drop" an operand's infinite cost via min
in one iteration of the loop, we will consider that operand's value again on the next iteration of the fix point, and eventually, as the fixpoint is reached, we will have the correct costs for everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fitzgen and @elliottt (and @alexcrichton) for taking this on and sorry for not realizing this subtle case originally!
A further optimization (which I can take on when I'm back) that occurred to me today: we could track whether we see any "forward references" (perhaps integrate this into the fixpoint loop itself, though it won't change between iterations), and exit the loop after one iteration if none exist. This is the common case, and it would avoid doing a second (no-changes) pass. This extra cost is totally fine for now IMHO (correctness first!).
I agree the code is pretty subtle; to some degree I think that's inherent to the problem, and it's already pretty comment-dense in many (not all!) areas, but I can also try to add some more top-level documentation on invariants and the like when I'm back. I'd like to try to do some more semi-formal proofs too, similar to MachBuffer's comments, to convince us that we don't have any more issues lurking (and to help understanding).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, and not trying to point fingers or anything, just trying to improve the situation for everyone. I think something like #7856 would help a lot too.
(Re-adding to merge queue after misunderstanding regarding #7859 (comment)) |
6eced83
to
370fb43
Compare
Given that the same riscv64 failure happened twice in a row my guess is that it's probably a deterministic failure rather than a spurious failure. That may mean that a preexisting riscv64 lowering rule is buggy and this is starting to expose that. I'll note though that I haven't attempted to reproduce locally yet. |
Ah yes I can reproduce locally:
No output on CI due to rayon-rs/rayon#1066 I think, not that it's actually a bug in rayon but an unfortunate consequence. |
I ran the
I'm still going to try to find a smaller one before trying to figure out which rule is causing issues |
Thanks Afonso! |
Here's another case that it found, this one for AArch64. Testcase
This one is interesting to me because almost all of this is dead code, but if we minimize it, it no longer crashes 👀 . The trace log states the following:
So it does optimize away the deadcode internally, but then still tries to elaborate some of the previously eliminated instructions. Which doesn't make sense to me, but I haven't kept up with the inner workings of the egraphs stuff. I'm not familiar enough with egraphs to be able to debug this, but if you need any help reworking one of the lowering rules let me know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Just one optional suggestion.
…ass (bytecodealliance#7859) * Cranelift: Use a fixpoint loop to compute the best value for each eclass Fixes bytecodealliance#7857 * Remove fixpoint loop early-continue optimization * Add document describing optimization rule invariants * Make select optimizations use subsume * Remove invalid debug assert * Remove now-unused methods * Add commutative adds to cost tests
…ass (#7859) (#7878) * Cranelift: Use a fixpoint loop to compute the best value for each eclass Fixes #7857 * Remove fixpoint loop early-continue optimization * Add document describing optimization rule invariants * Make select optimizations use subsume * Remove invalid debug assert * Remove now-unused methods * Add commutative adds to cost tests
…ass (bytecodealliance#7859) * Cranelift: Use a fixpoint loop to compute the best value for each eclass Fixes bytecodealliance#7857 * Remove fixpoint loop early-continue optimization * Add document describing optimization rule invariants * Make select optimizations use subsume * Remove invalid debug assert * Remove now-unused methods * Add commutative adds to cost tests
* Guard recursion in `will_simplify_with_ireduce` (#7882) Add a test to expose issues with unbounded recursion through `iadd` during egraph rewrites, and bound the recursion of `will_simplify_with_ireduce`. Fixes #7874 Co-authored-by: Nick Fitzgerald <[email protected]> * Cranelift: Use a fixpoint loop to compute the best value for each eclass (#7859) * Cranelift: Use a fixpoint loop to compute the best value for each eclass Fixes #7857 * Remove fixpoint loop early-continue optimization * Add document describing optimization rule invariants * Make select optimizations use subsume * Remove invalid debug assert * Remove now-unused methods * Add commutative adds to cost tests * Add missing subsume uses in egraph rules (#7879) * Fix a few egraph rules that needed `subsume` There were a few rules that dropped value references from the LHS without using subsume. I think they were probably benign as they produced constant results, but this change is in the spirit of our revised guidelines for egraph rules. * Augment egraph rule guideline 2 to talk about constants * Update release notes --------- Co-authored-by: Nick Fitzgerald <[email protected]>
This commit is born out of a fuzz bug on x64 that was discovered recently. Today, on `main`, and in the 17.0.1 release Wasmtime will panic when compiling this wasm module for x64: (module (func (result v128) i32.const 0 i32x4.splat f64x2.convert_low_i32x4_u)) panicking with: thread '<unnamed>' panicked at /home/alex/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cranelift-codegen-0.104.1/src/machinst/lower.rs:766:21: should be implemented in ISLE: inst = `v6 = fcvt_from_uint.f64x2 v13 ; v13 = const0`, type = `Some(types::F64X2)` note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace Bisections points to the "cause" of this regression as bytecodealliance#7859 which more-or-less means that this has always been an issue and that PR just happened to expose the issue. What's happening here is that egraph optimizations are turning the IR into a form that the x64 backend can't codegen. Namely there's no general purpose lowering of i64x2 being converted to f64x2. The Wasm frontend never produces this but the optimizations internally end up producing this. Notably here the result of this function is constant and what's happening is that a convert-of-a-splat is happening. In lieu of adding the full general lowering to x64 (which is perhaps overdue since this is the second or third time this panic has been triggered) I've opted to add constant propagation optimizations for int-to-float conversions. These are all based on the Rust `as` operator which has the same semantics as Cranelift. This is enough to fix the issue here for the time being.
This commit is born out of a fuzz bug on x64 that was discovered recently. Today, on `main`, and in the 17.0.1 release Wasmtime will panic when compiling this wasm module for x64: (module (func (result v128) i32.const 0 i32x4.splat f64x2.convert_low_i32x4_u)) panicking with: thread '<unnamed>' panicked at /home/alex/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cranelift-codegen-0.104.1/src/machinst/lower.rs:766:21: should be implemented in ISLE: inst = `v6 = fcvt_from_uint.f64x2 v13 ; v13 = const0`, type = `Some(types::F64X2)` note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace Bisections points to the "cause" of this regression as #7859 which more-or-less means that this has always been an issue and that PR just happened to expose the issue. What's happening here is that egraph optimizations are turning the IR into a form that the x64 backend can't codegen. Namely there's no general purpose lowering of i64x2 being converted to f64x2. The Wasm frontend never produces this but the optimizations internally end up producing this. Notably here the result of this function is constant and what's happening is that a convert-of-a-splat is happening. In lieu of adding the full general lowering to x64 (which is perhaps overdue since this is the second or third time this panic has been triggered) I've opted to add constant propagation optimizations for int-to-float conversions. These are all based on the Rust `as` operator which has the same semantics as Cranelift. This is enough to fix the issue here for the time being.
Fixes #7857