-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization: Opcode fusion of branch and comparison instructions #712
Comments
Today I thought a bit about this issue and since this is "just" an optimization based on the What if we just provide the enum Instruction {
...
BrI32Eq { .. },
BrI32EqImm { .. },
BrI64Eq { .. },
BrI64EqImm { .. },
BrF32Eq { .. },
BrF32EqImm { .. },
BrF64Eq { .. },
BrF64EqImm { .. },
BrI32Ne { .. },
BrI32NeImm { .. },
BrI64Ne { .. },
BrI64NeImm { .. },
BrF32Ne { .. },
BrF32NeImm { .. },
BrF64Ne { .. },
BrF64NeImm { .. },
BrI32LtS { .. },
BrI32LtSImm { .. },
BrI32GtS { .. },
BrI32GtSImm { .. },
BrI32LeS { .. },
BrI32LeSImm { .. },
BrI32GeS { .. },
BrI32GeSImm { .. },
BrI64LtS { .. },
BrI64LtSImm { .. },
BrI64GtS { .. },
BrI64GtSImm { .. },
BrI64LeS { .. },
BrI64LeSImm { .. },
BrI64GeS { .. },
BrI64GeSImm { .. },
BrF32Lt { .. },
BrF32Gt { .. },
BrF32Le { .. },
BrF32Ge { .. },
BrF64Lt { .. },
BrF64Gt { .. },
BrF64Le { .. },
BrF64Ge { .. },
...
} With this internal structure as represented exemplary by BrI32Eq {
lhs: Register,
rhs: Register,
offset: BranchOffset16,
},
BrI32EqImm {
lhs: Register,
rhs: Const16,
offset: BranchOffset16,
}, This implies that we only apply this optimization if For example this holds true for the common cases of Wasm's I{32,64}EqImm16 { result: Register, lhs: Register, imm: Const16::from(0) } And if followed by a BrI{32,64}EqImm { lhs: Register, imm: Const16::from(0) } For most common and practical use cases I assume that a Since |
The register-machine We can clearly see that conditional branches ( The |
Blocked by: #729
In the current
wasmi
bytecode comparison instructions such asI32LtS
,I64GtU
etc. and conditional branch instructions such asBrIfNez
andBrIfEqz
could be fused since most often they are used together. Executing fewer instructions usually results in improved performance for interpreters likewasmi
.For example the following Wasm bytecode:
Which represents something similar to:
Could be reduced to the following new
wasmi
instruction:Of course this means that we need to add plenty of new
wasmi
bytecode instructions which is a big drawback to this potential optimizations. Due to the increased number of instructions it is also unclear whether this optimization actually improves performance. We are not able to replace old unfusedwasmi
instructions by the newwasmi
instructions since they might still occur in isolation.Bytecode Changes
Compare + Branch
New
wasmi
bytecode branch instructions include:br.{i32, i64, f32, f64}.{eq, ne}
: 8 new instructions.br.{i32, i64}.{lt, le, gt, ge}{_s, _u}
: 16 new instructions.br.{f32, f64}.{lt, le, gt, ge}
: 8 new instructions.In total: 32 new
branch+compare
instructions. Furthermore for immediate versions we are going to need yet another 32branch+compare with immediate
instructions. Totalling 64 instructions.Note: We only need immediate versions for the second operand since we can rewrite bytecode where the first operand is an immediate value always as follows:
br.i32.lt_s 42 reg0
->br.i32.gt_s reg0 42
.Note: Wasms
br_eqz
can be represented bybr.{i32, i64}.eq_imm reg 0
. Where the_imm
suffix represents a bytecode instruction that takes an immediate value as its second operator.Compare + Return
Beyond branch instructions we could and probably should also introduce new
return
instruction variants for all the comparators since a branch to the outermost level is simply areturn
instruction and there already exist conditional return instructions inreturn.nez
andreturn.eqz
inwasmi
which act as living examples.return.{i32, i64, f32, f64}.{eq, ne}
: 8 new instructions.return.{i32, i64}.{lt, le, gt, ge}{_s, _u}
: 16 new instructions.return.{f32, f64}.{lt, le, gt, ge}
: 8 new instructions.Which is yet another 32 new instructions.
Blocked By
This optimization is blocked by the implementation of the register machine bytecode.
Conclusion
The biggest advantage of this potential optimization is that those fused instruction sequences usually occur within loops where they would have the biggest positive impact on the performance of the executed Wasm program.
The text was updated successfully, but these errors were encountered: