Change SIMD Loop from Fast to only reassoc/contract #49405

vchuravy · 2023-04-18T21:05:27Z

Addresses #49387

giordano

Probably needs some benchmarks?

giordano · 2023-04-18T21:15:47Z

src/llvm-simdloop.cpp

+        (*K)->setHasAllowReassoc(true);
+        (*K)->setHasAllowContract(true);


Can use setFastMathFlags to set multiple flags in one go?

Can probably use setHasNoSignedZeros() as well. It can help when you initialize loops at zero.

That might be going a bit too far, though I kind of want to just expose all of the fastmath flags separately. @fastmath implies too much and most of it is unsafe and useless.

IMO no signed zeros is reasonable. they don't do much and prevent a lot of useful transformations (i.e. 0-x to -x)

mikmoore · 2023-04-18T21:45:55Z

"reassoc" is clearly indicated by the help?> @simd documentation, but "contract" is not. If "contract" is included, the documentation should be updated to indicate so.

That said, I would generally favor just "reassoc" and someone can manually add @fastmath or muladd for additional optimizations where they want.

giordano · 2023-04-18T21:53:36Z

Probably needs some benchmarks?

I did a very quick benchmark on A64FX with the function

function sumsimd(v)
    sum = zero(eltype(v))
    @simd for x in v
        sum += x
    end
    return sum
end

and got pretty much same results, on 5da8d5f (latest build I have easily access to)

julia> @btime sumsimd(x) setup=(x=randn(Float16, 1_000_000));
  42.070 μs (0 allocations: 0 bytes)

julia> @btime sumsimd(x) setup=(x=randn(Float32, 1_000_000));
  79.340 μs (0 allocations: 0 bytes)

julia> @btime sumsimd(x) setup=(x=randn(Float64, 1_000_000));
  189.532 μs (0 allocations: 0 bytes)

this PR:

julia> @btime sumsimd(x) setup=(x=randn(Float16, 1_000_000));
  42.020 μs (0 allocations: 0 bytes)

julia> @btime sumsimd(x) setup=(x=randn(Float32, 1_000_000));
  79.041 μs (0 allocations: 0 bytes)

julia> @btime sumsimd(x) setup=(x=randn(Float64, 1_000_000));
  185.452 μs (0 allocations: 0 bytes)

The difference in code_llvm is

  %26 = fadd fast <vscale x 2 x double> %vec.phi, %wide.load
  %27 = fadd fast <vscale x 2 x double> %vec.phi14, %wide.load17
  %28 = fadd fast <vscale x 2 x double> %vec.phi15, %wide.load18
  %29 = fadd fast <vscale x 2 x double> %vec.phi16, %wide.load19
  %index.next = add nuw i64 %index, %10
  %30 = icmp eq i64 %index.next, %n.vec
  br i1 %30, label %middle.block, label %vector.body

middle.block:                                     ; preds = %vector.body
  %bin.rdx = fadd fast <vscale x 2 x double> %27, %26
  %bin.rdx20 = fadd fast <vscale x 2 x double> %28, %bin.rdx
  %bin.rdx21 = fadd fast <vscale x 2 x double> %29, %bin.rdx20
  %31 = call fast double @llvm.vector.reduce.fadd.nxv2f64(double -0.000000e+00, <vscale x 2 x double> %bin.rdx21)

vs

  %26 = fadd reassoc contract <vscale x 2 x double> %vec.phi, %wide.load
  %27 = fadd reassoc contract <vscale x 2 x double> %vec.phi14, %wide.load17
  %28 = fadd reassoc contract <vscale x 2 x double> %vec.phi15, %wide.load18
  %29 = fadd reassoc contract <vscale x 2 x double> %vec.phi16, %wide.load19
  %index.next = add nuw i64 %index, %10
  %30 = icmp eq i64 %index.next, %n.vec
  br i1 %30, label %middle.block, label %vector.body

middle.block:                                     ; preds = %vector.body
  %bin.rdx = fadd reassoc contract <vscale x 2 x double> %27, %26
  %bin.rdx20 = fadd reassoc contract <vscale x 2 x double> %28, %bin.rdx
  %bin.rdx21 = fadd reassoc contract <vscale x 2 x double> %29, %bin.rdx20
  %31 = call reassoc contract double @llvm.vector.reduce.fadd.nxv2f64(double -0.000000e+00, <vscale x 2 x double> %bin.rdx21)

as expected. At least in this specific case it doesn't look like the full fast flags makes a difference compared to only reassoc contract, which is good news, I guess.

giordano · 2023-05-28T23:11:26Z

With only reassoc:

julia> @btime sumsimd(x) setup=(x=randn(Float16, 1_000_000));
  41.820 μs (0 allocations: 0 bytes)

julia> @btime sumsimd(x) setup=(x=randn(Float32, 1_000_000));
  79.131 μs (0 allocations: 0 bytes)

julia> @btime sumsimd(x) setup=(x=randn(Float64, 1_000_000));
  184.232 μs (0 allocations: 0 bytes)

For reference, code_llvm:

  %23 = fadd reassoc <vscale x 2 x double> %vec.phi, %wide.load
  %24 = fadd reassoc <vscale x 2 x double> %vec.phi9, %wide.load12
  %25 = fadd reassoc <vscale x 2 x double> %vec.phi10, %wide.load13
  %26 = fadd reassoc <vscale x 2 x double> %vec.phi11, %wide.load14
  %index.next = add nuw i64 %index, %7
  %27 = icmp eq i64 %index.next, %n.vec
  br i1 %27, label %middle.block, label %vector.body

middle.block:                                     ; preds = %vector.body
  %bin.rdx = fadd reassoc <vscale x 2 x double> %24, %23
  %bin.rdx15 = fadd reassoc <vscale x 2 x double> %25, %bin.rdx
  %bin.rdx16 = fadd reassoc <vscale x 2 x double> %26, %bin.rdx15
  %28 = call reassoc double @llvm.vector.reduce.fadd.nxv2f64(double -0.000000e+00, <vscale x 2 x double> %bin.rdx16)

Sounds like reassoc alone would be enough to retain the performance speedup of @simd at least in this simple case.

vchuravy · 2023-05-28T23:21:20Z

Contract would only matter for a loop that implements the equivalent of prod

simonbyrne · 2023-05-28T23:23:13Z

@vchuravy im not sure about prod, but contract would make a difference for a vector dot product.

simonbyrne · 2023-05-28T23:24:40Z

@giordano if you only use reassoc, what does simdsum([-0.0]) give?

giordano · 2023-05-28T23:35:02Z

julia> sumsimd([-0.0])
0.0

which is the same as what I get now on master.

giordano · 2023-05-28T23:46:22Z

but contract would make a difference for a vector dot product.

julia> using BenchmarkTools

julia> function dotsimd(x::AbstractVector{T}, y::AbstractVector{T}) where {T}
           sum = zero(T)
           @simd for idx in eachindex(x, y)
               sum += x[idx] * y[idx]
           end
           return sum
       end
dotsimd (generic function with 1 method)

5da8d5f

julia> @btime dotsimd(x, y) setup=(T=Float16; N=1_000_000; x=randn(T, N); y=randn(T, N));
  60.581 μs (0 allocations: 0 bytes)

julia> @btime dotsimd(x, y) setup=(T=Float32; N=1_000_000; x=randn(T, N); y=randn(T, N));
  132.881 μs (0 allocations: 0 bytes)

julia> @btime dotsimd(x, y) setup=(T=Float64; N=1_000_000; x=randn(T, N); y=randn(T, N));
  299.743 μs (0 allocations: 0 bytes)

This PR (with reassoc and contract):

julia> @btime dotsimd(x, y) setup=(T=Float16; N=1_000_000; x=randn(T, N); y=randn(T, N));
  100.391 μs (0 allocations: 0 bytes)

julia> @btime dotsimd(x, y) setup=(T=Float32; N=1_000_000; x=randn(T, N); y=randn(T, N));
  140.891 μs (0 allocations: 0 bytes)

julia> @btime dotsimd(x, y) setup=(T=Float64; N=1_000_000; x=randn(T, N); y=randn(T, N));
  317.133 μs (0 allocations: 0 bytes)

Only reassoc:

julia> @btime dotsimd(x, y) setup=(T=Float16; N=1_000_000; x=randn(T, N); y=randn(T, N));
  99.511 μs (0 allocations: 0 bytes)

julia> @btime dotsimd(x, y) setup=(T=Float32; N=1_000_000; x=randn(T, N); y=randn(T, N));
  137.811 μs (0 allocations: 0 bytes)

julia> @btime dotsimd(x, y) setup=(T=Float64; N=1_000_000; x=randn(T, N); y=randn(T, N));
  304.733 μs (0 allocations: 0 bytes)

Contract would only matter for a loop that implements the equivalent of prod

julia> function prodsimd(v)
           prod = one(eltype(v))
           @simd for x in v
               prod *= x
           end
           return prod
       end
prodsimd (generic function with 1 method)

5da8d5f

julia> @btime prodsimd(x) setup=(x=randn(Float16, 1_000_000));
  162.721 μs (0 allocations: 0 bytes)

julia> @btime prodsimd(x) setup=(x=randn(Float32, 1_000_000));
  323.323 μs (0 allocations: 0 bytes)

julia> @btime prodsimd(x) setup=(x=randn(Float64, 1_000_000));
  658.906 μs (0 allocations: 0 bytes)

PR:

julia> @btime prodsimd(x) setup=(x=randn(Float16, 1_000_000));
  162.621 μs (0 allocations: 0 bytes)

julia> @btime prodsimd(x) setup=(x=randn(Float32, 1_000_000));
  322.863 μs (0 allocations: 0 bytes)

julia> @btime prodsimd(x) setup=(x=randn(Float64, 1_000_000));
  659.456 μs (0 allocations: 0 bytes)

Only reassoc:

julia> @btime prodsimd(x) setup=(x=randn(Float16, 1_000_000));
  162.572 μs (0 allocations: 0 bytes)

julia> @btime prodsimd(x) setup=(x=randn(Float32, 1_000_000));
  323.513 μs (0 allocations: 0 bytes)

julia> @btime prodsimd(x) setup=(x=randn(Float64, 1_000_000));
  658.036 μs (0 allocations: 0 bytes)

(unless I'm missing or I misunderstood something?)

chriselrod · 2023-05-29T02:14:42Z

Dot products are generally going to be memory bound when they're too big to fit in cache, probably even on the A64FX.
Sticking with x64 machines, on Intel we generally get 2 loads/cycle, and 2 fma's/cycle. Given that dot products need 2 loads per FMA, 2 loads + 1 mul + 1 add means we still get the same theoretical throughput.
This is even more true on AMD Ryzen (at least zen3 and zen4, I don't know about older), where we get the same each of (a) loads, (b) fma/muls, and (c) addition only. Thus, on these, even if we had double the amount of loads, muls + addition yield the same throughput as only needing fmas.

However, "in the real world" outside of microbenchmarks, fmas will be at least slightly better in theory.
Fewer instructions to decode, less code to occupy the instruction cache, fewer uops to occupy the uop cache, less overall use of execution units and thus better sharing with another thread running on the same core...

test/llvmpasses/fmf.jl

giordano · 2023-05-29T12:01:44Z

As shown by the fact that I had to change the fmf.jl test to make it pass, the main drawback of this PR (even with reassoc+contract) is that multiplication and addition don't fuse automatically. LLVM IR of dotf on x86_64 master:

  %25 = fmul contract <4 x double> %wide.load, %wide.load15
  %26 = fmul contract <4 x double> %wide.load12, %wide.load16
  %27 = fmul contract <4 x double> %wide.load13, %wide.load17
  %28 = fmul contract <4 x double> %wide.load14, %wide.load18
  %29 = fadd fast <4 x double> %vec.phi, %25
  %30 = fadd fast <4 x double> %vec.phi9, %26
  %31 = fadd fast <4 x double> %vec.phi10, %27
  %32 = fadd fast <4 x double> %vec.phi11, %28
  %index.next = add nuw i64 %index, 16
  %33 = icmp eq i64 %index.next, %n.vec
  br i1 %33, label %middle.block, label %vector.body

middle.block:                                     ; preds = %vector.body
  %bin.rdx = fadd fast <4 x double> %30, %29
  %bin.rdx19 = fadd fast <4 x double> %31, %bin.rdx
  %bin.rdx20 = fadd fast <4 x double> %32, %bin.rdx19
  %34 = call fast double @llvm.vector.reduce.fadd.v4f64(double -0.000000e+00, <4 x double> %bin.rdx20)

with this PR

  %25 = fmul <4 x double> %wide.load, %wide.load15
  %26 = fmul <4 x double> %wide.load12, %wide.load16
  %27 = fmul <4 x double> %wide.load13, %wide.load17
  %28 = fmul <4 x double> %wide.load14, %wide.load18
  %29 = fadd reassoc <4 x double> %vec.phi, %25
  %30 = fadd reassoc <4 x double> %vec.phi9, %26
  %31 = fadd reassoc <4 x double> %vec.phi10, %27
  %32 = fadd reassoc <4 x double> %vec.phi11, %28
  %index.next = add nuw i64 %index, 16
  %33 = icmp eq i64 %index.next, %n.vec
  br i1 %33, label %middle.block, label %vector.body

middle.block:                                     ; preds = %vector.body
  %bin.rdx = fadd reassoc <4 x double> %30, %29
  %bin.rdx19 = fadd reassoc <4 x double> %31, %bin.rdx
  %bin.rdx20 = fadd reassoc <4 x double> %32, %bin.rdx19
  %34 = call reassoc double @llvm.vector.reduce.fadd.v4f64(double -0.000000e+00, <4 x double> %bin.rdx20)

The good news is that fused muladd in native code can be recovered with an explicit muladd call, which results in the following LLVM IR:

  %25 = fmul contract <4 x double> %wide.load, %wide.load15
  %26 = fmul contract <4 x double> %wide.load12, %wide.load16
  %27 = fmul contract <4 x double> %wide.load13, %wide.load17
  %28 = fmul contract <4 x double> %wide.load14, %wide.load18
  %29 = fadd reassoc contract <4 x double> %vec.phi, %25
  %30 = fadd reassoc contract <4 x double> %vec.phi9, %26
  %31 = fadd reassoc contract <4 x double> %vec.phi10, %27
  %32 = fadd reassoc contract <4 x double> %vec.phi11, %28
  %index.next = add nuw i64 %index, 16
  %33 = icmp eq i64 %index.next, %n.vec
  br i1 %33, label %middle.block, label %vector.body

middle.block:                                     ; preds = %vector.body
  %bin.rdx = fadd reassoc contract <4 x double> %30, %29
  %bin.rdx19 = fadd reassoc contract <4 x double> %31, %bin.rdx
  %bin.rdx20 = fadd reassoc contract <4 x double> %32, %bin.rdx19
  %34 = call reassoc contract double @llvm.vector.reduce.fadd.v4f64(double -0.000000e+00, <4 x double> %bin.rdx20)

I presume because muladd internally enables contract.

vchuravy · 2023-05-29T12:31:56Z

This is likely due to the mulAdd pass checking for fast and not contract

julia/src/llvm-muladd.cpp

Line 90 in 1cc10a6

if (!I.isFast())

Require `contract` only instead of full `fast` flag.

vchuravy · 2023-05-29T13:50:45Z

Okay, let's update the docs for @simd to state that we perform contract + reassoc, and then I think this should be ready?

giordano · 2023-05-29T14:05:51Z

Okay, let's update the docs for @simd to state that we perform contract + reassoc

👍

and then I think this should be ready?

LinearAlgebra tests on aarch64 (both linux and darwin) are failing at

julia/stdlib/LinearAlgebra/test/addmul.jl

Line 192 in 1cc10a6

@test collect(returned_mat) ≈ α * Ac * Bc + β * Cc rtol=rtol

it's a bit concerning they're relying so much on a performance optimisation

gbaraldi · 2023-05-29T14:14:02Z

Maybe the latest change fixes that? But yeah it's kind of annoying that they are so brittle.

giordano · 2023-05-29T14:26:08Z

Ok, yes, relaxing fused muladd requirement seems to have solved the linearalgebra tests, ~~but now I have this problem: llvmpasses tests fail for me on aarch64 unless I apply this patch~~

diff --git a/test/llvmpasses/loopinfo.jl b/test/llvmpasses/loopinfo.jl
index 20a450f6f1..ca2b307b3d 100644
--- a/test/llvmpasses/loopinfo.jl
+++ b/test/llvmpasses/loopinfo.jl
@@ -29,10 +29,10 @@ function simdf(X)
         acc += x
 # CHECK: call void @julia.loopinfo_marker(), {{.*}}, !julia.loopinfo [[LOOPINFO:![0-9]+]]
 # LOWER-NOT: llvm.mem.parallel_loop_access
-# LOWER: fadd reassoc contract double
+# LOWER: fadd reassoc {{(contract )?}}double
 # LOWER-NOT: call void @julia.loopinfo_marker()
 # LOWER: br {{.*}}, !llvm.loop [[LOOPID:![0-9]+]]
-# FINAL: fadd reassoc contract <{{(vscale x )?}}{{[0-9]+}} x double>
+# FINAL: fadd reassoc {{(contract )?}}<{{(vscale x )?}}{{[0-9]+}} x double>
     end
     acc
 end
@@ -46,7 +46,7 @@ function simdf2(X)
 # CHECK: call void @julia.loopinfo_marker(), {{.*}}, !julia.loopinfo [[LOOPINFO2:![0-9]+]]
 # LOWER: llvm.mem.parallel_loop_access
 # LOWER-NOT: call void @julia.loopinfo_marker()
-# LOWER: fadd reassoc contract double
+# LOWER: fadd reassoc {{(contract )?}}double
 # LOWER: br {{.*}}, !llvm.loop [[LOOPID2:![0-9]+]]
     end
     acc
diff --git a/test/llvmpasses/simdloop.ll b/test/llvmpasses/simdloop.ll
index e4f46f8f2b..7b4f538fc9 100644
--- a/test/llvmpasses/simdloop.ll
+++ b/test/llvmpasses/simdloop.ll
@@ -37,7 +37,7 @@ loop:
 ; CHECK: llvm.mem.parallel_loop_access
   %aval = load double, double *%aptr
   %nextv = fsub double %v, %aval
-; CHECK: fsub reassoc contract double %v, %aval
+; CHECK: fsub reassoc {{(contract )?}}double %v, %aval
   %nexti = add i64 %i, 1
   call void @julia.loopinfo_marker(), !julia.loopinfo !3
   %done = icmp sgt i64 %nexti, 500
@@ -56,7 +56,7 @@ loop:
   %aptr = getelementptr double, double *%a, i64 %i
   %aval = load double, double *%aptr
   %nextv = fsub double %v, %aval
-; CHECK: fsub reassoc contract double %v, %aval
+; CHECK: fsub reassoc {{(contract )?}}double %v, %aval
   %nexti = add i64 %i, 1
   call void @julia.loopinfo_marker(), !julia.loopinfo !2
   %done = icmp sgt i64 %nexti, 500

~~It isn't quite clear to me why contract is gone, and also why only on aarch64.~~

Edit: nevermind, I fetched the local branch to the latest version of this PR, but somehow I still had the patch which removed contract, which I thought I had stashed.

mikmoore · 2023-05-29T14:47:01Z

Personally, I still don't think the decision to contract should be coupled into @simd. I'll reiterate my pitch and then shut up about it, as this PR is still a huge improvement over the current behavior.

contract is probably the most boring fastmath flag as the best it can ever do is turn 2 instructions into 1. It can never enable SIMD that was otherwise impossible (except maybe in an edge case where it dodges a register spill?) or eliminate a code block. The fact that contract couldn't accelerate a dot product in the above benchmarks (which was a surprise to me and could change in a different benchmark) further diminishes the performance arguments for it.

Meanwhile, contract is the only flag that can currently be enabled independently. muladd is easy to add manually, @fastmath is even easier (if one can suffer the extra flags or if we get #49890), and both can be used in combination with @simd. But there is still is no way (or plan for a way) to disable flags.

There will absolutely be cases (although not many cases) where someone requires that operations do not contract. Complex multiplication comes to mind, where implementing it with contract can cause imag(x*x') != 0.

P.S.
Maybe this should get a news item? Some people may observe performance regressions. It would be nice (and may avoid some bug reports) to note this change and suggest that the previous behavior and performance can be recovered via @fastmath @simd.

giordano · 2023-05-29T14:59:15Z

The fact that contract couldn't accelerate a dot product in the above benchmarks (which was a surprise to me and could change in a different benchmark) further diminishes the performance arguments for it.

Note that benchmarks in #49405 (comment) were before relaxing the requirement for fusing muladd, now I get

julia> @btime dotsimd(x, y) setup=(T=Float16; N=1_000_000; x=randn(T, N); y=randn(T, N));
  60.271 μs (0 allocations: 0 bytes)

julia> @btime dotsimd(x, y) setup=(T=Float32; N=1_000_000; x=randn(T, N); y=randn(T, N));
  135.201 μs (0 allocations: 0 bytes)

julia> @btime dotsimd(x, y) setup=(T=Float64; N=1_000_000; x=randn(T, N); y=randn(T, N));
  297.843 μs (0 allocations: 0 bytes)

which recovers the same performance as on master for Float16.

vchuravy · 2023-05-29T18:04:27Z

base/simdloop.jl

@@ -100,7 +100,7 @@ The object iterated over in a `@simd for` loop should be a one-dimensional range
 By using `@simd`, you are asserting several properties of the loop:

 * It is safe to execute iterations in arbitrary or overlapping order, with special consideration for reduction variables.
-* Floating-point operations on reduction variables can be reordered, possibly causing different results than without `@simd`.
+* Floating-point operations on reduction variables can be reordered or contracted, possibly causing different results than without `@simd`.


It's actually slightly broader I think. It's the entire reduction chain. Not just the reduction operations themselves

Any suggestions for the wording?

Change SIMD Loop from Fast to only reassoc/contract

4bfffc5

giordano reviewed Apr 18, 2023

View reviewed changes

giordano mentioned this pull request May 19, 2023

Granular @fastmath flags #49890

Open

giordano added 3 commits May 29, 2023 12:13

Merge branch 'master' into vc/simd_fastmath

e1883dd

Adapt some llvmpasses tests

4d8b00e

Adapt also llvmpasses/fmf.jl test

ce65a4b

giordano reviewed May 29, 2023

View reviewed changes

test/llvmpasses/fmf.jl Outdated Show resolved Hide resolved

Relax requirement for fusing multiplication and addition

da62e37

Require `contract` only instead of full `fast` flag.

giordano added compiler:simd instruction-level vectorization compiler:llvm For issues that relate to LLVM labels May 29, 2023

Mention in docstring that @simd can contract operations

eeaa9aa

Add NEWS.md entry about new semantics of @simd macro

dc436ab

giordano marked this pull request as ready for review May 29, 2023 15:33

vchuravy commented May 29, 2023

View reviewed changes

giordano mentioned this pull request Jun 5, 2023

Skip flakey tests in CI #50071

Merged

Merge branch 'master' into vc/simd_fastmath

648bdf6

vchuravy added the merge me PR is reviewed. Merge when all tests are passing label Jun 27, 2023

vtjnash merged commit 663c58d into master Jun 28, 2023

vtjnash deleted the vc/simd_fastmath branch June 28, 2023 22:12

pchintalapudi removed the merge me PR is reviewed. Merge when all tests are passing label Jun 28, 2023

vchuravy mentioned this pull request Jun 30, 2023

Document llvm passes in more depth #50319

Merged

ericphanson mentioned this pull request Nov 26, 2023

simd performance regression in 1.10+rc1 relative to 1.9.4 #52307

Open

giordano mentioned this pull request Sep 16, 2024

Overzealous fmul contract marking on * outside of muladd? #55785

Closed

giordano mentioned this pull request Oct 24, 2024

10.0^68 gives a different result on nightly vs 1.11 after llvm-muladd pass removal #56312

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change SIMD Loop from Fast to only reassoc/contract #49405

Change SIMD Loop from Fast to only reassoc/contract #49405

vchuravy commented Apr 18, 2023

giordano left a comment

giordano Apr 18, 2023

simonbyrne Apr 25, 2023

gbaraldi Apr 25, 2023

oscardssmith Apr 25, 2023

mikmoore commented Apr 18, 2023 •

edited

Loading

giordano commented Apr 18, 2023

giordano commented May 28, 2023

vchuravy commented May 28, 2023

simonbyrne commented May 28, 2023

simonbyrne commented May 28, 2023

giordano commented May 28, 2023

giordano commented May 28, 2023 •

edited

Loading

chriselrod commented May 29, 2023 •

edited

Loading

giordano commented May 29, 2023

vchuravy commented May 29, 2023

vchuravy commented May 29, 2023

giordano commented May 29, 2023

gbaraldi commented May 29, 2023

giordano commented May 29, 2023 •

edited

Loading

mikmoore commented May 29, 2023 •

edited

Loading

giordano commented May 29, 2023

vchuravy May 29, 2023

giordano May 29, 2023

		(*K)->setHasAllowReassoc(true);
		(*K)->setHasAllowContract(true);

Change SIMD Loop from Fast to only reassoc/contract #49405

Change SIMD Loop from Fast to only reassoc/contract #49405

Conversation

vchuravy commented Apr 18, 2023

giordano left a comment

Choose a reason for hiding this comment

giordano Apr 18, 2023

Choose a reason for hiding this comment

simonbyrne Apr 25, 2023

Choose a reason for hiding this comment

gbaraldi Apr 25, 2023

Choose a reason for hiding this comment

oscardssmith Apr 25, 2023

Choose a reason for hiding this comment

mikmoore commented Apr 18, 2023 • edited Loading

giordano commented Apr 18, 2023

giordano commented May 28, 2023

vchuravy commented May 28, 2023

simonbyrne commented May 28, 2023

simonbyrne commented May 28, 2023

giordano commented May 28, 2023

giordano commented May 28, 2023 • edited Loading

chriselrod commented May 29, 2023 • edited Loading

giordano commented May 29, 2023

vchuravy commented May 29, 2023

vchuravy commented May 29, 2023

giordano commented May 29, 2023

gbaraldi commented May 29, 2023

giordano commented May 29, 2023 • edited Loading

mikmoore commented May 29, 2023 • edited Loading

giordano commented May 29, 2023

vchuravy May 29, 2023

Choose a reason for hiding this comment

giordano May 29, 2023

Choose a reason for hiding this comment

mikmoore commented Apr 18, 2023 •

edited

Loading

giordano commented May 28, 2023 •

edited

Loading

chriselrod commented May 29, 2023 •

edited

Loading

giordano commented May 29, 2023 •

edited

Loading

mikmoore commented May 29, 2023 •

edited

Loading