-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement a separate TracedRNumber
#161
Conversation
TracedRScalar
TracedRNumber
246cd00
to
40af781
Compare
113c2b5
to
10495fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reactant.jl Benchmarks
Benchmark suite | Current: 8a9f06c | Previous: f2c0e8a | Ratio |
---|---|---|---|
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant |
1318556698 ns |
1315729546 ns |
1.00 |
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Lux |
213965204 ns |
212083499 ns |
1.01 |
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant |
6804934419 ns |
5286469750 ns |
1.29 |
ViT base (256 x 256 x 3 x 32)/forward/CPU/Lux |
18511487331 ns |
23583347555 ns |
0.78 |
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant |
1255068856 ns |
1254858296 ns |
1.00 |
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Lux |
8751128 ns |
8478570 ns |
1.03 |
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant |
1643973338 ns |
1636237670 ns |
1.00 |
ViT small (256 x 256 x 3 x 4)/forward/CPU/Lux |
2060290806 ns |
2376437823 ns |
0.87 |
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant |
1304838533 ns |
1266018905 ns |
1.03 |
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Lux |
93074730.5 ns |
84820407 ns |
1.10 |
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant |
2253189844 ns |
2170879105 ns |
1.04 |
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Lux |
6064963189 ns |
4675094299 ns |
1.30 |
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant |
1297977475 ns |
1263496480 ns |
1.03 |
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Lux |
7525710.5 ns |
7782824 ns |
0.97 |
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant |
1466869310 ns |
1467043032.5 ns |
1.00 |
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Lux |
1361212717 ns |
1685775445 ns |
0.81 |
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant |
1314689632 ns |
1306815930 ns |
1.01 |
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Lux |
11418734.5 ns |
11611908 ns |
0.98 |
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant |
1752787751.5 ns |
1752808523 ns |
1.00 |
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Lux |
2629684857 ns |
2463987825.5 ns |
1.07 |
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant |
1276847603.5 ns |
1325877558.5 ns |
0.96 |
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Lux |
86382264 ns |
90330187 ns |
0.96 |
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant |
2216207055 ns |
2213119086 ns |
1.00 |
ViT small (256 x 256 x 3 x 16)/forward/CPU/Lux |
3548437058 ns |
4023816395 ns |
0.88 |
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant |
1308980482.5 ns |
1270812264 ns |
1.03 |
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Lux |
116489754.5 ns |
113097539 ns |
1.03 |
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant |
3037687031 ns |
3042643080 ns |
1.00 |
ViT small (256 x 256 x 3 x 32)/forward/CPU/Lux |
9576755622 ns |
8210106924.5 ns |
1.17 |
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant |
1332370009 ns |
1324054039 ns |
1.01 |
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Lux |
126824571 ns |
127669686.5 ns |
0.99 |
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant |
3191878044 ns |
3203794253 ns |
1.00 |
ViT base (256 x 256 x 3 x 16)/forward/CPU/Lux |
7124639562 ns |
11004907984 ns |
0.65 |
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant |
1280201506 ns |
1299288245 ns |
0.99 |
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Lux |
83942324 ns |
96277750 ns |
0.87 |
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant |
1900998509 ns |
2155333265.5 ns |
0.88 |
ViT base (256 x 256 x 3 x 4)/forward/CPU/Lux |
2374950424 ns |
2863535293.5 ns |
0.83 |
This comment was automatically generated by workflow using github-action-benchmark.
Benchmark Results
Benchmark PlotsA plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR. |
Let me just merge #163 after CI checks pass and rebase this PR on top of it because it adds some tests that should be passed by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm only minor comment: would defining traced value = union{tracedrarrsy, travednumber} simplify some things
Do we also need a ConcreteRScalar? If we want to extend |
For now we can leave concrete as is |
I don't think we need a concrete scalar type, because indexing on a concrete array is just a normal number. |
I meant more from a tracing perspective. What should |
So I think this is two different questions.
For now I'm fine leaving them as is which is compile as constants
For example we may want to have a function with a user defined index offset. e.g.
without a concrete scalar we need to recompile for each index. I do think we need such a scalar, even if the default behavior for conversion is only converting arrays |
Even more simply though, we may need as the return type of a function with array inputs and a scalar output. i.e.
|
Looking at the above example I also realized the mapreduce semantics is incorrect at-present. For example
We could store it like an array in the ConcreteRArray but expose it to the enduser as a scalar. JuliaGPU/GPUArrays.jl#550 is something similar in the GPUArrays world. But let's keep this PR simple and avoid this for now |
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #161 +/- ##
==========================================
- Coverage 33.09% 30.62% -2.47%
==========================================
Files 37 38 +1
Lines 5107 5175 +68
==========================================
- Hits 1690 1585 -105
- Misses 3417 3590 +173 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a small comment but otherwise lgtm
There's a bug on concatenation of using Reactant
x = fill(true)
x_concrete = Reactant.to_rarray(x)
function traced_vcat(x)
a = x[]
[a; a; a]
end
f = @compile traced_vcat(x_concrete)
f(x_concrete) It fails with the following error: ERROR: BoundsError: attempt to access 3-element Vector{ConcreteRArray{Bool}} at index [1]
Stacktrace:
[1] traced_getfield
@ ~/Developer/Reactant.jl/src/Compiler.jl:18 [inlined]
[2] macro expansion
@ ~/Developer/Reactant.jl/src/Compiler.jl:649 [inlined]
[3] (::Reactant.Compiler.Thunk{Symbol("##test_vcat_reactant#229")})(args::ConcreteRArray{Bool, 0})
@ Reactant.Compiler ~/Developer/Reactant.jl/src/Compiler.jl:665
[4] top-level scope
@ REPL[10]:1 The error seems to be that it's not doing any XLA call and it's setting quote
#= /Users/mofeing/Developer/Reactant.jl/src/Compiler.jl:639 =#
#= /Users/mofeing/Developer/Reactant.jl/src/Compiler.jl:640 =#
nothing
#= /Users/mofeing/Developer/Reactant.jl/src/Compiler.jl:647 =#
usbuf_1 = (getindex(args, 1)).data
sbuf_1 = XLA.synced_buffer(usbuf_1)
#= /Users/mofeing/Developer/Reactant.jl/src/Compiler.jl:648 =#
()
#= /Users/mofeing/Developer/Reactant.jl/src/Compiler.jl:649 =#
result = (ConcreteRArray{Bool})[ConcreteRArray{Bool, 0}(Reactant.XLA.AsyncBuffer(Reactant.XLA.Buffer(Ptr{Nothing} @0x0000000000000000), nothing), ()), ConcreteRArray{Bool, 0}(Reactant.XLA.AsyncBuffer(Reactant.XLA.Buffer(Ptr{Nothing} @0x0000000000000000), nothing), ()), ConcreteRArray{Bool, 0}(Reactant.XLA.AsyncBuffer(Reactant.XLA.Buffer(Ptr{Nothing} @0x0000000000000000), nothing), ())]
(traced_getfield(result, $(Expr(:quote, 1)))).data = (args[1]).data
(traced_getfield(result, $(Expr(:quote, 2)))).data = (args[1]).data
(traced_getfield(result, $(Expr(:quote, 3)))).data = (args[1]).data
#= /Users/mofeing/Developer/Reactant.jl/src/Compiler.jl:650 =#
return result
end |
That actually makes sense here since it’s making an array of values all
available as args.
What’s the issue?
…On Sat, Oct 5, 2024 at 9:11 PM Sergio Sánchez Ramírez < ***@***.***> wrote:
There's a bug on concatenation of TracedRNumber. In particular, if we run
this kernel:
using Reactant
x = fill(true)
x_concrete = Reactant.to_rarray(x)
function traced_vcat(x)
a = x[]
[a; a; a]end
f = @compile traced_vcat(x_concrete)
f(x_concrete)
It fails with the following error:
ERROR: BoundsError: attempt to access 3-element Vector{ConcreteRArray{Bool}} at index [1]
Stacktrace:
[1] traced_getfield
@ ~/Developer/Reactant.jl/src/Compiler.jl:18 [inlined]
[2] macro expansion
@ ~/Developer/Reactant.jl/src/Compiler.jl:649 [inlined]
[3] (::Reactant.Compiler.Thunk{Symbol("##test_vcat_reactant#229")})(args::ConcreteRArray{Bool, 0})
@ Reactant.Compiler ~/Developer/Reactant.jl/src/Compiler.jl:665
[4] top-level scope
@ REPL[10]:1
The error seems to be that it's not doing any XLA call and it's setting
result returning to 3 empty buffers. Check out the generated Julia code
of f:
quote
#= /Users/mofeing/Developer/Reactant.jl/src/Compiler.jl:639 =#
#= /Users/mofeing/Developer/Reactant.jl/src/Compiler.jl:640 =#
nothing
#= /Users/mofeing/Developer/Reactant.jl/src/Compiler.jl:647 =#
usbuf_1 = (getindex(args, 1)).data
sbuf_1 = XLA.synced_buffer(usbuf_1)
#= /Users/mofeing/Developer/Reactant.jl/src/Compiler.jl:648 =#
()
#= /Users/mofeing/Developer/Reactant.jl/src/Compiler.jl:649 =#
result = (ConcreteRArray{Bool})[ConcreteRArray{Bool, 0}(Reactant.XLA.AsyncBuffer(Reactant.XLA.Buffer(Ptr{Nothing} @0x0000000000000000), nothing), ()), ConcreteRArray{Bool, 0}(Reactant.XLA.AsyncBuffer(Reactant.XLA.Buffer(Ptr{Nothing} @0x0000000000000000), nothing), ()), ConcreteRArray{Bool, 0}(Reactant.XLA.AsyncBuffer(Reactant.XLA.Buffer(Ptr{Nothing} @0x0000000000000000), nothing), ())]
(traced_getfield(result, $(Expr(:quote, 1)))).data = (args[1]).data
(traced_getfield(result, $(Expr(:quote, 2)))).data = (args[1]).data
(traced_getfield(result, $(Expr(:quote, 3)))).data = (args[1]).data
#= /Users/mofeing/Developer/Reactant.jl/src/Compiler.jl:650 =#
return resultend
—
Reply to this email directly, view it on GitHub
<#161 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJTUXFKSBYO4L2Z573JTE3Z2CL4RAVCNFSM6AAAAABPMI4XLGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJVGI3DCOJQGU>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
I don't get it. Its behavior should be like the one on 0-dim array test since we are passing a 0-dim array and should return a vector. From the MLIR point of view, it should be calling |
ah I see, I think this is a bug on the new number and cat perhaps? |
I think we won't do any XLA call here, see the julia fallbacks for number: vcat(X::Number...) = hvcat_fill!(Vector{promote_typeof(X...)}(undef, length(X)), X)
hcat(X::Number...) = hvcat_fill!(Matrix{promote_typeof(X...)}(undef, 1,length(X)), X) It is going to fill into a regular array |
Should work now julia> function traced_vcat(x)
a = x[];
Float64[a; a; a]
end
traced_vcat (generic function with 1 method)
julia> @code_hlo optimize=false traced_vcat(x_concrete)
Module:
module {
func.func @main(%arg0: tensor<i1>) -> (tensor<3xf64>, tensor<i1>) {
%0 = stablehlo.transpose %arg0, dims = [] : (tensor<i1>) -> tensor<i1>
%1 = stablehlo.broadcast_in_dim %0, dims = [] : (tensor<i1>) -> tensor<1xi1>
%2 = stablehlo.broadcast_in_dim %0, dims = [] : (tensor<i1>) -> tensor<1xi1>
%3 = stablehlo.broadcast_in_dim %0, dims = [] : (tensor<i1>) -> tensor<1xi1>
%4 = stablehlo.convert %1 : (tensor<1xi1>) -> tensor<1xf64>
%5 = stablehlo.convert %2 : (tensor<1xi1>) -> tensor<1xf64>
%6 = stablehlo.convert %3 : (tensor<1xi1>) -> tensor<1xf64>
%7 = stablehlo.concatenate %4, %5, %6, dim = 0 : (tensor<1xf64>, tensor<1xf64>, tensor<1xf64>) -> tensor<3xf64>
%8 = stablehlo.transpose %7, dims = [0] : (tensor<3xf64>) -> tensor<3xf64>
%9 = stablehlo.transpose %0, dims = [] : (tensor<i1>) -> tensor<i1>
return %8, %9 : tensor<3xf64>, tensor<i1>
}
}
julia> @code_hlo traced_vcat(x_concrete)
Module:
module attributes {transform.with_named_sequence} {
func.func @main(%arg0: tensor<i1>) -> tensor<3xf64> {
%0 = stablehlo.convert %arg0 : (tensor<i1>) -> tensor<f64>
%1 = stablehlo.broadcast_in_dim %0, dims = [] : (tensor<f64>) -> tensor<3xf64>
return %1 : tensor<3xf64>
}
} |
currently very WIP