Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matrix multiplication by static matrix #432

Open
jonniedie opened this issue Sep 19, 2022 · 3 comments
Open

Matrix multiplication by static matrix #432

jonniedie opened this issue Sep 19, 2022 · 3 comments

Comments

@jonniedie
Copy link

It looks like this is falling to the standard generic_matmatmul!, which errors on scalar indexing

julia> using LinearAlgebra, Metal, StaticArrays

julia> Metal.GPUArrays.allowscalar(false)

julia> A = one(SMatrix{3,3}); B = MtlArray(ones(Float32, 3, 1000)); C = copy(B);

julia> C .= A * B
ERROR: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] assertscalar(op::String)
   @ GPUArraysCore ~/.julia/packages/GPUArraysCore/lojQM/src/GPUArraysCore.jl:87
 [3] getindex
   @ ~/.julia/packages/GPUArrays/fqD8z/src/host/indexing.jl:9 [inlined]
 [4] _generic_matmatmul!(C::MtlArray{Float64, 2}, tA::Char, tB::Char, A::SMatrix{3, 3, Float64, 9}, B::MtlArray{Float32, 2}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
   @ LinearAlgebra /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/LinearAlgebra/src/matmul.jl:876
 [5] generic_matmatmul!(C::MtlArray{Float64, 2}, tA::Char, tB::Char, A::SMatrix{3, 3, Float64, 9}, B::MtlArray{Float32, 2}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
   @ LinearAlgebra /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/LinearAlgebra/src/matmul.jl:844
 [6] mul!
   @ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/LinearAlgebra/src/matmul.jl:303 [inlined]
 [7] mul!
   @ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/LinearAlgebra/src/matmul.jl:276 [inlined]
 [8] *(A::SMatrix{3, 3, Float64, 9}, B::MtlArray{Float32, 2})
   @ LinearAlgebra /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/LinearAlgebra/src/matmul.jl:141
 [9] top-level scope
   @ REPL[59]:1

It also happens if I skip straight to mul!

julia> mul!(C, A, B, true, false)
ERROR: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] assertscalar(op::String)
   @ GPUArraysCore ~/.julia/packages/GPUArraysCore/lojQM/src/GPUArraysCore.jl:87
 [3] getindex
   @ ~/.julia/packages/GPUArrays/fqD8z/src/host/indexing.jl:9 [inlined]
 [4] _generic_matmatmul!(C::MtlArray{Float32, 2}, tA::Char, tB::Char, A::SMatrix{3, 3, Float64, 9}, B::MtlArray{Float32, 2}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
   @ LinearAlgebra /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/LinearAlgebra/src/matmul.jl:876
 [5] generic_matmatmul!(C::MtlArray{Float32, 2}, tA::Char, tB::Char, A::SMatrix{3, 3, Float64, 9}, B::MtlArray{Float32, 2}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
   @ LinearAlgebra /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/LinearAlgebra/src/matmul.jl:844
 [6] mul!(C::MtlArray{Float32, 2}, A::SMatrix{3, 3, Float64, 9}, B::MtlArray{Float32, 2}, alpha::Bool, beta::Bool)
   @ LinearAlgebra /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/LinearAlgebra/src/matmul.jl:303
 [7] top-level scope
   @ REPL[60]:1

Here are the versions:

julia> versioninfo()
Julia Version 1.8.0
Commit 5544a0fab76 (2022-08-17 13:38 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.3.0)
  CPU: 8 × Apple M1
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
  Threads: 8 on 4 virtual cores
Environment:
  JULIA_NUM_THREADS = 8
  JULIA_EDITOR = code

(jl_eeJwW1) pkg> st
Status `/private/var/folders/jk/zbhsb6bd25b9swwhb_3d6v340000gn/T/jl_eeJwW1/Project.toml`
  [dde4c033] Metal v0.1.1
  [90137ffa] StaticArrays v1.5.7
  [37e2e46d] LinearAlgebra
@maleadt
Copy link
Member

maleadt commented Sep 22, 2022

I'm not sure if we should support this. None of the other GPU back-ends do either, and it's not clear where to draw the line in supporting multiplication (or other operations) with non-GPU array types. Currently we only support multiplication (and other operations) between (possibly wrapped) GPU arrays.

@jonniedie
Copy link
Author

I'd argue that StaticArrays should be given special consideration here (not just for Metal, but for all GPU backends). All of the other array packages are generally just thin wrappers that can be applied to any array and don't need any special consideration (since you can just wrap whatever array you have handy). StaticArrays are especially useful for GPU computations in that they can be instantiated on the fly without new allocations. It saves a ton of trouble from having to preallocate a container for some intermediate calculation.

@maleadt maleadt transferred this issue from JuliaGPU/Metal.jl Oct 3, 2022
@maleadt maleadt changed the title Matrix multiplication by static matrix gives scalar indexing error Matrix multiplication by static matrix Oct 3, 2022
@maleadt
Copy link
Member

maleadt commented Oct 3, 2022

Yeah, sure. I wouldn't be against a PR adding the necessary functionality, but it's not very critical so I won't myself get to work on this. Realistically, you'd also want to improve the GEMM implementation here, because the fallback is awfully slow (ideally making use of GEMMKernels.jl).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants