Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix reset_search_direction! failure when training with GPU #1034

Merged
merged 2 commits into from
Aug 7, 2023
Merged

fix reset_search_direction! failure when training with GPU #1034

merged 2 commits into from
Aug 7, 2023

Conversation

wei3li
Copy link
Contributor

@wei3li wei3li commented Mar 24, 2023

When training with GPU, sometimes the following error will occur, causing function optimize failure.

ERROR: GPU compilation of kernel #broadcast_kernel#28(CUDA.CuKernelContext, CuDeviceMatrix{Float64, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(identity), Tuple{Base.Broadcast.Extruded{Matrix{Float64}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, Int64) failed
KernelError: passing and using non-bitstype argument

Argument 4 to your kernel function is of type Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(identity), Tuple{Base.Broadcast.Extruded{Matrix{Float64}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, which is not isbits:
  .args is of type Tuple{Base.Broadcast.Extruded{Matrix{Float64}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}} which is not isbits.
    .1 is of type Base.Broadcast.Extruded{Matrix{Float64}, Tuple{Bool, Bool}, Tuple{Int64, Int64}} which is not isbits.
      .x is of type Matrix{Float64} which is not isbits.


Stacktrace:
  [1] check_invocation(job::GPUCompiler.CompilerJob)
    @ GPUCompiler ~/code/Programs/.julia/packages/GPUCompiler/S3TWf/src/validation.jl:88
  [2] macro expansion
    @ ~/code/Programs/.julia/packages/GPUCompiler/S3TWf/src/driver.jl:154 [inlined]
  [3] macro expansion
    @ ~/code/Programs/.julia/packages/TimerOutputs/LHjFw/src/TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ ~/code/Programs/.julia/packages/GPUCompiler/S3TWf/src/driver.jl:152 [inlined]
  [5] emit_julia(job::GPUCompiler.CompilerJob; validate::Bool)
    @ GPUCompiler ~/code/Programs/.julia/packages/GPUCompiler/S3TWf/src/utils.jl:83
  [6] emit_julia
    @ ~/code/Programs/.julia/packages/GPUCompiler/S3TWf/src/utils.jl:77 [inlined]
  [7] cufunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
    @ CUDA ~/code/Programs/.julia/packages/CUDA/ZdCxS/src/compiler/execution.jl:359
  [8] #221
    @ ~/code/Programs/.julia/packages/CUDA/ZdCxS/src/compiler/execution.jl:354 [inlined]
  [9] JuliaContext(f::CUDA.var"#221#222"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#28", Tuple{CUDA.CuKernelContext, CuDeviceMatrix{Float64, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(identity), Tuple{Base.Broadcast.Extruded{Matrix{Float64}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, Int64}}}})
    @ GPUCompiler ~/code/Programs/.julia/packages/GPUCompiler/S3TWf/src/driver.jl:76
 [10] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/code/Programs/.julia/packages/CUDA/ZdCxS/src/compiler/execution.jl:353
 [11] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler ~/code/Programs/.julia/packages/GPUCompiler/S3TWf/src/cache.jl:90
 [12] cufunction(f::GPUArrays.var"#broadcast_kernel#28", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceMatrix{Float64, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(identity), Tuple{Base.Broadcast.Extruded{Matrix{Float64}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, Int64}}; name::Nothing, always_inline::Bool, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/code/Programs/.julia/packages/CUDA/ZdCxS/src/compiler/execution.jl:306
 [13] cufunction
    @ ~/code/Programs/.julia/packages/CUDA/ZdCxS/src/compiler/execution.jl:300 [inlined]
 [14] macro expansion
    @ ~/code/Programs/.julia/packages/CUDA/ZdCxS/src/compiler/execution.jl:102 [inlined]
 [15] #launch_heuristic#245
    @ ~/code/Programs/.julia/packages/CUDA/ZdCxS/src/gpuarrays.jl:17 [inlined]
 [16] _copyto!
    @ ~/code/Programs/.julia/packages/GPUArrays/XR4WO/src/host/broadcast.jl:65 [inlined]
 [17] materialize!
    @ ~/code/Programs/.julia/packages/GPUArrays/XR4WO/src/host/broadcast.jl:41 [inlined]
 [18] materialize!
    @ ./broadcast.jl:868 [inlined]
 [19] reset_search_direction!(state::Optim.BFGSState{CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, d::Optim.ManifoldObjective{OnceDifferentiable{Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}}, method::BFGS{LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Nothing, Nothing, Flat})
    @ Optim ~/code/Programs/.julia/packages/Optim/tP8PJ/src/utilities/perform_linesearch.jl:17
 [20] perform_linesearch!(state::Optim.BFGSState{CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, method::BFGS{LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Nothing, Nothing, Flat}, d::Optim.ManifoldObjective{OnceDifferentiable{Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}})
    @ Optim ~/code/Programs/.julia/packages/Optim/tP8PJ/src/utilities/perform_linesearch.jl:45
 [21] update_state!(d::OnceDifferentiable{Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, state::Optim.BFGSState{CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, method::BFGS{LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Nothing, Nothing, Flat})
    @ Optim ~/code/Programs/.julia/packages/Optim/tP8PJ/src/multivariate/solvers/first_order/bfgs.jl:139
 [22] optimize(d::OnceDifferentiable{Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, initial_x::CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, method::BFGS{LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Nothing, Nothing, Flat}, options::Optim.Options{Float64, Nothing}, state::Optim.BFGSState{CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}})
    @ Optim ~/code/Programs/.julia/packages/Optim/tP8PJ/src/multivariate/optimize/optimize.jl:54
 [23] optimize
    @ ~/code/Programs/.julia/packages/Optim/tP8PJ/src/multivariate/optimize/optimize.jl:36 [inlined]
 [24] optimize(f::NLSolversBase.InplaceObjective{Nothing, var"#fg!#8"{typeof(loss)}, Nothing, Nothing, Nothing}, initial_x::CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, method::BFGS{LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Nothing, Nothing, Flat}, options::Optim.Options{Float64, Nothing}; inplace::Bool, autodiff::Symbol)
    @ Optim ~/code/Programs/.julia/packages/Optim/tP8PJ/src/multivariate/optimize/interface.jl:142
 [25] optimize(f::NLSolversBase.InplaceObjective{Nothing, var"#fg!#8"{typeof(loss)}, Nothing, Nothing, Nothing}, initial_x::CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, method::BFGS{LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Nothing, Nothing, Flat}, options::Optim.Options{Float64, Nothing})
    @ Optim ~/code/Programs/.julia/packages/Optim/tP8PJ/src/multivariate/optimize/interface.jl:141

This is because the code is trying to broadcast between memory and GPU. The constructor Matrix will build a matrix in computer memory, however, in the case of training with GPU, state.invH is a CuArray, which is in GPU.

@wei3li
Copy link
Contributor Author

wei3li commented Mar 26, 2023

Hi @pkofod, could you please take a look at this pull request at your convenience?

@pkofod
Copy link
Member

pkofod commented Apr 15, 2023

Thank you, yes it seems strange to require Matrix at that point when it was not required when initializing the types. I would request that you add a test of this bugfix. Thanks!

@codecov
Copy link

codecov bot commented Apr 15, 2023

Codecov Report

Merging #1034 (8ad4fab) into master (1f1258c) will decrease coverage by 0.04%.
The diff coverage is 100.00%.

❗ Current head 8ad4fab differs from pull request most recent head 1bfc4d8. Consider uploading reports for the commit 1bfc4d8 to get more accurate results

@@            Coverage Diff             @@
##           master    #1034      +/-   ##
==========================================
- Coverage   85.40%   85.36%   -0.04%     
==========================================
  Files          43       43              
  Lines        3199     3198       -1     
==========================================
- Hits         2732     2730       -2     
- Misses        467      468       +1     
Impacted Files Coverage Δ
src/utilities/perform_linesearch.jl 88.57% <100.00%> (-0.32%) ⬇️

... and 1 file with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@wei3li
Copy link
Contributor Author

wei3li commented Apr 17, 2023

I would request that you add a test of this bugfix.

Hi @pkofod, this bug only occurs when optimizing with the BFGS method in GPU. After reviewing the current test cases, I noticed that none of them run in GPU. I am uncertain if it is wise to introduce GPU tests solely for this bug fix. Considering that the fix passes all current CPU test cases, would it be better to keep it as is?

@pkofod pkofod merged commit 934cee0 into JuliaNLSolvers:master Aug 7, 2023
@pkofod
Copy link
Member

pkofod commented Aug 7, 2023

Thanks

@ChrisRackauckas
Copy link
Contributor

This isn't fully generic so it breaks a lot downstream. Can it be reverted or fixed to allow generic arrays?

@pkofod
Copy link
Member

pkofod commented Oct 7, 2023

I suppose it’s irrelevant given SciML/NeuralPDE.jl#751 (comment) ?

@pkofod
Copy link
Member

pkofod commented Oct 7, 2023

I think the original “gpu compatability” was made by a sciml contributor, but apparently a test that tested the sciml relevant code was not added.

@ChrisRackauckas
Copy link
Contributor

It's not irrelevant, the downstream fix there was done by upper bounding Optim in Optimization.jl

@pkofod
Copy link
Member

pkofod commented Oct 7, 2023

What works for componentarrays? Scale*I+0*state.invH i suppose?

@pkofod
Copy link
Member

pkofod commented Oct 7, 2023

@wei3li could you try master / 1.7.8 when it's tagged on your GPU problem? This should have used the internal functions we use initially to set the invH-matrices :) (_init_identity_matrix)

@wei3li
Copy link
Contributor Author

wei3li commented Oct 10, 2023

@wei3li could you try master / 1.7.8 when it's tagged on your GPU problem? This should have used the internal functions we use initially to set the invH-matrices :) (_init_identity_matrix)

Hi @pkofod, the GPU problem I was experiencing has been resolved in version v1.7.8. Thank you for the update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants