fix `reset_search_direction!` failure when training with GPU #1034

wei3li · 2023-03-24T05:30:30Z

When training with GPU, sometimes the following error will occur, causing function optimize failure.

ERROR: GPU compilation of kernel #broadcast_kernel#28(CUDA.CuKernelContext, CuDeviceMatrix{Float64, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(identity), Tuple{Base.Broadcast.Extruded{Matrix{Float64}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, Int64) failed
KernelError: passing and using non-bitstype argument

Argument 4 to your kernel function is of type Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(identity), Tuple{Base.Broadcast.Extruded{Matrix{Float64}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, which is not isbits:
  .args is of type Tuple{Base.Broadcast.Extruded{Matrix{Float64}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}} which is not isbits.
    .1 is of type Base.Broadcast.Extruded{Matrix{Float64}, Tuple{Bool, Bool}, Tuple{Int64, Int64}} which is not isbits.
      .x is of type Matrix{Float64} which is not isbits.


Stacktrace:
  [1] check_invocation(job::GPUCompiler.CompilerJob)
    @ GPUCompiler ~/code/Programs/.julia/packages/GPUCompiler/S3TWf/src/validation.jl:88
  [2] macro expansion
    @ ~/code/Programs/.julia/packages/GPUCompiler/S3TWf/src/driver.jl:154 [inlined]
  [3] macro expansion
    @ ~/code/Programs/.julia/packages/TimerOutputs/LHjFw/src/TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ ~/code/Programs/.julia/packages/GPUCompiler/S3TWf/src/driver.jl:152 [inlined]
  [5] emit_julia(job::GPUCompiler.CompilerJob; validate::Bool)
    @ GPUCompiler ~/code/Programs/.julia/packages/GPUCompiler/S3TWf/src/utils.jl:83
  [6] emit_julia
    @ ~/code/Programs/.julia/packages/GPUCompiler/S3TWf/src/utils.jl:77 [inlined]
  [7] cufunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
    @ CUDA ~/code/Programs/.julia/packages/CUDA/ZdCxS/src/compiler/execution.jl:359
  [8] #221
    @ ~/code/Programs/.julia/packages/CUDA/ZdCxS/src/compiler/execution.jl:354 [inlined]
  [9] JuliaContext(f::CUDA.var"#221#222"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#28", Tuple{CUDA.CuKernelContext, CuDeviceMatrix{Float64, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(identity), Tuple{Base.Broadcast.Extruded{Matrix{Float64}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, Int64}}}})
    @ GPUCompiler ~/code/Programs/.julia/packages/GPUCompiler/S3TWf/src/driver.jl:76
 [10] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/code/Programs/.julia/packages/CUDA/ZdCxS/src/compiler/execution.jl:353
 [11] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler ~/code/Programs/.julia/packages/GPUCompiler/S3TWf/src/cache.jl:90
 [12] cufunction(f::GPUArrays.var"#broadcast_kernel#28", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceMatrix{Float64, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(identity), Tuple{Base.Broadcast.Extruded{Matrix{Float64}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, Int64}}; name::Nothing, always_inline::Bool, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/code/Programs/.julia/packages/CUDA/ZdCxS/src/compiler/execution.jl:306
 [13] cufunction
    @ ~/code/Programs/.julia/packages/CUDA/ZdCxS/src/compiler/execution.jl:300 [inlined]
 [14] macro expansion
    @ ~/code/Programs/.julia/packages/CUDA/ZdCxS/src/compiler/execution.jl:102 [inlined]
 [15] #launch_heuristic#245
    @ ~/code/Programs/.julia/packages/CUDA/ZdCxS/src/gpuarrays.jl:17 [inlined]
 [16] _copyto!
    @ ~/code/Programs/.julia/packages/GPUArrays/XR4WO/src/host/broadcast.jl:65 [inlined]
 [17] materialize!
    @ ~/code/Programs/.julia/packages/GPUArrays/XR4WO/src/host/broadcast.jl:41 [inlined]
 [18] materialize!
    @ ./broadcast.jl:868 [inlined]
 [19] reset_search_direction!(state::Optim.BFGSState{CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, d::Optim.ManifoldObjective{OnceDifferentiable{Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}}, method::BFGS{LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Nothing, Nothing, Flat})
    @ Optim ~/code/Programs/.julia/packages/Optim/tP8PJ/src/utilities/perform_linesearch.jl:17
 [20] perform_linesearch!(state::Optim.BFGSState{CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, method::BFGS{LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Nothing, Nothing, Flat}, d::Optim.ManifoldObjective{OnceDifferentiable{Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}})
    @ Optim ~/code/Programs/.julia/packages/Optim/tP8PJ/src/utilities/perform_linesearch.jl:45
 [21] update_state!(d::OnceDifferentiable{Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, state::Optim.BFGSState{CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, method::BFGS{LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Nothing, Nothing, Flat})
    @ Optim ~/code/Programs/.julia/packages/Optim/tP8PJ/src/multivariate/solvers/first_order/bfgs.jl:139
 [22] optimize(d::OnceDifferentiable{Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}}, initial_x::CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, method::BFGS{LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Nothing, Nothing, Flat}, options::Optim.Options{Float64, Nothing}, state::Optim.BFGSState{CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, Float64, CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}})
    @ Optim ~/code/Programs/.julia/packages/Optim/tP8PJ/src/multivariate/optimize/optimize.jl:54
 [23] optimize
    @ ~/code/Programs/.julia/packages/Optim/tP8PJ/src/multivariate/optimize/optimize.jl:36 [inlined]
 [24] optimize(f::NLSolversBase.InplaceObjective{Nothing, var"#fg!#8"{typeof(loss)}, Nothing, Nothing, Nothing}, initial_x::CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, method::BFGS{LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Nothing, Nothing, Flat}, options::Optim.Options{Float64, Nothing}; inplace::Bool, autodiff::Symbol)
    @ Optim ~/code/Programs/.julia/packages/Optim/tP8PJ/src/multivariate/optimize/interface.jl:142
 [25] optimize(f::NLSolversBase.InplaceObjective{Nothing, var"#fg!#8"{typeof(loss)}, Nothing, Nothing, Nothing}, initial_x::CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, method::BFGS{LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Nothing, Nothing, Flat}, options::Optim.Options{Float64, Nothing})
    @ Optim ~/code/Programs/.julia/packages/Optim/tP8PJ/src/multivariate/optimize/interface.jl:141

This is because the code is trying to broadcast between memory and GPU. The constructor Matrix will build a matrix in computer memory, however, in the case of training with GPU, state.invH is a CuArray, which is in GPU.

wei3li · 2023-03-26T23:03:40Z

Hi @pkofod, could you please take a look at this pull request at your convenience?

pkofod · 2023-04-15T12:21:02Z

Thank you, yes it seems strange to require Matrix at that point when it was not required when initializing the types. I would request that you add a test of this bugfix. Thanks!

codecov · 2023-04-15T12:29:47Z

Codecov Report

Merging #1034 (8ad4fab) into master (1f1258c) will decrease coverage by 0.04%.
The diff coverage is 100.00%.

❗ Current head 8ad4fab differs from pull request most recent head 1bfc4d8. Consider uploading reports for the commit 1bfc4d8 to get more accurate results

@@            Coverage Diff             @@
##           master    #1034      +/-   ##
==========================================
- Coverage   85.40%   85.36%   -0.04%     
==========================================
  Files          43       43              
  Lines        3199     3198       -1     
==========================================
- Hits         2732     2730       -2     
- Misses        467      468       +1

Impacted Files	Coverage Δ
src/utilities/perform_linesearch.jl	`88.57% <100.00%> (-0.32%)`	⬇️

... and 1 file with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

wei3li · 2023-04-17T00:38:16Z

I would request that you add a test of this bugfix.

Hi @pkofod, this bug only occurs when optimizing with the BFGS method in GPU. After reviewing the current test cases, I noticed that none of them run in GPU. I am uncertain if it is wise to introduce GPU tests solely for this bug fix. Considering that the fix passes all current CPU test cases, would it be better to keep it as is?

pkofod · 2023-08-07T13:04:56Z

Thanks

ChrisRackauckas · 2023-10-06T20:47:31Z

This isn't fully generic so it breaks a lot downstream. Can it be reverted or fixed to allow generic arrays?

pkofod · 2023-10-07T11:39:52Z

I suppose it’s irrelevant given SciML/NeuralPDE.jl#751 (comment) ?

pkofod · 2023-10-07T11:43:39Z

I think the original “gpu compatability” was made by a sciml contributor, but apparently a test that tested the sciml relevant code was not added.

ChrisRackauckas · 2023-10-07T11:45:43Z

It's not irrelevant, the downstream fix there was done by upper bounding Optim in Optimization.jl

pkofod · 2023-10-07T12:00:07Z

What works for componentarrays? Scale*I+0*state.invH i suppose?

pkofod · 2023-10-07T17:33:19Z

@wei3li could you try master / 1.7.8 when it's tagged on your GPU problem? This should have used the internal functions we use initially to set the invH-matrices :) (_init_identity_matrix)

wei3li · 2023-10-10T03:42:15Z

@wei3li could you try master / 1.7.8 when it's tagged on your GPU problem? This should have used the internal functions we use initially to set the invH-matrices :) (_init_identity_matrix)

Hi @pkofod, the GPU problem I was experiencing has been resolved in version v1.7.8. Thank you for the update!

fix reset_search_direction! error when training with GPU

4a12ebc

fix reduced codecov

1bfc4d8

pkofod approved these changes Aug 7, 2023

View reviewed changes

pkofod merged commit 934cee0 into JuliaNLSolvers:master Aug 7, 2023

ChrisRackauckas mentioned this pull request Oct 6, 2023

Investigating doc failure SciML/NeuralPDE.jl#751

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix `reset_search_direction!` failure when training with GPU #1034

fix `reset_search_direction!` failure when training with GPU #1034

wei3li commented Mar 24, 2023

wei3li commented Mar 26, 2023

pkofod commented Apr 15, 2023 •

edited

Loading

codecov bot commented Apr 15, 2023 •

edited

Loading

wei3li commented Apr 17, 2023

pkofod commented Aug 7, 2023

ChrisRackauckas commented Oct 6, 2023

pkofod commented Oct 7, 2023

pkofod commented Oct 7, 2023

ChrisRackauckas commented Oct 7, 2023

pkofod commented Oct 7, 2023 •

edited

Loading

pkofod commented Oct 7, 2023

wei3li commented Oct 10, 2023

fix reset_search_direction! failure when training with GPU #1034

fix reset_search_direction! failure when training with GPU #1034

Conversation

wei3li commented Mar 24, 2023

wei3li commented Mar 26, 2023

pkofod commented Apr 15, 2023 • edited Loading

codecov bot commented Apr 15, 2023 • edited Loading

Codecov Report

wei3li commented Apr 17, 2023

pkofod commented Aug 7, 2023

ChrisRackauckas commented Oct 6, 2023

pkofod commented Oct 7, 2023

pkofod commented Oct 7, 2023

ChrisRackauckas commented Oct 7, 2023

pkofod commented Oct 7, 2023 • edited Loading

pkofod commented Oct 7, 2023

wei3li commented Oct 10, 2023

fix `reset_search_direction!` failure when training with GPU #1034

fix `reset_search_direction!` failure when training with GPU #1034

pkofod commented Apr 15, 2023 •

edited

Loading

codecov bot commented Apr 15, 2023 •

edited

Loading

pkofod commented Oct 7, 2023 •

edited

Loading