Skip to content
This repository has been archived by the owner on Nov 4, 2024. It is now read-only.

BatchNorm not defined (FluxMPI/Lux) #28

Closed
lwbhahahaha opened this issue Aug 7, 2023 · 1 comment · Fixed by #29
Closed

BatchNorm not defined (FluxMPI/Lux) #28

lwbhahahaha opened this issue Aug 7, 2023 · 1 comment · Fixed by #29
Labels
bug Something isn't working

Comments

@lwbhahahaha
Copy link

lwbhahahaha commented Aug 7, 2023

We are trying to duplicate the example here but we are getting an unexpected error saying BatchNorm is not defined

Here is the code combining Lux and FluxMPI

using Pkg
Pkg.activate("libs/")
using CUDA, Optimisers, FluxMPI, LuxCUDA, Lux, Random, Zygote
using NNlib, NNlibCUDA

FluxMPI.Init()
CUDA.allowscalar(false)

# Seeding
rng = Random.default_rng()
Random.seed!(rng, 0)

device = gpu_device()

model = Chain(Dense(2 => 4), BatchNorm(4, leakyrelu), Dense(4 => 2))
rng = Random.default_rng()
x = randn(rng, Float32, 2, 4) |> device
 
ps, st = Lux.setup(rng, model) .|> device

model(x, ps, st)

Here is the error

UndefVarError: `batchnorm` not defined

Stacktrace:
  [1] getproperty
    @ ./Base.jl:31 [inlined]
  [2] _batchnorm_cudnn!(running_mean::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, running_var::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, scale::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, bias::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, x::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, momentum::Float32, eps::Float32, #unused#::Val{true})
    @ LuxLibLuxCUDAExt ~/.julia/packages/LuxLib/2CJpI/ext/LuxLibLuxCUDAExt.jl:42
  [3] #batchnorm#1
    @ ~/.julia/packages/LuxLib/2CJpI/ext/LuxLibLuxCUDAExt.jl:30 [inlined]
  [4] batchnorm
    @ ~/.julia/packages/LuxLib/2CJpI/ext/LuxLibLuxCUDAExt.jl:20 [inlined]
  [5] (::BatchNorm{true, true, Float32, typeof(relu), typeof(zeros32), typeof(ones32)})(x::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, ps::NamedTuple{(:scale, :bias), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, st::NamedTuple{(:running_mean, :running_var, :training), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Val{true}}})
    @ Lux ~/.julia/packages/Lux/5YzHA/src/layers/normalize.jl:135
  [6] apply
    @ ~/.julia/packages/LuxCore/yC3wg/src/LuxCore.jl:100 [inlined]
  [7] macro expansion
    @ ~/.julia/packages/Lux/5YzHA/src/layers/containers.jl:0 [inlined]
  [8] applychain(layers::NamedTuple{(:layer_1, :layer_2, :layer_3), Tuple{Conv{2, true, 4, typeof(identity), typeof(glorot_uniform), typeof(zeros32)}, BatchNorm{true, true, Float32, typeof(relu), typeof(zeros32), typeof(ones32)}, Conv{2, true, 4, typeof(identity), typeof(glorot_uniform), typeof(zeros32)}}}, x::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, ps::NamedTuple{(:layer_1, :layer_2, :layer_3), Tuple{NamedTuple{(:weight, :bias), Tuple{CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}}}, NamedTuple{(:scale, :bias), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, NamedTuple{(:weight, :bias), Tuple{CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}}}}}, st::NamedTuple{(:layer_1, :layer_2, :layer_3), Tuple{NamedTuple{(), Tuple{}}, NamedTuple{(:running_mean, :running_var, :training), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Val{true}}}, NamedTuple{(), Tuple{}}}})
    @ Lux ~/.julia/packages/Lux/5YzHA/src/layers/containers.jl:493
  [9] (::Chain{NamedTuple{(:layer_1, :layer_2, :layer_3), Tuple{Conv{2, true, 4, typeof(identity), typeof(glorot_uniform), typeof(zeros32)}, BatchNorm{true, true, Float32, typeof(relu), typeof(zeros32), typeof(ones32)}, Conv{2, true, 4, typeof(identity), typeof(glorot_uniform), typeof(zeros32)}}}, Nothing})(x::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, ps::NamedTuple{(:layer_1, :layer_2, :layer_3), Tuple{NamedTuple{(:weight, :bias), Tuple{CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}}}, NamedTuple{(:scale, :bias), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, NamedTuple{(:weight, :bias), Tuple{CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}}}}}, st::NamedTuple{(:layer_1, :layer_2, :layer_3), Tuple{NamedTuple{(), Tuple{}}, NamedTuple{(:running_mean, :running_var, :training), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Val{true}}}, NamedTuple{(), Tuple{}}}})
    @ Lux ~/.julia/packages/Lux/5YzHA/src/layers/containers.jl:491
 [10] top-level scope
    @ ~/Desktop/Project BAC/BAC project/4_train_with_fluxMPI.ipynb:1

And the Project.toml:

[deps]
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
CairoMakie = "13f3f980-e62b-5c42-98c6-ff1f3baf88f0"
ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
Conda = "8f4d0f93-b110-5947-807f-2305c1781a2d"
CondaPkg = "992eb4ea-22a4-4c89-a5bb-47a3300528ab"
DICOM = "a26e6606-dd52-5f6a-a97f-4f611373d757"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
FastVision = "7bf02486-ff4c-4e73-b158-40c00866b54f"
Glob = "c27321d9-0574-5035-807b-f59d2c89b15c"
ImageDraw = "4381153b-2b60-58ae-a1ba-fd683676385f"
ImageView = "86fae568-95e7-573e-a6b2-d8a6b900c9ef"
Images = "916415d5-f1e6-5110-898d-aaa5f9f070e0"
JLD2 = "033835bb-8acc-5ee8-8aae-3f567f8a3819"
Lux = "b2108857-7c20-44ae-9111-449ecde12c47"
LuxCUDA = "d0bbae9a-e099-4d5b-a835-1c6931763bda"
MLDataPattern = "9920b226-0b2a-5f5f-9153-9aa70a013f8b"
MLUtils = "f1d291b0-491e-4a28-83b9-f70985020b54"
Metalhead = "dbeba491-748d-5e0e-a39e-b530a07fa0cc"
NNlib = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
NNlibCUDA = "a00861dc-f156-4864-bf3c-e6376f28a68d"
Optimisers = "3bd65402-5787-11e9-1adc-39752487f4e2"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
PyCall = "438e738f-606a-5dbb-bf0a-cddfbfd45ab0"
PythonCall = "6099a3de-0909-46bc-b1f4-468b9a2dfc0d"
Setfield = "efcf1570-3423-57d1-acb7-fd33fddbac46"
StaticArrays = "90137ffa-7385-5640-81b9-e52037218182"
XLSX = "fdbf4ff8-1666-58a4-91e7-1b58723a45e0"
Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"

We cleaned up the environment and below is the new Project.toml(Basically, we removed Flux, FastAI, FastVision, etc):

[deps]
Lux = "b2108857-7c20-44ae-9111-449ecde12c47"
LuxCUDA = "d0bbae9a-e099-4d5b-a835-1c6931763bda"Per
NNlib = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"

Now the model is able to produce outputs. As @avik-pal suggests on Zulip:"FastAI.jl is holding Flux and NNlib back. Can you remove that? This case should still be handled correctly in luxlib and is probably a bug. Can you open an issue so that I can patch this?", I'm creating a issue here.

@avik-pal avik-pal added the bug Something isn't working label Aug 8, 2023
@avik-pal
Copy link
Member

avik-pal commented Aug 8, 2023

I realized this is something that can't be fixed without user intervention. So I added a more meaningful error message telling the user what to do instead of an obscure getproperty failure.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants