Calculation of the gradient of a loss function that requires the calculation of the gradient of a Flux model. #24

emmanuellujan · 2022-09-30T21:43:10Z

Calculating the gradient of a loss function that requires computing the gradient of the energy, which is currently defined by a Flux neural network model. I did not find a clean and performant way to do this. For the moment I am computing the gradient of the neural network model "analytically", particularly, the gradient of a feed-forward neural network using relu as activation function (see here). In addition, I had to use Flux.destructure to extract the parameters of the model.
Links related to this issue:

Solving this issue enables working with many types of neural network architectures, not only FFNN. For example, it allows experimenting with different models in this script which helps to find the optimal hyperparameters of the model

emmanuellujan · 2023-01-03T21:02:02Z

Progress on Active learning refactor #25 by @dallasfoster
See also the following issue in Zygote: nested gradient fails with "Mutating arrays is not supported" FluxML/Zygote.jl#1244.
Based on PR above, next example is closer to actual neural potential training. This code addresses reverse mode over reverse mode with the latest version of Flux. It consumes a substantial amount of memory, so it would be good to iterate it. Months ago something similar happened with the reverse mode over forward mode approach, it worked but very slowly.

using Flux

# Range
xrange = 0:π/99:π
xs = [ Float32.([x1, x2]) for x1 in xrange for x2 in xrange]

# Target function: E
E_analytic(x) = sin(x[1]) * cos(x[2])

# Analytical gradient of E
dE_analytic(x) = [cos(x[1]) * cos(x[2]), -sin(x[1]) * sin(x[2])]

# NN model
mlp = Chain(Dense(2,4, Flux.σ),Dense(4,1))
ps_mlp = Flux.params(mlp)
E(x) = sum(mlp(x))
dE(mlp, x) = sum(gradient(x -> sum(mlp(x)), x))

# Loss
loss(x, y) = Flux.Losses.mse(x, y)

# Training
epochs = 10; opt = Flux.Adam(0.1)
for _ in 1:epochs
    g = gradient(()->loss(reduce(vcat, dE.([mlp],xs)),
                          reduce(vcat, dE_analytic.(xs))), ps_mlp)
    Flux.Optimise.update!(opt, ps_mlp, g)
    l = loss(reduce(vcat, dE.([mlp],xs)),
             reduce(vcat, dE_analytic.(xs)))
    println("loss:", l)
    # GC.gc()
end

Note: NeuralPDE, a project with similarities to this one (training of MLPs using ML Julia abstractions) moved from Flux to Lux.

emmanuellujan changed the title ~~Calculation of the gradient of the loss function that requires the calculation of the gradient of a Flux model.~~ Calculation of the gradient of a loss function that requires the calculation of the gradient of a Flux model. Sep 30, 2022

dallasfoster mentioned this issue Oct 7, 2022

Active learning refactor #25

Merged

dallasfoster closed this as completed in #25 Feb 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculation of the gradient of a loss function that requires the calculation of the gradient of a Flux model. #24

Calculation of the gradient of a loss function that requires the calculation of the gradient of a Flux model. #24

emmanuellujan commented Sep 30, 2022 •

edited

Loading

emmanuellujan commented Jan 3, 2023

Calculation of the gradient of a loss function that requires the calculation of the gradient of a Flux model. #24

Calculation of the gradient of a loss function that requires the calculation of the gradient of a Flux model. #24

Comments

emmanuellujan commented Sep 30, 2022 • edited Loading

emmanuellujan commented Jan 3, 2023

emmanuellujan commented Sep 30, 2022 •

edited

Loading