-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Active learning refactor #25
Conversation
Large updates to the structure of PotentialLearning.jl. This includes structs for containing various types of data (Energy, Force, Descriptors...) and scales of data (Configuration, Dataset...). Includes active learning components including features, distances, kernel methods. Includes initial fitting for linear and abstract potentials.
Adding learning modules, modifying subset selection methods, fine-tuning data and configuration types, adding dimension reduction. Working on tests.
Fixing bugs and errors
Thank you very much, I will review it carefully early next week. |
examples/Sodium/fit_sodium_dft.jl
Outdated
include("./src/PotentialLearning.jl") | ||
using .PotentialLearning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick we should be doing just using PotentialLearning
maybe needs a push!(Base.LOAD_PATH, dirname(@__DIR___))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call
examples/Sodium/fit_sodium_dft.jl
Outdated
# Import Raw Data | ||
|
||
energies, forces = JLD.load("examples/Sodium/data/liquify_sodium_dftk_calculations.jld", "energies", "forces"); | ||
e = [ ustrip(uconvert(u"eV", sum(collect(values(di))) * u"hartree")) for di in energies]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e = [ ustrip(uconvert(u"eV", sum(collect(values(di))) * u"hartree")) for di in energies]; | |
e = [ Energy(ustrip(uconvert(u"eV", sum(collect(values(di))) * u"hartree"))) for di in energies]; |
Co-authored-by: Valentin Churavy <[email protected]>
Co-authored-by: Valentin Churavy <[email protected]>
…mit/PotentialLearning.jl into active_learning_refactor
Busy week. I will look at it next week. |
Hi! :-) Some observations below
using Flux
# Range
xrange = 0:π/99:π
xs = [ Float32.([x1, x2]) for x1 in xrange for x2 in xrange]
# NN model: multi-layer perceptron (MLP)
mlp = Chain(Dense(2,4, Flux.σ),Dense(4,1))
ps_mlp = Flux.params(mlp)
E_mlp(x) = sum(mlp(x))
dE_mlp(E_mlp, x) = sum(gradient(E_mlp, x))
# Computing "nested" gradient
gradient(()->sum(sum.(dE_mlp.([E_mlp], xs))), ps_mlp) > Closes #22 and #21 - Have added DPP subset selector (as well as random subset selector). Unit testing is sufficient.
Thank you for your patience |
I would strongly recommend against using BFGS for neural ace training, let alone have it be a built in feature into PotentialLearning.jl. We can provide further scaffolding to allow for users to define their own optimizer, particularly in learn.jl
The data is assumed to be in the form of a DataSet struct and logprob is assumed to be a function that takes params::Vector{T} as input arguments and optimizes over those arguments. I think this is a fairly flexible approach to be taken that allows the users to transform their problem as needed.
Yes, will be fixed in the next push.
I suppose something like this works: using Flux
# Range
xrange = 0:π/99:π
xs = [ Float32.([x1, x2]) for x1 in xrange for x2 in xrange]
# NN model: multi-layer perceptron (MLP)
mlp = Chain(Dense(2,4, Flux.σ),Dense(4,1))
ps_mlp = Flux.params(mlp)
E(x) = sum(mlp(x))
dE(m) = x -> gradient(x -> sum(m(x)), x)[1]
gradient(()->sum(dE(mlp)(xs[1])), ps_mlp)
The linked code using HDBSCAN, which itself has some limitations vis-a-vis DPP subsampling. In any case, an implementation of HDBSCAN is already present in PotentialLearning.jl. Perhaps this can be made more robust, but that could be another issue.
The load_datasets function should be deprecated in favor of the new methods for dataset input and construction. The structures in this PR are better structured and more robust.
Not directly, I am adding all of the workflows to the /example folder, but in such a way that it is adapted to the current PR. |
Fixed an error in learn.jl, added workflow files and updated Project.toml
I am open to better options that can exceed the accuracy and performance of linear ACE. I think PotentialLearning.jl should provide built in features that allow users to leverage neural ACE (or non-linear ACE) advantages over linear ACE. Why BFGS here? Although (at least in my experience) this solver could not handle large NN models (several hundred parameters) and large datasets, these limitations can be addressed by using small NNs, subsampling of the original dataset, and minibatches during training. BFGS showed very fast convergence for small MLP ACE models (e.g. 225 parameters: a couple of dense layers with 8 nodes per layer) trained using medium size datasets (e.g. 57800 data points: energies and force components), something I could not get using other Flux solvers, like ADAM, which showed a slow convergence rate (and did not achieve the accuracy I need in the time I am willing to wait). Furthermore, larger NN models can be addressed by using parallel ensembles. It is this rapid convergence that allows for better accuracy (in different training and testing metrics) compared to linear ACE. BFGS is also used in NeuralPDE 1, 2, alone or combined with ADAM (btw, they are using Lux now) to train relatively small MLPs. You can find the codes I have been working on here and here. I can provide more details if you need them. Suggestions are welcome.
What I mean is whether mathematical transformations based on the Bayesian approach need to be performed on the data or loss functions before training. In other words, can any loss function commonly used to train NN potentials be considered a logprob?
👍
Good, based on these lines I put together an example closer to actual neural potential training. This code in particular consumes a substantial amount of memory, so I would like to give it another shot. Months ago something similar happened with the reverse mode over forward mode approach, it worked but very slowly. Suggestions are welcome. using Flux
# Range
xrange = 0:π/99:π
xs = [ Float32.([x1, x2]) for x1 in xrange for x2 in xrange]
# Target function: E
E_analytic(x) = sin(x[1]) * cos(x[2])
# Analytical gradient of E
dE_analytic(x) = [cos(x[1]) * cos(x[2]), -sin(x[1]) * sin(x[2])]
# NN model
mlp = Chain(Dense(2,4, Flux.σ),Dense(4,1))
ps_mlp = Flux.params(mlp)
E(x) = sum(mlp(x))
dE(mlp, x) = sum(gradient(x -> sum(mlp(x)), x))
# Loss
loss(x, y) = Flux.Losses.mse(x, y)
# Training
epochs = 10; opt = Flux.Adam(0.1)
for _ in 1:epochs
g = gradient(()->loss(reduce(vcat, dE.([mlp],xs)),
reduce(vcat, dE_analytic.(xs))), ps_mlp)
Flux.Optimise.update!(opt, ps_mlp, g)
l = loss(reduce(vcat, dE.([mlp],xs)),
reduce(vcat, dE_analytic.(xs)))
println("loss:", l)
# GC.gc()
end
I think the information about the progress related to HDBSCAN should be reported in #21. Then the issue could be closed if another issue with a follow-up is open. The new issue could have a more specific title that better represents the original description. A comparison between the two methods could be a nice publication. First the parallel Julia version of the method based on HDBSCAN (and/or other clustering algorithms) has to be implemented.
I think for the moment it is ok to deprecate load_datasets since it is not integrated to the new interfaces.
I agree. Just to make sure we are not duplicating efforts, I am currently working with the integration of the NACE, ACE, and SNAP folders in AtomisticComposableWorflows repo to the new interfaces. New observations:
|
To what extent? Right now PotentialLearning.jl implements a very simple training routine that is based on Flux (we can completely divest from Flux but I think that would be a separate issue). If the user wants a more bespoke routine, like I imagine they would most of the time, then they can implement the training themselves (especially if there is batch training and so forth). I believe that PotentialLearning.jl should mostly help with providing structure in the data and model. The debate about BFGS is moot and PotentialLearning.jl already provides enough flexibility for the user to tailor their own training method to their own needs.
This is somewhat philosophical, but essentially yes. I don't believe the rest of the discussion needs to hold up this pull request. Any changes to HDBSCAN can be made in future PRs and edits to the structure of NNBasis functions can occur in InteratomicBasisPotentials.jl .
Yes, this error is fixed in the upcoming push.
If you look at the definition of the function, you will find that it wraps the relevant code: function compute_features(ds::DataSet, f::Feature; dt = LocalDescriptors)
compute_feature.(ds, (f,); dt = dt)
end |
Fix to YAML
I agree that structuring the data and model is a fundamental part of this package. I also agree with the approach of providing basic training and leaving the door open to extending this functionality. At some point we will have to decide what extensions will be added to the package. I believe that new methods and Julia implementations born in CESMIX, such as new subsampling algorithms or new ML potentials, should live in the packages developed in this project even if it requires adding some extra code. In other words, I would like the Julia packages developed for CESMIX to shine for their interfaces and the workflows they enable but also for the novel methods implemented in them. However, I think the most practical thing to do now is to merge this PR and address this point in the future.
Addressed above. Please understand that I started writing about the use of this optimizer to address the comment about its integration into this package.
Okay. Actually my concern was practical rather than philosophical: whether it was necessary to transform the training data or loss function before using the new interfaces to comply with the Bayesian approach.
Good.
Good.
I will check this again after the merge. |
@dallasfoster if it's ok with you, I can merge the pull request. |
You have to finish your review first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
…com/cesmix-mit/PotentialLearning.jl into active_learning_refactor
Looks to solve many of the current open issues:
Major package refactor for #19, adds structs for readability of data. Sets up Energy, Force, LocalDescriptor, and ForceDescriptor structs so that we can make use of multiple dispatch on methods appropriate for each type of data. Comes up with Configuration level data struct (to hold information for a particular configuraiton of atoms) and a DataSet type (which is a vector of Configurations). Most of the methods in PotentialLearning operate on the DataSet type. This allows users to load all of their data into a DataSet, separate training and testing datasets, and either do dimension reduction, dpp active sampling, or fitting based on the DataSet struct. Ultimately this makes the code more reusable and general.
Adds documentation for #8 - each of the new methods has appropriate documentation. Unit testing is also in progress for each of the new methods.
Closes #23 - we have moved NNBasisPotential to InteratomicBasisPotentials.jl. There is some basic support for fitting neural network (and general loss functions) within PotentialLearning. More testing is needed.
Closes #24 - With the movement of NNBasisPotential to InteratomicBasisPotentials.jl, I have fixed the issue with gradients of gradients.
Closes #22 and #21 - Have added DPP subset selector (as well as random subset selector). Unit testing is sufficient.