Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Active learning refactor #25

Merged
merged 13 commits into from
Feb 1, 2023
Merged

Active learning refactor #25

merged 13 commits into from
Feb 1, 2023

Conversation

dallasfoster
Copy link
Collaborator

Looks to solve many of the current open issues:

Major package refactor for #19, adds structs for readability of data. Sets up Energy, Force, LocalDescriptor, and ForceDescriptor structs so that we can make use of multiple dispatch on methods appropriate for each type of data. Comes up with Configuration level data struct (to hold information for a particular configuraiton of atoms) and a DataSet type (which is a vector of Configurations). Most of the methods in PotentialLearning operate on the DataSet type. This allows users to load all of their data into a DataSet, separate training and testing datasets, and either do dimension reduction, dpp active sampling, or fitting based on the DataSet struct. Ultimately this makes the code more reusable and general.

Adds documentation for #8 - each of the new methods has appropriate documentation. Unit testing is also in progress for each of the new methods.

Closes #23 - we have moved NNBasisPotential to InteratomicBasisPotentials.jl. There is some basic support for fitting neural network (and general loss functions) within PotentialLearning. More testing is needed.

Closes #24 - With the movement of NNBasisPotential to InteratomicBasisPotentials.jl, I have fixed the issue with gradients of gradients.

Closes #22 and #21 - Have added DPP subset selector (as well as random subset selector). Unit testing is sufficient.

Large updates to the structure of PotentialLearning.jl. This includes structs for containing various types of data (Energy, Force, Descriptors...) and scales of data (Configuration, Dataset...). Includes active learning components including features, distances, kernel methods. Includes initial fitting for linear and abstract potentials.
Adding learning modules, modifying subset selection methods, fine-tuning data and configuration types, adding dimension reduction. Working on tests.
Fixing bugs and errors
@emmanuellujan
Copy link
Member

Thank you very much, I will review it carefully early next week.

Comment on lines 7 to 8
include("./src/PotentialLearning.jl")
using .PotentialLearning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick we should be doing just using PotentialLearning maybe needs a push!(Base.LOAD_PATH, dirname(@__DIR___))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call

# Import Raw Data

energies, forces = JLD.load("examples/Sodium/data/liquify_sodium_dftk_calculations.jld", "energies", "forces");
e = [ ustrip(uconvert(u"eV", sum(collect(values(di))) * u"hartree")) for di in energies];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
e = [ ustrip(uconvert(u"eV", sum(collect(values(di))) * u"hartree")) for di in energies];
e = [ Energy(ustrip(uconvert(u"eV", sum(collect(values(di))) * u"hartree"))) for di in energies];

src/Data/datatypes.jl Show resolved Hide resolved
src/Data/datatypes.jl Show resolved Hide resolved
@emmanuellujan
Copy link
Member

Busy week. I will look at it next week.

@emmanuellujan
Copy link
Member

emmanuellujan commented Dec 19, 2022

Hi! :-) Some observations below

Closes #23 - we have moved NNBasisPotential to InteratomicBasisPotentials.jl. There is some basic support for fitting neural network (and general loss functions) within PotentialLearning. More testing is needed.

  • Adding this functions (https://github.com/cesmix-mit/InteratomicBasisPotentials.jl/blob/c307c94fe6bbfae7f7c8062a03fc5d8de8a0b179/src/BasisPotentials/basis_potentials.jl) probably solves this issue ( Should this function be added to InteratomicBasisPotentials.jl? AtomisticComposableWorkflows#15). I am not sure about the units though, I can check it after the merge.

  • A key aspect of neural ace training is to be able to use BFGS. Flux does not provide this optimizer. To combine the NN model provided by Flux with BFGS provided by Optimization.jl, I used Flux.destructure. This function extracts the parameters of the NN model so they can interact nicely with the Optimization.jl interface. I am not sure how to train a Flux model with BFGS with this new code. In case it is not possible, we would need to re-incorporate this feature, I can address this after the merge. Perhaps another approach is to use FluxOptTools.jl.

  • Regarding the LearningProblem, should the datasets and/or loss functions (e.g. a sum of the squared differences of non-linear functions, its derivatives, etc) commonly used in NN potentials be transformed in any way to define the LearningProblem under the BI approach? E.g. logprob = something that depends on the loss

  • In the LearningProblem function, the gradient is computed based on f. Should this function be logprob? 

Closes #24 - With the movement of NNBasisPotential to InteratomicBasisPotentials.jl, I have fixed the issue with gradients of gradients.

  • What would you modify in the following snippet to make it work using reverse mode over reverse mode and the latest version of Flux?
using Flux
# Range
xrange = 0:π/99:π
xs = [ Float32.([x1, x2]) for x1 in xrange for x2 in xrange]
# NN model: multi-layer perceptron (MLP)
mlp = Chain(Dense(2,4, Flux.σ),Dense(4,1))
ps_mlp = Flux.params(mlp)
E_mlp(x) = sum(mlp(x))
dE_mlp(E_mlp, x) = sum(gradient(E_mlp, x))
# Computing "nested" gradient
gradient(()->sum(sum.(dE_mlp.([E_mlp], xs))), ps_mlp)

 > Closes #22 and #21 - Have added DPP subset selector (as well as random subset selector). Unit testing is sufficient.

Thank you for your patience

@dallasfoster
Copy link
Collaborator Author

  • A key aspect of neural ace training is to be able to use BFGS. Flux does not provide this optimizer. To combine the NN model provided by Flux with BFGS provided by Optimization.jl, I used Flux.destructure. This function extracts the parameters of the NN model so they can interact nicely with the Optimization.jl interface. I am not sure how to train a Flux model with BFGS with this new code. In case it is not possible, we would need to re-incorporate this feature, I can address this after the merge. Perhaps another approach is to use FluxOptTools.jl.

I would strongly recommend against using BFGS for neural ace training, let alone have it be a built in feature into PotentialLearning.jl. We can provide further scaffolding to allow for users to define their own optimizer, particularly in learn.jl

  • Regarding the LearningProblem, should the datasets and/or loss functions (e.g. a sum of the squared differences of non-linear functions, its derivatives, etc) commonly used in NN potentials be transformed in any way to define the LearningProblem under the BI approach? E.g. logprob = something that depends on the loss

The data is assumed to be in the form of a DataSet struct and logprob is assumed to be a function that takes params::Vector{T} as input arguments and optimizes over those arguments. I think this is a fairly flexible approach to be taken that allows the users to transform their problem as needed.

  • In the LearningProblem function, the gradient is computed based on f. Should this function be logprob?

Yes, will be fixed in the next push.

Closes #24 - With the movement of NNBasisPotential to InteratomicBasisPotentials.jl, I have fixed the issue with gradients of gradients.

  • What would you modify in the following snippet to make it work using reverse mode over reverse mode and the latest version of Flux?
using Flux
# Range
xrange = 0:π/99:π
xs = [ Float32.([x1, x2]) for x1 in xrange for x2 in xrange]
# NN model: multi-layer perceptron (MLP)
mlp = Chain(Dense(2,4, Flux.σ),Dense(4,1))
ps_mlp = Flux.params(mlp)
E_mlp(x) = sum(mlp(x))
dE_mlp(E_mlp, x) = sum(gradient(E_mlp, x))
# Computing "nested" gradient
gradient(()->sum(sum.(dE_mlp.([E_mlp], xs))), ps_mlp)

I suppose something like this works:

using Flux
# Range
xrange = 0:π/99:π
xs = [ Float32.([x1, x2]) for x1 in xrange for x2 in xrange]
# NN model: multi-layer perceptron (MLP)
mlp = Chain(Dense(2,4, Flux.σ),Dense(4,1))
ps_mlp = Flux.params(mlp)
E(x) = sum(mlp(x))
dE(m) = x -> gradient(x -> sum(m(x)), x)[1] 
gradient(()->sum(dE(mlp)(xs[1])), ps_mlp)

The linked code using HDBSCAN, which itself has some limitations vis-a-vis DPP subsampling. In any case, an implementation of HDBSCAN is already present in PotentialLearning.jl. Perhaps this can be made more robust, but that could be another issue.

  • This PR merged the code of Aug 8 version of main, there are new commits that are not currently merged in this version, in particular related to load_datasets function and metrics.

The load_datasets function should be deprecated in favor of the new methods for dataset input and construction. The structures in this PR are better structured and more robust.

Not directly, I am adding all of the workflows to the /example folder, but in such a way that it is adapted to the current PR.

Fixed an error in learn.jl, added workflow files and updated Project.toml
@emmanuellujan
Copy link
Member

A key aspect of neural ace training is to be able to use BFGS. Flux does not provide this optimizer. To combine the NN model provided by Flux with BFGS provided by Optimization.jl, I used Flux.destructure. This function extracts the parameters of the NN model so they can interact nicely with the Optimization.jl interface. I am not sure how to train a Flux model with BFGS with this new code. In case it is not possible, we would need to re-incorporate this feature, I can address this after the merge. Perhaps another approach is to use FluxOptTools.jl.

I would strongly recommend against using BFGS for neural ace training, let alone have it be a built in feature into PotentialLearning.jl. We can provide further scaffolding to allow for users to define their own optimizer, particularly in learn.jl

I am open to better options that can exceed the accuracy and performance of linear ACE. I think PotentialLearning.jl should provide built in features that allow users to leverage neural ACE (or non-linear ACE) advantages over linear ACE. Why BFGS here? Although (at least in my experience) this solver could not handle large NN models (several hundred parameters) and large datasets, these limitations can be addressed by using small NNs, subsampling of the original dataset, and minibatches during training. BFGS showed very fast convergence for small MLP ACE models (e.g. 225 parameters: a couple of dense layers with 8 nodes per layer) trained using medium size datasets (e.g. 57800 data points: energies and force components), something I could not get using other Flux solvers, like ADAM, which showed a slow convergence rate (and did not achieve the accuracy I need in the time I am willing to wait). Furthermore, larger NN models can be addressed by using parallel ensembles. It is this rapid convergence that allows for better accuracy (in different training and testing metrics) compared to linear ACE. BFGS is also used in NeuralPDE 1, 2, alone or combined with ADAM (btw, they are using Lux now) to train relatively small MLPs. You can find the codes I have been working on here and here. I can provide more details if you need them. Suggestions are welcome.

Regarding the LearningProblem, should the datasets and/or loss functions (e.g. a sum of the squared differences of non-linear functions, its derivatives, etc) commonly used in NN potentials be transformed in any way to define the LearningProblem under the BI approach? E.g. logprob = something that depends on the loss

The data is assumed to be in the form of a DataSet struct and logprob is assumed to be a function that takes params::Vector{T} as input arguments and optimizes over those arguments. I think this is a fairly flexible approach to be taken that allows the users to transform their problem as needed.

What I mean is whether mathematical transformations based on the Bayesian approach need to be performed on the data or loss functions before training. In other words, can any loss function commonly used to train NN potentials be considered a logprob?

In the LearningProblem function, the gradient is computed based on f. Should this function be logprob?

Yes, will be fixed in the next push.

👍

I suppose something like this works:
...dE(m) = x -> gradient(x -> sum(m(x)), x)[1]
gradient(()->sum(dE(mlp)(xs[1])), ps_mlp)

Good, based on these lines I put together an example closer to actual neural potential training. This code in particular consumes a substantial amount of memory, so I would like to give it another shot. Months ago something similar happened with the reverse mode over forward mode approach, it worked but very slowly. Suggestions are welcome.

using Flux

# Range
xrange = 0:π/99:π
xs = [ Float32.([x1, x2]) for x1 in xrange for x2 in xrange]

# Target function: E
E_analytic(x) = sin(x[1]) * cos(x[2])

# Analytical gradient of E
dE_analytic(x) = [cos(x[1]) * cos(x[2]), -sin(x[1]) * sin(x[2])]

# NN model
mlp = Chain(Dense(2,4, Flux.σ),Dense(4,1))
ps_mlp = Flux.params(mlp)
E(x) = sum(mlp(x))
dE(mlp, x) = sum(gradient(x -> sum(mlp(x)), x))

# Loss
loss(x, y) = Flux.Losses.mse(x, y)

# Training
epochs = 10; opt = Flux.Adam(0.1)
for _ in 1:epochs
    g = gradient(()->loss(reduce(vcat, dE.([mlp],xs)),
                          reduce(vcat, dE_analytic.(xs))), ps_mlp)
    Flux.Optimise.update!(opt, ps_mlp, g)
    l = loss(reduce(vcat, dE.([mlp],xs)),
             reduce(vcat, dE_analytic.(xs)))
    println("loss:", l)
    # GC.gc()
end

I would not close Accelerate learning by automatically reducing the size of the training dataset. #21, the idea here is to show how much training can be accelerated for different potentials and atomic systems using subsampling. The code will be based on an improved version of this approach: https://github.com/argonne-lcf/active-learning-md/blob/master/workflow/activesample.py  There is already a student assigned to this project.

The linked code using HDBSCAN, which itself has some limitations vis-a-vis DPP subsampling. In any case, an implementation of HDBSCAN is already present in PotentialLearning.jl. Perhaps this can be made more robust, but that could be another issue.

I think the information about the progress related to HDBSCAN should be reported in #21. Then the issue could be closed if another issue with a follow-up is open. The new issue could have a more specific title that better represents the original description. A comparison between the two methods could be a nice publication. First the parallel Julia version of the method based on HDBSCAN (and/or other clustering algorithms) has to be implemented.

This PR merged the code of Aug 8 version of main, there are new commits that are not currently merged in this version, in particular related to load_datasets function and metrics.

The load_datasets function should be deprecated in favor of the new methods for dataset input and construction. The structures in this PR are better structured and more robust.

I think for the moment it is ok to deprecate load_datasets since it is not integrated to the new interfaces.

Should the workflows you uploaded here(https://github.com/cesmix-mit/AtomisticComposableWorkflows/tree/main/workflows) be moved to PotentialLearning.jl examples folder? (After this merge I want to move the stuff about nace/ace/snap, hyper-par opt, AL, and docs to this repo. Before that I need to adapt the code to the new interfaces.)

Not directly, I am adding all of the workflows to the /example folder, but in such a way that it is adapted to the current PR.

I agree. Just to make sure we are not duplicating efforts, I am currently working with the integration of the NACE, ACE, and SNAP folders in AtomisticComposableWorflows repo to the new interfaces.

New observations:

  • In fit_sodium_dft.jl,
         - "examples/Sodium/data/liquify_sodium_dftk_calculations.jld" seems to be missing
         - Should YAML(u"eV", u"Å") be something like YAML(:Na, u"eV", u"Å")
  • In the LinearProblem
        -   Should this line: true,  compute_features(ds, GlobalSum()), get_values.(get_energy.(ds))
            be replaced by something like this: true, compute_feature.(get_local_descriptors.(ds_train_e), [GlobalSum()]), get_values.
           
           

@dallasfoster
Copy link
Collaborator Author

I am open to better options that can exceed the accuracy and performance of linear ACE. I think PotentialLearning.jl should provide built in features that allow users to leverage neural ACE (or non-linear ACE) advantages over linear ACE.

To what extent? Right now PotentialLearning.jl implements a very simple training routine that is based on Flux (we can completely divest from Flux but I think that would be a separate issue). If the user wants a more bespoke routine, like I imagine they would most of the time, then they can implement the training themselves (especially if there is batch training and so forth). I believe that PotentialLearning.jl should mostly help with providing structure in the data and model.

The debate about BFGS is moot and PotentialLearning.jl already provides enough flexibility for the user to tailor their own training method to their own needs.

What I mean is whether mathematical transformations based on the Bayesian approach need to be performed on the data or loss functions before training. In other words, can any loss function commonly used to train NN potentials be considered a logprob?

This is somewhat philosophical, but essentially yes.

I don't believe the rest of the discussion needs to hold up this pull request. Any changes to HDBSCAN can be made in future PRs and edits to the structure of NNBasis functions can occur in InteratomicBasisPotentials.jl .

New observations:

  • In fit_sodium_dft.jl,
         - "examples/Sodium/data/liquify_sodium_dftk_calculations.jld" seems to be missing
         - Should YAML(u"eV", u"Å") be something like YAML(:Na, u"eV", u"Å")

Yes, this error is fixed in the upcoming push.

  • In the LinearProblem
        -   Should this line: true,  compute_features(ds, GlobalSum()), get_values.(get_energy.(ds))
            be replaced by something like this: true, compute_feature.(get_local_descriptors.(ds_train_e), [GlobalSum()]), get_values.

If you look at the definition of the function, you will find that it wraps the relevant code:

function compute_features(ds::DataSet, f::Feature; dt = LocalDescriptors)
    compute_feature.(ds, (f,); dt = dt)
end

Fix to YAML
@emmanuellujan
Copy link
Member

I am open to better options that can exceed the accuracy and performance of linear ACE. I think PotentialLearning.jl should provide built in features that allow users to leverage neural ACE (or non-linear ACE) advantages over linear ACE.

To what extent? Right now PotentialLearning.jl implements a very simple training routine that is based on Flux (we can completely divest from Flux but I think that would be a separate issue). If the user wants a more bespoke routine, like I imagine they would most of the time, then they can implement the training themselves (especially if there is batch training and so forth). I believe that PotentialLearning.jl should mostly help with providing structure in the data and model.

I agree that structuring the data and model is a fundamental part of this package. I also agree with the approach of providing basic training and leaving the door open to extending this functionality. At some point we will have to decide what extensions will be added to the package. I believe that new methods and Julia implementations born in CESMIX, such as new subsampling algorithms or new ML potentials, should live in the packages developed in this project even if it requires adding some extra code. In other words, I would like the Julia packages developed for CESMIX to shine for their interfaces and the workflows they enable but also for the novel methods implemented in them. However, I think the most practical thing to do now is to merge this PR and address this point in the future.

The debate about BFGS is moot and PotentialLearning.jl already provides enough flexibility for the user to tailor their own training method to their own needs.

Addressed above. Please understand that I started writing about the use of this optimizer to address the comment about its integration into this package.

What I mean is whether mathematical transformations based on the Bayesian approach need to be performed on the data or loss functions before training. In other words, can any loss function commonly used to train NN potentials be considered a logprob?

This is somewhat philosophical, but essentially yes.

Okay. Actually my concern was practical rather than philosophical: whether it was necessary to transform the training data or loss function before using the new interfaces to comply with the Bayesian approach.

I don't believe the rest of the discussion needs to hold up this pull request. Any changes to HDBSCAN can be made in future PRs and edits to the structure of NNBasis functions can occur in InteratomicBasisPotentials.jl .

Good.

New observations:
In fit_sodium_dft.jl,
- "examples/Sodium/data/liquify_sodium_dftk_calculations.jld" seems to be missing

  • Should YAML(u"eV", u"Å") be something like YAML(:Na, u"eV", u"Å")

Yes, this error is fixed in the upcoming push.

Good.

In the LinearProblem

  • Should this line: true, compute_features(ds, GlobalSum()), get_values.(get_energy.(ds)) be replaced by something like this: true, compute_feature.(get_local_descriptors.(ds_train_e), [GlobalSum()]), get_values.

If you look at the definition of the function, you will find that it wraps the relevant code:
function compute_features(ds::DataSet, f::Feature; dt = LocalDescriptors)
compute_feature.(ds, (f,); dt = dt)
end

I will check this again after the merge.

@emmanuellujan
Copy link
Member

@dallasfoster if it's ok with you, I can merge the pull request.

@dallasfoster
Copy link
Collaborator Author

You have to finish your review first.

Copy link
Member

@emmanuellujan emmanuellujan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@dallasfoster dallasfoster merged commit d79d53e into main Feb 1, 2023
@swyant swyant deleted the active_learning_refactor branch April 13, 2023 07:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants