Active learning refactor #25

dallasfoster · 2022-10-07T15:57:18Z

Looks to solve many of the current open issues:

Major package refactor for #19, adds structs for readability of data. Sets up Energy, Force, LocalDescriptor, and ForceDescriptor structs so that we can make use of multiple dispatch on methods appropriate for each type of data. Comes up with Configuration level data struct (to hold information for a particular configuraiton of atoms) and a DataSet type (which is a vector of Configurations). Most of the methods in PotentialLearning operate on the DataSet type. This allows users to load all of their data into a DataSet, separate training and testing datasets, and either do dimension reduction, dpp active sampling, or fitting based on the DataSet struct. Ultimately this makes the code more reusable and general.

Adds documentation for #8 - each of the new methods has appropriate documentation. Unit testing is also in progress for each of the new methods.

Closes #23 - we have moved NNBasisPotential to InteratomicBasisPotentials.jl. There is some basic support for fitting neural network (and general loss functions) within PotentialLearning. More testing is needed.

Closes #24 - With the movement of NNBasisPotential to InteratomicBasisPotentials.jl, I have fixed the issue with gradients of gradients.

Closes #22 and #21 - Have added DPP subset selector (as well as random subset selector). Unit testing is sufficient.

Large updates to the structure of PotentialLearning.jl. This includes structs for containing various types of data (Energy, Force, Descriptors...) and scales of data (Configuration, Dataset...). Includes active learning components including features, distances, kernel methods. Includes initial fitting for linear and abstract potentials.

Adding learning modules, modifying subset selection methods, fine-tuning data and configuration types, adding dimension reduction. Working on tests.

Fixing bugs and errors

emmanuellujan · 2022-10-07T16:27:47Z

Thank you very much, I will review it carefully early next week.

vchuravy · 2022-10-07T18:15:54Z

examples/Sodium/fit_sodium_dft.jl

+include("./src/PotentialLearning.jl")
+using .PotentialLearning


Nitpick we should be doing just using PotentialLearning maybe needs a push!(Base.LOAD_PATH, dirname(@__DIR___))

vchuravy · 2022-10-07T18:17:40Z

examples/Sodium/fit_sodium_dft.jl

+# Import Raw Data
+
+energies, forces = JLD.load("examples/Sodium/data/liquify_sodium_dftk_calculations.jld", "energies", "forces");
+e = [ ustrip(uconvert(u"eV", sum(collect(values(di))) * u"hartree")) for di in energies];


Suggested change

e = [ ustrip(uconvert(u"eV", sum(collect(values(di))) * u"hartree")) for di in energies];

e = [ Energy(ustrip(uconvert(u"eV", sum(collect(values(di))) * u"hartree"))) for di in energies];

src/Data/datatypes.jl

Co-authored-by: Valentin Churavy <[email protected]>

…mit/PotentialLearning.jl into active_learning_refactor

emmanuellujan · 2022-10-16T22:57:22Z

Busy week. I will look at it next week.

emmanuellujan · 2022-12-19T21:03:35Z

Hi! :-) Some observations below

Closes #23 - we have moved NNBasisPotential to InteratomicBasisPotentials.jl. There is some basic support for fitting neural network (and general loss functions) within PotentialLearning. More testing is needed.

Adding this functions (https://github.com/cesmix-mit/InteratomicBasisPotentials.jl/blob/c307c94fe6bbfae7f7c8062a03fc5d8de8a0b179/src/BasisPotentials/basis_potentials.jl) probably solves this issue ( Should this function be added to InteratomicBasisPotentials.jl? AtomisticComposableWorkflows#15). I am not sure about the units though, I can check it after the merge.
A key aspect of neural ace training is to be able to use BFGS. Flux does not provide this optimizer. To combine the NN model provided by Flux with BFGS provided by Optimization.jl, I used Flux.destructure. This function extracts the parameters of the NN model so they can interact nicely with the Optimization.jl interface. I am not sure how to train a Flux model with BFGS with this new code. In case it is not possible, we would need to re-incorporate this feature, I can address this after the merge. Perhaps another approach is to use FluxOptTools.jl.
Regarding the LearningProblem, should the datasets and/or loss functions (e.g. a sum of the squared differences of non-linear functions, its derivatives, etc) commonly used in NN potentials be transformed in any way to define the LearningProblem under the BI approach? E.g. logprob = something that depends on the loss
In the LearningProblem function, the gradient is computed based on f. Should this function be logprob?

Closes #24 - With the movement of NNBasisPotential to InteratomicBasisPotentials.jl, I have fixed the issue with gradients of gradients.

What would you modify in the following snippet to make it work using reverse mode over reverse mode and the latest version of Flux?

using Flux
# Range
xrange = 0:π/99:π
xs = [ Float32.([x1, x2]) for x1 in xrange for x2 in xrange]
# NN model: multi-layer perceptron (MLP)
mlp = Chain(Dense(2,4, Flux.σ),Dense(4,1))
ps_mlp = Flux.params(mlp)
E_mlp(x) = sum(mlp(x))
dE_mlp(E_mlp, x) = sum(gradient(E_mlp, x))
# Computing "nested" gradient
gradient(()->sum(sum.(dE_mlp.([E_mlp], xs))), ps_mlp)

> Closes #22 and #21 - Have added DPP subset selector (as well as random subset selector). Unit testing is sufficient.

I would not close Accelerate learning by automatically reducing the size of the training dataset. #21, the idea here is to show how much training can be accelerated for different potentials and atomic systems using subsampling. The code will be based on an improved version of this approach: https://github.com/argonne-lcf/active-learning-md/blob/master/workflow/activesample.py There is already a student assigned to this project.
This PR merged the code of Aug 8 version of main, there are new commits that are not currently merged in this version, in particular related to load_datasets function and metrics.
Should the workflows you uploaded here(https://github.com/cesmix-mit/AtomisticComposableWorkflows/tree/main/workflows) be moved to PotentialLearning.jl examples folder? (After this merge I want to move the stuff about nace/ace/snap, hyper-par opt, AL, and docs to this repo. Before that I need to adapt the code to the new interfaces.)

Thank you for your patience

dallasfoster · 2022-12-22T16:04:17Z

A key aspect of neural ace training is to be able to use BFGS. Flux does not provide this optimizer. To combine the NN model provided by Flux with BFGS provided by Optimization.jl, I used Flux.destructure. This function extracts the parameters of the NN model so they can interact nicely with the Optimization.jl interface. I am not sure how to train a Flux model with BFGS with this new code. In case it is not possible, we would need to re-incorporate this feature, I can address this after the merge. Perhaps another approach is to use FluxOptTools.jl.

I would strongly recommend against using BFGS for neural ace training, let alone have it be a built in feature into PotentialLearning.jl. We can provide further scaffolding to allow for users to define their own optimizer, particularly in learn.jl

Regarding the LearningProblem, should the datasets and/or loss functions (e.g. a sum of the squared differences of non-linear functions, its derivatives, etc) commonly used in NN potentials be transformed in any way to define the LearningProblem under the BI approach? E.g. logprob = something that depends on the loss

The data is assumed to be in the form of a DataSet struct and logprob is assumed to be a function that takes params::Vector{T} as input arguments and optimizes over those arguments. I think this is a fairly flexible approach to be taken that allows the users to transform their problem as needed.

In the LearningProblem function, the gradient is computed based on f. Should this function be logprob?

Yes, will be fixed in the next push.

Closes #24 - With the movement of NNBasisPotential to InteratomicBasisPotentials.jl, I have fixed the issue with gradients of gradients.

What would you modify in the following snippet to make it work using reverse mode over reverse mode and the latest version of Flux?
using Flux
# Range
xrange = 0:π/99:π
xs = [ Float32.([x1, x2]) for x1 in xrange for x2 in xrange]
# NN model: multi-layer perceptron (MLP)
mlp = Chain(Dense(2,4, Flux.σ),Dense(4,1))
ps_mlp = Flux.params(mlp)
E_mlp(x) = sum(mlp(x))
dE_mlp(E_mlp, x) = sum(gradient(E_mlp, x))
# Computing "nested" gradient
gradient(()->sum(sum.(dE_mlp.([E_mlp], xs))), ps_mlp)

I suppose something like this works:

using Flux
# Range
xrange = 0:π/99:π
xs = [ Float32.([x1, x2]) for x1 in xrange for x2 in xrange]
# NN model: multi-layer perceptron (MLP)
mlp = Chain(Dense(2,4, Flux.σ),Dense(4,1))
ps_mlp = Flux.params(mlp)
E(x) = sum(mlp(x))
dE(m) = x -> gradient(x -> sum(m(x)), x)[1] 
gradient(()->sum(dE(mlp)(xs[1])), ps_mlp)

I would not close Accelerate learning by automatically reducing the size of the training dataset. #21, the idea here is to show how much training can be accelerated for different potentials and atomic systems using subsampling. The code will be based on an improved version of this approach: https://github.com/argonne-lcf/active-learning-md/blob/master/workflow/activesample.py There is already a student assigned to this project.

The linked code using HDBSCAN, which itself has some limitations vis-a-vis DPP subsampling. In any case, an implementation of HDBSCAN is already present in PotentialLearning.jl. Perhaps this can be made more robust, but that could be another issue.

This PR merged the code of Aug 8 version of main, there are new commits that are not currently merged in this version, in particular related to load_datasets function and metrics.

The load_datasets function should be deprecated in favor of the new methods for dataset input and construction. The structures in this PR are better structured and more robust.

Should the workflows you uploaded here(https://github.com/cesmix-mit/AtomisticComposableWorkflows/tree/main/workflows) be moved to PotentialLearning.jl examples folder? (After this merge I want to move the stuff about nace/ace/snap, hyper-par opt, AL, and docs to this repo. Before that I need to adapt the code to the new interfaces.)

Not directly, I am adding all of the workflows to the /example folder, but in such a way that it is adapted to the current PR.

Fixed an error in learn.jl, added workflow files and updated Project.toml

emmanuellujan · 2022-12-23T22:20:34Z

A key aspect of neural ace training is to be able to use BFGS. Flux does not provide this optimizer. To combine the NN model provided by Flux with BFGS provided by Optimization.jl, I used Flux.destructure. This function extracts the parameters of the NN model so they can interact nicely with the Optimization.jl interface. I am not sure how to train a Flux model with BFGS with this new code. In case it is not possible, we would need to re-incorporate this feature, I can address this after the merge. Perhaps another approach is to use FluxOptTools.jl.

I would strongly recommend against using BFGS for neural ace training, let alone have it be a built in feature into PotentialLearning.jl. We can provide further scaffolding to allow for users to define their own optimizer, particularly in learn.jl

I am open to better options that can exceed the accuracy and performance of linear ACE. I think PotentialLearning.jl should provide built in features that allow users to leverage neural ACE (or non-linear ACE) advantages over linear ACE. Why BFGS here? Although (at least in my experience) this solver could not handle large NN models (several hundred parameters) and large datasets, these limitations can be addressed by using small NNs, subsampling of the original dataset, and minibatches during training. BFGS showed very fast convergence for small MLP ACE models (e.g. 225 parameters: a couple of dense layers with 8 nodes per layer) trained using medium size datasets (e.g. 57800 data points: energies and force components), something I could not get using other Flux solvers, like ADAM, which showed a slow convergence rate (and did not achieve the accuracy I need in the time I am willing to wait). Furthermore, larger NN models can be addressed by using parallel ensembles. It is this rapid convergence that allows for better accuracy (in different training and testing metrics) compared to linear ACE. BFGS is also used in NeuralPDE 1, 2, alone or combined with ADAM (btw, they are using Lux now) to train relatively small MLPs. You can find the codes I have been working on here and here. I can provide more details if you need them. Suggestions are welcome.

Regarding the LearningProblem, should the datasets and/or loss functions (e.g. a sum of the squared differences of non-linear functions, its derivatives, etc) commonly used in NN potentials be transformed in any way to define the LearningProblem under the BI approach? E.g. logprob = something that depends on the loss

The data is assumed to be in the form of a DataSet struct and logprob is assumed to be a function that takes params::Vector{T} as input arguments and optimizes over those arguments. I think this is a fairly flexible approach to be taken that allows the users to transform their problem as needed.

What I mean is whether mathematical transformations based on the Bayesian approach need to be performed on the data or loss functions before training. In other words, can any loss function commonly used to train NN potentials be considered a logprob?

In the LearningProblem function, the gradient is computed based on f. Should this function be logprob?

Yes, will be fixed in the next push.

👍

I suppose something like this works:
...dE(m) = x -> gradient(x -> sum(m(x)), x)[1]
gradient(()->sum(dE(mlp)(xs[1])), ps_mlp)

Good, based on these lines I put together an example closer to actual neural potential training. This code in particular consumes a substantial amount of memory, so I would like to give it another shot. Months ago something similar happened with the reverse mode over forward mode approach, it worked but very slowly. Suggestions are welcome.

using Flux

# Range
xrange = 0:π/99:π
xs = [ Float32.([x1, x2]) for x1 in xrange for x2 in xrange]

# Target function: E
E_analytic(x) = sin(x[1]) * cos(x[2])

# Analytical gradient of E
dE_analytic(x) = [cos(x[1]) * cos(x[2]), -sin(x[1]) * sin(x[2])]

# NN model
mlp = Chain(Dense(2,4, Flux.σ),Dense(4,1))
ps_mlp = Flux.params(mlp)
E(x) = sum(mlp(x))
dE(mlp, x) = sum(gradient(x -> sum(mlp(x)), x))

# Loss
loss(x, y) = Flux.Losses.mse(x, y)

# Training
epochs = 10; opt = Flux.Adam(0.1)
for _ in 1:epochs
    g = gradient(()->loss(reduce(vcat, dE.([mlp],xs)),
                          reduce(vcat, dE_analytic.(xs))), ps_mlp)
    Flux.Optimise.update!(opt, ps_mlp, g)
    l = loss(reduce(vcat, dE.([mlp],xs)),
             reduce(vcat, dE_analytic.(xs)))
    println("loss:", l)
    # GC.gc()
end

I would not close Accelerate learning by automatically reducing the size of the training dataset. #21, the idea here is to show how much training can be accelerated for different potentials and atomic systems using subsampling. The code will be based on an improved version of this approach: https://github.com/argonne-lcf/active-learning-md/blob/master/workflow/activesample.py There is already a student assigned to this project.

The linked code using HDBSCAN, which itself has some limitations vis-a-vis DPP subsampling. In any case, an implementation of HDBSCAN is already present in PotentialLearning.jl. Perhaps this can be made more robust, but that could be another issue.

I think the information about the progress related to HDBSCAN should be reported in #21. Then the issue could be closed if another issue with a follow-up is open. The new issue could have a more specific title that better represents the original description. A comparison between the two methods could be a nice publication. First the parallel Julia version of the method based on HDBSCAN (and/or other clustering algorithms) has to be implemented.

This PR merged the code of Aug 8 version of main, there are new commits that are not currently merged in this version, in particular related to load_datasets function and metrics.

The load_datasets function should be deprecated in favor of the new methods for dataset input and construction. The structures in this PR are better structured and more robust.

I think for the moment it is ok to deprecate load_datasets since it is not integrated to the new interfaces.

Should the workflows you uploaded here(https://github.com/cesmix-mit/AtomisticComposableWorkflows/tree/main/workflows) be moved to PotentialLearning.jl examples folder? (After this merge I want to move the stuff about nace/ace/snap, hyper-par opt, AL, and docs to this repo. Before that I need to adapt the code to the new interfaces.)

Not directly, I am adding all of the workflows to the /example folder, but in such a way that it is adapted to the current PR.

I agree. Just to make sure we are not duplicating efforts, I am currently working with the integration of the NACE, ACE, and SNAP folders in AtomisticComposableWorflows repo to the new interfaces.

New observations:

In fit_sodium_dft.jl,
- "examples/Sodium/data/liquify_sodium_dftk_calculations.jld" seems to be missing
- Should YAML(u"eV", u"Å") be something like YAML(:Na, u"eV", u"Å")
In the LinearProblem
- Should this line: true, compute_features(ds, GlobalSum()), get_values.(get_energy.(ds))
be replaced by something like this: true, compute_feature.(get_local_descriptors.(ds_train_e), [GlobalSum()]), get_values.

dallasfoster · 2022-12-29T20:24:43Z

I am open to better options that can exceed the accuracy and performance of linear ACE. I think PotentialLearning.jl should provide built in features that allow users to leverage neural ACE (or non-linear ACE) advantages over linear ACE.

To what extent? Right now PotentialLearning.jl implements a very simple training routine that is based on Flux (we can completely divest from Flux but I think that would be a separate issue). If the user wants a more bespoke routine, like I imagine they would most of the time, then they can implement the training themselves (especially if there is batch training and so forth). I believe that PotentialLearning.jl should mostly help with providing structure in the data and model.

The debate about BFGS is moot and PotentialLearning.jl already provides enough flexibility for the user to tailor their own training method to their own needs.

What I mean is whether mathematical transformations based on the Bayesian approach need to be performed on the data or loss functions before training. In other words, can any loss function commonly used to train NN potentials be considered a logprob?

This is somewhat philosophical, but essentially yes.

I don't believe the rest of the discussion needs to hold up this pull request. Any changes to HDBSCAN can be made in future PRs and edits to the structure of NNBasis functions can occur in InteratomicBasisPotentials.jl .

New observations:

In fit_sodium_dft.jl,
- "examples/Sodium/data/liquify_sodium_dftk_calculations.jld" seems to be missing
- Should YAML(u"eV", u"Å") be something like YAML(:Na, u"eV", u"Å")

Yes, this error is fixed in the upcoming push.

In the LinearProblem
- Should this line: true, compute_features(ds, GlobalSum()), get_values.(get_energy.(ds))
be replaced by something like this: true, compute_feature.(get_local_descriptors.(ds_train_e), [GlobalSum()]), get_values.

If you look at the definition of the function, you will find that it wraps the relevant code:

function compute_features(ds::DataSet, f::Feature; dt = LocalDescriptors)
    compute_feature.(ds, (f,); dt = dt)
end

Fix to YAML

emmanuellujan · 2023-01-03T03:47:15Z

I am open to better options that can exceed the accuracy and performance of linear ACE. I think PotentialLearning.jl should provide built in features that allow users to leverage neural ACE (or non-linear ACE) advantages over linear ACE.

To what extent? Right now PotentialLearning.jl implements a very simple training routine that is based on Flux (we can completely divest from Flux but I think that would be a separate issue). If the user wants a more bespoke routine, like I imagine they would most of the time, then they can implement the training themselves (especially if there is batch training and so forth). I believe that PotentialLearning.jl should mostly help with providing structure in the data and model.

I agree that structuring the data and model is a fundamental part of this package. I also agree with the approach of providing basic training and leaving the door open to extending this functionality. At some point we will have to decide what extensions will be added to the package. I believe that new methods and Julia implementations born in CESMIX, such as new subsampling algorithms or new ML potentials, should live in the packages developed in this project even if it requires adding some extra code. In other words, I would like the Julia packages developed for CESMIX to shine for their interfaces and the workflows they enable but also for the novel methods implemented in them. However, I think the most practical thing to do now is to merge this PR and address this point in the future.

The debate about BFGS is moot and PotentialLearning.jl already provides enough flexibility for the user to tailor their own training method to their own needs.

Addressed above. Please understand that I started writing about the use of this optimizer to address the comment about its integration into this package.

What I mean is whether mathematical transformations based on the Bayesian approach need to be performed on the data or loss functions before training. In other words, can any loss function commonly used to train NN potentials be considered a logprob?

This is somewhat philosophical, but essentially yes.

Okay. Actually my concern was practical rather than philosophical: whether it was necessary to transform the training data or loss function before using the new interfaces to comply with the Bayesian approach.

I don't believe the rest of the discussion needs to hold up this pull request. Any changes to HDBSCAN can be made in future PRs and edits to the structure of NNBasis functions can occur in InteratomicBasisPotentials.jl .

Good.

New observations:
In fit_sodium_dft.jl,
- "examples/Sodium/data/liquify_sodium_dftk_calculations.jld" seems to be missing

Should YAML(u"eV", u"Å") be something like YAML(:Na, u"eV", u"Å")

Yes, this error is fixed in the upcoming push.

Good.

In the LinearProblem

Should this line: true, compute_features(ds, GlobalSum()), get_values.(get_energy.(ds)) be replaced by something like this: true, compute_feature.(get_local_descriptors.(ds_train_e), [GlobalSum()]), get_values.

If you look at the definition of the function, you will find that it wraps the relevant code:
function compute_features(ds::DataSet, f::Feature; dt = LocalDescriptors)
compute_feature.(ds, (f,); dt = dt)
end

I will check this again after the merge.

emmanuellujan · 2023-01-17T19:06:31Z

@dallasfoster if it's ok with you, I can merge the pull request.

dallasfoster · 2023-01-18T15:08:39Z

You have to finish your review first.

emmanuellujan

.

…com/cesmix-mit/PotentialLearning.jl into active_learning_refactor

dallasfoster added 5 commits May 21, 2022 14:27

add Active Learning code

12eca60

Merge branch 'main' into active_learning_refactor

ba71d99

Big updates v2

6e2472f

Adding learning modules, modifying subset selection methods, fine-tuning data and configuration types, adding dimension reduction. Working on tests.

Updates through testing

a5ef60b

Fixing bugs and errors

dallasfoster requested review from vchuravy and emmanuellujan October 7, 2022 15:57

vchuravy reviewed Oct 7, 2022

View reviewed changes

dallasfoster and others added 4 commits October 10, 2022 08:21

Update src/Data/datatypes.jl

912042e

Co-authored-by: Valentin Churavy <[email protected]>

Update src/Data/datatypes.jl

d135ed8

Co-authored-by: Valentin Churavy <[email protected]>

bug fixing for pr

ae37749

Merge branch 'active_learning_refactor' of https://github.com/cesmix-…

49f68af

…mit/PotentialLearning.jl into active_learning_refactor

small fixes

fbe7b34

Minor updates and workflow additions.

ffa32a1

Fixed an error in learn.jl, added workflow files and updated Project.toml

Minor bug fix

a742296

Fix to YAML

This was referenced Jan 3, 2023

Calculation of the gradient of a loss function that requires the calculation of the gradient of a Flux model. #24

Closed

Accelerate learning by automatically reducing the size of the training dataset. #21

Closed

emmanuellujan approved these changes Jan 18, 2023

View reviewed changes

Completing merge conflicts for Merge branch 'main' of https://github.…

eaf2b5a

…com/cesmix-mit/PotentialLearning.jl into active_learning_refactor

dallasfoster merged commit d79d53e into main Feb 1, 2023

swyant deleted the active_learning_refactor branch April 13, 2023 07:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Active learning refactor #25

Active learning refactor #25

dallasfoster commented Oct 7, 2022

emmanuellujan commented Oct 7, 2022

vchuravy Oct 7, 2022

dallasfoster Oct 10, 2022

vchuravy Oct 7, 2022

emmanuellujan commented Oct 16, 2022

emmanuellujan commented Dec 19, 2022 •

edited

Loading

dallasfoster commented Dec 22, 2022

emmanuellujan commented Dec 23, 2022

dallasfoster commented Dec 29, 2022

emmanuellujan commented Jan 3, 2023

emmanuellujan commented Jan 17, 2023

dallasfoster commented Jan 18, 2023

emmanuellujan left a comment

		include("./src/PotentialLearning.jl")
		using .PotentialLearning

	e = [ ustrip(uconvert(u"eV", sum(collect(values(di))) * u"hartree")) for di in energies];
	e = [ Energy(ustrip(uconvert(u"eV", sum(collect(values(di))) * u"hartree"))) for di in energies];

Active learning refactor #25

Active learning refactor #25

Conversation

dallasfoster commented Oct 7, 2022

emmanuellujan commented Oct 7, 2022

vchuravy Oct 7, 2022

Choose a reason for hiding this comment

dallasfoster Oct 10, 2022

Choose a reason for hiding this comment

vchuravy Oct 7, 2022

Choose a reason for hiding this comment

emmanuellujan commented Oct 16, 2022

emmanuellujan commented Dec 19, 2022 • edited Loading

dallasfoster commented Dec 22, 2022

emmanuellujan commented Dec 23, 2022

dallasfoster commented Dec 29, 2022

emmanuellujan commented Jan 3, 2023

emmanuellujan commented Jan 17, 2023

dallasfoster commented Jan 18, 2023

emmanuellujan left a comment

Choose a reason for hiding this comment

emmanuellujan commented Dec 19, 2022 •

edited

Loading