Skip to content

Commit

Permalink
Merge pull request #236 from prisms-center/0.2a2
Browse files Browse the repository at this point in the history
0.2a2
  • Loading branch information
bpuchala authored Aug 14, 2016
2 parents ebd0e00 + 98e283c commit d0c341e
Show file tree
Hide file tree
Showing 27 changed files with 859 additions and 299 deletions.
161 changes: 97 additions & 64 deletions INSTALL.md

Large diffs are not rendered by default.

13 changes: 4 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ This version of CASM supports:
- Occupational degrees of freedom.
- High-throughput calculations using:
- VASP: [https://www.vasp.at](https://www.vasp.at)
- Semi-Grand canonical Monte Carlo calculations

CASM is updated frequently with support for new effective Hamiltonians, new interfaces for first-principles electronic structure codes, and new Monte Carlo methods. Collaboration is welcome and new features can be incorporated by forking the repository on GitHub, creating a new feature, and submitting pull requests. If you are interested in developing features that involve a significant time investment we encourage you to first contact the CASM development team at <[email protected]>.

Expand Down Expand Up @@ -57,7 +58,7 @@ CASM is developed by the Van der Ven group, originally at the University of Mich

**Developers**: John Goiri and Anirudh Natarajan.

**Other contributors**: Min-Hua Chen, Jonathon Bechtel, Max Radin, Elizabeth Decolvenaere and Anna Belak
**Other contributors**: Min-Hua Chen, Jonathon Bechtel, Max Radin, Elizabeth Decolvenaere, Anna Belak, Liang Tian, and Naga Sri Harsha Gunda

#### Acknowledgements ####

Expand Down Expand Up @@ -89,7 +90,7 @@ See INSTALL.md

The ``casm`` executable includes extensive help documentation describing the various commands and options. Simply executing ``casm`` will display a list of possible commands, and executing ``casm <cmd> -h`` will display help documentation particular to the chosen command.

For a beginner, the best place to start is to follow the suggestions printed when calling ``casm status -n``. This provides step-by-step instructions for creating a CASM project, generating symmetry information, setting composition axes, enumerating configurations, calculating energies with VASP, setting reference states, and fitting an effective Hamiltonian. The subcommand ``casm format`` provides information on the directory structure of the CASM project and the format of all the CASM files.
For a beginner, the best place to start is to follow the suggestions printed when calling ``casm status -n``. This provides step-by-step instructions for creating a CASM project, generating symmetry information, setting composition axes, enumerating configurations, calculating energies with VASP, setting reference states, and fitting an effective Hamiltonian using the program ``casm-learn``. ``casm-learn`` provides The subcommand ``casm format`` provides information on the directory structure of the CASM project and the format of all the CASM files.

All that is needed to start a new project is a ``prim.json`` file describing the crystal structure of the material being studied. See ``casm format --prim`` for a description and examples. Typically one will create a new project directory containing the ``prim.json`` file and then initialize the casm project. For example:

Expand All @@ -108,15 +109,9 @@ All that is needed to start a new project is a ``prim.json`` file describing the

After initializing a casm project:

- ``casm`` generates code that is compiled and linked at runtime in order to evaluate effective Hamiltonians in a highly optimized manner. If you installed the CASM header files in a location that is not in your default search path you must specify in your CASM project settings where to find the header files. You can inspect the current settings via ``casm settings -l``, and then add the correct include path via ``casm settings --set-compile-options``. For example:

casm settings --set-compile-options 'g++ -O3 -Wall -fPIC --std=c++11 -I/path/to/include/casm'

- Shared object compilation options may be set via ``casm settings --set-so-options``. For example (using the default settings):
- ``casm`` generates code that is compiled and linked at runtime in order to evaluate effective Hamiltonians in a highly optimized manner. If you installed the CASM header files and libraries in a location that is not in your default search path you must specify where to find them. Often the default compilation options work well, but there are some cases when the c++ compiler, compiler flags, or shared object construction flags might need to be customized. You can inspect the current settings via ``casm settings -l`` and options to change them via ``casm settings --desc``.

casm settings --set-so-options 'g++ -shared -lboost_system'


An html tutorial describing the creation of an example CASM project and typical steps is coming soon.


6 changes: 3 additions & 3 deletions SConstruct
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@ import sys, os, glob, copy, shutil, subprocess, imp, re
from os.path import join

Help("""
Type: 'scons' to build all binaries,
Type: 'scons configure' to run configuration checks,
'scons' to build all binaries,
'scons install' to install all libraries, binaries, scripts and python packages,
'scons test' to run all tests,
'scons unit' to run all unit tests,
Expand Down Expand Up @@ -43,10 +44,9 @@ Help("""
Sets to compile with debugging symbols. In this case, the optimization level gets
set to -O0, and NDEBUG does not get set.
$LD_LIBRARY_PATH:
$LD_LIBRARY_PATH (Linux) or $DYLD_FALLBACK_LIBRARY_PATH (Mac):
Search path for dynamic libraries, may need $CASM_BOOST_PREFIX/lib
and $CASM_PREFIX/lib added to it.
On Mac OS X, this variable is $DYLD_FALLBACK_LIBRARY_PATH.
This should be added to your ~/.bash_profile (Linux) or ~/.profile (Mac).
$CASM_BOOST_NO_CXX11_SCOPED_ENUMS:
Expand Down
38 changes: 19 additions & 19 deletions casmenv.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@
#
#export CASM_BOOST_PREFIX=""

#

# Recognized by install scripts. Use this if linking to boost libraries compiled without c++11. If defined, (i.e. CASM_BOOST_NO_CXX11_SCOPED_ENUMS=1) will compile with -DBOOST_NO_CXX11_SCOPED_ENUMS option.
# Order of precedence:
# 1) if $CASM_BOOST_NO_CXX11_SCOPED_ENUMS defined
Expand Down Expand Up @@ -105,6 +107,17 @@ if [ ! -z ${CASM_PREFIX} ]; then

fi

# If CASM_BOOST_PREFIX is set, update library search path
if [ ! -z ${CASM_BOOST_PREFIX} ]; then

# For Linux, set LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$CASM_BOOST_PREFIX/lib:$LD_LIBRARY_PATH

# For Mac, set DYLD_LIBRARY_FALLBACK_PATH
export DYLD_FALLBACK_LIBRARY_PATH=$CASM_BOOST_PREFIX/lib:$DYLD_FALLBACK_LIBRARY_PATH

fi

# If testing:
if [ ! -z ${CASM_REPO} ]; then

Expand All @@ -114,25 +127,12 @@ if [ ! -z ${CASM_REPO} ]; then
export PATH=$CASM_REPO/bin:$CASM_REPO/python/casm/scripts:$PATH
export PYTHONPATH=$CASM_REPO/python/casm:$PYTHONPATH

if [ ! -z ${DYLD_FALLBACK_LIBRARY_PATH} ]; then
# For testing on Mac, use DYLD_FALLBACK_LIBRARY_PATH:
export DYLD_FALLBACK_LIBRARY_PATH=$CASM_REPO/lib:$DYLD_FALLBACK_LIBRARY_PATH

# If CASM_BOOST_PREFIX is set, update library search path
if [ ! -z ${CASM_BOOST_PREFIX} ]; then
# For testing on Mac, set DYLD_LIBRARY_FALLBACK_PATH
export DYLD_FALLBACK_LIBRARY_PATH=$CASM_BOOST_PREFIX/lib:$DYLD_FALLBACK_LIBRARY_PATH
fi
else
# For testing on Linux, use LD_LIBRARY_PATH:
export LD_LIBRARY_PATH=$CASM_REPO/lib:$LD_LIBRARY_PATH

# If CASM_BOOST_PREFIX is set, update library search path
if [ ! -z ${CASM_BOOST_PREFIX} ]; then
# For testing on Mac, set DYLD_LIBRARY_FALLBACK_PATH
export LD_LIBRARY_PATH=$CASM_BOOST_PREFIX/lib:$LD_LIBRARY_PATH
fi
fi
# For testing on Linux, use LD_LIBRARY_PATH:
export LD_LIBRARY_PATH=$CASM_REPO/lib:$LD_LIBRARY_PATH

# For testing on Mac, use DYLD_FALLBACK_LIBRARY_PATH:
export DYLD_FALLBACK_LIBRARY_PATH=$CASM_REPO/lib:$DYLD_FALLBACK_LIBRARY_PATH

fi


3 changes: 2 additions & 1 deletion python/casm/casm/learn/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ def create_halloffame(maxsize, rel_tol=1e-6):
from fit import example_input_Lasso, example_input_LassoCV, example_input_RFE, \
example_input_GeneticAlgorithm, example_input_IndividualBestFirst, \
example_input_PopulationBestFirst, example_input_DirectSelection, \
set_input_defaults, \
open_input, set_input_defaults, \
FittingData, TrainingData, \
print_input_help, print_individual, print_population, print_halloffame, print_eci, \
to_json, open_halloffame, save_halloffame, \
Expand All @@ -90,6 +90,7 @@ def create_halloffame(maxsize, rel_tol=1e-6):
'example_input_IndividualBestFirst',
'example_input_PopulationBestFirst',
'example_input_DirectSelection',
'open_input',
'set_input_defaults',
'FittingData',
'TrainingData',
Expand Down
25 changes: 25 additions & 0 deletions python/casm/casm/learn/fit.py
Original file line number Diff line number Diff line change
Expand Up @@ -1101,6 +1101,31 @@ def set_input_defaults(input, input_filename=None):
return input


def open_input(input_filename):
"""
Read casm-learn input file into a dict
Arguments
---------
input_filename: str
The path to the input file
Returns
-------
input: dict
The result of reading the input file and running it through
casm.learn.set_input_defaults
"""
# open input and always set input defaults before doing anything else
with open(input_filename, 'r') as f:
try:
input = set_input_defaults(json.load(f), input_filename)
except Exception as e:
print "Error parsing JSON in", args.settings[0]
raise e
return input

class FittingData(object):
"""
FittingData holds feature values, target values, sample weights, etc. used
Expand Down
188 changes: 168 additions & 20 deletions python/casm/scripts/casm-learn
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import deap.tools
if __name__ == "__main__":

parser = argparse.ArgumentParser(description = 'Fit cluster expansion coefficients (ECI)')
parser.add_argument('--desc', help='Print extended usage description', action="store_true")
parser.add_argument('-s', '--settings', nargs=1, help='Settings input filename', type=str)
parser.add_argument('--format', help='Hall of fame print format. Options are "details", "json", or "csv".', type=str, default=None)
#parser.add_argument('--path', help='Path to CASM project. Default assumes the current directory is in the CASM project.', type=str, default=os.getcwd())
Expand Down Expand Up @@ -61,13 +62,7 @@ if __name__ == "__main__":
if args.verbose:
print "Loading", args.settings[0]

# open input and always set input defaults before doing anything else
with open(args.settings[0], 'r') as f:
try:
input = casm.learn.set_input_defaults(json.load(f), args.settings[0])
except Exception as e:
print "Error parsing JSON in", args.settings[0]
raise e
input = casm.learn.open_input(args.settings[0])

if args.hall:

Expand Down Expand Up @@ -132,28 +127,181 @@ if __name__ == "__main__":
# pickle hall of fame
casm.learn.save_halloffame(hall, halloffame_filename, args.verbose)

else:
elif args.desc:

print \
"""
Learning is performed in four steps:
1) Select training data.
Create a selection of configurations to include in the regression problem.
1) Specify the problem:
'casm-learn' helps solve the problem:
X*b = y,
where:
X: 2d matrix of shape (n_samples, n_features)
The correlation matrix, holding the evaluated basis functions. The
entry X[config, bfunc] holds the average value of the 'bfunc' cluster
basis function for configuration 'config'. The number of configurations
is 'n_samples' and the number of cluster basis functions is 'n_features'.
y: 1d matrix of shape (n_samples, 1)
The calculated properties being fit to. The most common case is that
y[config] holds the formation energy calculated for configuration
'config'.
b: 1d matrix of shape (n_features, 1)
The effective cluster interactions (ECI) being solved for.
To specify this problem, the 'casm-learn' input file specifies which
configurations to fit to (the training data), how to weight the
configurations, and how to compare solutions via cross-validation.
Training data may be input via a 'casm select' output file. The default
name expected is 'train'. So to use all calculated configurations, you
could create a directory in your CASM project where you will perform
fitting and generate a 'train' file:
cd /my/casm/project
mkdir fit_1 && cd fit_1
casm select --set is_calculated -o train
Example 'casm-learn' JSON input files can be output by the
'casm-learn --exMethodName' options:
casm-learn --exGeneticAlgorithm > fit_1_ga.json
casm-learn --exRFE > fit_1_rfe.json
...etc..
By default, these settings files are prepared for fitting formation_energy,
using the 'train' configuration selection. Edit the file as needed, and
see 'casm-learn --settings-format' for help.
When weighting configurations, the problem is transformed:
X*b = y -> L*X*b = L*y,
where, W = L*L.tranpose():
W: 2d matrix of shape (n_samples, n_samples)
The weight matrix is specified in the casm-learn input file. If the
weighting method provides 1-dimensional input (this is typical, i.e.
a weight for each configuration), in an array called 'w', then:
W = diag(w)*n_samples/sum(w),
diag(w) being the diagonal matrix with 'w' along the diagonal.
A cross-validation score is used for comparing generated ECI. The cv score
reported is:
cv = sqrt(mean(scores)) + N_nonzero_eci*penalty,
where:
scores: 1d array of shape (number of train/test sets)
The mean squared error calculated for each training/testing set
N_nonzero_eci: int
The number of basis functions with non-zero ECI
penalty: number, optional, default=0.0
Is the user-input penalty per basis function that can be used to
favor solutions with a small number of non-zero ECI
See 'casm-learn --settings-format' for help specifying the cross-validation
training and test sets using options from scikit-learn. It is usually
important to use the 'shuffle'=true option so that configurations are
randomly added to train/test sets and not ordered by supercell size.
When you run 'casm-learn' with a new problem specification the first time,
it generates a "problem specs" file that stores the training data, weights,
and cross-validation train/test sets. Then, when running subsequent times,
the data can be loaded more quickly, and the cross-validation can be
performed using the same train/test sets. 'casm-learn' will attempt to
prevent you from re-running with a different problem specification so that
solutions can be compared via their cv score in an "apples-to-apples"
manner. The default name for the "specs" file is determined from the input
filename. For example, 'my_input_specs.pkl' is used if the input file is
named 'my_input.json'. See 'casm-learn --settings-format' for more help.
The '--checkspecs' option can be used to write output files with the
generated problem specs data. Amont other things, this can be used to
adjust weights manually or save and re-use train/test sets. See
'casm-learn --settings-format' for more help.
2) Select estimator and feature selection methods
The "estimator" option specifies a linear model estimator that determines
how to solve the linear problem L*X*b = L*b, for b.
The "feature_selection" option specifies a feature selection method that
determines which features (ECI) should be considered for the solution. The
remaining are effectively set to 0.0 when calculating the cluster
expansion. Generally there is a tradeoff: By limiting the number of
features included in the cluster expansion Monte Carlo calculations can be
more efficient, but at a possible loss of accuracy. Be careful to avoid
overfitting however. If your cross validation scheme does not provide
enough testing data, you may fit your training data very well, but not
have an accurate extrapolation to other configurations.
See 'casm-learn --settings-format' for help specifying the estimator and
feature selection methods. Assuming you are using the GeneticAlgorithm and
have named your input file 'fit_1_ga.json', run:
casm-learn -s fit_1_ga.json
'casm-learn' will run and eventually store its results. For a single
problem specification (step 1, the settings in "problem_specs"
in the 'casm-learn' input file), you may try many different estimation
and feature selection methods and use the cv score to compare results. All
the results for a single problem specification can be stored in a 'Hall Of
Fame' that collects the N individual solutions with the best cv scores. To
view these results use:
casm-learn -s fit_1_ga.json --hall
For more details, or to output the results for further analysis in JSON or
CSV format, there is a '--format' option. To view only particular
individuals in the hall of fame, there is a '--indiv' option.
2) Select scoring metric.
Add sample weights to configurations if desired and select a cross validation
method.
3) Analyze results
The above steps (1) and (2) may be repeated many times as you attempt to
optimize your ECI. Solutions for different problems (i.e. different
weighting schemes, re-calculating with more training data) may be compared
based on scientific knowledge, for instance, which predicts the 0K ground
state configurations correctly, or from analysis of Monte Carlo results.
The '--checkhull' option provides a simple way to check the 0K ground
states and can create 'casm select' style output files with enumerated but
uncalculated configurations that are predicted to be low energy. These can
then be used to generate more training data and re-fit the ECI.
When you have generated ECI that you wish to use in Monte Carlo
calculations, use the '--select' option to write an 'eci.json' file into
your CASM project for the currently selected cluster expansion (as listed
by 'casm settings -l).
3) Select estimator.
Choose how to solve for ECI from calculated property and correlations. For
instance: LinearRegression, Lasso, or Ridge regression.
4) Select features.
Select which basis functions to include in the cluster expansion. For instance,
SelectFromModel along with a l-1 norm minimizing estimator. Or GeneticAlgorithm.
4) Use results
Once an 'eci.json' file has been written, you can run Monte Carlo
calculations. See 'casm monte -h' and 'casm format --monte' for help.
"""

else:

parser.print_help()


Expand Down
Loading

0 comments on commit d0c341e

Please sign in to comment.