Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🔥 refactor outputs and database saves #271

Merged
merged 19 commits into from
Sep 2, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,14 +26,20 @@ There is one key exception to the rules above -- and that is with `MAJOR`=0 rele
- REST API fields can now be specified directly with the `api_filters` attribute of any `DatabaseTable` class & fields from mix-ins are automatically added
- add `archive_fields` attribute that sets the "raw data" for the database table & fields from mix-ins are automatically added
- accept `TOML` input files in addition to `YAML`
- convergence plots and extras are now written for many workflow types (such as relaxations)
- when `use_database=True`, output files are automatically written and the workup method is directly paired with the database table.
- NEB workflow now accepts parameters to tune how distinct pathways are determined, including the max pathway length and cutoffs at 1D percolation.

**Refactors**
- the `website.core_components.filters` module has been absorbed into the `DatabaseTable` class/module
- yaml input for custom workflows now matches the python input format
- workup methods are largely depreciated and now database entries are returned when a workflow has `use_database=True`
- several NEB input parameters have been renamed to accurate depict their meaning.

**Fixes**
- fix bug in windows dev env where `simmate run-server` fails to find python path
- fix bug in `workflows explore` command where 'vasp' is the assumed calculator name
- fix broken example code in custom workflow docs


# v0.10.0 (2022.08.29)
Expand Down
13 changes: 9 additions & 4 deletions docs/contributing/first_time_setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,16 +25,21 @@ conda activate simmate_dev
pip install -e .
```

6. Make sure everything works properly by running our tests
6. When resetting your database, make sure you do **NOT** use the prebuilt database. Pre-builts are only made for new releases and the dev database may differ from the most recent release.
``` bash
simmate database reset --confirm-delete --use-prebuilt false
```

7. Make sure everything works properly by running our tests
``` shell
# you can optionally run tests in parallel
# with a command such as "pytest -n 4"
pytest
```

7. In GitKraken, make sure you have the `main` branch of your repo (`yourname/simmate`) checked out.
8. In GitKraken, make sure you have the `main` branch of your repo (`yourname/simmate`) checked out.

8. In Spyder, go `Projects` > `New Project...`. Check `existing directory`, select your `~/Documents/github/simmate` directory, and then `create` your Project!
9. In Spyder, go `Projects` > `New Project...`. Check `existing directory`, select your `~/Documents/github/simmate` directory, and then `create` your Project!

9. You can now explore the source code and add/edit files! Move to the next section on how to format, test, and submit these changes to our team.
10. You can now explore the source code and add/edit files! Move to the next section on how to format, test, and submit these changes to our team.

22 changes: 22 additions & 0 deletions docs/full_guides/database/notes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@

<< _register_calc >>
from_run_context
from_toolkit

<< _update_database_with_results >> from_run_context --> grabs from _register_calc
update_database_from_results
update_from_results
update_from_toolkit
from_toolkit(as_dict=True)
update_from_directory
from_directory(as_dict=True)
from_vasp_directory(as_dict=True) ---> unexpected as_dict
from_vasp_run(as_dict=True)
update_from_toolkit()
from_toolkit(as_dict=True)

<< load_completed_calc >>
from_toolkit
from_directory
from_vasp_directory
from_vasp_run
8 changes: 3 additions & 5 deletions docs/full_guides/workflows/creating_new_workflows.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ class Example__Python__MyFavoriteSettings(Workflow):
@staticmethod
def run_config(**kwargs):
print("This workflow doesn't do much")
return 42
return 12345
```

!!! note
Expand Down Expand Up @@ -602,13 +602,11 @@ class Example__Python__MyFavoriteSettings(Workflow):
# just running the workflow 10 times in row on different
# perturbations or "rattling" of the original structure
for n in range(10):
structure.perturb(0.05) # modifies in-place
another_workflow.run(
structure=structure.perturb(0.05)
structure=structure,
directory= directory / f"perturb_number_{n}",
# **kwargs, <-- you may want to pass kwargs too.
)

return 42
```

!!! warning
Expand Down
124 changes: 108 additions & 16 deletions docs/parameters.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# Parameters


## Overview

Knowing which parameters are available and how to use them is essential. We therefore outline **all** unique parameters for **all** workflows here.
Expand Down Expand Up @@ -98,6 +97,33 @@ The command that will be called during execution of a program. There is typicall
command = "mpirun -n 8 vasp_std > vasp.out"
```

<!-- NOTES ON SUBCOMMANDS (feature is currently disabled)

command list expects three subcommands:
command_bulk, command_supercell, and command_neb

I separate these out because each calculation is a very different scale.
For example, you may want to run the bulk relaxation on 10 cores, the
supercell on 50, and the NEB on 200. Even though more cores are available,
running smaller calculation on more cores could slow down the calc.
["command_bulk", "command_supercell", "command_neb"]


If you are running this workflow via the command-line, you can run this
with...

``` bash
simmate workflows run diffusion/all-paths -s example.cif -c "cmd1; cmd2; cmd3"
```
Note, the `-c` here is very important! Here we are passing three commands
separated by semicolons. Each command is passed to a specific workflow call:
- cmd1 --> used for bulk crystal relaxation and static energy
- cmd2 --> used for endpoint supercell relaxations
- cmd3 --> used for NEB
Thus, you can scale your resources for each step. Here's a full -c option:
-c "vasp_std > vasp.out; mpirun -n 12 vasp_std > vasp.out; mpirun -n 70 vasp_std > vasp.out"
-->

--------------------------

## composition
Expand Down Expand Up @@ -322,19 +348,19 @@ For evolutionary searches, fixed compositions will be stopped when the best indi
--------------------------

## max_atoms
For workflows that involve generating a supercell or random structure, this will be the maximum number of sites to allow in the generate structure(s). For example, NEB workflows would set this value to something like 100 atoms to limit their supercell image sizes. Alternatively, a evolutionary search may set this to 10 atoms to limit the compositions & stoichiometries that are explored.
For workflows that involve generating a supercell or random structure, this will be the maximum number of sites to allow in the generated structure(s). For example, an evolutionary search may set this to 10 atoms to limit the compositions & stoichiometries that are explored.

=== "yaml"
``` yaml
max_atoms: 100
max_atoms: 10
```
=== "toml"
``` toml
max_atoms = 100
max_atoms = 10
```
=== "python"
``` python
max_atoms = 100
max_atoms = 10
```

--------------------------
Expand All @@ -357,6 +383,24 @@ For workflows that generate new structures (and potentially run calculations on

--------------------------

## max_supercell_atoms
For workflows that involve generating a supercell, this will be the maximum number of sites to allow in the generated structure(s). For example, NEB workflows would set this value to something like 100 atoms to limit their supercell image sizes.

=== "yaml"
``` yaml
max_supercell_atoms: 100
```
=== "toml"
``` toml
max_supercell_atoms = 100
```
=== "python"
``` python
max_supercell_atoms = 100
```

--------------------------

## migrating_specie
This is the atomic species/element that will be moving in the analysis (typically NEB or MD diffusion calculations). Note, oxidation states (e.g. "Ca2+") can be used, but this requires your input structure to be oxidation-state decorated as well.

Expand Down Expand Up @@ -386,11 +430,6 @@ The atomic path that should be analyzed. Inputs are anything compatible with the

--------------------------

## migration_hop_id
(advanced users only) The entry id from the `MigrationHop` table to link the results to. This is set automatically by higher-level workflows and rarely (if ever) set by the user. If used, you'll likely need to set `diffusion_analysis_id` as well.

--------------------------

## migration_images
The full set of images (including endpoint images) that should be analyzed. Inputs are anything compatible with the `MigrationImages` class of the `simmate.toolkit.diffusion` module, which is effectively a list of `structure` inputs. This includes:

Expand Down Expand Up @@ -433,24 +472,38 @@ The full set of images (including endpoint images) that should be analyzed. Inpu
--------------------------

## min_atoms
This is the opposite of `max_atoms` as this will be the minimum number of sites to allow in the generate structure(s). See `max_atoms` for details.
This is the opposite of `max_atoms` as this will be the minimum number of sites allowed in the generate structure(s). See `max_atoms` for details.

--------------------------

## min_structures_exact

(experimental) The minimum number of structures that must be calculated with exactly
matching nsites as specified in the fixed-composition.

--------------------------

## min_supercell_atoms

This is the opposite of `max_supercell_atoms` as this will be the minimum number of sites allowed in the generated supercell structure.

--------------------------

## min_length
When generating a supercell, this is the minimum length for each lattice vector of the generate cell (in Angstroms).
## min_supercell_vector_lengths

When generating a supercell, this is the minimum length for each lattice vector of the generated cell (in Angstroms). For workflows such as NEB, larger is better but more computationally expensive.

=== "yaml"
``` yaml
min_length: 7.5
min_supercell_vector_lengths: 7.5
```
=== "toml"
``` toml
min_length = 7.5
min_supercell_vector_lengths = 7.5
```
=== "python"
``` python
min_length = 7.5
min_supercell_vector_lengths = 7.5
```

--------------------------
Expand Down Expand Up @@ -531,6 +584,26 @@ The total number of steps to run the calculation on. For example, in molecular d

--------------------------

## percolation_mode
The percolating type to detect. The default is ">1d", which search for percolating
paths up to the `max_path_length`. Alternatively, this can be set to "1d" in order
to stop unique pathway finding when 1D percolation is achieved.

=== "yaml"
``` yaml
percolation_mode: 1d
```
=== "toml"
``` toml
percolation_mode = "1d"
```
=== "python"
``` python
percolation_mode = "1d"
```

--------------------------

## run_id
The id assigned to a specific workflow run / calculation. If not provided this will be randomly generated, and we highly recommended leaving this at the default value. Note, this is based on unique-ids (UUID), so every id should be 100% unique and in a string format.

Expand Down Expand Up @@ -986,6 +1059,25 @@ Unique to `customized.vasp.user-config`. This is a list of parameters to update

--------------------------

## vacancy_mode
For NEB and diffusion workfows, this determines whether vacancy or interstitial
diffusion is analyzed. Default of True corresponds to vacancy-based diffusion.

=== "yaml"
``` yaml
vacancy_mode: false
```
=== "toml"
``` toml
vacancy_mode = false
```
=== "python"
``` python
vacancy_mode = False
```

--------------------------

## validator_kwargs
(advanced users only)
Extra conditions to use when initializing the validator class. `MyValidator(**validator_kwargs)`. The input should be given as a dictionary. Note, for evolutionary searches, the composition kwarg is added automatically. This is closely tied with the `validator_name` parameter so be sure to read that section as well.
Expand Down
70 changes: 68 additions & 2 deletions src/simmate/calculators/bader/outputs/acf.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,18 @@
from pathlib import Path

import pandas
from pymatgen.io.vasp import Potcar
from pymatgen.io.vasp.outputs import Chgcar


def ACF(filename="ACF.dat"):
def ACF(directory: Path = None, filename="ACF.dat"):

# grab working directory if one wasn't provided
if not directory:
directory = Path.cwd()

# convert to path obj
filename = Path(filename)
filename = directory / filename

# open the file, grab the lines, and then close it
with filename.open() as file:
Expand Down Expand Up @@ -46,4 +52,64 @@ def ACF(filename="ACF.dat"):
"nelectrons": float(lines[-1].split()[-1]),
}

# The remaining code is to analyze the results and calculate extra
# information such as the final oxidation states. This requires extra
# files to be present, such as from a vasp calculation

potcar_filename = directory / "POTCAR"
chgcar_filename = directory / "CHGCAR"
chgcar_empty_filename = directory / "CHGCAR_empty" # SPECIAL CASE

# check if the required vasp files are present before doing the workup
if potcar_filename.exists() and (
chgcar_filename.exists() or chgcar_empty_filename.exists()
):

# load the electron counts used by VASP from the POTCAR files
# OPTIMIZE: this can be much faster if I have a reference file
potcars = Potcar.from_file(potcar_filename)
nelectron_data = {}
# the result is a list because there can be multiple element potcars
# in the file (e.g. for NaCl, POTCAR = POTCAR_Na + POTCAR_Cl)
for potcar in potcars:
nelectron_data.update({potcar.element: potcar.nelectrons})

# SPECIAL CASE: in scenarios where empty atoms are added to the structure,
# we should grab that modified structure instead of the one from the POSCAR.
# the empty file will always take preference
if chgcar_empty_filename.exists():
chgcar = Chgcar.from_file(chgcar_empty_filename)
structure = chgcar.structure
# We typically use hydrogen ("H") as the empty atom, so we will
# need to add this to our element list for oxidation analysis.
# We use 0 for electron count because this is an 'empty' atom, and
# not actually Hydrogen
nelectron_data.update({"H": 0})

# otherwise, grab the structure from the CHGCAR
# OPTIMIZE: consider grabbing from the POSCAR or CONTCAR for speed
else:
chgcar = Chgcar.from_file(chgcar_filename)
structure = chgcar.structure

# Calculate the oxidation state of each site where it is simply the
# change in number of electrons associated with it from vasp potcar vs
# the bader charge I also add the element strings for filtering functionality
elements = []
oxi_state_data = []
for site, site_charge in zip(structure, dataframe.charge.values):
element_str = site.specie.name
elements.append(element_str)
oxi_state = nelectron_data[element_str] - site_charge
oxi_state_data.append(oxi_state)

# add the new column to the dataframe
dataframe = dataframe.assign(
oxidation_state=oxi_state_data,
element=elements,
)
# !!! There are multiple ways to do this, but I don't know which is best
# dataframe["oxidation_state"] = pandas.Series(
# oxi_state_data, index=dataframe.index)

return dataframe, extra_data
Loading