jacksund · jacksund · Sep 2, 2022 · Aug 30, 2022 · Aug 31, 2022 · Aug 31, 2022
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -26,14 +26,20 @@ There is one key exception to the rules above -- and that is with `MAJOR`=0 rele
 - REST API fields can now be specified directly with the `api_filters` attribute of any `DatabaseTable` class & fields from mix-ins are automatically added
 - add `archive_fields` attribute that sets the "raw data" for the database table & fields from mix-ins are automatically added
 - accept `TOML` input files in addition to `YAML`
+- convergence plots and extras are now written for many workflow types (such as relaxations)
+- when `use_database=True`, output files are automatically written and the workup method is directly paired with the database table.
+- NEB workflow now accepts parameters to tune how distinct pathways are determined, including the max pathway length and cutoffs at 1D percolation.
 
 **Refactors**
 - the `website.core_components.filters` module has been absorbed into the `DatabaseTable` class/module
 - yaml input for custom workflows now matches the python input format
+- workup methods are largely depreciated and now database entries are returned when a workflow has `use_database=True`
+- several NEB input parameters have been renamed to accurate depict their meaning.
 
 **Fixes**
 - fix bug in windows dev env where `simmate run-server` fails to find python path
 - fix bug in `workflows explore` command where 'vasp' is the assumed calculator name
+- fix broken example code in custom workflow docs
 
 
 # v0.10.0 (2022.08.29)

diff --git a/docs/contributing/first_time_setup.md b/docs/contributing/first_time_setup.md
@@ -25,16 +25,21 @@ conda activate simmate_dev
 pip install -e .
 ```
 
-6. Make sure everything works properly by running our tests
+6. When resetting your database, make sure you do **NOT** use the prebuilt database. Pre-builts are only made for new releases and the dev database may differ from the most recent release.
+``` bash
+simmate database reset --confirm-delete --use-prebuilt false
+```
+
+7. Make sure everything works properly by running our tests
 ``` shell
 # you can optionally run tests in parallel 
 # with a command such as "pytest -n 4"
 pytest
 ```
 
-7. In GitKraken, make sure you have the `main` branch of your repo (`yourname/simmate`) checked out.
+8. In GitKraken, make sure you have the `main` branch of your repo (`yourname/simmate`) checked out.
 
-8. In Spyder, go `Projects` > `New Project...`. Check `existing directory`, select your `~/Documents/github/simmate` directory, and then `create` your Project!
+9. In Spyder, go `Projects` > `New Project...`. Check `existing directory`, select your `~/Documents/github/simmate` directory, and then `create` your Project!
 
-9. You can now explore the source code and add/edit files! Move to the next section on how to format, test, and submit these changes to our team.
+10. You can now explore the source code and add/edit files! Move to the next section on how to format, test, and submit these changes to our team.
 
diff --git a/docs/full_guides/database/notes.md b/docs/full_guides/database/notes.md
@@ -0,0 +1,22 @@
+
+<< _register_calc >>
+from_run_context
+from_toolkit
+
+<< _update_database_with_results >> from_run_context --> grabs from _register_calc
+update_database_from_results
+update_from_results
+    update_from_toolkit
+        from_toolkit(as_dict=True)
+    update_from_directory
+        from_directory(as_dict=True)
+            from_vasp_directory(as_dict=True) ---> unexpected as_dict
+                from_vasp_run(as_dict=True)
+        update_from_toolkit()
+            from_toolkit(as_dict=True)
+
+<< load_completed_calc >>
+from_toolkit
+from_directory
+    from_vasp_directory
+        from_vasp_run
diff --git a/docs/full_guides/workflows/creating_new_workflows.md b/docs/full_guides/workflows/creating_new_workflows.md
@@ -48,7 +48,7 @@ class Example__Python__MyFavoriteSettings(Workflow):
     @staticmethod
     def run_config(**kwargs):
         print("This workflow doesn't do much")
-        return 42
+        return 12345
 ```
 
 !!! note
@@ -602,13 +602,11 @@ class Example__Python__MyFavoriteSettings(Workflow):
         # just running the workflow 10 times in row on different
         # perturbations or "rattling" of the original structure
         for n in range(10):
+            structure.perturb(0.05)  # modifies in-place
             another_workflow.run(
-                structure=structure.perturb(0.05)
+                structure=structure,
                 directory= directory / f"perturb_number_{n}",
-                # **kwargs, <-- you may want to pass kwargs too.
             )
-
-        return 42
 ```
 
 !!! warning

diff --git a/docs/parameters.md b/docs/parameters.md
@@ -1,6 +1,5 @@
 # Parameters
 
-
 ## Overview
 
 Knowing which parameters are available and how to use them is essential. We therefore outline **all** unique parameters for **all** workflows here.
@@ -98,6 +97,33 @@ The command that will be called during execution of a program. There is typicall
     command = "mpirun -n 8 vasp_std > vasp.out"
     ```
 
+<!-- NOTES ON SUBCOMMANDS (feature is currently disabled)
+
+command list expects three subcommands:
+  command_bulk, command_supercell, and command_neb
+
+I separate these out because each calculation is a very different scale.
+For example, you may want to run the bulk relaxation on 10 cores, the
+supercell on 50, and the NEB on 200. Even though more cores are available,
+running smaller calculation on more cores could slow down the calc.
+["command_bulk", "command_supercell", "command_neb"]
+
+
+If you are running this workflow via the command-line, you can run this
+with...
+
+``` bash
+simmate workflows run diffusion/all-paths -s example.cif -c "cmd1; cmd2; cmd3"
+```
+Note, the `-c` here is very important! Here we are passing three commands
+separated by semicolons. Each command is passed to a specific workflow call:
+    - cmd1 --> used for bulk crystal relaxation and static energy
+    - cmd2 --> used for endpoint supercell relaxations
+    - cmd3 --> used for NEB
+Thus, you can scale your resources for each step. Here's a full -c option:
+-c "vasp_std > vasp.out; mpirun -n 12 vasp_std > vasp.out; mpirun -n 70 vasp_std > vasp.out"
+-->
+
 --------------------------
 
 ## composition
@@ -322,19 +348,19 @@ For evolutionary searches, fixed compositions will be stopped when the best indi
 --------------------------
 
 ## max_atoms
-For workflows that involve generating a supercell or random structure, this will be the maximum number of sites to allow in the generate structure(s). For example, NEB workflows would set this value to something like 100 atoms to limit their supercell image sizes. Alternatively, a evolutionary search may set this to 10 atoms to limit the compositions & stoichiometries that are explored.
+For workflows that involve generating a supercell or random structure, this will be the maximum number of sites to allow in the generated structure(s). For example, an evolutionary search may set this to 10 atoms to limit the compositions & stoichiometries that are explored.
 
 === "yaml"
     ``` yaml
-    max_atoms: 100
+    max_atoms: 10
     ```
 === "toml"
     ``` toml
-    max_atoms = 100
+    max_atoms = 10
     ```
 === "python"
     ``` python
-    max_atoms = 100
+    max_atoms = 10
     ```
 
 --------------------------
@@ -357,6 +383,24 @@ For workflows that generate new structures (and potentially run calculations on
 
 --------------------------
 
+## max_supercell_atoms
+For workflows that involve generating a supercell, this will be the maximum number of sites to allow in the generated structure(s). For example, NEB workflows would set this value to something like 100 atoms to limit their supercell image sizes.
+
+=== "yaml"
+    ``` yaml
+    max_supercell_atoms: 100
+    ```
+=== "toml"
+    ``` toml
+    max_supercell_atoms = 100
+    ```
+=== "python"
+    ``` python
+    max_supercell_atoms = 100
+    ```
+
+--------------------------
+
 ## migrating_specie
 This is the atomic species/element that will be moving in the analysis (typically NEB or MD diffusion calculations). Note, oxidation states (e.g. "Ca2+") can be used, but this requires your input structure to be oxidation-state decorated as well.
 
@@ -386,11 +430,6 @@ The atomic path that should be analyzed. Inputs are anything compatible with the
 
 --------------------------
 
-## migration_hop_id
-(advanced users only) The entry id from the `MigrationHop` table to link the results to. This is set automatically by higher-level workflows and rarely (if ever) set by the user. If used, you'll likely need to set `diffusion_analysis_id` as well.
-
---------------------------
-
 ## migration_images
 The full set of images (including endpoint images) that should be analyzed. Inputs are anything compatible with the `MigrationImages` class of the `simmate.toolkit.diffusion` module, which is effectively a list of `structure` inputs. This includes:
 
@@ -433,24 +472,38 @@ The full set of images (including endpoint images) that should be analyzed. Inpu
 --------------------------
 
 ## min_atoms
-This is the opposite of `max_atoms` as this will be the minimum number of sites to allow in the generate structure(s). See `max_atoms` for details.
+This is the opposite of `max_atoms` as this will be the minimum number of sites allowed in the generate structure(s). See `max_atoms` for details.
+
+--------------------------
+
+## min_structures_exact
+
+(experimental) The minimum number of structures that must be calculated with exactly
+matching nsites as specified in the fixed-composition.
+
+--------------------------
+
+## min_supercell_atoms
+
+This is the opposite of `max_supercell_atoms` as this will be the minimum number of sites allowed in the generated supercell structure.
 
 --------------------------
 
-## min_length
-When generating a supercell, this is the minimum length for each lattice vector of the generate cell (in Angstroms).
+## min_supercell_vector_lengths
+
+When generating a supercell, this is the minimum length for each lattice vector of the generated cell (in Angstroms). For workflows such as NEB, larger is better but more computationally expensive.
 
 === "yaml"
     ``` yaml
-    min_length: 7.5
+    min_supercell_vector_lengths: 7.5
     ```
 === "toml"
     ``` toml
-    min_length = 7.5
+    min_supercell_vector_lengths = 7.5
     ```
 === "python"
     ``` python
-    min_length = 7.5
+    min_supercell_vector_lengths = 7.5
     ```
 
 --------------------------
@@ -531,6 +584,26 @@ The total number of steps to run the calculation on. For example, in molecular d
 
 --------------------------
 
+## percolation_mode
+The percolating type to detect. The default is ">1d", which search for percolating
+paths up to the `max_path_length`. Alternatively, this can be set to "1d" in order
+to stop unique pathway finding when 1D percolation is achieved.
+
+=== "yaml"
+    ``` yaml
+    percolation_mode: 1d
+    ```
+=== "toml"
+    ``` toml
+    percolation_mode = "1d"
+    ```
+=== "python"
+    ``` python
+    percolation_mode = "1d"
+    ```
+
+--------------------------
+
 ## run_id
 The id assigned to a specific workflow run / calculation. If not provided this will be randomly generated, and we highly recommended leaving this at the default value. Note, this is based on unique-ids (UUID), so every id should be 100% unique and in a string format.
 
@@ -986,6 +1059,25 @@ Unique to `customized.vasp.user-config`. This is a list of parameters to update
 
 --------------------------
 
+## vacancy_mode
+For NEB and diffusion workfows, this determines whether vacancy or interstitial
+diffusion is analyzed. Default of True corresponds to vacancy-based diffusion.
+
+=== "yaml"
+    ``` yaml
+    vacancy_mode: false
+    ```
+=== "toml"
+    ``` toml
+    vacancy_mode = false
+    ```
+=== "python"
+    ``` python
+    vacancy_mode = False
+    ```
+
+--------------------------
+
 ## validator_kwargs
 (advanced users only)
 Extra conditions to use when initializing the validator class. `MyValidator(**validator_kwargs)`. The input should be given as a dictionary. Note, for evolutionary searches, the composition kwarg is added automatically. This is closely tied with the `validator_name` parameter so be sure to read that section as well.

diff --git a/src/simmate/calculators/bader/outputs/acf.py b/src/simmate/calculators/bader/outputs/acf.py
@@ -3,12 +3,18 @@
 from pathlib import Path
 
 import pandas
+from pymatgen.io.vasp import Potcar
+from pymatgen.io.vasp.outputs import Chgcar
 
 
-def ACF(filename="ACF.dat"):
+def ACF(directory: Path = None, filename="ACF.dat"):
+
+    # grab working directory if one wasn't provided
+    if not directory:
+        directory = Path.cwd()
 
     # convert to path obj
-    filename = Path(filename)
+    filename = directory / filename
 
     # open the file, grab the lines, and then close it
     with filename.open() as file:
@@ -46,4 +52,64 @@ def ACF(filename="ACF.dat"):
         "nelectrons": float(lines[-1].split()[-1]),
     }
 
+    # The remaining code is to analyze the results and calculate extra
+    # information such as the final oxidation states. This requires extra
+    # files to be present, such as from a vasp calculation
+
+    potcar_filename = directory / "POTCAR"
+    chgcar_filename = directory / "CHGCAR"
+    chgcar_empty_filename = directory / "CHGCAR_empty"  # SPECIAL CASE
+
+    # check if the required vasp files are present before doing the workup
+    if potcar_filename.exists() and (
+        chgcar_filename.exists() or chgcar_empty_filename.exists()
+    ):
+
+        # load the electron counts used by VASP from the POTCAR files
+        # OPTIMIZE: this can be much faster if I have a reference file
+        potcars = Potcar.from_file(potcar_filename)
+        nelectron_data = {}
+        # the result is a list because there can be multiple element potcars
+        # in the file (e.g. for NaCl, POTCAR = POTCAR_Na + POTCAR_Cl)
+        for potcar in potcars:
+            nelectron_data.update({potcar.element: potcar.nelectrons})
+
+        # SPECIAL CASE: in scenarios where empty atoms are added to the structure,
+        # we should grab that modified structure instead of the one from the POSCAR.
+        # the empty file will always take preference
+        if chgcar_empty_filename.exists():
+            chgcar = Chgcar.from_file(chgcar_empty_filename)
+            structure = chgcar.structure
+            # We typically use hydrogen ("H") as the empty atom, so we will
+            # need to add this to our element list for oxidation analysis.
+            # We use 0 for electron count because this is an 'empty' atom, and
+            # not actually Hydrogen
+            nelectron_data.update({"H": 0})
+
+        # otherwise, grab the structure from the CHGCAR
+        # OPTIMIZE: consider grabbing from the POSCAR or CONTCAR for speed
+        else:
+            chgcar = Chgcar.from_file(chgcar_filename)
+            structure = chgcar.structure
+
+        # Calculate the oxidation state of each site where it is simply the
+        # change in number of electrons associated with it from vasp potcar vs
+        # the bader charge I also add the element strings for filtering functionality
+        elements = []
+        oxi_state_data = []
+        for site, site_charge in zip(structure, dataframe.charge.values):
+            element_str = site.specie.name
+            elements.append(element_str)
+            oxi_state = nelectron_data[element_str] - site_charge
+            oxi_state_data.append(oxi_state)
+
+        # add the new column to the dataframe
+        dataframe = dataframe.assign(
+            oxidation_state=oxi_state_data,
+            element=elements,
+        )
+        # !!! There are multiple ways to do this, but I don't know which is best
+        # dataframe["oxidation_state"] = pandas.Series(
+        #     oxi_state_data, index=dataframe.index)
+
     return dataframe, extra_data