Merge branch 'main' of github.com:jeremyleung521/lpath into 3.12

jeremyleung521 · Jun 27, 2024 · fd2585c · fd2585c
2 parents 5e26bc7 + 6a301fa
commit fd2585c
Show file tree

Hide file tree

Showing 16 changed files with 251 additions and 50 deletions.
diff --git a/.github/workflows/CI.yaml b/.github/workflows/CI.yaml
@@ -61,11 +61,13 @@ jobs:
           pytest -v --cov=lpath --cov-report=xml --color=yes lpath/tests/
 
       - name: CodeCov
-        uses: codecov/codecov-action@v3
+        uses: codecov/codecov-action@v4
         with:
-          # token: ${{ secrets.CODECOV_TOKEN }} # Not needed for Public Repos
           file: ./coverage.xml
           flags: unittests
           name: codecov-${{ matrix.os }}-py${{ matrix.python-version }}
           fail_ci_if_error: false
           verbose: false
+        env:
+          CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }} # Not needed for Public Repos
+
diff --git a/.github/workflows/build.yaml b/.github/workflows/build.yaml
@@ -101,7 +101,7 @@ jobs:
         with:
           # unpacks default artifact into dist/
           # if `name: artifact` is omitted, the action will create extra parent dir
-          name: artifact-*
+          pattern: artifact-*
           path: dist
           merge-multiple: true
 

diff --git a/devtools/conda-envs/environment-rtd.yml b/devtools/conda-envs/environment-rtd.yml
@@ -3,7 +3,7 @@ channels:
     - conda-forge
     - defaults
 dependencies:
-    - python
+    - python<3.12
     - westpa>=2022.03
     - scikit-learn
     - tqdm

diff --git a/devtools/conda-envs/test_env.yaml b/devtools/conda-envs/test_env.yaml
@@ -4,7 +4,7 @@ channels:
   - defaults
 dependencies:
     # Base depends
-  - python
+  - python<3.12
   - pip
   - scikit-learn
   - tqdm

diff --git a/docs/usage.rst b/docs/usage.rst
@@ -92,17 +92,31 @@ In this step, we will pattern match any successful transitions we've identified
 
 1. From the command line, run the following::
 
-    lpath match --input-pickle succ_traj/pathways.pickle --cluster-labels-output succ_traj/cluster_labels.npy
+    lpath match --input-pickle succ_traj/pathways.pickle --output-pickle succ_traj/match-output.pickle \
+    --cluster-labels-output succ_traj/cluster_labels.npy
 
 2. After the comparison process is completed, it should show you the dendrogram. Closing the figure should trigger prompts to guide you further.
 
 3. Input ``y`` if you think the threshold (horizontal line which dictates how many clusters there are) should be at a different value. Otherwise, input ``n`` and tell the program how many clusters you want at the end.
 
 Plot
 ____
+This step will help you plot some of the most common graphs, such as dendrograms and histograms, directly from the pickle object generated from match. Users may also elect to use the plotting scripts from the ``examples`` folder.
+There is a script to plot ``NetworkX`` plots there.
 
-[UNDER CONSTRUCTION]
+More specifically, the following graphs will be made in the ``plots`` folder::
 
+* Dendrogram showing separation between clusters
+* Weights/Cluster bar graph
+* Target iteration histograms (per cluster)
+* Event duration histograms (per cluster)
+
+
+From the command line, run the following and it should generate a separate file for each of the above graphs::
+
+    lpath plot --plot-input succ_traj/match-output.pickle
+
+More options for customizing the graphs can be found by running ``lpath plot --help``.
 
 Weighted Ensemble Simulations
 -----------------------------
@@ -144,7 +158,7 @@ This will do the pattern matching and output individual h5 files for each cluste
 
 1. From the command line, run the following::
 
-    lpath match -we --input-pickle succ_traj/output.pickle --cluster-labels-output succ_traj/cluster_labels.npy \
+    lpath match -we --input-pickle succ_traj/output.pickle --output-pickle succ_traj/match-output.pickle  --cluster-labels-output succ_traj/cluster_labels.npy \
         --export-h5 --file-pattern "west_succ_c{}.h5"
 
 2. After the comparison process is completed, it should show you the dendrogram. Closing the figure should trigger prompts to guide you further.
@@ -154,11 +168,90 @@ This will do the pattern matching and output individual h5 files for each cluste
 
 For cases where you want to run pattern matching comparison between segment IDs, you will have to use the largest common substring ``--substring`` option. By default, the longest common subsequence algorithm is used.::
 
-    lpath match -we --input-pickle succ_traj/output.pickle --cluster-labels-output succ_traj/cluster_labels.npy \
-        --export-h5 --file-pattern "west_succ_c{}.h5" --reassign-function "reassign_segid" --substring
+    lpath match -we --input-pickle succ_traj/output.pickle --output-pickle succ_traj/match-output.pickle --cluster-labels-output succ_traj/cluster_labels.npy \
+        --export-h5 --file-pattern "west_succ_c{}.h5" --reassign-method "reassign_segid" --substring
 
 
 Plot
 ____
+This step will help you plot some of the most common graphs, such as dendrograms and histograms, directly from the pickle object generated from match. Users may also elect to use the plotting scripts from the ``examples`` folder.
+There is a script to plot ``NetworkX`` plots there.
+
+More specifically, the following graphs will be made in the ``plots`` folder::
+
+* Dendrogram showing separation between clusters
+* Weights/Cluster bar graph
+* Target iteration histograms (per cluster)
+* Event duration histograms (per cluster)
+
+
+From the command line, run the following and it should generate a separate file for each of the above graphs::
+
+    lpath plot --plot-input succ_traj/match-output.pickle
+
+More options for customizing the graphs can be found by running ``lpath plot --help``.
+
+
+Example Reassign file
+---------------------
+
+The following is a reassign function if you decides to reclassify your states::
 
-[UNDER CONSTRUCTION]
+    def reassign_custom(data, pathways, dictionary, assign_file=None):
+        """
+        Reclassify/assign frames into different states. This is highly
+        specific to the system. If w_assign's definition is sufficient,
+        you can proceed with what's made in the previous step
+        using ``reassign_identity``.
+
+        In this example, the dictionary maps state idx to its corresponding ``state_string``.
+        We suggest using alphabets as states.
+
+        Parameters
+        ----------
+        data : list
+            An array with the data necessary to reassign, as extracted from ``output.pickle``.
+
+        pathways : numpy.ndarray
+            An empty array with shapes for iter_id/seg_id/state_id/pcoord_or_auxdata/frame#/weight.
+
+        dictionary : dict
+            An empty dictionary obj for mapping ``state_id`` with ``state string``. The last entry in
+            the dictionary should be the "unknown" state.
+
+        assign_file : str, default : None
+            A string pointing to the ``assign.h5`` file. Needed as a parameter for all functions,
+            but is ignored if it's an MD trajectory.
+
+        Returns
+        -------
+        dictionary : dict
+            A dictionary mapping each ``state_id`` (float/int) with a ``state string`` (character).
+            The last entry in the dictionary should be the "unknown" state.
+
+        """
+        # Other example for grouping multiple states into one.
+        for idx, pathway in enumerate(data):
+            # The following shows how you can "merge" multiple states into
+            # a single one.
+            pathway = numpy.asarray(pathway)
+            # Further downsizing... to if pcoord is less than 5
+            first_contact = numpy.where(pathway[:, 3] < 5)[0][0]
+            for jdx, frame in enumerate(pathway):
+                # First copy all columns over
+                pathways[idx, jdx] = frame
+                # ortho is assigned to state 0
+                if frame[2] in [1, 3, 4, 6, 7, 9]:
+                    frame[2] = 0
+                # para is assigned to state 1
+                elif frame[2] in [2, 5, 8]:
+                    frame[2] = 1
+                # Unknown state is assigned 2
+                if jdx < first_contact:
+                    frame[2] = 2
+                pathways[idx, jdx] = frame
+
+        # Generating a dictionary mapping each state
+        dictionary = {0: 'A', 1: 'B', 2: '!'}
+
+        return dictionary
diff --git a/environment.yml b/environment.yml
@@ -3,10 +3,10 @@ channels:
     - conda-forge
     - defaults
 dependencies:
-    - python<3.11
+    - python<3.12
     - westpa>=2022.03
     - scikit-learn
-    - matplotlib
+    - matplotlib>=3.6.0
     - tqdm
     - networkx
     - pip

diff --git a/examples/WE/reassign_custom.py b/examples/WE/reassign_custom.py
@@ -1,11 +1,43 @@
 import numpy
 
 def reassign_custom(data, pathways, dictionary, assign_file=None):
+    """
+    Reclassify/assign frames into different states. This is highly
+    specific to the system. If w_assign's definition is sufficient,
+    you can proceed with what's made in the previous step
+    using ``reassign_identity``.
 
-    for idx, val in enumerate(data):
+    In this example, the dictionary maps state idx to its corresponding ``state_string``.
+    We suggest using alphabets as states.
+
+    Parameters
+    ----------
+    data : list
+        An array with the data necessary to reassign, as extracted from ``output.pickle``.
+
+    pathways : numpy.ndarray
+        An empty array with shapes for iter_id/seg_id/state_id/pcoord_or_auxdata/frame#/weight.
+
+    dictionary : dict
+        An empty dictionary obj for mapping ``state_id`` with ``state string``. The last entry in
+        the dictionary should be the "unknown" state.
+
+    assign_file : str, default : None
+        A string pointing to the ``assign.h5`` file. Needed as a parameter for all functions,
+        but is ignored if it's an MD trajectory.
+
+    Returns
+    -------
+    dictionary : dict
+        A dictionary mapping each ``state_id`` (float/int) with a ``state string`` (character).
+        The last entry in the dictionary should be the "unknown" state.
+
+    """
+    # reassign states to be the cluster IDs
+    for idx, val in enumerate(data):  # Loop through each set of successful pathways
         val_arr = numpy.asarray(val)
-        for idx2, val2 in enumerate(val_arr):
-                val2[2] = int(val2[5])
+        for idx2, val2 in enumerate(val_arr):  # Loop through each frame of the pathway
+                val2[2] = int(val2[-3])  # Renumber state_id the with the aux dataset
                 pathways[idx, idx2] = val2
 
     # Generating a dictionary mapping each state

diff --git a/examples/cMD/reassign_custom.py b/examples/cMD/reassign_custom.py
@@ -1,12 +1,43 @@
 import numpy
 
 def reassign_custom(data, pathways, dictionary, assign_file=None):
+    """
+    Reclassify/assign frames into different states. This is highly
+    specific to the system. If w_assign's definition is sufficient,
+    you can proceed with what's made in the previous step
+    using ``reassign_identity``.
 
+    In this example, the dictionary maps state idx to its corresponding ``state_string``.
+    We suggest using alphabets as states.
+
+    Parameters
+    ----------
+    data : list
+        An array with the data necessary to reassign, as extracted from ``output.pickle``.
+
+    pathways : numpy.ndarray
+        An empty array with shapes for iter_id/seg_id/state_id/pcoord_or_auxdata/frame#/weight.
+
+    dictionary : dict
+        An empty dictionary obj for mapping ``state_id`` with ``state string``. The last entry in
+        the dictionary should be the "unknown" state.
+
+    assign_file : str, default : None
+        A string pointing to the ``assign.h5`` file. Needed as a parameter for all functions,
+        but is ignored if it's an MD trajectory.
+
+    Returns
+    -------
+    dictionary : dict
+        A dictionary mapping each ``state_id`` (float/int) with a ``state string`` (character).
+        The last entry in the dictionary should be the "unknown" state.
+
+    """
     # reassign states to be the cluster IDs
-    for idx, val in enumerate(data):
+    for idx, val in enumerate(data):  # Loop through each set of successful pathways
         val_arr = numpy.asarray(val)
-        for idx2, val2 in enumerate(val_arr):
-                val2[2] = int(val2[3])
+        for idx2, val2 in enumerate(val_arr):  # Loop through each frame of the pathway
+                val2[2] = int(val2[-3])  # Renumber state_id the with the aux dataset
                 pathways[idx, idx2] = val2
 
     # Generating a dictionary mapping each state

diff --git a/lpath/argparser.py b/lpath/argparser.py
@@ -2,9 +2,10 @@
 All argument parsing from commandline is dealt here.
 """
 import argparse
-from argparse import ArgumentTypeError
-from lpath._logger import Logger
+from argparse import ArgumentTypeError, Namespace
 from ast import literal_eval
+
+from lpath._logger import Logger
 from lpath.io import default_dendrogram_colors
 
 log = Logger().get_logger(__name__)
@@ -344,8 +345,9 @@ def add_extract_args(parser=None):
                           help='Use Ray work manager. On by default.')
     raygroup.add_argument('--no-ray', '-NR', dest='use_ray', action='store_false',
                           help='Do not use Ray. This overrides ``--use-ray``.')
-    raygroup.add_argument('--threads', '-t', type=check_non_neg, default=0, help='Number of threads to use '
-                          'with Ray. The default of ``0`` uses all available resources detected.')
+    raygroup.add_argument('--threads', '-t', type=check_non_neg, default=0,
+                          help='Number of threads to use with Ray. The default of ``0`` uses '
+                               'all available resources detected.')
 
     extract_we = parser.add_argument_group('WE-specific Extract Parameters')
 
@@ -406,18 +408,18 @@ def add_match_args(parser=None):
 
     match_io.add_argument('--input-pickle', '-ip', '--IP', '--pickle', dest='extract_output',
                           default='succ_traj/output.pickle', type=str, help='Path to pickle object from the `extract` '
-                          'step.')
+                                                                            'step.')
     match_io.add_argument('--output-pickle', '-op', '--OP', dest='output_pickle',
                           default='succ_traj/pathways.pickle', type=str, help='Path to reassigned object to be '
-                          'outputted from the `match` step.')
-    match_io.add_argument('--cl-output', '-co', '--cluster-label-output', dest='cl_output',
+                                                                              'outputted from the `match` step.')
+    match_io.add_argument('--cl-output', '-co', '--cluster-label-output', '--cluster-labels-output', dest='cl_output',
                           default='succ_traj/cluster_labels.npy', type=str,
                           help='Output file location for cluster labels.')
     match_io.add_argument('--match-exclude-min-length', '-me', '--match-exclude-length', '--match-exclude-short',
                           dest='exclude_short', type=check_non_neg, default=0,
                           help='Exclude trajectories shorter than provided value during '
                                'matching. Default is 0, which will include trajectories of all lengths.')
-    match_io.add_argument('--reassign', '-ra', '--reassign-method', dest='reassign_method',
+    match_io.add_argument('--reassign', '-ra', '--reassign-method', '--reassign-function', dest='reassign_method',
                           default='reassign_identity', type=str,
                           help='Reassign method to use. Could be one of the defaults or a module to load. Defaults are '
                                '``reassign_identity``, ``reassign_statelabel``, ``reassign_segid``, '
@@ -461,7 +463,7 @@ def add_match_args(parser=None):
                           help='Do not remake distance matrix.')
     match_io.add_argument('--remake-file', '--remade-file', '-dF', dest='dmatrix_save', type=str,
                           default='succ_traj/distmat.npy', help='Path to pre-calculated distance matrix. Make sure '
-                                  'the ``--no-remake`` flag is specified.')
+                                                                'the ``--no-remake`` flag is specified.')
     match_io.add_argument('--remake-parallel', '-dP', dest='dmatrix_parallel', type=int,
                           help='Number of jobs to run with the pairwise distance calculations. The default=None issues '
                                'one job. A value of -1 uses all available resources. This is directly passed to the '
@@ -538,6 +540,8 @@ def add_plot_args(parser=None):
     plot_io.add_argument('--n-clusters', '-nc', '--num-clusters', dest='num_clusters', type=check_positive,
                          help='For cases where you know in advance how many clusters you want for '
                               'the hierarchical clustering.')
+    plot_io.add_argument('--timeout', '-pto', '--plot-timeout', dest='plot_timeout', type=check_non_neg,
+                         default=None, help='Timeout (in seconds) for asking input.')
 
     # plot_io.add_argument('--plot-regen-cl', '-rcl', '--plot-regenerate-cluster-labels', dest='regen_cl',
     #                      action='store_true',
@@ -764,3 +768,19 @@ def check_argv():
 
     if 1 < len(sys.argv) < 3 and sys.argv[1] in all_options:
         log.warning(f'Running {sys.argv[1]} with all default values. Make sure you\'re sure of this!')
+
+
+class DefaultArgs:
+    """
+    Convenience class that could be used to call all the default arguments for each subparser.
+    """
+    def __init__(self):
+        self.parser = create_parser()
+        self.subparsers = []
+        self.parser, self.subparsers = create_subparsers(self.parser, self.subparsers)
+
+        self.discretize = self.subparsers[0].parse_args('')
+        self.extract = self.subparsers[1].parse_args('')
+        self.match = self.subparsers[2].parse_args('')
+        self.plot = self.subparsers[3].parse_args('')
+        self.all = self.subparsers[4].parse_args('')
diff --git a/lpath/extract.py b/lpath/extract.py
@@ -571,7 +571,8 @@ def trace_seg_to_last_state(
                             for frame_index in frame_loop:
                                 indv_trace.append([iteration_num, segment_num, corr_assign[frame_index],
                                                    *ad_arr[frame_index], frame_index, weight])
-                            break
+                            if trace_basis is False:
+                                break
                         else:
                             # Just a normal iteration where we reached target state. Output everything in stride.
                             frame_loop = frame_range(-1, term_frame_num, total_frames, stride_step)
@@ -713,7 +714,8 @@ def trace_seg_to_last_state(
                         for frame_index in frame_loop:
                             indv_trace.append([iteration_num, segment_num, corr_assign[frame_index],
                                                *ad_arr[frame_index], frame_index, weight])
-                        break
+                        if trace_basis is False:
+                            break
                     else:
                         # Just a normal iteration where we reached target state. Output everything in stride.
                         frame_loop = frame_range(-1, term_frame_num, total_frames, stride_step)