Skip to content

Commit

Permalink
Merge branch 'main' of github.com:jeremyleung521/lpath into 3.12
Browse files Browse the repository at this point in the history
  • Loading branch information
jeremyleung521 committed Jun 27, 2024
2 parents 5e26bc7 + 6a301fa commit fd2585c
Show file tree
Hide file tree
Showing 16 changed files with 251 additions and 50 deletions.
6 changes: 4 additions & 2 deletions .github/workflows/CI.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -61,11 +61,13 @@ jobs:
pytest -v --cov=lpath --cov-report=xml --color=yes lpath/tests/
- name: CodeCov
uses: codecov/codecov-action@v3
uses: codecov/codecov-action@v4
with:
# token: ${{ secrets.CODECOV_TOKEN }} # Not needed for Public Repos
file: ./coverage.xml
flags: unittests
name: codecov-${{ matrix.os }}-py${{ matrix.python-version }}
fail_ci_if_error: false
verbose: false
env:
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }} # Not needed for Public Repos

2 changes: 1 addition & 1 deletion .github/workflows/build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ jobs:
with:
# unpacks default artifact into dist/
# if `name: artifact` is omitted, the action will create extra parent dir
name: artifact-*
pattern: artifact-*
path: dist
merge-multiple: true

Expand Down
2 changes: 1 addition & 1 deletion devtools/conda-envs/environment-rtd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ channels:
- conda-forge
- defaults
dependencies:
- python
- python<3.12
- westpa>=2022.03
- scikit-learn
- tqdm
Expand Down
2 changes: 1 addition & 1 deletion devtools/conda-envs/test_env.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ channels:
- defaults
dependencies:
# Base depends
- python
- python<3.12
- pip
- scikit-learn
- tqdm
Expand Down
105 changes: 99 additions & 6 deletions docs/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -92,17 +92,31 @@ In this step, we will pattern match any successful transitions we've identified

1. From the command line, run the following::

lpath match --input-pickle succ_traj/pathways.pickle --cluster-labels-output succ_traj/cluster_labels.npy
lpath match --input-pickle succ_traj/pathways.pickle --output-pickle succ_traj/match-output.pickle \
--cluster-labels-output succ_traj/cluster_labels.npy

2. After the comparison process is completed, it should show you the dendrogram. Closing the figure should trigger prompts to guide you further.

3. Input ``y`` if you think the threshold (horizontal line which dictates how many clusters there are) should be at a different value. Otherwise, input ``n`` and tell the program how many clusters you want at the end.

Plot
____
This step will help you plot some of the most common graphs, such as dendrograms and histograms, directly from the pickle object generated from match. Users may also elect to use the plotting scripts from the ``examples`` folder.
There is a script to plot ``NetworkX`` plots there.

[UNDER CONSTRUCTION]
More specifically, the following graphs will be made in the ``plots`` folder::

* Dendrogram showing separation between clusters
* Weights/Cluster bar graph
* Target iteration histograms (per cluster)
* Event duration histograms (per cluster)


From the command line, run the following and it should generate a separate file for each of the above graphs::

lpath plot --plot-input succ_traj/match-output.pickle

More options for customizing the graphs can be found by running ``lpath plot --help``.

Weighted Ensemble Simulations
-----------------------------
Expand Down Expand Up @@ -144,7 +158,7 @@ This will do the pattern matching and output individual h5 files for each cluste

1. From the command line, run the following::

lpath match -we --input-pickle succ_traj/output.pickle --cluster-labels-output succ_traj/cluster_labels.npy \
lpath match -we --input-pickle succ_traj/output.pickle --output-pickle succ_traj/match-output.pickle --cluster-labels-output succ_traj/cluster_labels.npy \
--export-h5 --file-pattern "west_succ_c{}.h5"

2. After the comparison process is completed, it should show you the dendrogram. Closing the figure should trigger prompts to guide you further.
Expand All @@ -154,11 +168,90 @@ This will do the pattern matching and output individual h5 files for each cluste

For cases where you want to run pattern matching comparison between segment IDs, you will have to use the largest common substring ``--substring`` option. By default, the longest common subsequence algorithm is used.::

lpath match -we --input-pickle succ_traj/output.pickle --cluster-labels-output succ_traj/cluster_labels.npy \
--export-h5 --file-pattern "west_succ_c{}.h5" --reassign-function "reassign_segid" --substring
lpath match -we --input-pickle succ_traj/output.pickle --output-pickle succ_traj/match-output.pickle --cluster-labels-output succ_traj/cluster_labels.npy \
--export-h5 --file-pattern "west_succ_c{}.h5" --reassign-method "reassign_segid" --substring


Plot
____
This step will help you plot some of the most common graphs, such as dendrograms and histograms, directly from the pickle object generated from match. Users may also elect to use the plotting scripts from the ``examples`` folder.
There is a script to plot ``NetworkX`` plots there.

More specifically, the following graphs will be made in the ``plots`` folder::

* Dendrogram showing separation between clusters
* Weights/Cluster bar graph
* Target iteration histograms (per cluster)
* Event duration histograms (per cluster)


From the command line, run the following and it should generate a separate file for each of the above graphs::

lpath plot --plot-input succ_traj/match-output.pickle

More options for customizing the graphs can be found by running ``lpath plot --help``.


Example Reassign file
---------------------

The following is a reassign function if you decides to reclassify your states::

[UNDER CONSTRUCTION]
def reassign_custom(data, pathways, dictionary, assign_file=None):
"""
Reclassify/assign frames into different states. This is highly
specific to the system. If w_assign's definition is sufficient,
you can proceed with what's made in the previous step
using ``reassign_identity``.

In this example, the dictionary maps state idx to its corresponding ``state_string``.
We suggest using alphabets as states.

Parameters
----------
data : list
An array with the data necessary to reassign, as extracted from ``output.pickle``.

pathways : numpy.ndarray
An empty array with shapes for iter_id/seg_id/state_id/pcoord_or_auxdata/frame#/weight.

dictionary : dict
An empty dictionary obj for mapping ``state_id`` with ``state string``. The last entry in
the dictionary should be the "unknown" state.

assign_file : str, default : None
A string pointing to the ``assign.h5`` file. Needed as a parameter for all functions,
but is ignored if it's an MD trajectory.

Returns
-------
dictionary : dict
A dictionary mapping each ``state_id`` (float/int) with a ``state string`` (character).
The last entry in the dictionary should be the "unknown" state.

"""
# Other example for grouping multiple states into one.
for idx, pathway in enumerate(data):
# The following shows how you can "merge" multiple states into
# a single one.
pathway = numpy.asarray(pathway)
# Further downsizing... to if pcoord is less than 5
first_contact = numpy.where(pathway[:, 3] < 5)[0][0]
for jdx, frame in enumerate(pathway):
# First copy all columns over
pathways[idx, jdx] = frame
# ortho is assigned to state 0
if frame[2] in [1, 3, 4, 6, 7, 9]:
frame[2] = 0
# para is assigned to state 1
elif frame[2] in [2, 5, 8]:
frame[2] = 1
# Unknown state is assigned 2
if jdx < first_contact:
frame[2] = 2
pathways[idx, jdx] = frame

# Generating a dictionary mapping each state
dictionary = {0: 'A', 1: 'B', 2: '!'}

return dictionary
4 changes: 2 additions & 2 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@ channels:
- conda-forge
- defaults
dependencies:
- python<3.11
- python<3.12
- westpa>=2022.03
- scikit-learn
- matplotlib
- matplotlib>=3.6.0
- tqdm
- networkx
- pip
Expand Down
38 changes: 35 additions & 3 deletions examples/WE/reassign_custom.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,43 @@
import numpy

def reassign_custom(data, pathways, dictionary, assign_file=None):
"""
Reclassify/assign frames into different states. This is highly
specific to the system. If w_assign's definition is sufficient,
you can proceed with what's made in the previous step
using ``reassign_identity``.
for idx, val in enumerate(data):
In this example, the dictionary maps state idx to its corresponding ``state_string``.
We suggest using alphabets as states.
Parameters
----------
data : list
An array with the data necessary to reassign, as extracted from ``output.pickle``.
pathways : numpy.ndarray
An empty array with shapes for iter_id/seg_id/state_id/pcoord_or_auxdata/frame#/weight.
dictionary : dict
An empty dictionary obj for mapping ``state_id`` with ``state string``. The last entry in
the dictionary should be the "unknown" state.
assign_file : str, default : None
A string pointing to the ``assign.h5`` file. Needed as a parameter for all functions,
but is ignored if it's an MD trajectory.
Returns
-------
dictionary : dict
A dictionary mapping each ``state_id`` (float/int) with a ``state string`` (character).
The last entry in the dictionary should be the "unknown" state.
"""
# reassign states to be the cluster IDs
for idx, val in enumerate(data): # Loop through each set of successful pathways
val_arr = numpy.asarray(val)
for idx2, val2 in enumerate(val_arr):
val2[2] = int(val2[5])
for idx2, val2 in enumerate(val_arr): # Loop through each frame of the pathway
val2[2] = int(val2[-3]) # Renumber state_id the with the aux dataset
pathways[idx, idx2] = val2

# Generating a dictionary mapping each state
Expand Down
37 changes: 34 additions & 3 deletions examples/cMD/reassign_custom.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,43 @@
import numpy

def reassign_custom(data, pathways, dictionary, assign_file=None):
"""
Reclassify/assign frames into different states. This is highly
specific to the system. If w_assign's definition is sufficient,
you can proceed with what's made in the previous step
using ``reassign_identity``.
In this example, the dictionary maps state idx to its corresponding ``state_string``.
We suggest using alphabets as states.
Parameters
----------
data : list
An array with the data necessary to reassign, as extracted from ``output.pickle``.
pathways : numpy.ndarray
An empty array with shapes for iter_id/seg_id/state_id/pcoord_or_auxdata/frame#/weight.
dictionary : dict
An empty dictionary obj for mapping ``state_id`` with ``state string``. The last entry in
the dictionary should be the "unknown" state.
assign_file : str, default : None
A string pointing to the ``assign.h5`` file. Needed as a parameter for all functions,
but is ignored if it's an MD trajectory.
Returns
-------
dictionary : dict
A dictionary mapping each ``state_id`` (float/int) with a ``state string`` (character).
The last entry in the dictionary should be the "unknown" state.
"""
# reassign states to be the cluster IDs
for idx, val in enumerate(data):
for idx, val in enumerate(data): # Loop through each set of successful pathways
val_arr = numpy.asarray(val)
for idx2, val2 in enumerate(val_arr):
val2[2] = int(val2[3])
for idx2, val2 in enumerate(val_arr): # Loop through each frame of the pathway
val2[2] = int(val2[-3]) # Renumber state_id the with the aux dataset
pathways[idx, idx2] = val2

# Generating a dictionary mapping each state
Expand Down
38 changes: 29 additions & 9 deletions lpath/argparser.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,10 @@
All argument parsing from commandline is dealt here.
"""
import argparse
from argparse import ArgumentTypeError
from lpath._logger import Logger
from argparse import ArgumentTypeError, Namespace
from ast import literal_eval

from lpath._logger import Logger
from lpath.io import default_dendrogram_colors

log = Logger().get_logger(__name__)
Expand Down Expand Up @@ -344,8 +345,9 @@ def add_extract_args(parser=None):
help='Use Ray work manager. On by default.')
raygroup.add_argument('--no-ray', '-NR', dest='use_ray', action='store_false',
help='Do not use Ray. This overrides ``--use-ray``.')
raygroup.add_argument('--threads', '-t', type=check_non_neg, default=0, help='Number of threads to use '
'with Ray. The default of ``0`` uses all available resources detected.')
raygroup.add_argument('--threads', '-t', type=check_non_neg, default=0,
help='Number of threads to use with Ray. The default of ``0`` uses '
'all available resources detected.')

extract_we = parser.add_argument_group('WE-specific Extract Parameters')

Expand Down Expand Up @@ -406,18 +408,18 @@ def add_match_args(parser=None):

match_io.add_argument('--input-pickle', '-ip', '--IP', '--pickle', dest='extract_output',
default='succ_traj/output.pickle', type=str, help='Path to pickle object from the `extract` '
'step.')
'step.')
match_io.add_argument('--output-pickle', '-op', '--OP', dest='output_pickle',
default='succ_traj/pathways.pickle', type=str, help='Path to reassigned object to be '
'outputted from the `match` step.')
match_io.add_argument('--cl-output', '-co', '--cluster-label-output', dest='cl_output',
'outputted from the `match` step.')
match_io.add_argument('--cl-output', '-co', '--cluster-label-output', '--cluster-labels-output', dest='cl_output',
default='succ_traj/cluster_labels.npy', type=str,
help='Output file location for cluster labels.')
match_io.add_argument('--match-exclude-min-length', '-me', '--match-exclude-length', '--match-exclude-short',
dest='exclude_short', type=check_non_neg, default=0,
help='Exclude trajectories shorter than provided value during '
'matching. Default is 0, which will include trajectories of all lengths.')
match_io.add_argument('--reassign', '-ra', '--reassign-method', dest='reassign_method',
match_io.add_argument('--reassign', '-ra', '--reassign-method', '--reassign-function', dest='reassign_method',
default='reassign_identity', type=str,
help='Reassign method to use. Could be one of the defaults or a module to load. Defaults are '
'``reassign_identity``, ``reassign_statelabel``, ``reassign_segid``, '
Expand Down Expand Up @@ -461,7 +463,7 @@ def add_match_args(parser=None):
help='Do not remake distance matrix.')
match_io.add_argument('--remake-file', '--remade-file', '-dF', dest='dmatrix_save', type=str,
default='succ_traj/distmat.npy', help='Path to pre-calculated distance matrix. Make sure '
'the ``--no-remake`` flag is specified.')
'the ``--no-remake`` flag is specified.')
match_io.add_argument('--remake-parallel', '-dP', dest='dmatrix_parallel', type=int,
help='Number of jobs to run with the pairwise distance calculations. The default=None issues '
'one job. A value of -1 uses all available resources. This is directly passed to the '
Expand Down Expand Up @@ -538,6 +540,8 @@ def add_plot_args(parser=None):
plot_io.add_argument('--n-clusters', '-nc', '--num-clusters', dest='num_clusters', type=check_positive,
help='For cases where you know in advance how many clusters you want for '
'the hierarchical clustering.')
plot_io.add_argument('--timeout', '-pto', '--plot-timeout', dest='plot_timeout', type=check_non_neg,
default=None, help='Timeout (in seconds) for asking input.')

# plot_io.add_argument('--plot-regen-cl', '-rcl', '--plot-regenerate-cluster-labels', dest='regen_cl',
# action='store_true',
Expand Down Expand Up @@ -764,3 +768,19 @@ def check_argv():

if 1 < len(sys.argv) < 3 and sys.argv[1] in all_options:
log.warning(f'Running {sys.argv[1]} with all default values. Make sure you\'re sure of this!')


class DefaultArgs:
"""
Convenience class that could be used to call all the default arguments for each subparser.
"""
def __init__(self):
self.parser = create_parser()
self.subparsers = []
self.parser, self.subparsers = create_subparsers(self.parser, self.subparsers)

self.discretize = self.subparsers[0].parse_args('')
self.extract = self.subparsers[1].parse_args('')
self.match = self.subparsers[2].parse_args('')
self.plot = self.subparsers[3].parse_args('')
self.all = self.subparsers[4].parse_args('')
6 changes: 4 additions & 2 deletions lpath/extract.py
Original file line number Diff line number Diff line change
Expand Up @@ -571,7 +571,8 @@ def trace_seg_to_last_state(
for frame_index in frame_loop:
indv_trace.append([iteration_num, segment_num, corr_assign[frame_index],
*ad_arr[frame_index], frame_index, weight])
break
if trace_basis is False:
break
else:
# Just a normal iteration where we reached target state. Output everything in stride.
frame_loop = frame_range(-1, term_frame_num, total_frames, stride_step)
Expand Down Expand Up @@ -713,7 +714,8 @@ def trace_seg_to_last_state(
for frame_index in frame_loop:
indv_trace.append([iteration_num, segment_num, corr_assign[frame_index],
*ad_arr[frame_index], frame_index, weight])
break
if trace_basis is False:
break
else:
# Just a normal iteration where we reached target state. Output everything in stride.
frame_loop = frame_range(-1, term_frame_num, total_frames, stride_step)
Expand Down
Loading

0 comments on commit fd2585c

Please sign in to comment.