West tools kinetics rewrite #35

astatide · 2017-03-01T19:15:31Z

It's time for a code review! This branch includes all the previously discussed changes (see here: westpa/west_tools#17) to the kinetics code. Let's start with a brief summary of what's been changed, what's been added, and what's been marked for future deprecation, followed by a TODO (either pre or post merge, depending on what we would like).

SUMMARY:
Introduces changes to mclib code and both types of kinetics analysis (w_kinavg,w_kinetics,w_postanalysis_matrix/reweight), including changes and new versions of the estimator functions. Both types of analysis now produce the same datasets, and all calculate error through monte carlo block bootstrapping.

NEW TOOLS:
w_direct: w_kinetics, w_stateprobs, and w_kinavg have all been combined into this tool.
w_reweight: w_postanalysis_matrix and w_postanalysis_reweight are combined here.

Both tools use subcommands to perform the analysis, and include some 'helper' commands to easily run everything.

w_direct/reweight: all, average, stateprobs
w_direct: kinavg, kinetics
w_reweight: matrix, rate

The flags you would use for each of the old tools should still work with the new ones and their respective subcommands. By default, each tool writes each analysis dataset to one file: direct.h5, and reweight.h5. You can choose to write the output of each dataset to a different file, as well, to be compatible with how things were run before.

The old w_kinavg error calling routines have been abstracted away into an 'AverageCommands' class, which exists in westpa/kintool (place in with other tools?). Each tool subclasses this, overrides the go function, and has a unique function that performs most or all of the actual analysis (typically named for its old tool). This makes generating the futures for the work manager, printing responsible output, etc, all trivial and uniform across tools. Each tool still needs to build its own block to submit (ala how it was done before) to the mclib, depending on how complex the estimator is (this could probably be simplified).

The code that once was w_postanalysis_matrix and w_kinetics now exists largely in westpa/reweight and westpa/kinetics, and is largely unchanged.

All observable calculations are done through their cython estimators; there is no non-cython code for the reweighting procedure or the rate calculation procedure. The reweighting estimator, reweight_for_c, is a cython implementation of the exact algorithm that was once in w_postanalysis_matrix.

RATE AND ERROR CALCULATIONS:
The rate estimator function for w_direct is different from w_kinavg, as it now uses the ratio of sums vs. the sum of the ratios (see the presentation file from the WESTPA meeting from January 27th, 2017). This substantially increases the complexity of the estimator function used in the bootstrap, as numpy.mean can no longer be used (the estimator is now the rate constant equation itself). In addition, reweight_for_c, the reweighting estimator, is very complex and requires a large number of datasets.

To accommodate the necessities of calculating error through reweighting and the new direct trace code, the way that programs interface with the mclib functions is different. It now accepts two dictionaries: a dictionary of datasets that are able to be decimated for the blocking procedure (based on a correlation analysis), and a dictionary of kwargs for the estimator function. To account for correlation, the estimator is run without bootstrapping within mclib to calculate the observable over the time period indicated, and the correlation analysis/bootstrapping proceeds from there. In the event that a non dictionary is sent in as a dataset, the code should detect this and handle this appropriately. The block length determined from correlation is typically sent on to the estimator, as the data from correlated blocks may need to be handled in a very complex manner.

As an example, the dataset dictionary for reweight_for_c is only the iterations submitted, whereas the estimator kwargs are the stride length, all the flux events, the number of observations, the type of observable required (as reweighting can generate all of them), etc. For 3 bins, this is slightly slower than the direct trace.

FOR FUTURE DEPRECATION:
w_stateprobs, w_kinetics, and w_kinavg can be called as they once were, but send in a deprecation warning. They accept the same flags, and work mostly in the same way (with some minor differences).

TODO:
Add flag to disable bootstrapping if error is not required.
w_postanalysis_matrix and w_postanalysis_reweight need to be rewritten to just call the new code and generate a deprecation warning.

The routines have been tested numerous times on numerous datasets, and return the correct answers. Please test the code and try and break it. In addition, please comment on any areas where what's going on isn't clear or could be better documented, as this rewrite has resulted in a substantial amount of code changes.

averaging together multiple weighted ensemble simulations.

… tool built by Lewis Baker.

… (for the grouping function).

… from the west.cfg file.

…weighting code.

Relies on new WESTTool type, 'WESTMultiTool', as a base which handles the file loading and argument parsing necessary.

Bringing this up to date.

Forgot to pull in up to date master.

w_postanalysis_reweight.py

…ulti_reweight.py

…ing soon.

…or some preliminary tests.

… This is needed to work with w_multi_reweight.py.

…obably broken right now; I will update it again soon.

…well as ability to automatically convert from state indices to bin indices (for denoting bins subject to recycling).

…ns every timepoint or every iteration. Also commented out check for target states in w_postanalysis_matrix, as I think this tool produces valid results for simulations with target states; this check could perhaps be moved to w_postanalysis_reweight in the future.

astatide · 2017-03-10T19:17:02Z

Fluxanl is fixed, multitool stuff has been purged, all references to w_ipython are gone, /lib/west_tools/westpa/_rc.py has been reverted to the original version (changes committed to another branch, now). We're getting close, here.

The bootstrap is now optional on all dataset evolutions (flag --disable-bootstrap or -db), which significantly speeds up calculation of the mean if that's all you're interested in. This is explicitly supported in mclib, which might seem odd except that it's already set up to run estimators and return datasets using the block options (and the datasets are properly shaped, at that).

@mczwier, I know one of your original concerns was whether the mclib library retained the functionality to 'just work' on an arbitrary dataset. If you have any pre-existing code, could you check whether it works as expected, currently?

TO DO, STILL:

Continue cleaning, commenting, adding/updating docstrings.
Consider moving kintool into westtools, and potentially merging it in with the other tool types (it's not specifically a tool type, really, but...)
Consider modifying w_direct kinetics and w_reweight matrix such that they aren't subclassed on the kintool averaging class (AverageCommands), as that simply adds extra flags that they don't need. Probably not difficult.
Possible bug with unknown states in kinetics tracing? - fixed in @ajd98's revisions in the multitool branch, which I attempted to port over, but may not have been successful, as it seems to potentially bork out on the ODLD system. EDIT: Seems this affects the work in the wipython branch, not here. Seems to work just fine.

…onger have bootstrapping option flags.

…kinetics.

…or (finally).

astatide · 2017-03-10T21:34:53Z

FIXED/FINISHED:

Progress indicator now works regardless of what analysis you're running
Docstrings on w_direct basically done.
Base classes make more sense, and now properly handle the argparser in the way you'd expect (given how it works for every other tool). Notably, everything stems off a basic class which handles assignment loading and output options. Every subcommand which uses the mclib has the same bootstrapping options, and w_direct kinetics and w_reweight matrix properly have their own options. THAT IS, argparsing works as you would expect and is relatively clean. 'all' inherits all the options it requires, as well.

TODO:
Docstrings on w_reweight, slight cleanup on code indentation leftover from progress indicator handling, review whether I missed anything on moving classes around.

…netics_tool.py

…reweight.

astatide · 2017-03-10T22:33:33Z

COMPLETED:

Docstrings on both w_{reweight/direct} have been updated and are consistent.
Fixed indentation hack on matrix/events.

CONSISTENT COMMANDS: It didn't make much sense to name the subcommands after their old tool names (this is, after all, why we're leaving in placeholders to the old commands). Therefore, each tool now has the following commands

init
kinetics
probs
averages
all

INIT:
Runs the routines that were once w_postanalysis_matrix or w_kinetics trace, depending on the tool.

KINETICS:
Calculates and averages all kinetic observables (fluxes and rates).

PROBS:
Calculates and averages all types of probabilities ('color populations', or steady state ensemble populations, state populations, etc.)

AVERAGES:
Skips the init step, and runs kinetics/probs.

ALL:
Runs init/kinetics/probs, in the appropriate order.

In addition, "kintool" has been moved and renamed. It's now in the westtools folder. It's not a real 'general purpose' subclass of the various WESTTools classes, but it certainly seems to belong there.

At this time, it's probably appropriate to do a full code review.

… from old toolkit model to new one.

… current; changed references in w_direct and westpa/__init__ accordingly.

…both w_direct and w_reweight.

astatide · 2017-03-13T17:19:00Z

Ploterr has been updated to work with both tools, and to also plot steady-state populations (or 'color populations' or ensemble populations, or whatever).

The subcommands for ploterr are similar to w_direct and w_reweight: {d/rw}.probs for state and ensemble probabilities, and {d/rw}.kinetics for plotting fluxes and rates.

The help text on w_direct has been fixed to accurately inform the end user of what the subcommands do (there were just copy/pasted defaults in there, before).

w_fluxanl has been updated to also support optional bootstrap/correlation analysis on evolution datasets. While w_fluxanl has always been pretty fast, it's still the case that we don't always want to spend the time generating calculating error intervals; sometimes, a very rough estimate is all we require. As with w_direct and w_reweight, the flags -db (--disable-bootstrap) and -dc (--disable-correlation) are available.

There are now stubs such that w_postanalysis_reweight and w_postanalysis_matrix can be called, like before, so scripts/workflows that relied on these tools should not break.

Critically, I cannot think of anything else that needs to be completed before merging.

mczwier

This looks great, Adam. Thank you for your hard work. I'm not too concerned about mclib breaking; I use it, but not often enough to be annoyed at having to adapt my code somewhat for the more flexible backend. The only thing I can think of (though it is super annoying) is to update the copyright date on the files you've modified or written. (i.e. (c) 2013 – 2017).

astatide · 2017-03-14T16:34:20Z

The copyright date has been updated on files I've modified (and the appropriate headers added to new files).

The only one I haven't touched is w_assign, as the changes there aren't real (there's no functional difference), but it WILL be modified in a separate branch/pull request coming up soon that I'd prefer to not have to git merge (again), so that'll happen soon enough.

West tools kinetics rewrite merged. Former-commit-id: b07237e

kenjita and others added 30 commits June 2, 2015 19:23

Initial commit of multitool, a suite of tools designed for

9b05cfa

averaging together multiple weighted ensemble simulations.

Beginning of function to average generic datasets.

611cecd

Added in functions which should generalise the current routine in the…

0151c45

… tool built by Lewis Baker.

First commit of w_multi_kin.py

22510ea

Staging in my edits to w_multi_kin.py.

f965921

Temp hack to get stuff working on the direct trace.

87cd351

Added in state_labels.

40fb641

Slight modification to functions.

640c71c

Deleted accidentally added file.

f56958d

Modified _rc.py in westpa such that more driver options are available…

7d9c533

… (for the grouping function).

Added in color functionality and the ability to directly pull from it…

b87a4f5

… from the west.cfg file.

Minor modifications.

a327b2a

Initial commit for porting monte carlo bootstrap function into the re…

3f36298

…weighting code.

Removed random print statement from w_kinavg.py

939326e

Making changes!

700d3d1

Significant, albeit incomplete, rewrite.

cd0b3c5

Relies on new WESTTool type, 'WESTMultiTool', as a base which handles the file loading and argument parsing necessary.

Merge branch 'master' into multitool

638a18c

Bringing this up to date.

Merge branch 'master' into multitool

0710ff2

Forgot to pull in up to date master.

Added in the initial basis of a new tool, which is heavily based on

daa6df6

w_postanalysis_reweight.py

Added functionality for averaging together transition matrices to w_m…

ebed58f

…ulti_reweight.py

Nearly done with w_multi_reweight.py; it should be ready for bug test…

40f88f5

…ing soon.

Added a bunch of updates to w_multi_reweight.py. It should be ready f…

e553669

…or some preliminary tests.

Updated westtools/core.py with a small yaml parsing wrapper function.…

ad69a6d

… This is needed to work with w_multi_reweight.py.

Fixed typos and updated TODO notes in w_multi_reweight.py

5aaf856

Started adding option for saving transition matrices. This tool is pr…

6f308f9

…obably broken right now; I will update it again soon.

Added ability to save transition matrices in w_multi_reweight.py, as …

d93dc71

…well as ability to automatically convert from state indices to bin indices (for denoting bins subject to recycling).

Updated saving of transition matrices and added more help information.

8de863e

Fixed some quick syntax errors.

50ac899

Fixed a bug with specifying recycling states.

b10280b

Adam Pratt added 3 commits March 7, 2017 21:36

Purged of references to multitools and w_ipython stuff.

1577803

Removed changes related to split/merge driver changes.

c34f3bc

Bootstrap for evolution datasets is now optional, but on by default.

e21bff9

Adam Pratt added 4 commits March 10, 2017 15:06

Fixed argument parsing in reweight code.

44d11c3

Moved arguments to their respective classes; matrix and kinetics no l…

2f372fb

…onger have bootstrapping option flags.

Renamed WESTKinAvg to WESTKineticsBase, cleaned up imports for old w_…

69984a3

…kinetics.

Fixed a lot of docstrings in w_direct, and fixed the progress indicat…

ab587d3

…or (finally).

Adam Pratt added 6 commits March 10, 2017 16:41

Fixed more imports and headers.

8ba4674

Moved lib/west_tools/westpa/kintool.py to lib/west_tools/westtools/ki…

a79d0aa

…netics_tool.py

More updates on docstrings and help.

956cda6

Removed indentation from pi.

720f183

Changed subcommand values and seriously altered docstring text for w_…

f6b4d2b

…reweight.

Changed the order of subcommands.

131f738

Adam Pratt added 6 commits March 10, 2017 17:45

Added in stubs to w_postanalysis_{reweight/matrix} to ease transition…

3b572a0

… from old toolkit model to new one.

Finished removing old rate estimator from Cython and replaced it with…

70842ea

… current; changed references in w_direct and westpa/__init__ accordingly.

Updated ploterr to plot color/ensemble probabilities, and to support …

9f8110b

…both w_direct and w_reweight.

Removed errant print statement.

f8521e5

Made correlation/bootstrap optional on w_fluxanl.

50ed8d6

Updated help text on w_direct.

958db35

mczwier requested changes Mar 14, 2017

View reviewed changes

astatide removed the request for review from synapticarbors March 14, 2017 16:22

Updated/added copyright date/information to modified files.

eb28e07

astatide merged commit b07237e into master Mar 14, 2017

astatide deleted the west_tools_KineticsRewrite branch March 14, 2017 17:04

synapticarbors pushed a commit to synapticarbors/westpa that referenced this pull request Jun 24, 2020

Merge pull request westpa#35 from westpa/west_tools_KineticsRewrite

f01e041

West tools kinetics rewrite merged. Former-commit-id: b07237e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

West tools kinetics rewrite #35

West tools kinetics rewrite #35

astatide commented Mar 1, 2017 •

edited

Loading

astatide commented Mar 10, 2017 •

edited

Loading

astatide commented Mar 10, 2017

astatide commented Mar 10, 2017

astatide commented Mar 13, 2017

mczwier left a comment

astatide commented Mar 14, 2017

West tools kinetics rewrite #35

West tools kinetics rewrite #35

Conversation

astatide commented Mar 1, 2017 • edited Loading

astatide commented Mar 10, 2017 • edited Loading

astatide commented Mar 10, 2017

astatide commented Mar 10, 2017

astatide commented Mar 13, 2017

mczwier left a comment

Choose a reason for hiding this comment

astatide commented Mar 14, 2017

astatide commented Mar 1, 2017 •

edited

Loading

astatide commented Mar 10, 2017 •

edited

Loading