Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

West tools kinetics rewrite #35

Merged
merged 161 commits into from
Mar 14, 2017
Merged

West tools kinetics rewrite #35

merged 161 commits into from
Mar 14, 2017

Conversation

astatide
Copy link

@astatide astatide commented Mar 1, 2017

It's time for a code review! This branch includes all the previously discussed changes (see here: westpa/west_tools#17) to the kinetics code. Let's start with a brief summary of what's been changed, what's been added, and what's been marked for future deprecation, followed by a TODO (either pre or post merge, depending on what we would like).

SUMMARY:
Introduces changes to mclib code and both types of kinetics analysis (w_kinavg,w_kinetics,w_postanalysis_matrix/reweight), including changes and new versions of the estimator functions. Both types of analysis now produce the same datasets, and all calculate error through monte carlo block bootstrapping.

NEW TOOLS:
w_direct: w_kinetics, w_stateprobs, and w_kinavg have all been combined into this tool.
w_reweight: w_postanalysis_matrix and w_postanalysis_reweight are combined here.

Both tools use subcommands to perform the analysis, and include some 'helper' commands to easily run everything.

w_direct/reweight: all, average, stateprobs
w_direct: kinavg, kinetics
w_reweight: matrix, rate

The flags you would use for each of the old tools should still work with the new ones and their respective subcommands. By default, each tool writes each analysis dataset to one file: direct.h5, and reweight.h5. You can choose to write the output of each dataset to a different file, as well, to be compatible with how things were run before.

The old w_kinavg error calling routines have been abstracted away into an 'AverageCommands' class, which exists in westpa/kintool (place in with other tools?). Each tool subclasses this, overrides the go function, and has a unique function that performs most or all of the actual analysis (typically named for its old tool). This makes generating the futures for the work manager, printing responsible output, etc, all trivial and uniform across tools. Each tool still needs to build its own block to submit (ala how it was done before) to the mclib, depending on how complex the estimator is (this could probably be simplified).

The code that once was w_postanalysis_matrix and w_kinetics now exists largely in westpa/reweight and westpa/kinetics, and is largely unchanged.

All observable calculations are done through their cython estimators; there is no non-cython code for the reweighting procedure or the rate calculation procedure. The reweighting estimator, reweight_for_c, is a cython implementation of the exact algorithm that was once in w_postanalysis_matrix.

RATE AND ERROR CALCULATIONS:
The rate estimator function for w_direct is different from w_kinavg, as it now uses the ratio of sums vs. the sum of the ratios (see the presentation file from the WESTPA meeting from January 27th, 2017). This substantially increases the complexity of the estimator function used in the bootstrap, as numpy.mean can no longer be used (the estimator is now the rate constant equation itself). In addition, reweight_for_c, the reweighting estimator, is very complex and requires a large number of datasets.

To accommodate the necessities of calculating error through reweighting and the new direct trace code, the way that programs interface with the mclib functions is different. It now accepts two dictionaries: a dictionary of datasets that are able to be decimated for the blocking procedure (based on a correlation analysis), and a dictionary of kwargs for the estimator function. To account for correlation, the estimator is run without bootstrapping within mclib to calculate the observable over the time period indicated, and the correlation analysis/bootstrapping proceeds from there. In the event that a non dictionary is sent in as a dataset, the code should detect this and handle this appropriately. The block length determined from correlation is typically sent on to the estimator, as the data from correlated blocks may need to be handled in a very complex manner.

As an example, the dataset dictionary for reweight_for_c is only the iterations submitted, whereas the estimator kwargs are the stride length, all the flux events, the number of observations, the type of observable required (as reweighting can generate all of them), etc. For 3 bins, this is slightly slower than the direct trace.

FOR FUTURE DEPRECATION:
w_stateprobs, w_kinetics, and w_kinavg can be called as they once were, but send in a deprecation warning. They accept the same flags, and work mostly in the same way (with some minor differences).

TODO:
Add flag to disable bootstrapping if error is not required.
w_postanalysis_matrix and w_postanalysis_reweight need to be rewritten to just call the new code and generate a deprecation warning.

The routines have been tested numerous times on numerous datasets, and return the correct answers. Please test the code and try and break it. In addition, please comment on any areas where what's going on isn't clear or could be better documented, as this rewrite has resulted in a substantial amount of code changes.

kenjita and others added 30 commits June 2, 2015 19:23
averaging together multiple weighted ensemble simulations.
Relies on new WESTTool type, 'WESTMultiTool', as a base
which handles the file loading and argument parsing necessary.
Bringing this up to date.
Forgot to pull in up to date master.
… This is needed to work with w_multi_reweight.py.
…obably broken right now; I will update it again soon.
…well as ability to automatically convert from state indices to bin indices (for denoting bins subject to recycling).
…ns every timepoint or every iteration. Also commented out check for target states in w_postanalysis_matrix, as I think this tool produces valid results for simulations with target states; this check could perhaps be moved to w_postanalysis_reweight in the future.
@astatide
Copy link
Author

astatide commented Mar 10, 2017

Fluxanl is fixed, multitool stuff has been purged, all references to w_ipython are gone, /lib/west_tools/westpa/_rc.py has been reverted to the original version (changes committed to another branch, now). We're getting close, here.

The bootstrap is now optional on all dataset evolutions (flag --disable-bootstrap or -db), which significantly speeds up calculation of the mean if that's all you're interested in. This is explicitly supported in mclib, which might seem odd except that it's already set up to run estimators and return datasets using the block options (and the datasets are properly shaped, at that).

@mczwier, I know one of your original concerns was whether the mclib library retained the functionality to 'just work' on an arbitrary dataset. If you have any pre-existing code, could you check whether it works as expected, currently?

TO DO, STILL:

  • Continue cleaning, commenting, adding/updating docstrings.
  • Consider moving kintool into westtools, and potentially merging it in with the other tool types (it's not specifically a tool type, really, but...)
  • Consider modifying w_direct kinetics and w_reweight matrix such that they aren't subclassed on the kintool averaging class (AverageCommands), as that simply adds extra flags that they don't need. Probably not difficult.
  • Possible bug with unknown states in kinetics tracing? - fixed in @ajd98's revisions in the multitool branch, which I attempted to port over, but may not have been successful, as it seems to potentially bork out on the ODLD system. EDIT: Seems this affects the work in the wipython branch, not here. Seems to work just fine.

@astatide
Copy link
Author

FIXED/FINISHED:

  • Progress indicator now works regardless of what analysis you're running
  • Docstrings on w_direct basically done.
  • Base classes make more sense, and now properly handle the argparser in the way you'd expect (given how it works for every other tool). Notably, everything stems off a basic class which handles assignment loading and output options. Every subcommand which uses the mclib has the same bootstrapping options, and w_direct kinetics and w_reweight matrix properly have their own options. THAT IS, argparsing works as you would expect and is relatively clean. 'all' inherits all the options it requires, as well.

TODO:
Docstrings on w_reweight, slight cleanup on code indentation leftover from progress indicator handling, review whether I missed anything on moving classes around.

@astatide
Copy link
Author

COMPLETED:

  • Docstrings on both w_{reweight/direct} have been updated and are consistent.
  • Fixed indentation hack on matrix/events.

CONSISTENT COMMANDS: It didn't make much sense to name the subcommands after their old tool names (this is, after all, why we're leaving in placeholders to the old commands). Therefore, each tool now has the following commands

  • init
  • kinetics
  • probs
  • averages
  • all

INIT:
Runs the routines that were once w_postanalysis_matrix or w_kinetics trace, depending on the tool.

KINETICS:
Calculates and averages all kinetic observables (fluxes and rates).

PROBS:
Calculates and averages all types of probabilities ('color populations', or steady state ensemble populations, state populations, etc.)

AVERAGES:
Skips the init step, and runs kinetics/probs.

ALL:
Runs init/kinetics/probs, in the appropriate order.

In addition, "kintool" has been moved and renamed. It's now in the westtools folder. It's not a real 'general purpose' subclass of the various WESTTools classes, but it certainly seems to belong there.

At this time, it's probably appropriate to do a full code review.

@astatide
Copy link
Author

Ploterr has been updated to work with both tools, and to also plot steady-state populations (or 'color populations' or ensemble populations, or whatever).

The subcommands for ploterr are similar to w_direct and w_reweight: {d/rw}.probs for state and ensemble probabilities, and {d/rw}.kinetics for plotting fluxes and rates.

The help text on w_direct has been fixed to accurately inform the end user of what the subcommands do (there were just copy/pasted defaults in there, before).

w_fluxanl has been updated to also support optional bootstrap/correlation analysis on evolution datasets. While w_fluxanl has always been pretty fast, it's still the case that we don't always want to spend the time generating calculating error intervals; sometimes, a very rough estimate is all we require. As with w_direct and w_reweight, the flags -db (--disable-bootstrap) and -dc (--disable-correlation) are available.

There are now stubs such that w_postanalysis_reweight and w_postanalysis_matrix can be called, like before, so scripts/workflows that relied on these tools should not break.

Critically, I cannot think of anything else that needs to be completed before merging.

Copy link
Member

@mczwier mczwier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, Adam. Thank you for your hard work. I'm not too concerned about mclib breaking; I use it, but not often enough to be annoyed at having to adapt my code somewhat for the more flexible backend. The only thing I can think of (though it is super annoying) is to update the copyright date on the files you've modified or written. (i.e. (c) 2013 – 2017).

@astatide astatide removed the request for review from synapticarbors March 14, 2017 16:22
@astatide
Copy link
Author

The copyright date has been updated on files I've modified (and the appropriate headers added to new files).

The only one I haven't touched is w_assign, as the changes there aren't real (there's no functional difference), but it WILL be modified in a separate branch/pull request coming up soon that I'd prefer to not have to git merge (again), so that'll happen soon enough.

@astatide astatide merged commit b07237e into master Mar 14, 2017
@astatide astatide deleted the west_tools_KineticsRewrite branch March 14, 2017 17:04
synapticarbors pushed a commit to synapticarbors/westpa that referenced this pull request Jun 24, 2020
West tools kinetics rewrite merged.

Former-commit-id: b07237e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants