Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Min SSE #3

Merged
merged 15 commits into from
Jan 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .github/workflows/CI.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -58,3 +58,9 @@ jobs:
uses: codecov/codecov-action@v3
env:
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}

- name: Run codacy-coverage-reporter
uses: codacy/codacy-coverage-reporter-action@v1
with:
project-token: ${{ secrets.CODACY_PROJECT_TOKEN }}
coverage-reports: coverage.xml
32 changes: 19 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,13 @@ deea

Detection of Equilibration by Ensemble Analysis

A lightweight Python package for detecting equilibration in timeseries data where an initial transient is followed by a stationary distribution. Currently, two equilibration detection methods are implemented:
A Python package for detecting equilibration in timeseries data where an initial transient is followed by a stationary distribution. Two main methods are implemented, which differ in the way they account for autocorrelation:

- Maximum effective sample size based on the lugsail replicated batch means method. See [here](https://projecteuclid.org/journals/statistical-science/volume-36/issue-4/Revisiting-the-GelmanRubin-Diagnostic/10.1214/20-STS812.full) and [here](https://academic.oup.com/biomet/article/109/3/735/6395353).

- Paired t-test. This checks for significant differences between the first 10 %, and last 50 % of the data. The test is repeated while sequentially removing more initial data. The first time point at which no significant difference is found is taken as the equilibration time.
- `detect_equilibration_init_seq`: This uses the initial sequence methods of Geyer ([Geyer, 1992](https://www.jstor.org/stable/2246094)) to determine the truncation point of the sum of autocovariances.
- `detect_equilibration_window`: This uses window methods (see [Geyer](https://www.jstor.org/stable/2246094) again) when calculating the
autocorrelation.

For both, the equilibration point can be determined either according to the minimum of the squared standard error (the default), or the maximum effective sample size, by specifying `method="min_sse"` or `method="max_ess"`.

### Installation

Expand All @@ -29,18 +30,23 @@ pip install -e .
import deea

# Load your timeseries of interest.
# This should be a 2D numpy array with shape (n_runs, n_samples)
# This should be a 2D numpy array with shape (n_runs, n_samples),
# or a 1D numpy array with shape (n_samples).
my_timeseries = ...

# Detect equilibration based on the maximum effective sample size
# based on the lugsail replicated batch means method.
# idx is the index of the first sample after equilibration, g is the
# statistical inefficiency, and ess is the effective sample size.
idx, g, ess = deea.detect_equilibration_max_ess(my_timeseries, method="lugsail", plot=True)
# Detect equilibration based on the minimum squared standard error
# using Geyer's initial convex sequence method to account for
# autocorrelation. idx is the index of the first sample after
# equilibration, g is the statistical inefficiency of the equilibrated
# sample, and ess is the effective sample size of the equilibrated sample.
idx, g, ess = deea.detect_equilibration_init_seq(my_timeseries, method="min_sse", plot=True)

# Alternatively, use the window method to account for autocorrelation.
# By default, this uses a Bartlett kernel and a window size of round(n_samples**0.5).
idx, g, ess = deea.detect_equilibration_window(my_timeseries, method="min_sse", plot=True)

# Detect equilibration using a paired t-test.
# idx is the index of the first sample after equilibration.
idx = deea.detect_equilibration_ttest(my_timeseries, plot=True)
# We can also determine equilibration in the same way as in pymbar.timeseries.detect_equilibration.
idx, g, ess = deea.detect_equilibration_init_seq(my_timeseries, method="max_ess", sequence_estimator="positive")
```

### Copyright
Expand Down
1 change: 1 addition & 0 deletions deea/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,5 @@
from .ess import *
from .gelman_rubin import *
from .plot import *
from .sse import *
from .variance import *
7 changes: 5 additions & 2 deletions deea/_validation.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
"""Functions for data validation."""

from warnings import warn as _warn

import numpy as np

from ._exceptions import InvalidInputError
Expand Down Expand Up @@ -45,8 +47,9 @@ def check_data(data: np.ndarray, one_dim_allowed: bool = False) -> np.ndarray:
if n_dims == 2:
n_chains, n_samples = data.shape
if n_chains > n_samples:
raise InvalidInputError(
"Data must have shape (n_chains, n_samples) where n_chains >= n_samples."
_warn(
"Data has more chains than samples. This is allowed but may not be what you intended.",
RuntimeWarning,
)

# If the array is one dimensional, reshape it to (1, n_samples).
Expand Down
2 changes: 1 addition & 1 deletion deea/_version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.1.0"
__version__ = "0.1.0+12.gb6118c6.dirty"
Loading