Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semi-automatic summary statistics and sample weighting #429

Merged
merged 98 commits into from
Jul 30, 2021

Conversation

yannikschaelte
Copy link
Member

@yannikschaelte yannikschaelte commented Feb 23, 2021

Breaking changes:

  • API of the (Adaptive)PNormDistance was altered substantially to allow cutom definition of update indices.
  • Internal weighting of samples (should not affect users).

Semi-automatic summary statistics:

  • Implement (Adaptive)PNormDistance with the ability to learn summary statistics from simulations.
  • Add sumstat submodule for generic mappings (id, trafos), and especially a PredictorSumstat summary statistic that can make use of Predictor objects.
  • Add subsetting routines that allow restricting predictor model training samples.
  • Add predictor submodule with generic Predictor class and concrete implementations including linear regression, Lasso, Gaussian Process, Neural Network.
  • Add InfoWeightedPNormDistance that allows using predictor models to weight data not only by scale, but also by information content.

Changes to internal sample weighting:

  • Do not normalize weights of in-memory particles by model; this allows to more easily use the sampling weights and the list of particles for adaptive components (e.g. distance functions)
  • Normalization of population to 1 is applied on sample level in the sampler wrapper function
  • In the database, normalization is still by sample to not break old db support; would be nicer to also there only normalize by total sum -- requires a db update though.

Changes to internal object instruction from samples:

  • Pass sample instead of weighted_sum_stats to distance function. This is because thus the distance can choose on its own what it wants -- all or only accepted particles; distances; weights; parameters; summary statistics.

Visualization:

  • Function to plot adaptive distance weights from log file.

@yannikschaelte yannikschaelte self-assigned this Feb 23, 2021
@codecov-io
Copy link

codecov-io commented Feb 23, 2021

Codecov Report

Merging #429 (95b3365) into develop (23c08bf) will decrease coverage by 37.57%.
The diff coverage is 36.61%.

Impacted file tree graph

@@             Coverage Diff              @@
##           develop     #429       +/-   ##
============================================
- Coverage    87.34%   49.76%   -37.58%     
============================================
  Files          103      107        +4     
  Lines         6098     6561      +463     
============================================
- Hits          5326     3265     -2061     
- Misses         772     3296     +2524     
Impacted Files Coverage Δ
pyabc/distance/__init__.py 100.00% <ø> (ø)
pyabc/epsilon/base.py 78.57% <ø> (-17.86%) ⬇️
pyabc/epsilon/epsilon.py 72.50% <ø> (-18.75%) ⬇️
pyabc/distance/distance.py 27.34% <18.22%> (-58.28%) ⬇️
pyabc/sumstat/util.py 18.30% <18.30%> (ø)
pyabc/predictor.py 25.75% <25.75%> (ø)
pyabc/sampler/redis_eps/sampler.py 61.32% <34.78%> (-35.19%) ⬇️
pyabc/sumstat/learn.py 35.06% <35.06%> (ø)
pyabc/distance/util.py 40.00% <40.00%> (ø)
pyabc/sumstat/base.py 40.00% <40.00%> (ø)
... and 87 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 23c08bf...95b3365. Read the comment docs.

@yannikschaelte yannikschaelte merged commit dbb9b82 into develop Jul 30, 2021
@yannikschaelte yannikschaelte deleted the feature_learn branch July 30, 2021 14:03
@yannikschaelte yannikschaelte mentioned this pull request Jul 30, 2021
EmadAlamoudi pushed a commit that referenced this pull request Jun 30, 2022
* init

* limit look-ahead sample number in delayed mode

* update releasenotes -> 0.10.15

* refactor: weight normalization applied to all particles; pass sample to distance function

* fix typo in changelog

* fix tmp changes

* fixup

* pycharm annoys me

* fixit

* fix population test

* fix zero division error

* fix docs

* whatever

* remove file

* init

* tmp

* tmp

* refactor adaptive distances: sumstat + vectorize

* add working version of sumstat and predictor modules

* whatever

* add nbs

* handle trivial statistics better

* normalize info weighting correctly

* refactor anew info weighting + normalization + gp and layer handles

* fix flake8

* add lasso sumstat

* set indices to keep correctly

* add option to not normalize per parameter in info weight

* cont

* implement late model use

* remove slad

* tidy up

* update nbs; fix various things

* add predictor test

* add model selection test

* additional tests

* update readme; add raise tests

* add sumstat test

* add test for dict2arr

* test info weighting

* test sample construction

* test fit index construction

* test inf norm; test scales errors

* fixup

* implement subsetting

* fix imports

* test augmentation

* add missing base class dependency

* move worker signup up

* add logger

* always normalize linreg inputs; postpone default fit indices

* do not clear up redis server

* fix typo

* reset default scale function from rmsd to std for stability in most cases

* cont

* cont

* update

* add tests

* whatever

* Allow fitting at simulation-based events (#462)

* Allow fitting at simulation-based events

* update nb

* cont

* fix test

* fix test

* Add distance weight plot (#463)

* fix wrong deviation threshold 0.5 -> 0.33

* Small fixes (#466)

* Add distance weight plot

* add colors

* enable passing keys

* integer coordinates

* implement option to use only accepted particles for scale calculation in adaptive distances (#467)

* implement only accepted particles for scale calculation

* add test

* fix indent

* add max mlp method

* log fitting time

* add train-test-split model selection method

* better info weight calculation

* add pre_before_fit and from_events

* fix

* change default to weights

* normalize in subsetter

* add n_sample option to data plot

* fix stuff

* allow kwargs in distance weights plot

* add pcmad convenience

* apply la normalization to all particles

* fix defaults

* final edits

Co-authored-by: Yannik Schälte <[email protected]>
Co-authored-by: Yannik Schälte <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants