Semi-automatic summary statistics and sample weighting #429

yannikschaelte · 2021-02-23T14:59:44Z

Breaking changes:

API of the (Adaptive)PNormDistance was altered substantially to allow cutom definition of update indices.
Internal weighting of samples (should not affect users).

Semi-automatic summary statistics:

Implement (Adaptive)PNormDistance with the ability to learn summary statistics from simulations.
Add sumstat submodule for generic mappings (id, trafos), and especially a PredictorSumstat summary statistic that can make use of Predictor objects.
Add subsetting routines that allow restricting predictor model training samples.
Add predictor submodule with generic Predictor class and concrete implementations including linear regression, Lasso, Gaussian Process, Neural Network.
Add InfoWeightedPNormDistance that allows using predictor models to weight data not only by scale, but also by information content.

Changes to internal sample weighting:

Do not normalize weights of in-memory particles by model; this allows to more easily use the sampling weights and the list of particles for adaptive components (e.g. distance functions)
Normalization of population to 1 is applied on sample level in the sampler wrapper function
In the database, normalization is still by sample to not break old db support; would be nicer to also there only normalize by total sum -- requires a db update though.

Changes to internal object instruction from samples:

Pass sample instead of weighted_sum_stats to distance function. This is because thus the distance can choose on its own what it wants -- all or only accepted particles; distances; weights; parameters; summary statistics.

Visualization:

Function to plot adaptive distance weights from log file.

…to distance function

codecov-io · 2021-02-23T15:02:49Z

Codecov Report

Merging #429 (95b3365) into develop (23c08bf) will decrease coverage by 37.57%.
The diff coverage is 36.61%.

@@             Coverage Diff              @@
##           develop     #429       +/-   ##
============================================
- Coverage    87.34%   49.76%   -37.58%     
============================================
  Files          103      107        +4     
  Lines         6098     6561      +463     
============================================
- Hits          5326     3265     -2061     
- Misses         772     3296     +2524

Impacted Files	Coverage Δ
pyabc/distance/__init__.py	`100.00% <ø> (ø)`
pyabc/epsilon/base.py	`78.57% <ø> (-17.86%)`	⬇️
pyabc/epsilon/epsilon.py	`72.50% <ø> (-18.75%)`	⬇️
pyabc/distance/distance.py	`27.34% <18.22%> (-58.28%)`	⬇️
pyabc/sumstat/util.py	`18.30% <18.30%> (ø)`
pyabc/predictor.py	`25.75% <25.75%> (ø)`
pyabc/sampler/redis_eps/sampler.py	`61.32% <34.78%> (-35.19%)`	⬇️
pyabc/sumstat/learn.py	`35.06% <35.06%> (ø)`
pyabc/distance/util.py	`40.00% <40.00%> (ø)`
pyabc/sumstat/base.py	`40.00% <40.00%> (ø)`
... and 87 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 23c08bf...95b3365. Read the comment docs.

… feature_learn

* Add distance weight plot * add colors * enable passing keys

… in adaptive distances (#467) * implement only accepted particles for scale calculation * add test * fix indent

… feature_learn

* init * limit look-ahead sample number in delayed mode * update releasenotes -> 0.10.15 * refactor: weight normalization applied to all particles; pass sample to distance function * fix typo in changelog * fix tmp changes * fixup * pycharm annoys me * fixit * fix population test * fix zero division error * fix docs * whatever * remove file * init * tmp * tmp * refactor adaptive distances: sumstat + vectorize * add working version of sumstat and predictor modules * whatever * add nbs * handle trivial statistics better * normalize info weighting correctly * refactor anew info weighting + normalization + gp and layer handles * fix flake8 * add lasso sumstat * set indices to keep correctly * add option to not normalize per parameter in info weight * cont * implement late model use * remove slad * tidy up * update nbs; fix various things * add predictor test * add model selection test * additional tests * update readme; add raise tests * add sumstat test * add test for dict2arr * test info weighting * test sample construction * test fit index construction * test inf norm; test scales errors * fixup * implement subsetting * fix imports * test augmentation * add missing base class dependency * move worker signup up * add logger * always normalize linreg inputs; postpone default fit indices * do not clear up redis server * fix typo * reset default scale function from rmsd to std for stability in most cases * cont * cont * update * add tests * whatever * Allow fitting at simulation-based events (#462) * Allow fitting at simulation-based events * update nb * cont * fix test * fix test * Add distance weight plot (#463) * fix wrong deviation threshold 0.5 -> 0.33 * Small fixes (#466) * Add distance weight plot * add colors * enable passing keys * integer coordinates * implement option to use only accepted particles for scale calculation in adaptive distances (#467) * implement only accepted particles for scale calculation * add test * fix indent * add max mlp method * log fitting time * add train-test-split model selection method * better info weight calculation * add pre_before_fit and from_events * fix * change default to weights * normalize in subsetter * add n_sample option to data plot * fix stuff * allow kwargs in distance weights plot * add pcmad convenience * apply la normalization to all particles * fix defaults * final edits Co-authored-by: Yannik Schälte <[email protected]> Co-authored-by: Yannik Schälte <[email protected]>

yannikschaelte added 7 commits February 22, 2021 14:58

init

93c0bb6

limit look-ahead sample number in delayed mode

019f937

update releasenotes -> 0.10.15

d70c19d

Merge branch 'develop' of https://github.com/icb-dcm/pyabc into develop

5fe77b8

Merge branch 'develop' into feature_learn

472f13e

refactor: weight normalization applied to all particles; pass sample …

6c5efbc

…to distance function

fix typo in changelog

7cc80f4

yannikschaelte self-assigned this Feb 23, 2021

fix tmp changes

a4bf962

yannikschaelte and others added 20 commits February 23, 2021 16:04

fixup

79dc80e

pycharm annoys me

d35351e

fixit

32b9bb1

fix population test

fc7664a

fix zero division error

55360a6

fix docs

b8b8c47

whatever

57ea041

remove file

a8932af

init

3f956c5

Merge branch 'develop' into feature_learn

bc141a0

Merge branch 'develop' into feature_learn

4f06b93

tmp

8b0a8af

tmp

549f491

Merge branch 'develop' into feature_learn

c3233b7

refactor adaptive distances: sumstat + vectorize

36a2282

Merge branch 'feature_learn' of https://github.com/icb-dcm/pyabc into…

ebcdaac

… feature_learn

Merge branch 'develop' into feature_learn

4b8e2b3

add working version of sumstat and predictor modules

6bab4da

Merge branch 'feature_learn' of https://github.com/icb-dcm/pyabc into…

02bdd84

… feature_learn

whatever

55c8336

yannikschaelte and others added 23 commits June 25, 2021 20:47

Add distance weight plot (#463)

c77b3d8

fix wrong deviation threshold 0.5 -> 0.33

6610864

Small fixes (#466)

4106b87

* Add distance weight plot * add colors * enable passing keys

integer coordinates

de4aeda

implement option to use only accepted particles for scale calculation…

1cec658

… in adaptive distances (#467) * implement only accepted particles for scale calculation * add test * fix indent

Merge branch 'develop' into feature_learn

d8f2f88

add max mlp method

72f01cc

log fitting time

1f41dd9

add train-test-split model selection method

21a0385

better info weight calculation

d188941

add pre_before_fit and from_events

19222ce

fix

c80a932

change default to weights

807a1b9

normalize in subsetter

a10d547

add n_sample option to data plot

dd57aee

fix stuff

7e1f7d5

allow kwargs in distance weights plot

0b3604f

Merge branch 'develop' into feature_learn

6c0512b

Merge branch 'feature_learn' of https://github.com/icb-dcm/pyabc into…

da8aa75

… feature_learn

add pcmad convenience

d4109f1

apply la normalization to all particles

39af456

fix defaults

b0d09d6

final edits

41939cc

yannikschaelte requested review from EmadAlamoudi, elbaraim and JanHasenauer July 30, 2021 13:45

yannikschaelte merged commit dbb9b82 into develop Jul 30, 2021

yannikschaelte deleted the feature_learn branch July 30, 2021 14:03

yannikschaelte mentioned this pull request Jul 30, 2021

Release 0.11.0 #476

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Semi-automatic summary statistics and sample weighting #429

Semi-automatic summary statistics and sample weighting #429

yannikschaelte commented Feb 23, 2021 •

edited

Loading

codecov-io commented Feb 23, 2021 •

edited

Loading

Semi-automatic summary statistics and sample weighting #429

Semi-automatic summary statistics and sample weighting #429

Conversation

yannikschaelte commented Feb 23, 2021 • edited Loading

codecov-io commented Feb 23, 2021 • edited Loading

Codecov Report

yannikschaelte commented Feb 23, 2021 •

edited

Loading

codecov-io commented Feb 23, 2021 •

edited

Loading