Implement KDE bootstrapped error estimates and more #689

atrettin · 2022-01-16T20:39:56Z

This PR adds the option to estimate errors on KDE'd histograms using the bootstrap_kde class from the KDE module that we already used. It's of course very slow and not recommended to be used in a fit, but it is very useful for fitting hyper surfaces with statistically sound errors. Other changes are:

To the aeff.weight stage:
- Scale errors along with weights in the aeff.weight stage
  - This makes it possible to do the weighting after making histograms in a pipeline. In this way, KDE histograms have to be calculated only once before a fit starts, and the histograms are simply scaled with the efficiency.
to the utils.kde stage:
- Add option to stash histograms in the KDE stage. Ties into the point above: the (expensive) KDEs only need to be done once before a fit.
- Calculate KDEs in log-space for variables that are binned logarithmically (i.e. energy). Significantly decreases distortions in the energy dimension.
to the utils.hist stage:
- Add option to calculate unweighted histograms. Makes it possible to look at raw MC counts, which is useful for diagnostics. Can be used to exclude points with a very small number of MC events from Hypersurface fits, where the Gaussian assumption would be badly broken.
to hyper surface code:
- Add option to exclude data points with too little MC or weighted count from the fits.
- Support Pipelines that use KDEs to produce histograms as long as they use bootstrapping.
- Plot bin-wise fits on fixed range. Should probably be an option rather than being hard-coded...
Ran the KDE code through black to make it nicer.

philippeller · 2022-02-03T09:17:57Z

I think it looks mostly good!
The only thing I am a little hesitant with is the stashing mechanism....just wondering if this will bite us at some point....
Do the parameters in stages before actually not change, or do they still change and the changes are being ignored?

Back in the day we made for example Param sets hasable....so one can quite easily detect if any changes were made. So maybe there we could think of a more global mechanism of stashing....maybe?

atrettin · 2022-02-03T09:56:42Z

@philippeller Yes, the stashing is a bit of a use-at-your-own-risk thing. All changes in previous stages after the first call to apply are ignored. I'm not aware of a way that a stage could know that the weights have changed and require re-running of the KDE... It's tricky because it would need to know just if any parameter in a stage before has changed. If there is a way to do it I would gladly implement it, otherwise we can give out a warning when the setting is enabled about the behavior.

atrettin · 2022-02-03T10:05:09Z

Or, better, we run a check on the pipeline after instantiation to look for configuration errors and produce an error if the stage before the KDE stage has a parameter that is free while stashing is enabled. Errors are better than warnings, we still have tons of warnings in PISA that just get ignored... (actually an issue on its own, too many warnings)

philippeller · 2022-02-03T10:28:13Z

I think cleanest would be to implement stashing in general at the pipeline level.

So you would stay: stash at stage X, and then the pipeline will not run anything up to stage X, except when a parameter has changed. What do you think?

It probably should then keep an internal copy of the weights and apply those, just to avoid side effects, but that is low cost

atrettin · 2022-02-03T10:52:00Z

That sounds like a good idea! Though, I'm wondering how exactly we'd go about it to make it safe in general. The weights are often manipulated in place in a pipeline, so as you said it would have to store an internal copy of the weights and re-apply them. But what if there is more than just "weights" that need to be applied? I guess we can say that we store everything that's in apply_mode... I need to think about this.

philippeller · 2022-02-03T14:32:26Z

we could make the user specify what keys to be cached! So this way would be explicit.

E.g. in the cfg specify: At stage X cache keys x, and y

philippeller · 2022-02-03T14:35:35Z

We could do that in the [pipeline] section of the cfg most naturally.
Add a optional: stash = {5 : ['weights', 'blah', ...]} or so

atrettin · 2022-02-22T10:19:23Z

Hey, @philippeller!

Sorry I was too busy with other stuff, now coming back to this. Could we please find a compromise here? I understand that it would be nice to implement caching globally, but right now it's only used for the KDE stage and for a very specific reason. I see the point that it could lead to unexpected behavior if used inadvertently, and I'm prepared to make changes that make it harder to mis-use such as calling the caching flag ignore_free_parameters_in_earlier_stages and also issuing a warning. But I just don't have time to implement caching consistently throughout PISA right now, in a way that is generally applicable, just to end up using it in this one stage.

Could we please just pass the salt this one time?

philippeller · 2022-02-22T12:04:26Z

sure, i do not want to be an unreasonable pain. Can you create an issue about our discussion, and that we should fix it soon? Then we will merge the PR as is for now

philippeller · 2022-02-22T13:12:18Z

Ok, just did that. Will merge the PR now

atrettin · 2022-02-22T13:16:06Z

Thank you!

implement KDE bootstrapped error estimates

020e1a9

philippeller mentioned this pull request Feb 22, 2022

Stage Caching #692

Open

philippeller merged commit f8ed000 into icecube:master Feb 22, 2022

philippeller mentioned this pull request Feb 22, 2022

New stages for high-E analyses, some changes to existing ones #691

Merged

atrettin deleted the kde_bootstrap branch June 20, 2022 11:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement KDE bootstrapped error estimates and more #689

Implement KDE bootstrapped error estimates and more #689

atrettin commented Jan 16, 2022 •

edited

Loading

philippeller commented Feb 3, 2022

atrettin commented Feb 3, 2022

atrettin commented Feb 3, 2022

philippeller commented Feb 3, 2022

atrettin commented Feb 3, 2022

philippeller commented Feb 3, 2022

philippeller commented Feb 3, 2022

atrettin commented Feb 22, 2022

philippeller commented Feb 22, 2022

philippeller commented Feb 22, 2022

atrettin commented Feb 22, 2022

Implement KDE bootstrapped error estimates and more #689

Implement KDE bootstrapped error estimates and more #689

Conversation

atrettin commented Jan 16, 2022 • edited Loading

philippeller commented Feb 3, 2022

atrettin commented Feb 3, 2022

atrettin commented Feb 3, 2022

philippeller commented Feb 3, 2022

atrettin commented Feb 3, 2022

philippeller commented Feb 3, 2022

philippeller commented Feb 3, 2022

atrettin commented Feb 22, 2022

philippeller commented Feb 22, 2022

philippeller commented Feb 22, 2022

atrettin commented Feb 22, 2022

atrettin commented Jan 16, 2022 •

edited

Loading