Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

histogram normalization #118

Open
nilsbecker opened this issue Nov 22, 2015 · 4 comments
Open

histogram normalization #118

nilsbecker opened this issue Nov 22, 2015 · 4 comments

Comments

@nilsbecker
Copy link

for float-bin histograms: one could normalize 1) not at all - then int is a natural bin count type 2) sum to 1 3) integrate to 1, i.e. bin width * hist_value sums up to 1.

(for 2 the counts would be rationals (but that seems silly) and for 3 obviously only float counts make sense)

is it worthwhile to include the choice of normalization as an option? i guess 1) with int counts would need an extra function for the different return type; alternatively one could think of a variant return type...

@rleonid
Copy link
Owner

rleonid commented Nov 23, 2015

Oh, now that I've also read this issue, yes! Those are all good ideas. Perhaps a new module in the style of Accu would be useful, at that point the "groupby" or "reducing" function in accu could perform this normalization?

@nilsbecker
Copy link
Author

do you mean the 'increment function in the type? i don't see yet how that would work; when updating one bin and maintaining the normalization at every update, all other bins would have to be updated at every step -- sounds inefficient and incompatible with the type.

i was thinking it may be better to maintain an extra running sum in the data structure and only normalize lazily when the actual array of counts/weights is requested. i.e. one would have to keep an int array for the counts, a running int sum, and i guess also the bin array in the case of integral normalization.

anyway. would a putative new accu-like module then be the foundation also for the float histogram? this would be the design in biocaml. would that also work for 2d float histograms?

@nilsbecker
Copy link
Author

on second thought maybe a special histogram data structure is too complicated. i guess that would only be needed if one wants to interrupt and later restart histogram accumulation.

@rleonid
Copy link
Owner

rleonid commented Nov 23, 2015

Yes, you're right, it would need another parameter/transformation at the end. This is part of why I am hesitant to think of these operations under the general capabilities of a histogram, but as more of a selecting/grouping/aggregating table-like data structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants