-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
histogram normalization #118
Comments
Oh, now that I've also read this issue, yes! Those are all good ideas. Perhaps a new module in the style of Accu would be useful, at that point the "groupby" or "reducing" function in |
do you mean the i was thinking it may be better to maintain an extra running sum in the data structure and only normalize lazily when the actual array of counts/weights is requested. i.e. one would have to keep an anyway. would a putative new accu-like module then be the foundation also for the float histogram? this would be the design in biocaml. would that also work for 2d float histograms? |
on second thought maybe a special histogram data structure is too complicated. i guess that would only be needed if one wants to interrupt and later restart histogram accumulation. |
Yes, you're right, it would need another parameter/transformation at the end. This is part of why I am hesitant to think of these operations under the general capabilities of a histogram, but as more of a selecting/grouping/aggregating table-like data structure. |
for float-bin histograms: one could normalize 1) not at all - then int is a natural bin count type 2) sum to 1 3) integrate to 1, i.e. bin width * hist_value sums up to 1.
(for 2 the counts would be rationals (but that seems silly) and for 3 obviously only float counts make sense)
is it worthwhile to include the choice of normalization as an option? i guess 1) with int counts would need an extra function for the different return type; alternatively one could think of a variant return type...
The text was updated successfully, but these errors were encountered: