histogram normalization #118

nilsbecker · 2015-11-22T20:44:55Z

for float-bin histograms: one could normalize 1) not at all - then int is a natural bin count type 2) sum to 1 3) integrate to 1, i.e. bin width * hist_value sums up to 1.

(for 2 the counts would be rationals (but that seems silly) and for 3 obviously only float counts make sense)

is it worthwhile to include the choice of normalization as an option? i guess 1) with int counts would need an extra function for the different return type; alternatively one could think of a variant return type...

rleonid · 2015-11-23T00:28:05Z

Oh, now that I've also read this issue, yes! Those are all good ideas. Perhaps a new module in the style of Accu would be useful, at that point the "groupby" or "reducing" function in accu could perform this normalization?

nilsbecker · 2015-11-23T10:20:36Z

do you mean the 'increment function in the type? i don't see yet how that would work; when updating one bin and maintaining the normalization at every update, all other bins would have to be updated at every step -- sounds inefficient and incompatible with the type.

i was thinking it may be better to maintain an extra running sum in the data structure and only normalize lazily when the actual array of counts/weights is requested. i.e. one would have to keep an int array for the counts, a running int sum, and i guess also the bin array in the case of integral normalization.

anyway. would a putative new accu-like module then be the foundation also for the float histogram? this would be the design in biocaml. would that also work for 2d float histograms?

nilsbecker · 2015-11-23T10:38:53Z

on second thought maybe a special histogram data structure is too complicated. i guess that would only be needed if one wants to interrupt and later restart histogram accumulation.

rleonid · 2015-11-23T18:55:08Z

Yes, you're right, it would need another parameter/transformation at the end. This is part of why I am hesitant to think of these operations under the general capabilities of a histogram, but as more of a selecting/grouping/aggregating table-like data structure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

histogram normalization #118

histogram normalization #118

nilsbecker commented Nov 22, 2015

rleonid commented Nov 23, 2015

nilsbecker commented Nov 23, 2015

nilsbecker commented Nov 23, 2015

rleonid commented Nov 23, 2015

histogram normalization #118

histogram normalization #118

Comments

nilsbecker commented Nov 22, 2015

rleonid commented Nov 23, 2015

nilsbecker commented Nov 23, 2015

nilsbecker commented Nov 23, 2015

rleonid commented Nov 23, 2015