-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: histogram weights not handled correctly in hist / boost conversion #774
Conversation
I refactored Let me know if I should remove it @jpivarski or if maybe I should have made a separate PR for this. Perhaps I should've implemented it in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! All of my requested changes are inline.
Co-authored-by: Jim Pivarski <[email protected]> Co-authored-by: Angus Hollands <[email protected]>
for more information, see https://pre-commit.ci
Looking at the code of I don't think the complexity of the code increases enough to discourage this and it would reduce code duplication (triplication!), but I would love to hear some thoughts @jpivarski @agoose77. |
Generally, I didn't worry about code duplication up to three, particularly if it was right next to each other and formed a clear pattern. (The problem with duplicated code is that a future maintainer might fix one copy without applying the same update to the other copies, and then they get out of date until another future maintainer wonders if they were supposed to be the same or not.) In the case of histograms, I knew that the code duplication/triplication would never go beyond three, since ROOT only defines 1, 2, and 3-D histograms. I'll let you decide whether you want to squash more duplication. As it is, I would accept this PR now. So if you consider yourself done, go ahead and do the honors by pressing the "squash and merge" button. (Make sure it's set to "squash and merge" mode; we don't want all of the commits in this PR to be individual items in the global history.) |
* fix: initialize empty `TObject` members on `to_TObjString` * add test for serialization of `TObjString` * remove unused dependency on test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add `tojson` method to `TObjString` * add additional check to `TObjString` write test * fix bad field in `TList` tojson conversion * add inexpensive `assert` to `TList` serialization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add `to_THashList` method * add aux method to create `THashList` from categorical axis * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update src/uproot/writing/identify.py Co-authored-by: Angus Hollands <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix bad serialization of non-empty TList due to options (#763 (comment)) * add tests for TList serialization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed bad `__repr__` for `TObject` * add serialization of `fUniqueID` to `TObject` * add `empty` method to `TObject` * remove redundant `TObject` member initialization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add serialization for `THashList` * made `THashList` writable * add test to check serialization of `TList` vs `THashList` (which should be mostly equal) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * initialize boost histogram axis as `IntCategory` if labels can be converted to integers * Update src/uproot/writing/identify.py Co-authored-by: Jim Pivarski <[email protected]> * moved `TList` serialization list to `serialize` method * add helper serialization method `bytestring` as suggested in #763 (comment) by @agoose77 * keep `TList` `_options` as python `bytes` and update serialization to use the new `bytestring` helper * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add bin index as `TObject.fUniqueID` when setting hist labels as done by root * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * reset `serialization.py` to `main` branch status * Revert "keep `TList` `_options` as python `bytes` and update serialization to use the new `bytestring` helper" This reverts commit 8e6ad2e. * Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit 897972f. # Conflicts: # src/uproot/serialization.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "reset `serialization.py` to `main` branch status" This reverts commit 9a500c1. * Revert "Revert "keep `TList` `_options` as python `bytes` and update serialization to use the new `bytestring` helper"" This reverts commit 884ec4d. * Apply suggestions from code review Co-authored-by: Jim Pivarski <[email protected]> * initialize `opt` with different list initialization for consistent style * add back two spaces before imports (this is not handled by the pre-commit formatter!) * Update src/uproot/models/TObject.py Co-authored-by: Angus Hollands <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix bad label placement * axis `fName` will always be one of the default values (`xaxis`, `yaxis`, `zaxis`) * add test for categorical histogram * hist categorical test: check for correctly overwritten axis name * add missing assert to test * updated tests for new axis naming behaviour * remove unused dependency from test * do not used mutable default arguments * fix `to_hist` not using options from function arguments (was using default arguments all the time instead) * fix not checking for all categories * add test for TH1 -> hist/boost conversion * TH1 to boost/hist do not use weights (temporary solution) * build action: move checkout right before install * add debug step to pipeline * fix test not being updated to new functionality * reverted build CI to original state * updated test to check for working `to_hist` conversion * simplified test * add check for boost histogram conversion * check also for intcategory to resize values * add test for TH3 histogram * remove redundant test * add link to issue on test * add temporary fix to bad `to_boost` conversion for categorical histograms * add tmp fix * reduced repeated code for slicing values when categorical axis via helper function * implemented `to_boost` in `Histogram` parent class to reduce code duplication * remove now unecessary helper function `_slice_values_if_categorical_axis` * remove "temporary" `and False` * remove unused dependency * add missing asserts * add test for TH1 weights from root * add test for issue * alternative way to detect weighted hist * made `fSumw2` None if storage is not weights * remove comments from test * add test for hist with weights * fix bad check for storage type * remove comment * add test for hist with(out) weights and labels * updated TH1 `to_boost` to handle weights/labels better * placed histogram `to_boost` in parent `Histogram` class to reduce code duplication * updated `weighted` property * implemented histogram `weighted` property in parent `Histogram` class * using weighted property instead of copying check * do not use mutable default arguments * fix calling property * add missing asserts to test * add test for issue #722 * add weight test for 2D and 3D histograms * add temporary skip to test until file is uploaded * update issue test file * add back temporary skip until file is available * Apply suggestions from code review Co-authored-by: Jim Pivarski <[email protected]> Co-authored-by: Angus Hollands <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add suggestion from #774 (comment) * add check for length of `fSumw2` greater than 0 so empty histograms are not weighted * remove unnecessary subclass method implementation * addressed #764 (comment) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Angus Hollands <[email protected]> Co-authored-by: Jim Pivarski <[email protected]>
Currently the
fSumw2
(weights) are not being correctly read/written when performing conversion to/fromhist
/boost
histograms.This is causing issues in #672 and solving this will also close #722.
hist
storage_type
property to check wether it's a weighted histogram or not, instead of thevariances
which is always defined and was cuasing false positives.weighted
property andto_boost
method to use inheritance.