-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Speed-up readxml by caching key lookup instead of using try/except #1691
Conversation
@gollumben - this has the fix. |
Codecov Report
@@ Coverage Diff @@
## master #1691 +/- ##
==========================================
- Coverage 98.05% 98.05% -0.01%
==========================================
Files 64 64
Lines 4213 4212 -1
Branches 585 587 +2
==========================================
- Hits 4131 4130 -1
Misses 49 49
Partials 33 33
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. 👍 Thanks @kratsg!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the coverage boost. LGTM now.
@gollumben If you are willing to install a development version of |
Hi @kratsg, @matthewfeickert and @lukasheinrich, thank you very much for your help and the very fast turn around time! It takes 3 minutes to convert the workspace now instead of the previous 30 h (~600x speed increase), which is amazing! |
Related to scikit-hep/uproot5#595 as well. |
Pull Request Description
This resolves #1690 and was brought up by the discussion in #1687.
Fundamentally, when accessing a path that does not exist in
uproot
viaf.__getitem__
, it would then trigger a user-friendly levenshtein distance calculation to recommend similarly-named keys. However, in the case ofpyhf
, it was assumed thatf[name]
would be fast to return or at least fast to raise an exception, but this is a bad assumption.Instead, the set of keys in the file (without cycle numbers) is cached along with the file itself, and we will rely on a key lookup in the set instead.
This only occurs in HistFactory when the XML has
<Sample/>
withHistoPath
attributes and a suboptimal computation path inpyhf
is then used, because we need to buildfullname
in order to figure out the right path for the histogram.Checklist Before Requesting Reviewer
Before Merging
For the PR Assignees: