-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add 'interp_options' mechanism and ak_add_doc. #784
Conversation
If you don't get a chance to look at this, @kkothari2001 (because I know you're busy), that's okay! I just wanted to give you a chance because the |
@jpivarski why don't we always want to do this? My first thought was "let's not add any options, and just make this the behaviour" before reading your remark |
…ordArray __doc__.
I had the same thought until I saw it: import uproot
tree = uproot.open("nano_dy.root:Events")
array = tree.arrays(filter_name="Muon*", ak_add_doc=True)
array.show(type=True)
versus array = tree.arrays(filter_name="Muon*", ak_add_doc=True)
array.show(type=True)
|
I agree with Jim it should be opt-in, we use it for some very particular user-facing features that tie into notebook use. That gives me the idea it should be sensitive to if you're in ipython/jupyter or not and change defaultness depending on that? |
I'm generally not in favour of environment-specific behaviour; it's hard to know that it's happening without discovering it (usually accidentally), and harder still to google what's happening! What kind of features do you use @jpivarski that's maybe suggesting to me that we should elide long parameters rather than we should not set them unless opt-in? Could a solution be to make parameters > N characters collapse to ellipsis, and make |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a solution goes, I refer to my question about changing the repr vs actually not storing the __doc__
parameter in the first place. However, this PR is good to go if you decide against that course of action!
entry_start=start, | ||
entry_stop=stop, | ||
library="np", | ||
ak_add_doc=self.interp_options["ak_add_doc"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ak_add_doc=self.interp_options["ak_add_doc"], | |
**self.interp_options, |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to be explicit, to control the list of options. Then adding a new one would be a matter of searching for all instances of ak_add_doc
and adding the new one next to that.
**self.interp_options
passes everything in the interp_options
dict through, which might be right or it might silently overshadow arguments of TTree.arrays
that aren't interp_options
. We shouldn't create different types of arguments with the same names, but it would just be easier to catch such mistakes with an explicit pass-through.
@@ -339,7 +355,9 @@ def __call__(self, file_path_object_path): | |||
self.allow_missing, | |||
self.real_options, | |||
) | |||
return ttree[self.key].array(library="np") | |||
return ttree[self.key].array( | |||
library="np", ak_add_doc=self.interp_options["ak_add_doc"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above
self.branches, | ||
entry_start=start, | ||
entry_stop=stop, | ||
ak_add_doc=self.interp_options["ak_add_doc"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know where this is going ;)
@@ -554,11 +583,17 @@ def __call__(self, file_path_object_path): | |||
self.allow_missing, | |||
self.real_options, | |||
) | |||
return ttree.arrays(self.common_keys) | |||
return ttree.arrays( | |||
self.common_keys, ak_add_doc=self.interp_options["ak_add_doc"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more
@@ -658,6 +666,12 @@ def to_global(self, global_offset): | |||
) | |||
|
|||
|
|||
def _ak_add_doc(array, hasbranches, ak_add_doc): | |||
if ak_add_doc and type(array).__module__ == "awkward.highlevel": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use an isinstance
here? It slightly reduces the strictness of the coupling if we can promise to provide ak.Array
vs ak.highlevel.Array
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't use isinstance
here because we don't know if awkward
is installable.
Regarding looseness/strictness: we'll never be able to move Array
out of awkward.highlevel
, anyway. That much of the public API is fixed by widespread use.
Also, while we know that array
is either an ak.Array
or a dict
, list
, or tuple
, it's nice to narrow in on the three classes in the awkward.highlevel
submodule, rather than accepting anything that might be defined elsewhere in the Awkward library.
Right now it's really this very user facing documentation of what branches in TTrees do (if the designer of the TTree cares to fill it). The point is largely to have the capability there so that it can be exploited and so the data further serves as its own documentation. I could imagine people filling fairly rich descriptions of TTrees or branches or using doc strings to contain example analysis patterns for the data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at _dask.py
, everything looks great to me. All callable classes and code paths have been covered.
Thanks, @agoose77 and @kkothari2001! |
…Pandas Dataframes (#734) * Token change to get PR number * Revert "Token change to get PR number" This reverts commit 5a631b3. * Complete basic Awkward Pandas port, and start changing tests * make some of the suggested changes * Solve some tests * Finalize tests * Add awkward-pandas to dev dependencies * awkward-pandas only supports Python 3.8+. * Declare awkward-pandas requirement in affected tests. * Spell it right. * Get this PR up to date with #784. Co-authored-by: Jim Pivarski <[email protected]>
This is taking over a function from Coffea—adding
TBranch.title
to the__doc__
parameter of Awkward Arrays—in a centralized way. The only file that is affected to add that feature is library.py (theAwkward
singleton). Let me know, @lgray, if it works as needed.All the rest of the changes are to propagate the option down. We don't want to always do this; Coffea just needs a hook to be able to enable it. The
interp_options
mechanism enables us to pass more of these options in in the future.