feat: add 'interp_options' mechanism and ak_add_doc. #784

jpivarski · 2022-11-18T20:59:42Z

This is taking over a function from Coffea—adding TBranch.title to the __doc__ parameter of Awkward Arrays—in a centralized way. The only file that is affected to add that feature is library.py (the Awkward singleton). Let me know, @lgray, if it works as needed.

All the rest of the changes are to propagate the option down. We don't want to always do this; Coffea just needs a hook to be able to enable it. The interp_options mechanism enables us to pass more of these options in in the future.

jpivarski · 2022-11-18T21:01:03Z

If you don't get a chance to look at this, @kkothari2001 (because I know you're busy), that's okay! I just wanted to give you a chance because the interp_options mechanism threads through your uproot.dask work.

agoose77 · 2022-11-18T21:39:25Z

@jpivarski why don't we always want to do this? My first thought was "let's not add any options, and just make this the behaviour" before reading your remark

…ordArray __doc__.

jpivarski · 2022-11-18T21:51:29Z

I had the same thought until I saw it:

import uproot
tree = uproot.open("nano_dy.root:Events")
array = tree.arrays(filter_name="Muon*", ak_add_doc=True)
array.show(type=True)

type: 40 * struct[{
    Muon_dxy: [var * float32, parameters={"__doc__": "dxy (with sign) wrt first PV, in cm"}],
    Muon_dxyErr: [var * float32, parameters={"__doc__": "dxy uncertainty, in cm"}],
    Muon_dz: [var * float32, parameters={"__doc__": "dz (with sign) wrt first PV, in cm"}],
    Muon_dzErr: [var * float32, parameters={"__doc__": "dz uncertainty, in cm"}],
    Muon_eta: [var * float32, parameters={"__doc__": "eta"}],
    Muon_ip3d: [var * float32, parameters={"__doc__": "3D impact parameter wrt first PV, in cm"}],
    Muon_jetPtRelv2: [var * float32, parameters={"__doc__": "Relative momentum of the lepton with respect to the closest jet after subtracting the lepton"}],
    Muon_jetRelIso: [var * float32, parameters={"__doc__": "Relative isolation in matched jet (1/ptRatio-1, pfRelIso04_all if no matched jet)"}],
    Muon_mass: [var * float32, parameters={"__doc__": "mass"}],
    Muon_miniPFRelIso_all: [var * float32, parameters={"__doc__": "mini PF relative isolation, total (with scaled rho*EA PU corrections)"}],
    Muon_miniPFRelIso_chg: [var * float32, parameters={"__doc__": "mini PF relative isolation, charged component"}],
    Muon_pfRelIso03_all: [var * float32, parameters={"__doc__": "PF relative isolation dR=0.3, total (deltaBeta corrections)"}],
    Muon_pfRelIso03_chg: [var * float32, parameters={"__doc__": "PF relative isolation dR=0.3, charged component"}],
    Muon_pfRelIso04_all: [var * float32, parameters={"__doc__": "PF relative isolation dR=0.4, total (deltaBeta corrections)"}],
    Muon_phi: [var * float32, parameters={"__doc__": "phi"}],
    Muon_pt: [var * float32, parameters={"__doc__": "pt"}],
    Muon_ptErr: [var * float32, parameters={"__doc__": "ptError of the muon track"}],
    Muon_segmentComp: [var * float32, parameters={"__doc__": "muon segment compatibility"}],
    Muon_sip3d: [var * float32, parameters={"__doc__": "3D impact parameter significance wrt first PV"}],
    Muon_softMva: [var * float32, parameters={"__doc__": "soft MVA ID score"}],
    Muon_tkRelIso: [var * float32, parameters={"__doc__": "Tracker-based relative isolation dR=0.3 for highPt, trkIso/tunePpt"}],
    Muon_tunepRelPt: [var * float32, parameters={"__doc__": "TuneP relative pt, tunePpt/pt"}],
    Muon_mvaLowPt: [var * float32, parameters={"__doc__": "Low pt muon ID score"}],
    Muon_mvaTTH: [var * float32, parameters={"__doc__": "TTH MVA lepton ID score"}],
    Muon_charge: [var * int32, parameters={"__doc__": "electric charge"}],
    Muon_jetIdx: [var * int32, parameters={"__doc__": "index of the associated jet (-1 if none)"}],
    Muon_nStations: [var * int32, parameters={"__doc__": "number of matched stations with default arbitration (segment & track)"}],
    Muon_nTrackerLayers: [var * int32, parameters={"__doc__": "number of layers in the tracker"}],
    Muon_pdgId: [var * int32, parameters={"__doc__": "PDG code assigned by the event reconstruction (not by MC truth)"}],
    Muon_tightCharge: [var * int32, parameters={"__doc__": "Tight charge criterion using pterr/pt of muonBestTrack (0:fail, 2:pass)"}],
    Muon_fsrPhotonIdx: [var * int32, parameters={"__doc__": "Index of the associated FSR photon"}],
    Muon_highPtId: [var * uint8, parameters={"__doc__": "high-pT cut-based ID (1 = tracker high pT, 2 = global high pT, which includes tracker high pT)"}],
    Muon_inTimeMuon: [var * bool, parameters={"__doc__": "inTimeMuon ID"}],
    Muon_isGlobal: [var * bool, parameters={"__doc__": "muon is global muon"}],
    Muon_isPFcand: [var * bool, parameters={"__doc__": "muon is PF candidate"}],
    Muon_isTracker: [var * bool, parameters={"__doc__": "muon is tracker muon"}],
    Muon_looseId: [var * bool, parameters={"__doc__": "muon is loose muon"}],
    Muon_mediumId: [var * bool, parameters={"__doc__": "cut-based ID, medium WP"}],
    Muon_mediumPromptId: [var * bool, parameters={"__doc__": "cut-based ID, medium prompt WP"}],
    Muon_miniIsoId: [var * uint8, parameters={"__doc__": "MiniIso ID from miniAOD selector (1=MiniIsoLoose, 2=MiniIsoMedium, 3=MiniIsoTight, 4=MiniIsoVeryTight)"}],
    Muon_multiIsoId: [var * uint8, parameters={"__doc__": "MultiIsoId from miniAOD selector (1=MultiIsoLoose, 2=MultiIsoMedium)"}],
    Muon_mvaId: [var * uint8, parameters={"__doc__": "Mva ID from miniAOD selector (1=MvaLoose, 2=MvaMedium, 3=MvaTight)"}],
    Muon_pfIsoId: [var * uint8, parameters={"__doc__": "PFIso ID from miniAOD selector (1=PFIsoVeryLoose, 2=PFIsoLoose, 3=PFIsoMedium, 4=PFIsoTight, 5=PFIsoVeryTight, 6=PFIsoVeryVeryTight)"}],
    Muon_softId: [var * bool, parameters={"__doc__": "soft cut-based ID"}],
    Muon_softMvaId: [var * bool, parameters={"__doc__": "soft MVA ID"}],
    Muon_tightId: [var * bool, parameters={"__doc__": "cut-based ID, tight WP"}],
    Muon_tkIsoId: [var * uint8, parameters={"__doc__": "TkIso ID (1=TkIsoLoose, 2=TkIsoTight)"}],
    Muon_triggerIdLoose: [var * bool, parameters={"__doc__": "TriggerIdLoose ID"}],
    Muon_genPartIdx: [var * int32, parameters={"__doc__": "Index into genParticle list for MC matching to status==1 muons"}],
    Muon_genPartFlav: [var * uint8, parameters={"__doc__": "Flavour of genParticle for MC matching to status==1 muons: 1 = prompt muon (including gamma*->mu mu), 15 = muon from prompt tau, 5 = muon from b, 4 = muon from c, 3 = muon from light or unknown, 0 = unmatched"}],
    Muon_cleanmask: [var * uint8, parameters={"__doc__": "simple cleaning mask with priority to leptons"}]
}, parameters={"__doc__": "Events"}]
[{Muon_dxy: [], Muon_dxyErr: [], Muon_dz: [], Muon_dzErr: [], ...},
 {Muon_dxy: [], Muon_dxyErr: [], Muon_dz: [], Muon_dzErr: [], ...},
 {Muon_dxy: [], Muon_dxyErr: [], Muon_dz: [], Muon_dzErr: [], ...},
 {Muon_dxy: [], Muon_dxyErr: [], Muon_dz: [], Muon_dzErr: [], ...},
 {Muon_dxy: [-0.000319, -0.00682], Muon_dxyErr: [...], Muon_dz: [...], ...},
 {Muon_dxy: [-0.00011], Muon_dxyErr: [0.00162], Muon_dz: [0.0026], ...},
 {Muon_dxy: [0.00324, -0.00244], Muon_dxyErr: [0.00229, ...], ...},
 {Muon_dxy: [], Muon_dxyErr: [], Muon_dz: [], Muon_dzErr: [], ...},
 {Muon_dxy: [], Muon_dxyErr: [], Muon_dz: [], Muon_dzErr: [], ...},
 {Muon_dxy: [], Muon_dxyErr: [], Muon_dz: [], Muon_dzErr: [], ...},
 ...,
 {Muon_dxy: [0.000774], Muon_dxyErr: [0.00229], Muon_dz: [-0.000873], ...},
 {Muon_dxy: [], Muon_dxyErr: [], Muon_dz: [], Muon_dzErr: [], ...},
 {Muon_dxy: [], Muon_dxyErr: [], Muon_dz: [], Muon_dzErr: [], ...},
 {Muon_dxy: [-0.000587], Muon_dxyErr: [0.00162], Muon_dz: [0.000254], ...},
 {Muon_dxy: [], Muon_dxyErr: [], Muon_dz: [], Muon_dzErr: [], ...},
 {Muon_dxy: [], Muon_dxyErr: [], Muon_dz: [], Muon_dzErr: [], ...},
 {Muon_dxy: [], Muon_dxyErr: [], Muon_dz: [], Muon_dzErr: [], ...},
 {Muon_dxy: [], Muon_dxyErr: [], Muon_dz: [], Muon_dzErr: [], ...},
 {Muon_dxy: [], Muon_dxyErr: [], Muon_dz: [], Muon_dzErr: [], ...}]

versus

array = tree.arrays(filter_name="Muon*", ak_add_doc=True)
array.show(type=True)

type: 40 * {
    Muon_dxy: var * float32,
    Muon_dxyErr: var * float32,
    Muon_dz: var * float32,
    Muon_dzErr: var * float32,
    Muon_eta: var * float32,
    Muon_ip3d: var * float32,
    Muon_jetPtRelv2: var * float32,
    Muon_jetRelIso: var * float32,
    Muon_mass: var * float32,
    Muon_miniPFRelIso_all: var * float32,
    Muon_miniPFRelIso_chg: var * float32,
    Muon_pfRelIso03_all: var * float32,
    Muon_pfRelIso03_chg: var * float32,
    Muon_pfRelIso04_all: var * float32,
    Muon_phi: var * float32,
    Muon_pt: var * float32,
    Muon_ptErr: var * float32,
    Muon_segmentComp: var * float32,
    Muon_sip3d: var * float32,
    Muon_softMva: var * float32,
    Muon_tkRelIso: var * float32,
    Muon_tunepRelPt: var * float32,
    Muon_mvaLowPt: var * float32,
    Muon_mvaTTH: var * float32,
    Muon_charge: var * int32,
    Muon_jetIdx: var * int32,
    Muon_nStations: var * int32,
    Muon_nTrackerLayers: var * int32,
    Muon_pdgId: var * int32,
    Muon_tightCharge: var * int32,
    Muon_fsrPhotonIdx: var * int32,
    Muon_highPtId: var * uint8,
    Muon_inTimeMuon: var * bool,
    Muon_isGlobal: var * bool,
    Muon_isPFcand: var * bool,
    Muon_isTracker: var * bool,
    Muon_looseId: var * bool,
    Muon_mediumId: var * bool,
    Muon_mediumPromptId: var * bool,
    Muon_miniIsoId: var * uint8,
    Muon_multiIsoId: var * uint8,
    Muon_mvaId: var * uint8,
    Muon_pfIsoId: var * uint8,
    Muon_softId: var * bool,
    Muon_softMvaId: var * bool,
    Muon_tightId: var * bool,
    Muon_tkIsoId: var * uint8,
    Muon_triggerIdLoose: var * bool,
    Muon_genPartIdx: var * int32,
    Muon_genPartFlav: var * uint8,
    Muon_cleanmask: var * uint8
}
[{Muon_dxy: [], Muon_dxyErr: [], Muon_dz: [], Muon_dzErr: [], ...},
 {Muon_dxy: [], Muon_dxyErr: [], Muon_dz: [], Muon_dzErr: [], ...},
 {Muon_dxy: [], Muon_dxyErr: [], Muon_dz: [], Muon_dzErr: [], ...},
 {Muon_dxy: [], Muon_dxyErr: [], Muon_dz: [], Muon_dzErr: [], ...},
 {Muon_dxy: [-0.000319, -0.00682], Muon_dxyErr: [...], Muon_dz: [...], ...},
 {Muon_dxy: [-0.00011], Muon_dxyErr: [0.00162], Muon_dz: [0.0026], ...},
 {Muon_dxy: [0.00324, -0.00244], Muon_dxyErr: [0.00229, ...], ...},
 {Muon_dxy: [], Muon_dxyErr: [], Muon_dz: [], Muon_dzErr: [], ...},
 {Muon_dxy: [], Muon_dxyErr: [], Muon_dz: [], Muon_dzErr: [], ...},
 {Muon_dxy: [], Muon_dxyErr: [], Muon_dz: [], Muon_dzErr: [], ...},
 ...,
 {Muon_dxy: [0.000774], Muon_dxyErr: [0.00229], Muon_dz: [-0.000873], ...},
 {Muon_dxy: [], Muon_dxyErr: [], Muon_dz: [], Muon_dzErr: [], ...},
 {Muon_dxy: [], Muon_dxyErr: [], Muon_dz: [], Muon_dzErr: [], ...},
 {Muon_dxy: [-0.000587], Muon_dxyErr: [0.00162], Muon_dz: [0.000254], ...},
 {Muon_dxy: [], Muon_dxyErr: [], Muon_dz: [], Muon_dzErr: [], ...},
 {Muon_dxy: [], Muon_dxyErr: [], Muon_dz: [], Muon_dzErr: [], ...},
 {Muon_dxy: [], Muon_dxyErr: [], Muon_dz: [], Muon_dzErr: [], ...},
 {Muon_dxy: [], Muon_dxyErr: [], Muon_dz: [], Muon_dzErr: [], ...},
 {Muon_dxy: [], Muon_dxyErr: [], Muon_dz: [], Muon_dzErr: [], ...}]

lgray · 2022-11-18T21:56:12Z

I agree with Jim it should be opt-in, we use it for some very particular user-facing features that tie into notebook use.

That gives me the idea it should be sensitive to if you're in ipython/jupyter or not and change defaultness depending on that?
That has some sense to it, since it gives useful features in situations where you can take advantage of them.

agoose77 · 2022-11-18T22:06:24Z

I'm generally not in favour of environment-specific behaviour; it's hard to know that it's happening without discovering it (usually accidentally), and harder still to google what's happening!

What kind of features do you use __doc__ for @lgray? You've piqued my interest!

@jpivarski that's maybe suggesting to me that we should elide long parameters rather than we should not set them unless opt-in? Could a solution be to make parameters > N characters collapse to ellipsis, and make __doc__ non-optional?

agoose77

As a solution goes, I refer to my question about changing the repr vs actually not storing the __doc__ parameter in the first place. However, this PR is good to go if you decide against that course of action!

agoose77 · 2022-11-18T22:07:40Z

src/uproot/_dask.py

+            entry_start=start,
+            entry_stop=stop,
+            library="np",
+            ak_add_doc=self.interp_options["ak_add_doc"],


Suggested change

ak_add_doc=self.interp_options["ak_add_doc"],

**self.interp_options,

?

I think it's better to be explicit, to control the list of options. Then adding a new one would be a matter of searching for all instances of ak_add_doc and adding the new one next to that.

**self.interp_options passes everything in the interp_options dict through, which might be right or it might silently overshadow arguments of TTree.arrays that aren't interp_options. We shouldn't create different types of arguments with the same names, but it would just be easier to catch such mistakes with an explicit pass-through.

agoose77 · 2022-11-18T22:07:46Z

src/uproot/_dask.py

@@ -339,7 +355,9 @@ def __call__(self, file_path_object_path):
            self.allow_missing,
            self.real_options,
        )
-        return ttree[self.key].array(library="np")
+        return ttree[self.key].array(
+            library="np", ak_add_doc=self.interp_options["ak_add_doc"]


agoose77 · 2022-11-18T22:07:56Z

src/uproot/_dask.py

+            self.branches,
+            entry_start=start,
+            entry_stop=stop,
+            ak_add_doc=self.interp_options["ak_add_doc"],


You know where this is going ;)

agoose77 · 2022-11-18T22:08:04Z

src/uproot/_dask.py

@@ -554,11 +583,17 @@ def __call__(self, file_path_object_path):
            self.allow_missing,
            self.real_options,
        )
-        return ttree.arrays(self.common_keys)
+        return ttree.arrays(
+            self.common_keys, ak_add_doc=self.interp_options["ak_add_doc"]


agoose77 · 2022-11-18T22:09:22Z

src/uproot/behaviors/TBranch.py

@@ -658,6 +666,12 @@ def to_global(self, global_offset):
        )


+def _ak_add_doc(array, hasbranches, ak_add_doc):
+    if ak_add_doc and type(array).__module__ == "awkward.highlevel":


Could we use an isinstance here? It slightly reduces the strictness of the coupling if we can promise to provide ak.Array vs ak.highlevel.Array.

We can't use isinstance here because we don't know if awkward is installable.

Regarding looseness/strictness: we'll never be able to move Array out of awkward.highlevel, anyway. That much of the public API is fixed by widespread use.

Also, while we know that array is either an ak.Array or a dict, list, or tuple, it's nice to narrow in on the three classes in the awkward.highlevel submodule, rather than accepting anything that might be defined elsewhere in the Awkward library.

lgray · 2022-11-18T22:25:00Z

What kind of features do you use __doc__ for @lgray? You've piqued my interest!

Right now it's really this very user facing documentation of what branches in TTrees do (if the designer of the TTree cares to fill it). The point is largely to have the capability there so that it can be exploited and so the data further serves as its own documentation. I could imagine people filling fairly rich descriptions of TTrees or branches or using doc strings to contain example analysis patterns for the data.

kkothari2001

Looking at _dask.py, everything looks great to me. All callable classes and code paths have been covered.

jpivarski · 2022-11-20T03:52:39Z

Thanks, @agoose77 and @kkothari2001!

…Pandas Dataframes (#734) * Token change to get PR number * Revert "Token change to get PR number" This reverts commit 5a631b3. * Complete basic Awkward Pandas port, and start changing tests * make some of the suggested changes * Solve some tests * Finalize tests * Add awkward-pandas to dev dependencies * awkward-pandas only supports Python 3.8+. * Declare awkward-pandas requirement in affected tests. * Spell it right. * Get this PR up to date with #784. Co-authored-by: Jim Pivarski <[email protected]>

feat: add 'interp_options' mechanism and ak_add_doc.

1a37e67

jpivarski requested review from kkothari2001 and agoose77 November 18, 2022 20:59

jpivarski mentioned this pull request Nov 18, 2022

uproot.dask(<some tree specifier>) does not know about behaviors? dask-contrib/dask-awkward#100

Closed

The same ak_add_doc=True argument attaches the TTree title to the Rec…

fe77914

…ordArray __doc__.

agoose77 approved these changes Nov 18, 2022

View reviewed changes

kkothari2001 approved these changes Nov 19, 2022

View reviewed changes

jpivarski merged commit b36a022 into main Nov 20, 2022

jpivarski deleted the jpivarski/add-interp_options-and-ak_add_doc branch November 20, 2022 03:52

jpivarski mentioned this pull request Nov 28, 2022

feat: Use awkward pandas, instead of the existing code that explodes Pandas Dataframes #734

Merged

jpivarski added a commit that referenced this pull request Nov 28, 2022

Get this PR up to date with #784.

cb969fa

jpivarski mentioned this pull request Feb 17, 2023

fix: ak_add_doc should add docs to both the lazy and the materialized array. #832

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add 'interp_options' mechanism and ak_add_doc. #784

feat: add 'interp_options' mechanism and ak_add_doc. #784

jpivarski commented Nov 18, 2022

jpivarski commented Nov 18, 2022

agoose77 commented Nov 18, 2022

jpivarski commented Nov 18, 2022

lgray commented Nov 18, 2022

agoose77 commented Nov 18, 2022

agoose77 left a comment

agoose77 Nov 18, 2022

jpivarski Nov 18, 2022

agoose77 Nov 18, 2022

agoose77 Nov 18, 2022

agoose77 Nov 18, 2022

agoose77 Nov 18, 2022

jpivarski Nov 18, 2022

lgray commented Nov 18, 2022

kkothari2001 left a comment

jpivarski commented Nov 20, 2022

	ak_add_doc=self.interp_options["ak_add_doc"],
	**self.interp_options,

feat: add 'interp_options' mechanism and ak_add_doc. #784

feat: add 'interp_options' mechanism and ak_add_doc. #784

Conversation

jpivarski commented Nov 18, 2022

jpivarski commented Nov 18, 2022

agoose77 commented Nov 18, 2022

jpivarski commented Nov 18, 2022

lgray commented Nov 18, 2022

agoose77 commented Nov 18, 2022

agoose77 left a comment

Choose a reason for hiding this comment

agoose77 Nov 18, 2022

Choose a reason for hiding this comment

jpivarski Nov 18, 2022

Choose a reason for hiding this comment

agoose77 Nov 18, 2022

Choose a reason for hiding this comment

agoose77 Nov 18, 2022

Choose a reason for hiding this comment

agoose77 Nov 18, 2022

Choose a reason for hiding this comment

agoose77 Nov 18, 2022

Choose a reason for hiding this comment

jpivarski Nov 18, 2022

Choose a reason for hiding this comment

lgray commented Nov 18, 2022

kkothari2001 left a comment

Choose a reason for hiding this comment

jpivarski commented Nov 20, 2022