feat: Drop uproot3 for uproot4 for writing ROOT files #1567

matthewfeickert · 2021-08-27T22:20:27Z

Description

Resolves #1566

Drop all usages of uproot3 and only use uproot v4.1.1+ for all reading and writing of ROOT files.

Checklist Before Requesting Reviewer

Tests are passing
"WIP" removed from the title of the pull request
Selected an Assignee for the PR to be responsible for the log summary

Before Merging

For the PR Assignees:

Summarize commit messages into a comprehensive review of the PR

* Remove all uproot3 dependency and use uproot4 for all reading and writing of ROOT files
* Require uproot>=4.1.1 in 'xmlio' extra
* Require uproot==4.1.1 in lower-bound-requirements.txt
* Remove uproot3 HEAD of dependencies testing workflow
* Add test to ensure pyhf.writexml writes histograms to ROOT files as expected

matthewfeickert · 2021-08-27T23:12:36Z

I'm a bit stuck as I'm having some trouble understanding why the fllowing traceback isn't finding the directories

=================================== FAILURES ===================================
______________________ test_import_roundtrip[example-one] ______________________

self = <[AttributeError("'MemmapSource' object has no attribute '_fallback'") raised in repr()] MemmapSource object at 0x7fe0b4a33590>

    def _open(self):
        try:
>           self._file = numpy.memmap(self._file_path, dtype=self._dtype, mode="r")

/opt/hostedtoolcache/Python/3.7.11/x64/lib/python3.7/site-packages/uproot/source/file.py:113: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

subtype = <class 'numpy.memmap'>
filename = '/tmp/pytest-of-runner/pytest-0/test_import_roundtrip_example_0/()'
dtype = dtype('uint8'), mode = 'r', offset = 0, shape = None, order = 'C'

    def __new__(subtype, filename, dtype=uint8, mode='r+', offset=0,
                shape=None, order='C'):
        # Import here to minimize 'import numpy' overhead
        import mmap
        import os.path
        try:
            mode = mode_equivalents[mode]
        except KeyError:
            if mode not in valid_filemodes:
                raise ValueError("mode must be one of %s" %
                                 (valid_filemodes + list(mode_equivalents.keys())))
    
        if mode == 'w+' and shape is None:
            raise ValueError("shape must be given")
    
        if hasattr(filename, 'read'):
            f_ctx = contextlib_nullcontext(filename)
        else:
>           f_ctx = open(os_fspath(filename), ('r' if mode == 'c' else mode)+'b')
E           FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pytest-of-runner/pytest-0/test_import_roundtrip_example_0/()'

/opt/hostedtoolcache/Python/3.7.11/x64/lib/python3.7/site-packages/numpy/core/memmap.py:225: FileNotFoundError

During handling of the above exception, another exception occurred:

self = <uproot.source.file.FileResource object at 0x7fe0b4a33650>
file_path = '/tmp/pytest-of-runner/pytest-0/test_import_roundtrip_example_0/()'

    def __init__(self, file_path):
        self._file_path = file_path
        try:
>           self._file = open(self._file_path, "rb")
E           FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pytest-of-runner/pytest-0/test_import_roundtrip_example_0/()'

/opt/hostedtoolcache/Python/3.7.11/x64/lib/python3.7/site-packages/uproot/source/file.py:37: FileNotFoundError

During handling of the above exception, another exception occurred:

tmpdir = local('/tmp/pytest-of-runner/pytest-0/test_import_roundtrip_example_0')
toplvl = 'validation/xmlimport_input/config/example.xml'
basedir = 'validation/xmlimport_input/'

    @pytest.mark.parametrize(
        'toplvl, basedir',
        [
            (
                'validation/xmlimport_input/config/example.xml',
                'validation/xmlimport_input/',
            ),
            (
                'validation/xmlimport_input2/config/example.xml',
                'validation/xmlimport_input2',
            ),
            (
                'validation/xmlimport_input3/config/examples/example_ShapeSys.xml',
                'validation/xmlimport_input3',
            ),
        ],
        ids=['example-one', 'example-two', 'example-three'],
    )
    def test_import_roundtrip(tmpdir, toplvl, basedir):
        parsed_xml_before = pyhf.readxml.parse(toplvl, basedir)
        spec = {
            'channels': parsed_xml_before['channels'],
            'parameters': parsed_xml_before['measurements'][0]['config']['parameters'],
        }
        pdf_before = pyhf.Model(spec, poi_name='SigXsecOverSM')
    
        tmpconfig = tmpdir.mkdir('config')
        tmpdata = tmpdir.mkdir('data')
        tmpxml = tmpdir.join('FitConfig.xml')
        tmpxml.write(
            pyhf.writexml.writexml(
                parsed_xml_before,
                tmpconfig.strpath,
                tmpdata.strpath,
                Path(tmpdir.strpath).joinpath('FitConfig'),
            ).decode('utf-8')
        )
>       parsed_xml_after = pyhf.readxml.parse(tmpxml.strpath, tmpdir.strpath)

tests/test_validation.py:846: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
src/pyhf/readxml.py:347: in parse
    ET.parse(str(Path(rootdir).joinpath(inp))), rootdir, track_progress
src/pyhf/readxml.py:228: in process_channel
    parsed_data = process_data(data[0], rootdir, inputfile, histopath)
src/pyhf/readxml.py:212: in process_data
    data, _ = import_root_histogram(rootdir, inputfile, histopath, histoname)
src/pyhf/readxml.py:61: in import_root_histogram
    f = uproot.open(fullpath)
/opt/hostedtoolcache/Python/3.7.11/x64/lib/python3.7/site-packages/uproot/reading.py:156: in open
    **options  # NOTE: a comma after **options breaks Python 2
/opt/hostedtoolcache/Python/3.7.11/x64/lib/python3.7/site-packages/uproot/reading.py:589: in __init__
    file_path, **self._options  # NOTE: a comma after **options breaks Python 2
/opt/hostedtoolcache/Python/3.7.11/x64/lib/python3.7/site-packages/uproot/source/file.py:109: in __init__
    self._open()
/opt/hostedtoolcache/Python/3.7.11/x64/lib/python3.7/site-packages/uproot/source/file.py:120: in _open
    self._file_path, **opts  # NOTE: a comma after **opts breaks Python 2
/opt/hostedtoolcache/Python/3.7.11/x64/lib/python3.7/site-packages/uproot/source/file.py:257: in __init__
    self._open()
/opt/hostedtoolcache/Python/3.7.11/x64/lib/python3.7/site-packages/uproot/source/file.py:263: in _open
    for x in uproot._util.range(self._num_workers)
/opt/hostedtoolcache/Python/3.7.11/x64/lib/python3.7/site-packages/uproot/source/file.py:263: in <listcomp>
    for x in uproot._util.range(self._num_workers)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <uproot.source.file.FileResource object at 0x7fe0b4a33650>
file_path = '/tmp/pytest-of-runner/pytest-0/test_import_roundtrip_example_0/()'

    def __init__(self, file_path):
        self._file_path = file_path
        try:
            self._file = open(self._file_path, "rb")
        except uproot._util._FileNotFoundError:
>           raise uproot._util._file_not_found(file_path)
E           FileNotFoundError: file not found
E           
E               '/tmp/pytest-of-runner/pytest-0/test_import_roundtrip_example_0/()'
E           
E           Files may be specified as:
E              * str/bytes: relative or absolute filesystem path or URL, without any colons
E                    other than Windows drive letter or URL schema.
E                    Examples: "rel/file.root", "C:\abs\file.root", "http://where/what.root"
E              * str/bytes: same with an object-within-ROOT path, separated by a colon.
E                    Example: "rel/file.root:tdirectory/ttree"
E              * pathlib.Path: always interpreted as a filesystem path or URL only (no
E                    object-within-ROOT path), regardless of whether there are any colons.
E                    Examples: Path("rel:/file.root"), Path("/abs/path:stuff.root")
E           
E           Functions that accept many files (uproot.iterate, etc.) also allow:
E              * glob syntax in str/bytes and pathlib.Path.
E                    Examples: Path("rel/*.root"), "/abs/*.root:tdirectory/ttree"
E              * dict: keys are filesystem paths, values are objects-within-ROOT paths.
E                    Example: {"/data_v1/*.root": "ttree_v1", "/data_v2/*.root": "ttree_v2"}
E              * already-open TTree objects.
E              * iterables of the above.

/opt/hostedtoolcache/Python/3.7.11/x64/lib/python3.7/site-packages/uproot/source/file.py:39: FileNotFoundError

If I run this locally

$ python -m pytest -sx tests/test_validation.py -k test_import_roundtrip[example-one]
...
self = <uproot.source.file.FileResource object at 0x7f7057103820>, file_path = '/tmp/pytest-of-feickert/pytest-42/test_import_roundtrip_example_0/()'

    def __init__(self, file_path):
        self._file_path = file_path
        try:
            self._file = open(self._file_path, "rb")
        except uproot._util._FileNotFoundError:
>           raise uproot._util._file_not_found(file_path)
E           FileNotFoundError: file not found
E           
E               '/tmp/pytest-of-feickert/pytest-42/test_import_roundtrip_example_0/()'
E           
E           Files may be specified as:
E              * str/bytes: relative or absolute filesystem path or URL, without any colons
E                    other than Windows drive letter or URL schema.
E                    Examples: "rel/file.root", "C:\abs\file.root", "http://where/what.root"
E              * str/bytes: same with an object-within-ROOT path, separated by a colon.
E                    Example: "rel/file.root:tdirectory/ttree"
E              * pathlib.Path: always interpreted as a filesystem path or URL only (no
E                    object-within-ROOT path), regardless of whether there are any colons.
E                    Examples: Path("rel:/file.root"), Path("/abs/path:stuff.root")
E           
E           Functions that accept many files (uproot.iterate, etc.) also allow:
E              * glob syntax in str/bytes and pathlib.Path.
E                    Examples: Path("rel/*.root"), "/abs/*.root:tdirectory/ttree"
E              * dict: keys are filesystem paths, values are objects-within-ROOT paths.
E                    Example: {"/data_v1/*.root": "ttree_v1", "/data_v2/*.root": "ttree_v2"}
E              * already-open TTree objects.
E              * iterables of the above.

../../../.pyenv/versions/pyhf-CPU/lib/python3.8/site-packages/uproot/source/file.py:39: FileNotFoundError

The directory exists

$ file /tmp/pytest-of-feickert/pytest-42/test_import_roundtrip_example_0
/tmp/pytest-of-feickert/pytest-42/test_import_roundtrip_example_0: directory
$ ls /tmp/pytest-of-feickert/pytest-42/test_import_roundtrip_example_0
config  data  FitConfig_channel1.xml  FitConfig.xml  HistFactorySchema.dtd

Nothing should have changed in src/pyhf/readxml.py so I'm not really sure what's going on from a first pass.

I'll tag @jpivarski in case he can think of something that is obvious that I'm missing, but I'm not expecting you to work on this PR Jim as my guess is that something subtle has changed that I need to fix in pyhf.

jpivarski · 2021-08-28T00:57:03Z

The first thing I'm noticing is that the filename has parentheses in it. That might be some string formatting (maybe in Uproot) gone awry. This is the error message for "can't find a file at that file path" and I'd be surprised if your filename really did have parentheses in it.

filename = '/tmp/pytest-of-runner/pytest-0/test_import_roundtrip_example_0/()'

matthewfeickert · 2021-08-28T02:16:37Z

The first thing I'm noticing is that the filename has parentheses in it. ...
This is the error message for "can't find a file at that file path" and I'd be surprised if your filename really did have parentheses in it.
filename = '/tmp/pytest-of-runner/pytest-0/test_import_roundtrip_example_0/()'

@jpivarski Yeah, all the filenames are just pathlib.Path() wrapped in str (though the str bit probably isn't needed), so they should all be properly formed paths.

That might be some string formatting (maybe in Uproot) gone awry.

It seems maybe so(?). If I revert the following diff

diff --git a/src/pyhf/writexml.py b/src/pyhf/writexml.py
index f1595c57..650c313a 100644
--- a/src/pyhf/writexml.py
+++ b/src/pyhf/writexml.py
@@ -6,9 +6,7 @@
 import xml.etree.ElementTree as ET
 import numpy as np
 
-# TODO: Move to uproot4 when ROOT file writing is supported
-import uproot3 as uproot
-from uproot3_methods.classes import TH1
+import uproot
 
 from pyhf.mixins import _ChannelSummaryMixin
 
@@ -48,11 +46,11 @@ def _make_hist_name(channel, sample, modifier='', prefix='hist', suffix=''):
 
 
 def _export_root_histogram(histname, data):
-    hist = TH1.from_numpy((np.asarray(data), np.arange(len(data) + 1)))
-    hist._fName = histname
     if histname in _ROOT_DATA_FILE:
         raise KeyError(f"Duplicate key {histname} being written.")
-    _ROOT_DATA_FILE[histname] = hist
+    _ROOT_DATA_FILE[histname] = np.histogram(
+        np.asarray(data), bins=np.arange(len(data) + 1)
+    )
 
 
 # https://stackoverflow.com/a/4590052

so that I'm using uproot3 again, then

python -m pytest -sx tests/test_validation.py -k test_import_roundtrip

passes.

I don't think(?) the assignment to _ROOT_DATA_FILE[histname] should matter a huge amount here, so it seems that comes down to the uproot.recreate block in writexml

pyhf/src/pyhf/writexml.py

Lines 288 to 303 in ce2ffab

    
           with uproot.recreate( 
        
               str(Path(data_rootdir).joinpath('data.root')) 
        
           ) as _ROOT_DATA_FILE: 
        
               for channelspec in spec['channels']: 
        
                   channelfilename = str( 
        
                       Path(specdir).joinpath(f'{resultprefix}_{channelspec["name"]}.xml') 
        
                   ) 
        
                   with open(channelfilename, 'w') as channelfile: 
        
                       channel = build_channel(spec, channelspec, spec.get('observations')) 
        
                       indent(channel) 
        
                       channelfile.write( 
        
                           "<!DOCTYPE Channel SYSTEM '../HistFactorySchema.dtd'>\n\n" 
        
                       ) 
        
                       channelfile.write( 
        
                           ET.tostring(channel, encoding='utf-8').decode('utf-8') 
        
                       )

matthewfeickert · 2021-08-28T02:19:07Z

I wonder if InputFile=_ROOT_DATA_FILE._path in

pyhf/src/pyhf/writexml.py

Lines 265 to 268 in ce2ffab

    
           def build_channel(spec, channelspec, obsspec): 
        
               channel = ET.Element( 
        
                   'Channel', Name=channelspec['name'], InputFile=_ROOT_DATA_FILE._path 
        
               )

is causing problems.

matthewfeickert · 2021-08-28T02:27:07Z

Ah seems so. If I add in the following addition

$ git diff
diff --git a/src/pyhf/writexml.py b/src/pyhf/writexml.py
index 650c313a..e9391ae8 100644
--- a/src/pyhf/writexml.py
+++ b/src/pyhf/writexml.py
@@ -261,6 +261,7 @@ def build_data(obsspec, channelname):
 
 
 def build_channel(spec, channelspec, obsspec):
+    print(f"\n{type(_ROOT_DATA_FILE)}: {_ROOT_DATA_FILE._path=}\n")
     channel = ET.Element(
         'Channel', Name=channelspec['name'], InputFile=_ROOT_DATA_FILE._path
     )

then with the uproot4 in this PR

$ python -m pytest -sx tests/test_validation.py -k test_import_roundtrip[example-one]
...
<class 'uproot.writing.writable.WritableDirectory'>: _ROOT_DATA_FILE._path=()
...

If I revert back to uproot3

$ python -m pytest -sx tests/test_validation.py -k test_import_roundtrip[example-one]
... 
<class 'uproot3.write.TFile.TFileRecreate'>: _ROOT_DATA_FILE._path='/tmp/pytest-of-feickert/pytest-54/test_import_roundtrip_example_0/data/data.root'
...

where for both _ROOT_DATA_FILE is set by

pyhf/src/pyhf/writexml.py

Lines 288 to 290 in ce2ffab

    
           with uproot.recreate( 
        
               str(Path(data_rootdir).joinpath('data.root')) 
        
           ) as _ROOT_DATA_FILE:

matthewfeickert · 2021-08-28T16:53:47Z

Debugging note to self: If I print out the spec from

pyhf/tests/test_validation.py

Lines 846 to 850 in ce2ffab

    
           parsed_xml_after = pyhf.readxml.parse(tmpxml.strpath, tmpdir.strpath) 
        
           spec = { 
        
               'channels': parsed_xml_after['channels'], 
        
               'parameters': parsed_xml_after['measurements'][0]['config']['parameters'], 
        
           }

and then compare them for uproot3 and uproot4

$ diff -y /tmp/uproot3_spec.py /tmp/uproot4_spec.py 
{								{
    "channels": [						    "channels": [
        {							        {
            "name": "channel1",					            "name": "channel1",
            "samples": [					            "samples": [
                {						                {
                    "name": "signal",				                    "name": "signal",
                    "data": [20.0, 10.0],		      |	                    "data": [0.0, 0.0],
                    "modifiers": [				                    "modifiers": [
                        {					                        {
                            "name": "syst1",			                            "name": "syst1",
                            "type": "normsys",			                            "type": "normsys",
                            "data": {"lo": 0.95, "hi": 1.05},	                            "data": {"lo": 0.95, "hi": 1.05},
                        },					                        },
                        {"name": "SigXsecOverSM", "type": "no	                        {"name": "SigXsecOverSM", "type": "no
                    ],						                    ],
                },						                },
                {						                {
                    "name": "background1",			                    "name": "background1",
                    "data": [100.0, 0.0],		      |	                    "data": [1.0, 0.0],
                    "modifiers": [				                    "modifiers": [
                        {"name": "lumi", "type": "lumi", "dat	                        {"name": "lumi", "type": "lumi", "dat
                        {					                        {
                            "name": "staterror_channel1",	                            "name": "staterror_channel1",
                            "type": "staterror",		                            "type": "staterror",
                            "data": [5.000000074505806, 0.0], |	                            "data": [2.0, 0.0],
                        },					                        },
                        {					                        {
                            "name": "syst2",			                            "name": "syst2",
                            "type": "normsys",			                            "type": "normsys",
                            "data": {"lo": 0.95, "hi": 1.05},	                            "data": {"lo": 0.95, "hi": 1.05},
                        },					                        },
                    ],						                    ],
                },						                },
                {						                {
                    "name": "background2",			                    "name": "background2",
                    "data": [0.0, 100.0],		      |	                    "data": [1.0, 0.0],
                    "modifiers": [				                    "modifiers": [
                        {"name": "lumi", "type": "lumi", "dat	                        {"name": "lumi", "type": "lumi", "dat
                        {					                        {
                            "name": "staterror_channel1",	                            "name": "staterror_channel1",
                            "type": "staterror",		                            "type": "staterror",
                            "data": [0.0, 10.0],	      |	                            "data": [2.0, 0.0],
                        },					                        },
                        {					                        {
                            "name": "syst3",			                            "name": "syst3",
                            "type": "normsys",			                            "type": "normsys",
                            "data": {"lo": 0.95, "hi": 1.05},	                            "data": {"lo": 0.95, "hi": 1.05},
                        },					                        },
                    ],						                    ],
                },						                },
            ],							            ],
        }							        }
    ],								    ],
    "parameters": [						    "parameters": [
        {							        {
            "name": "lumi",					            "name": "lumi",
            "auxdata": [1.0],					            "auxdata": [1.0],
            "bounds": [[0.5, 1.5]],				            "bounds": [[0.5, 1.5]],
            "inits": [1.0],					            "inits": [1.0],
            "sigmas": [0.1],					            "sigmas": [0.1],
            "fixed": True,					            "fixed": True,
        },							        },
        {"name": "SigXsecOverSM", "bounds": [[0.0, 3.0]], "in	        {"name": "SigXsecOverSM", "bounds": [[0.0, 3.0]], "in
        {"name": "syst1", "fixed": True},			        {"name": "syst1", "fixed": True},
    ],								    ],
}								}

the only parts that differ are the data fields

$ diff /tmp/uproot3_spec.py /tmp/uproot4_spec.py 
8c8
<                     "data": [20.0, 10.0],
---
>                     "data": [0.0, 0.0],
20c20
<                     "data": [100.0, 0.0],
---
>                     "data": [1.0, 0.0],
26c26
<                             "data": [5.000000074505806, 0.0],
---
>                             "data": [2.0, 0.0],
37c37
<                     "data": [0.0, 100.0],
---
>                     "data": [1.0, 0.0],
43c43
<                             "data": [0.0, 10.0],
---
>                             "data": [2.0, 0.0],

so I need to look at how those are getting written. From

pyhf/src/pyhf/writexml.py

Lines 234 to 253 in ce2ffab

    
           def build_sample(spec, samplespec, channelname): 
        
               histname = _make_hist_name(channelname, samplespec['name']) 
        
               attrs = { 
        
                   'Name': samplespec['name'], 
        
                   'HistoName': histname, 
        
                   'InputFile': _ROOT_DATA_FILE._path, 
        
                   'NormalizeByTheory': 'False', 
        
               } 
        
               sample = ET.Element('Sample', **attrs) 
        
               for modspec in samplespec['modifiers']: 
        
                   # if lumi modifier added for this sample, need to set NormalizeByTheory 
        
                   if modspec['type'] == 'lumi': 
        
                       sample.attrib.update({'NormalizeByTheory': 'True'}) 
        
                   modifier = build_modifier( 
        
                       spec, modspec, channelname, samplespec['name'], samplespec['data'] 
        
                   ) 
        
                   if modifier is not None: 
        
                       sample.append(modifier) 
        
               _export_root_histogram(histname, samplespec['data']) 
        
               return sample

it is clear to me then that how I've written _export_root_histogram in this PR is wrong.

matthewfeickert · 2021-08-28T17:15:37Z

it is clear to me then that how I've written _export_root_histogram in this PR is wrong.

Yeah, I wrote it 100% wrong. Don't write code when very tired is the obvious lesson here.

codecov · 2021-08-28T17:23:40Z

Codecov Report

Merging #1567 (72fa349) into master (c8a0e64) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1567      +/-   ##
==========================================
- Coverage   97.70%   97.70%   -0.01%     
==========================================
  Files          63       63              
  Lines        4050     4047       -3     
  Branches      576      576              
==========================================
- Hits         3957     3954       -3     
  Misses         54       54              
  Partials       39       39

Flag	Coverage Δ
contrib	`25.42% <0.00%> (+0.01%)`	⬆️
unittests	`97.47% <100.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/pyhf/writexml.py	`95.83% <100.00%> (-0.09%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c8a0e64...72fa349. Read the comment docs.

jpivarski

Looks great! I just have a few suggestions that could streamline things, nothing critical.

lower-bound-requirements.txt

jpivarski · 2021-08-28T21:17:47Z

tests/test_export.py

+
+    with uproot.recreate(tmp_path.joinpath("test_export_root_histogram.root")) as file:
+        file["hist"] = pyhf.writexml._ROOT_DATA_FILE["example"]
+        values, edges = file["hist"].to_numpy()


This literally writes and reads the data back. If you want to keep an Uproot Model of the data for later processing, you could use uproot.to_writable to turn it into a Model, which is implicitly the first step in writing anyway.

This was more just meant as a very simple sanity check that pyhf.writexml._export_root_histogram is writing what we expect it to into the ROOT file. I see though that

>>> import uproot >>> import numpy as np >>> data = np.arange(10) >>> edges = np.arange(len(data) + 1) >>> uproot.to_writable((data, edges)) <TH1D (version 3) at 0x7f421bf86df0>

verifies the structure. So in your mind should I add the suggestion that I've placed in pyhf.writexml._export_root_histogram?

If it's a sanity check, then writing and reading back is a good idea. It would be even more of a sanity check if you did so with different file objects:

with uproot.recreate(tmp_path.joinpath("test_export_root_histogram.root")) as file: file["hist"] = pyhf.writexml._ROOT_DATA_FILE["example"] with uproot.open(tmp_path.joinpath("test_export_root_histogram.root")) as file: values, edges = file["hist"].to_numpy()

That would test more—the whole file structure, and the certainty that these really are bytes on disk. Using the same file does invoke the system operation to read from the file it's just written to, but with the same file handle in "r+b" mode and without going back to the beginning of the file and seeking through all the pointers to get to the object again. to_writable does still less, just turning the 2-tuple of arrays Python object into an Uproot Model Python object, which is what you'd want to do if performance were your goal, rather than sanity-checking.

So it's up to you, depending on what your goal is here.

Good point. I've added this in the latest commit.

matthewfeickert

Thanks for the super fast response, @jpivarski! Just one follow up question for you though.

matthewfeickert · 2021-08-28T21:46:26Z

tests/test_export.py

+
+    with uproot.recreate(tmp_path.joinpath("test_export_root_histogram.root")) as file:
+        file["hist"] = pyhf.writexml._ROOT_DATA_FILE["example"]
+        values, edges = file["hist"].to_numpy()


This was more just meant as a very simple sanity check that pyhf.writexml._export_root_histogram is writing what we expect it to into the ROOT file. I see though that

>>> import uproot >>> import numpy as np >>> data = np.arange(10) >>> edges = np.arange(len(data) + 1) >>> uproot.to_writable((data, edges)) <TH1D (version 3) at 0x7f421bf86df0>

verifies the structure. So in your mind should I add the suggestion that I've placed in pyhf.writexml._export_root_histogram?

matthewfeickert · 2021-08-28T21:49:02Z

src/pyhf/writexml.py

    if histname in _ROOT_DATA_FILE:
        raise KeyError(f"Duplicate key {histname} being written.")
-    _ROOT_DATA_FILE[histname] = hist
+    _ROOT_DATA_FILE[histname] = (np.asarray(data), np.arange(len(data) + 1))


Suggested change

_ROOT_DATA_FILE[histname] = (np.asarray(data), np.arange(len(data) + 1))

_ROOT_DATA_FILE[histname] = uproot.to_writable(

(np.asarray(data), np.arange(len(data) + 1))

)

@jpivarski given your other comment, this seems better then directly writing, correct? As if this was ever messed up then we'd get a TypeError raised.

Just

writable = uproot.to_writable((np.asarray(data), np.arange(len(data) + 1)))

by itself ensures that the data passed to to_writable will be writable. It will raise a TypeError if it is not recognized as a writable Python type.

The next step of assigning to the file,

_ROOT_DATA_FILE[histname] = writable

actually writes it to the file. The disk I/O operations are only invoked when this assignment happens, not when the to_writable function is called.

The disk I/O operations are only invoked when this assignment happens, not when the to_writable function is called.

Right. Okay, then it seems what I have here (now) is doing as expected. 👍

src/pyhf/writexml.py

when _export_root_histogram does ``` _ROOT_DATA_FILE[histname] = hist_uproot_model ``` this is doing the assignment of `hsitname` as the name

matthewfeickert · 2021-08-30T23:18:57Z

I bumped the lower bound to uproot v4.1.1 as there's no real downside to requiring it given that @jpivarski just applied some bug fixes and v4.1.0 is way too new for anyone to have developed any hard dependency on it in the last 4 days.

matthewfeickert added feat/enhancement New feature or request tests pytest CI CI systems, GitHub Actions labels Aug 27, 2021

matthewfeickert self-assigned this Aug 27, 2021

matthewfeickert marked this pull request as draft August 27, 2021 22:21

matthewfeickert marked this pull request as ready for review August 28, 2021 17:29

matthewfeickert requested review from jpivarski, lukasheinrich and kratsg August 28, 2021 17:31

matthewfeickert changed the title ~~feat: Use uproot4 exclusively for writing and reading~~ feat: Use uproot4 exclusively for reading and writing ROOT files Aug 28, 2021

matthewfeickert mentioned this pull request Aug 28, 2021

chore: Remove unnecessary str conversions of pathlib.Path objects #1568

Merged

4 tasks

jpivarski approved these changes Aug 28, 2021

View reviewed changes

matthewfeickert commented Aug 28, 2021

View reviewed changes

src/pyhf/writexml.py Show resolved Hide resolved

matthewfeickert added 7 commits August 30, 2021 16:06

Use uproot4 only in xmlio

5de2404

Update lowerbound on uproot

2491cb2

Remove uproot3 from head of dependencies

eee9cf2

Use uproot4 writing

eeba377

Remove patching of TH1 as no longer used

918e917

Add test for export_root_histogram

34282d4

Use tmp_path for pathlib

805cc68

matthewfeickert added 9 commits August 30, 2021 16:06

Use 'file_path' in uproot4 not '_path'

c423f6f

write data correclty

bc75ca5

Write test that is actually correct

55d1ecb

Use uproot.to_writeable and ensure name

36ad516

Name memeber assignment is done through dict key

a4e8d58

when _export_root_histogram does ``` _ROOT_DATA_FILE[histname] = hist_uproot_model ``` this is doing the assignment of `hsitname` as the name

Don't create intermediate object as no name setting needed

aea65a4

Use axis().edges()

45735a7

drop a number for readability

119e36c

take Jim's advice and test on serialized file

404b57d

matthewfeickert changed the title ~~feat: Use uproot4 exclusively for reading and writing ROOT files~~ feat: Drop uproot3 for uproot4 for writing ROOT files Aug 30, 2021

Bump louwer bound to v4.1.1 to pickup bug fixes Jim found in release

72fa349

matthewfeickert force-pushed the feat/use-uproot4-only branch from 9ba2c7e to 72fa349 Compare August 30, 2021 23:16

lukasheinrich approved these changes Aug 31, 2021

View reviewed changes

matthewfeickert merged commit 210391f into master Aug 31, 2021

matthewfeickert deleted the feat/use-uproot4-only branch August 31, 2021 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Drop uproot3 for uproot4 for writing ROOT files #1567

feat: Drop uproot3 for uproot4 for writing ROOT files #1567

matthewfeickert commented Aug 27, 2021 •

edited

Loading

matthewfeickert commented Aug 27, 2021

jpivarski commented Aug 28, 2021

matthewfeickert commented Aug 28, 2021

matthewfeickert commented Aug 28, 2021

matthewfeickert commented Aug 28, 2021 •

edited

Loading

matthewfeickert commented Aug 28, 2021 •

edited

Loading

matthewfeickert commented Aug 28, 2021

codecov bot commented Aug 28, 2021 •

edited

Loading

jpivarski left a comment

jpivarski Aug 28, 2021

matthewfeickert Aug 28, 2021

jpivarski Aug 30, 2021

matthewfeickert Aug 30, 2021

matthewfeickert left a comment

matthewfeickert Aug 28, 2021

matthewfeickert Aug 28, 2021

jpivarski Aug 30, 2021

matthewfeickert Aug 30, 2021

matthewfeickert commented Aug 30, 2021 •

edited

Loading

feat: Drop uproot3 for uproot4 for writing ROOT files #1567

feat: Drop uproot3 for uproot4 for writing ROOT files #1567

Conversation

matthewfeickert commented Aug 27, 2021 • edited Loading

Description

Checklist Before Requesting Reviewer

Before Merging

matthewfeickert commented Aug 27, 2021

jpivarski commented Aug 28, 2021

matthewfeickert commented Aug 28, 2021

matthewfeickert commented Aug 28, 2021

matthewfeickert commented Aug 28, 2021 • edited Loading

matthewfeickert commented Aug 28, 2021 • edited Loading

matthewfeickert commented Aug 28, 2021

codecov bot commented Aug 28, 2021 • edited Loading

Codecov Report

jpivarski left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthewfeickert left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthewfeickert commented Aug 30, 2021 • edited Loading

matthewfeickert commented Aug 27, 2021 •

edited

Loading

matthewfeickert commented Aug 28, 2021 •

edited

Loading

matthewfeickert commented Aug 28, 2021 •

edited

Loading

codecov bot commented Aug 28, 2021 •

edited

Loading

matthewfeickert commented Aug 30, 2021 •

edited

Loading