Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edm4hep files contaning vertex collections incompatible with root RDataFrame.Describe() #262

Open
Zehvogel opened this issue Feb 6, 2024 · 10 comments

Comments

@Zehvogel
Copy link
Contributor

Zehvogel commented Feb 6, 2024

root's RDataFrame .Describe() feature fails with edm4hep files containing vertex collections with the following error message:

cppyy.gbl.std.runtime_error: ROOT::RDF::RDFDescription ROOT::RDF::RInterfaceBase::Describe() =>
    runtime_error: TTree leaf BuildUpVertices.covMatrix[6] has both a leaf count and a static length. This is not supported.

It is maybe not our fault but I wanted to submit the issue here first

  • OS version: alma9
  • Package version: source /cvmfs/sw-nightlies.hsf.org/key4hep/releases/2024-02-05/x86_64-almalinux9-gcc11.3.1-opt/key4hep-stack/2024-02-05-osrelo/setup.sh
  • Reproduced by:
    downloading https://cernbox.cern.ch/s/71P17DwMNplpruv
import ROOT
df = ROOT.RDataFrame("events", "cld_enuW_1evt_REC_edm4hep.root")
df.Describe()
@tmadlener
Copy link
Contributor

tmadlener commented Feb 6, 2024

This looks like a root issue, where the minimal reproducer should be to effectively write a branch with an std::array. It could also be that it fails only if the std::array is part of a struct as that is what it is in our case. (And then the branch is actually a vector and not just single elments)

@Zehvogel
Copy link
Contributor Author

Zehvogel commented Feb 6, 2024

Looks like a reproducer is not so easy to produce...

I made a struct

struct test {
    std::array<float, 6> arr;
    ROOT::VecOps::RVec<float> vec;
    ROOT::VecOps::RVec<ROOT::VecOps::RVec<float>> vecvec;
    std::array<std::array<float, 6>,1> arrarr;
    ROOT::VecOps::RVec<std::array<float,6>> vecarr;
}

and use it like this:

import ROOT

df = ROOT.RDataFrame(10)

code = """
struct test {
    std::array<float, 6> arr;
    ROOT::VecOps::RVec<float> vec;
    ROOT::VecOps::RVec<ROOT::VecOps::RVec<float>> vecvec;
    std::array<std::array<float, 6>,1> arrarr;
    //ROOT::VecOps::RVec<std::array<float,6>> vecarr;
};
"""
ROOT.gInterpreter.Declare(code)

df = df.Define("test", "test()")
print(df.Describe())
df.Snapshot("test_tree", "test_file.root")
df2 = ROOT.RDataFrame("test_tree", "test_file.root")
print(df2.Describe())

returning:

[---]
Column  Type    Origin
------  ----    ------
test    test    Define
[---]
Column                  Type                                            Origin
------                  ----                                            ------
arr[6]                  array<float,6>                                  Dataset
arrarr[1][6]            array<array<float,6>,1>                         Dataset
test                    test                                            Dataset
test.arr[6]             ROOT::VecOps::RVec<Float_t>                     Dataset
test.arrarr[1][6]       ROOT::VecOps::RVec<Float_t>                     Dataset
test.vec                ROOT::VecOps::RVec<float>                       Dataset
test.vecvec             ROOT::VecOps::RVec<ROOT::VecOps::RVec<float> >  Dataset
vec                     ROOT::VecOps::RVec<float>                       Dataset
vecvec                  ROOT::VecOps::RVec<ROOT::VecOps::RVec<float> >  Dataset
[---]

uncommenting the vecarr member fails when writing to disk complaining about not having a dictionary:

Error in <TStreamerInfo::Build>: The class "test" is interpreted and for its data member "vecarr" we do not have a dictionary for the collection "ROOT::VecOps::RVec<array<float,6> >". Because of this, we will not be able to read or write this data member.

@tmadlener
Copy link
Contributor

tmadlener commented Feb 6, 2024

Ah, sorry. That was a bit ambiguous from my side. What we write is more like this

struct test {
  std::array<float, 6> values;
};

std::vector<test> data;

So probably something along the lines of

import ROOT
ROOT.gInterpreter.Declare("""
#include <array>
#include <vector>

struct test {
  std::array<float, 6> values{};
};

std::vector<test> makeTestData() {
  return std::vector<test>(10);
}
"""

# all the rest

If you want to make it slightly more complicated, this would also be allowed in a podio generated EDM

struct ArrayStruct {
  std::array<int, 42> arr;
};

struct SomeData {
  ArrayStruct a{};
};

@Zehvogel
Copy link
Contributor Author

Zehvogel commented Feb 6, 2024

Ok, thx.
With your construct (the simpler one) it still complains about not having a dictionary but in another way.

Error in <TTree::Branch>: The class requested (vector<test>) for the branch "test_col" is an instance of an stl collection and does not have a compiled CollectionProxy. Please generate the dictionary for this collection (vector<test>) to avoid to write corrupted data.

@tmadlener
Copy link
Contributor

Ah right, you will probably have to create a dictionary then for at least test then. I am not sure if there is an easy way to get that done via the python bindings, or whether it is necessary to bring in some heavier (CMake) machinery.

@Zehvogel
Copy link
Contributor Author

Zehvogel commented Feb 6, 2024

I have no idea either, guess I will have to postpone this for now as there is more important stuff to do...

@wdconinc
Copy link
Contributor

@Zehvogel I'll file an issue at root-project/root and link here.

@wdconinc
Copy link
Contributor

Back reference: root-project/root#14790

@jmcarcell
Copy link
Contributor

I've had a look at this and it seems it's only an issue with .Describe(). Working with std::array works fine, accessing different indexes, creating new variables, filtering... So a storage type change is not needed.

@wdconinc
Copy link
Contributor

This affects more than just .Describe(). It prevents also for example df.AsNumpy(['BuildUpVertices.covMatrix']). In fact, I think it prevents any access to values stored in a std::array from python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants