-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an RDataSource for podio files and collections #593
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kjvbrt. I wasn't aware this was possible in this generality. Very nice. I am probably still missing quite a few things, but I tried to summarize my current understanding below and added and a few comments / questions inline
For my general understanding, what this does is basically:
- Create a reader for each thread / slot
- Keep a vector of Frames, again one per thread / slot
- Populate a vector of collections per thread / slot, where
- each collection always is at the same position, so that we can simply index into it
I am wondering whether we could do with just one reader instead of multiple, but I suppose then we would probably have to do our own locking around that, because the RDataSource
doesn't do that for us?
If I understand correctly, we could in the end even replace the current ROOTReader
with the Reader
that is introduced in #522 and effectively support all backends + RDataFrame?
The sanitizer workflows are most likely failing because the tests that produce the input files are not running. You can disable them by adding the new tests also to the list in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to avoid the hardcoded strings :)
src/ROOTDataSource.cc
Outdated
// Get collections stored in the files | ||
std::vector<std::string> collNames = frame.getAvailableCollections(); | ||
// std::cout << "podio::ROOTDataSource: Found following collections:\n"; | ||
for (auto& collName : collNames) { | ||
const podio::CollectionBase* coll = frame.get(collName); | ||
if (coll->isValid()) { | ||
m_columnNames.emplace_back(collName); | ||
m_columnTypes.emplace_back(coll->getValueTypeName()); | ||
// std::cout << " - " << collName << "\n"; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also make the parameters of the frame available as columns, e.g. to access the cross section? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do this in a follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few new minor suggestions for using the reader interface, now that it's available. I think I got all the places, but I might have missed some.
I continue with testing the changes made here in FCCAnalyses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be an issue with the installation now, because the EDM4hep tests are failing. As far as I can tell it has to do with header files that are no longer installed. Otherwise it's mainly minor comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell this is ready, right? From which ROOT version onwards should this work? Then we could add a simple check in CMake (similar to RNTuple).
I will push the minor code changes immediately. Once we have this, we can the add support for parameters in a follow-up PR, I would suggest.
list(APPEND root_components_needed ROOTDataFrame) | ||
endif() | ||
find_package(ROOT REQUIRED COMPONENTS ${root_components_needed}) | ||
if((ENABLE_RNTUPLE) AND (${ROOT_VERSION} VERSION_LESS 6.28.02)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a minimal version for RDataSource that we could / should check here for the RDataSource as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RDataSource is in ROOT for a long time (not experimental since ROOT 6.14), I tested this in nightlies stack (6.32)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, then I think we can also skip the check here. Thanks for checking.
EDM4hep CI workflows are fixed with #658 |
I tested this in FCCAnalysis, seems to work... |
BEGINRELEASENOTES
podio::DataSource
as anRDataSource
for working with podio (generated EDM) files and collections. ThisRDataSource
exposes the podio collections directly as columns forRDataFrames
. We consider this fairly stable, but still in early stages, there might still be some (breaking) changes to thisENDRELEASENOTES