Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a reader and writer interface #522

Merged
merged 37 commits into from
May 22, 2024
Merged

Conversation

jmcarcell
Copy link
Member

@jmcarcell jmcarcell commented Dec 3, 2023

BEGINRELEASENOTES

  • Add a reader and writer interface and tests using those
  • Add a new library podioIO with the interface readers and writers

ENDRELEASENOTES

@jmcarcell jmcarcell force-pushed the interface branch 2 times, most recently from 74dd6d6 to cf6cc78 Compare December 5, 2023 10:20
Copy link
Collaborator

@tmadlener tmadlener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't yet looked into all the details, but one general comment that I already have now is: How hard would it be to make an I/O independent reader interfaces? The hardest part is most likely the different return types of readNextEntry/readEntry.

I would then make the user facing interface look something like this:

struct Reader {
   podio::Frame readNextFrame(const std::string& category);
};

which then internally just calls the readNextEntry from the specific reader and constructs the Frame from the returned FrameData. Potentially the return type should be std::optional<podio::Frame> to make it easier to check whether we have run out of entries, because readNextEntry will return a nullptr in that case.

@jmcarcell
Copy link
Member Author

jmcarcell commented Jan 18, 2024

I pushed although it's unfinished (all test pass when building with rntuple and without SIO) but the details can be done later. So now the readers return an std::optional<podio::Frame> and the user Reader looks like:

class Reader {
public:
  Reader(const std::string& filename, const std::string& fileType="TTree");

  podio::Frame readNextFrame(const std::string& name);
  podio::Frame readFrame(const std::string& name, size_t index);


  size_t getEntries(const std::string& name);

So to use

    auto reader = Reader("file.root");
    for (auto i = 0; i < reader.getEntries(); i++)
      auto frame = reader.readNextFrame("events");

which should cover most cases together with the other method to read a specific Frame and then one can always use an specific reader for more flexibility.

@jmcarcell jmcarcell force-pushed the interface branch 2 times, most recently from 8e36225 to 5642b3a Compare January 21, 2024 18:12
Copy link
Collaborator

@tmadlener tmadlener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move the RNTuple related renaming commit to a separate PR? It would make this a bit more "atomic" and would yield a slightly cleaner diff for the review of the interface related changes.

I have a few comments for the interface already below as well. I think it looks good in general, just a few details that could be improved.

include/podio/Reader.h Outdated Show resolved Hide resolved
include/podio/Reader.h Outdated Show resolved Hide resolved
include/podio/Reader.h Show resolved Hide resolved
include/podio/Reader.h Outdated Show resolved Hide resolved
include/podio/Writer.h Outdated Show resolved Hide resolved
include/podio/Reader.h Outdated Show resolved Hide resolved
@jmcarcell jmcarcell changed the title Add a reader and writer interface, rename the RNTuple reader and writer Add a reader and writer interface Jan 23, 2024
@jmcarcell jmcarcell force-pushed the interface branch 2 times, most recently from 4d5032c to f172dc9 Compare January 25, 2024 10:36
Copy link
Collaborator

@tmadlener tmadlener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if we want to replicate the reader / writer interfaces on the python side exactly. In the end the whole Mixin business is there to abstract that away and we have duck typing in python, so we probably should just aim at providing (renaming?) the makeReader or makeWriter functionality in python to do the right thing.

Another advantage of that would be that the FrameCategoryIterator will always get some FrameDataT and will not have to differentiate between dealing with the reader interface or with a concrete reader.

If we replicate the reader / writer interfaces they have to go into some generic module that is independent of the backend (i.e. it would have to be moved out of root_io where it currently is).

@jmcarcell
Copy link
Member Author

jmcarcell commented Apr 22, 2024

I am not sure if we want to replicate the reader / writer interfaces on the python side exactly. In the end the whole Mixin business is there to abstract that away and we have duck typing in python, so we probably should just aim at providing (renaming?) the makeReader or makeWriter functionality in python to do the right thing.

Another advantage of that would be that the FrameCategoryIterator will always get some FrameDataT and will not have to differentiate between dealing with the reader interface or with a concrete reader.

If we replicate the reader / writer interfaces they have to go into some generic module that is independent of the backend (i.e. it would have to be moved out of root_io where it currently is).

Python changes have been removed, it's not necessary for example for the readers since they detect the type automatically but I guess it should be nice to have something more similar in C++ and python.

Needs #568; Clang 12 doesn't like to return a std::unique_ptr without std::move (see the last commits) but if they are added then GCC 12 complains. Locally both GCC 13 and Clang 17 like both so I guess we stick to the most natural option.

Copy link
Collaborator

@tmadlener tmadlener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only have some test organization and nitpicks here.

include/podio/Reader.h Outdated Show resolved Hide resolved
include/podio/Reader.h Outdated Show resolved Hide resolved
include/podio/Reader.h Show resolved Hide resolved
include/podio/Reader.h Outdated Show resolved Hide resolved
include/podio/Writer.h Outdated Show resolved Hide resolved
src/Writer.cc Outdated Show resolved Hide resolved
Comment on lines 102 to 107
#if PODIO_ENABLE_SIO
auto readerSIO = podio::makeReader("example_frame_sio_interface.sio");
if (read_frames(readerSIO)) {
return 1;
}
#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go into the sio_io tests directory, also for the writing. That might require shuffling some of the code in the .cpp files into headers and putting them into the top level tests directory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some more general includes in the top tests directory and then the actual tests in tests/root_io and tests/sio_io. I've added a new library podioIO since I don't think there is a good place for the reader and writer (but then one will need to link to this one to use the interfaces). I think this should eventually replace the current instances since the interface should cover most of the use cases and you can also change backend very easily.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the new library / target. I think this should be the new default target to link against. It brings along all the correct dependencies in any case (it links publicly against podioRootIO and potentially also podioSioIO). So users should just use podioIO in their setups. We will have to document that somewhere. However, I don't think we have a fitting section yet to put it into. I only ever put the cmake macros on to slides so far, it seems.

@tmadlener
Copy link
Collaborator

Some test names changes so sanitizer workflows started to fall over them because of ROOT internals. I have split them into purely TTree based and purely RNTuple based tests so that at least the latter should be doable in sanitizer builds.

@tmadlener tmadlener force-pushed the interface branch 2 times, most recently from 522c636 to ba6722b Compare May 2, 2024 14:57
src/Writer.cc Outdated
throw std::runtime_error("ROOT RNTuple writer not available. Please recompile with ROOT RNTuple support.");
#endif
}
if ((type == "default" && endsWith(filename, ".sio")) || lower(type) == "sio") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely sure whether we should check the type here as well. currently it would be possible to write example.root as an sio file, which would then break when trying to read it back, probably with some rather incomprehensible error.

I think it would be less error prone, if we only check the type for the .root file ending and ignore it entirely for sio. Otherwise we would at least also have to foresee the possibility to customize the type in the reader interface.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@tmadlener
Copy link
Collaborator

I don't understand why the read_interface_sio test is flagged by the AddressSanitizer, claiming there is a leak of effectively the complete Frame. Since reading with sio works without leaks this would have to be introduced somewhere in the interface, but I don't see a place where that would be possible or where it would make sense. Especially since the AddressSanitizer effectively points to the Frame construction in readEvent.

@jmcarcell
Copy link
Member Author

jmcarcell commented May 6, 2024

Reading a frame with SIO is also not free from leaks and this is why it's commented out in the sanitizer tests, right?
https://github.com/AIDASoft/podio/blob/master/tests/CTestCustom.cmake#L80-L81
Simply reading frames with SIO:

SUMMARY: AddressSanitizer: 7328 byte(s) leaked in 342 allocation(s).

The read_interface_sio.cpp test:

SUMMARY: AddressSanitizer: 7328 byte(s) leaked in 342 allocation(s).

so it doesn't have anything to do with the interface. I'll just add it to the list of excluded

@tmadlener
Copy link
Collaborator

Ah you are right, I missed that they are also excluded from the sanitizer tests. I thought they were running. So this is indeed not something new with the interface, but a general leak.

I have opened an issue to keep track of that.

@tmadlener tmadlener merged commit 6eefdc4 into AIDASoft:master May 22, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants