-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for reading multiple files with SIO #647
Comments
Hello @tmadlener! I'd love to work on this. Could you please provide more details on the expected implementation? From my understanding, the new functionality will involve defining methods in |
Hi @SaxenaAnushka102, this will involve quite a bit of c++ and after having had another look into it, it might not be as straight forward as I had originally anticipated. Nevertheless, let me lay out the basic pieces of work that would need to be done and then you can still decide if you want to embark on this :) The main work will have to happen in the The main building blocks of the SIOReader are:
There are a few more members, but we don't care too much about them here. The small issueThe Lines 23 to 24 in 8651fdd
There really should be a check there and if it returns false, then we need to throw a std::runtime_error telling us that we cannot use this file.
The actual workThe main part of the work will be to make sure that we switch files whenever necessary. The main challenge in this case will be that when reading these files, there can be different categories and it is possible that we read some categories much faster than we do others. Hence, it's possible that we run out of entries for one category in a file, while we still have entries for other categories in this file. Obviously, we still need to be able to go back to those entries. This means that we will have to keep track of which entries of each category are in which file and then potentially open another file when the current entry is not in the file that is currently open. We could in principle also keep multiple files open at the same time, but then we have to keep track of all of them and we will potentially have very many open file handles, which is something we would like to avoid. The steps that we will have to take are:
I think most of the functionality can be put into the track keeping structure, such that the The structure that keeps track of which entry is in which file would look something like this (from an interface point of view) struct EntryFileMap {
EntryFileMap(const std::vector<std::string>& filenames, const std::vector<podio::SIOFileTocRecord>& tocRecords);
std::tuple<sio::ifstream&, unsigned> getFileAndPosition(const std::string& category, unsigned iEntry) const;
}; For the implementation one would probably go with something like a std::vector<std::string> m_filenames; ///< All the file names
sio::ifstream m_currentStream; ///< The currently open stream / file
unsigned m_currentStreamIdx; ///< The index in the filenames vector of the currently open stream
std::vector<SIOFileTocRecord> m_fileTOCRecords; ///< The toc records of each file Then
|
Hi @tmadlener, |
Glad to hear it :) And obviously very happy to help along the way. To answer your specific questions:
Entries will always be fully within one file, unless there is a problem with the file. However, in that case there is not too much we can do in any case.
For the first version we simply go the easiest way; If we find an issue in any of the files during TOC reading, we will just throw an exception, as there is again not too much we can do with faulty input files.
I would start with something a long the lines in my original comment. I am fairly certain it will need some refinement that we will discover throughout the development.
No, not really at this point. |
Currently the
SIOReader
does not support automatic file switching.The text was updated successfully, but these errors were encountered: