Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple files as input of EventSource #1821

Closed
aleberti opened this issue Jan 12, 2022 · 3 comments · Fixed by #1998
Closed

Multiple files as input of EventSource #1821

aleberti opened this issue Jan 12, 2022 · 3 comments · Fixed by #1998
Labels

Comments

@aleberti
Copy link

Currently, giving more than one file as input_url to EventSource (specifically I am using the MAGICEventSource from https://github.com/cta-observatory/ctapipe_io_magic) e.g. passing a string with wildcards, throws an error (a TraitError to be precise).

Is there a way, if the logic allows, to pass more files at a time? The issue I have is that in MAGIC data files (at calibrated level; one file per subrun, and each run has multiple subruns) the pointing information from the drive reports are stored in the same file as the events, and sometimes happens that a given subrun has only one drive report available (this can happen for different reasons e.g. the subrun is too short, usually they are ~1min long and have ~7-8 reports available). This prevents to interpolate the pointing position of the events starting from the drive reports.

Therefore, if I could be able to pass all the subruns belonging to the same run to the EventSource, this problem would be no longer there. Also, this would improve the interpolation, since performing the interpolation subrun by subrun will result in extrapolation at the edges of each subrun.

Have you any suggestion on how to do this without introducing changes to the EventSource class? Or do you suggest to adopt a solution similar to ctapipe_io_lst i.e. having the possibility to give as input a file containing the pointing information from the drive reports? In the latter case it would mean for us to have a pre-processing of the files before the stage1 process (MAGIC calibrated-->DL1) to get the drive reports and create this file to give as input to the MAGICEventSource.

Thanks!

@maxnoe
Copy link
Member

maxnoe commented Jan 12, 2022

@aleberti Actually, for ctapipe_io_lst we do need to read multiple files as well, the four parallel streams of LST.

Given as input_url LST-1.1.Runxxxx.yyyyy.fits.fz, the source looks for LST-1.{2,3,4}.Runxxxx.yyyyy.fits.fz at the same path to read all files (here in parallel).

You could do the same for the magic event source, give the path to the first subrun as input_url and let the event source find the other subruns automatically.

@aleberti
Copy link
Author

@aleberti Actually, for ctapipe_io_lst we do need to read multiple files as well, the four parallel streams of LST.

Given as input_url LST-1.1.Runxxxx.yyyyy.fits.fz, the source looks for LST-1.{2,3,4}.Runxxxx.yyyyy.fits.fz at the same path to read all files (here in parallel).

You could do the same for the magic event source, give the path to the first subrun as input_url and let the event source find the other subruns automatically.

Ok, I did not notice that! Thank, I will try that then.

@cta-observatory cta-observatory deleted a comment from aleberti Jan 12, 2022
@maxnoe
Copy link
Member

maxnoe commented Jan 12, 2022

But I think the general issue is valid. The base class already makes the assumption there always has to be one input_url and that input_url refers to a single existing file (or file that can be downloaded from a url or that is in the dataset path).

That is very much ingrained in the EventSource plugin system, as that looks for an EventSource subclass that is_compatible(input_url).

Changing that is not possible I think, the design of everything is quite build around the fact that you give an input_url from which the event source to be used can be determined.

Maybe we should relax the requirements the EventSource base class is already checking:

input_url = Path(
    directory_ok=False,
    exists=True,
    help="Path to the input file containing events.",
).tag(config=True)

Requiring an existing file in the base class seems very restrictive towards what event sources can do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants