-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added additional MC header information #921
Added additional MC header information #921
Conversation
- added missing information to the MCHeaderContainer - filled new entries in SimTelEventSource - added some of the new entries to the unit test - MCHeaderContainer should now contain every information provided in the MCRunHeader of the simtel files
One question I have here is whether or not we need a Container for this mc header info at all, or if we can just put it into the metadata of a container (e.g. event.mc.meta), which is a dictionary of headers. We could just copy the entire dict there in one line, and do no mapping of name to item. Certainly since it doesn't change per event, we will eventually move it outside of the event data structure anyhow. However, i may be missing a use case. The way the output files work when using a |
Codecov Report
@@ Coverage Diff @@
## master #921 +/- ##
==========================================
- Coverage 81.83% 78.42% -3.41%
==========================================
Files 184 191 +7
Lines 10526 10792 +266
==========================================
- Hits 8614 8464 -150
- Misses 1912 2328 +416
Continue to review full report at Codecov.
|
I approved these changes just so that we can perhaps discuss simplifying it as header info later (it doesn't hurt to keep the current implementation for now) |
Yes, writing multiple input files to the same output file |
see #922 |
Yes that could be useful. Ok, let's keep it as a Container, and it will perhaps move elsewhere. That means that we should also not blindly use the same field names as simtelarray, they should somehow be "general" to any simulation code, so we can consider renaming any that are not clear, or expanding some abbreviations at the same time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add the run id (or obsid), however we want to call it, to this container.
ctapipe/io/containers.py
Outdated
corsika_low_E_model = Field(0.0, "Detector MC information") | ||
corsika_high_E_model = Field(0.0, "Detector MC information") | ||
corsika_bunchsize = Field(0.0, "Detector MC information") | ||
corsika_wlen_min = Field(0.0, "Detector MC information") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minimum wavelength of cherenkov light
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't a unit be there as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add the run id (or obsid), however we want to call it, to this container.
That can be in a separate container that is written at the same time as this one - otherwise we'd have to reproduce it for every container we want to have a run_id, etc in. However, having the corsika run id might be useful, since so far the "obs_id" is filled as the simtelarray run id, but we lose the original corsika id (which can be different).
ctapipe/io/containers.py
Outdated
corsika_iact_options = Field(0.0, "Detector MC information") | ||
corsika_low_E_model = Field(0.0, "Detector MC information") | ||
corsika_high_E_model = Field(0.0, "Detector MC information") | ||
corsika_bunchsize = Field(0.0, "Detector MC information") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
number of photons per bunch (photons are not simulated individually but as bunches of n photons)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is almost ready to merge, other than a small comment:
In some of the fields like num_showers
, corsika_low_E_model
, etc, the default value is a float (0.0), but those are integer quantities. Right now the way the output works is that it takes the type from the default value, or the currently filled value (which ever is present when the data are written), so it's best to use the correct type (e.g. 0 for integers).
Alternately, we could set the default for all of these fields in the container to np.NaN
, which would ensure that if an EventSource
doesn't fill in that info, it's clear that it is unknown. Otherwise the default will be used, which may not be correct (e.g. the low-energy model 0 is perhaps not what was used). In the future, it would of course be good to map some of these arcane integer numbers to strings so it's more clear, but that can happen later.
Agreed, using |
- switched to np.nan as default values for the MCHeaderContainer (instead of 0.0) - (kept "None" strings)
Am i missing something or is the problem with the Travis Build not connected to this PR? I dont see how my changes would affect ctapipe/io/tests/test_hdf5.py |
The master is currently broken. Not your fault at all. We are working on it. |
Ah, i thought that was fixed already. Thanks |
We broke it again last friday |
Oh |
Apart from the master, which is currently broken, so the tests fail. This PR needs to be reviewed and approved by two ctapipe developers. I think this has not been done yet? |
I think this is ready right? @maxnoe do you still have a problem? |
No |
This is basically a continuation of #895 and #655.
MCRunHeader information from the simtel files gets stored in the MCHeaderContainer.
The Container should now contain every information of the MCRunHeader stored in the simtel files.
Item description is mainly taken from the hessio documentation at:
https://www.mpi-hd.mpg.de/hfm/~bernlohr/sim_telarray/Documentation/hessio_refman.pdf (page 80-81)
Range arrays are stored as max and min values (e.g. shower azimuth range).
Unfortunately I do not really know how corsika works and the hessio documentation does not mention
anything about the meaning of the corsika_* parameters. Those Descriptions should probably be updated
with something more meaningful.