Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved configuration of reference metadata #1889

Merged
merged 5 commits into from
Jun 22, 2022

Conversation

kosack
Copy link
Contributor

@kosack kosack commented Apr 22, 2022

use case:

The user expects that the metadata in the output file's header is useful, for example to build a file catalog necessary to retrieve the data later.

what this PR adds:

Part of the reference metadata was already configurable (the Contact info), this adds two additional user-configurable parts:

  • the Instrument metadata (not to be confused with the InstrumentDescription, this is just id information about the instrument, in this case the subarray used). This lets the user set the subarray name, version, and other instrument-related attributes. If they are not specified, they are automatically filled as before (or left unspecified)
  • CONTEXT metadata, which is fully user-defined and allows an additional set of context-specific headers to be added as key-value pairs.

changes

  • Added user-defined CONTEXT metadata, which can be added to the config
    file
  • Made metadata.Instrument a configurable class similar to
    metadata.Contact so values can be overridden in config files

The result is one can create a config file for a subarray definition like:

EventSource:
  allowed_tels: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 19, 35]

DataWriter:
  Instrument:  # this is the new configurable info
    site: CTA-North
    class_: Subarray
    id_: alpha
    version: Prod5b

And that information will be written to the file's reference headers:

ctapipe-fileinfo output.h5
...
        INSTRUMENT:
            CLASS: Subarray
            ID: alpha
            SITE: CTA-North
            SUBTYPE: unspecified
            TYPE: unspecified
            VERSION: Prod5b
...

Similarly the arbitrary CONTEXT metadata can be set as follows:

DataWriter:
  context_metadata:
    "SIMULATION PRODUCTION": prod5b
    "SIMULATION DATASET":  Prod5b_Paranal_AdvancedBaseline_NSB1x_South_40deg
    "DESCRIPTION": "This is just a test"

Using spaces between keys allows a hierarchy (we could also support dots or similar)

ctapipe-fileinfo output.h5
...
    CONTEXT:
        DESCRIPTION: This is just a test
        SIMULATION:
            DATASET: Prod5b_Paranal_AdvancedBaseline_NSB1x_South_40deg
            PRODUCTION: prod5b
...

This is useful for example for storing parameters that are not automatically read from the MC files into the SimulationConfigContainer , or for storing things like calibration versions, etc.

Still todo:

  • make tests

issues

  • due to how the config system works, you cannot "chain" context metadata, meaning if you specify it in two included config files, the latter one fully replaces the former's dictionary, rather than adding to it.
  • the id_ and class_ attributes have underscores to avoid linter/IDE complaints about python name conflicts (even if they do not conflict). This is a bit ugly, could consider removing them and just deal with complaints

- Added user-defined CONTEXT metadata, which can be added to the config
file
- Made `metadata.Instrument` a configurable class similar to
`metadata.Contact` so values can be overridden in config files

The result is one can create a config file containing:

```yaml
DataWriter:
  Instrument:
    site: CTA-North
    class_: Subarray
    id_: alpha
    version: Prod5b
```
And that information will be written to the file's reference headers:

```sh
ctapipe-fileinfo output.h5
...
        INSTRUMENT:
            CLASS: Subarray
            ID: alpha
            SITE: CTA-North
            SUBTYPE: unspecified
            TYPE: unspecified
            VERSION: Prod5b
...
```
@codecov
Copy link

codecov bot commented Apr 22, 2022

Codecov Report

Merging #1889 (c8d3618) into master (0612a33) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1889      +/-   ##
==========================================
+ Coverage   92.04%   92.06%   +0.02%     
==========================================
  Files         189      189              
  Lines       14842    14881      +39     
==========================================
+ Hits        13661    13700      +39     
  Misses       1181     1181              
Impacted Files Coverage Δ
ctapipe/io/datawriter.py 90.16% <100.00%> (+0.43%) ⬆️
ctapipe/io/metadata.py 96.46% <100.00%> (+0.06%) ⬆️
ctapipe/io/tests/test_datawriter.py 100.00% <100.00%> (ø)
ctapipe/tools/process.py 92.13% <100.00%> (+0.08%) ⬆️
ctapipe/core/tests/test_component.py 97.83% <0.00%> (-0.23%) ⬇️
ctapipe/core/tool.py 93.39% <0.00%> (+0.03%) ⬆️
ctapipe/core/component.py 97.64% <0.00%> (+1.17%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0612a33...c8d3618. Read the comment docs.

@maxnoe
Copy link
Member

maxnoe commented Apr 22, 2022

the id_ and class_ attributes have underscores to avoid linter/IDE complaints about python name conflicts (even if they do not conflict).

This is true for id which is a builtin function that can be shadowed, but not for class which is a keyword.

@kosack
Copy link
Contributor Author

kosack commented Apr 22, 2022

This is true for id which is a builtin function that can be shadowed, but not for class which is a keyword

True, so maybe I'll just stick with the underscores (since they are at least needed for the id column). It's not too bad. Or I could rename it inst_id, inst_class.

@kosack kosack marked this pull request as ready for review April 22, 2022 11:32
kosack added 3 commits April 22, 2022 13:38
@kosack kosack added this to the v0.14.1 milestone Apr 22, 2022
@HealthyPear HealthyPear modified the milestones: v0.15.1, v0.16.0 May 23, 2022
@kosack kosack requested a review from maxnoe June 9, 2022 08:44
@kosack kosack mentioned this pull request Jun 9, 2022
@maxnoe maxnoe merged commit cbb01b5 into cta-observatory:master Jun 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants