Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow YAML and TOML config files and multiple configs #1856

Merged
merged 42 commits into from
Apr 14, 2022

Conversation

kosack
Copy link
Contributor

@kosack kosack commented Mar 22, 2022

fixes #1047
fixes #1732

This adds a function to load a YAML or TOML config file just like a JSON or Python one. There is more to do to make this fully useful, but it works as is.

It also adds the ability to specify multiple config files for a tool (by using repeated --config options, or a list in a config file itself).

For example the config file can look like:

DataWriter:
  Contact:
    # please fill in your contact information here. It will be stored in the
    # output files as provenance information
    name: YOUR-NAME-HERE
    email: [email protected]
    organization: YOUR-ORGANIZATION

  # options that control what is stored in the output file
  overwrite: false # do not overwrite existing files
  write_images: true # store DL1 images
  write_parameters: true # store DL1 parameters
  write_stereo_shower: false # store DL2 stereo geometry

CameraCalibrator:
  image_extractor_type: NeighborPeakWindowSum

ImageProcessor:
  image_cleaner_type: TailcutsImageCleaner

  # make sure you include a configuration for the image cleaner you selected
  # above here. The section named by the image_cleaner_type will be used to
  # configure it.
  TailcutsImageCleaner:
    # the thresholds for this image cleaner must be optimized for the data set
    # you are analyzing. The defaults may not be correct.
    picture_threshold_pe:
      - [type, "*", 10.0]
      - [type, LST_LST_LSTCam, 5.0]
      - [type, MST_MST_NectarCam, 5.0]
      - [type, SST_ASTRI_CHEC, 3.0]
    boundary_threshold_pe:
      - [type, "*", 5.0]
      - [type, LST_LST_LSTCam, 2.5]
      - [type, MST_MST_NectarCam, 2.5]
      - [type, SST_ASTRI_CHEC, 1.5]
    min_picture_neighbors:
      - [type, "*", 2]

  # These specify which images should be parameterized:
  ImageQualityQuery:
    quality_criteria:
      - [enough_pixels, "lambda im: np.count_nonzero(im) > 2"]
      - [enough_charge, "lambda im: im.sum() > 50"]

ShowerProcessor:
  # These specify criteria for telescopes that should be included in stereo
  # reconstruction:
  ShowerQualityQuery:
    quality_criteria:
      - [enough intensity, "lambda p: p.hillas.intensity > 50"]
      - [Positive width, "lambda p: p.hillas.width.value > 0"]
      - [enough pixels, "lambda p: p.morphology.num_pixels > 3"]

@maxnoe
Copy link
Member

maxnoe commented Mar 22, 2022

I am not such a huge fan of yml anymore... way too complex. I'd go for toml to have a simple config file format does not have the short comings of json / the unlimited power of py files

@kosack kosack changed the title Feature allow yaml configs Feature: allow yaml configs Mar 22, 2022
@codecov
Copy link

codecov bot commented Mar 22, 2022

Codecov Report

Merging #1856 (ca66dba) into master (d254451) will increase coverage by 0.03%.
The diff coverage is 94.64%.

@@            Coverage Diff             @@
##           master    #1856      +/-   ##
==========================================
+ Coverage   92.00%   92.04%   +0.03%     
==========================================
  Files         189      189              
  Lines       14805    14842      +37     
==========================================
+ Hits        13622    13661      +39     
+ Misses       1183     1181       -2     
Impacted Files Coverage Δ
ctapipe/core/tool.py 93.36% <90.62%> (+0.13%) ⬆️
ctapipe/tools/quickstart.py 77.58% <100.00%> (ø)
ctapipe/tools/tests/test_merge.py 97.18% <100.00%> (ø)
ctapipe/tools/tests/test_process.py 95.27% <100.00%> (+0.78%) ⬆️
ctapipe/io/datawriter.py 89.72% <0.00%> (+1.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d254451...ca66dba. Read the comment docs.

@kosack
Copy link
Contributor Author

kosack commented Mar 22, 2022

There is also an old function tool.export_tool_config_to_commented_yaml that could probably be cleaned up and resurrected once this PR is merged.

ctapipe/core/tool.py Outdated Show resolved Hide resolved
@kosack
Copy link
Contributor Author

kosack commented Mar 22, 2022

I am not such a huge fan of yml anymore... way too complex. I'd go for toml to have a simple config file format does not have the short comings of json / the unlimited power of py files

Could add TOML to this as well pretty easily. However, for nested configs, I really don't like TOML as it is harder to quickly see how it is nested. Simple YAML is pretty easy to read, I think, even if it does support some really crazy stuff. Anything is better than JSON, which is not really a configuration language, but rather just a serialization format (no comments, etc).

@kosack
Copy link
Contributor Author

kosack commented Mar 22, 2022

Now also supports TOML:

[DataWriter]
overwrite = false
write_images = true
write_parameters = true
write_stereo_shower = false
write_mono_shower = false
transform_image = true
transform_peak_time = true

[CameraCalibrator]
image_extractor_type = "NeighborPeakWindowSum"

[ImageProcessor]
image_cleaner_type = "TailcutsImageCleaner"

[DataWriter.Contact]
name = "YOUR-NAME-HERE"
email = "[email protected]"
organization = "YOUR-ORGANIZATION"

[ImageProcessor.TailcutsImageCleaner]
picture_threshold_pe = [ [ "type", "*", 10.0,], [ "type", "LST_LST_LSTCam", 5.0,], [ "type", "MST_MST_NectarCam", 5.0,], [ "type", "SST_ASTRI_CHEC", 3.0,],]
boundary_threshold_pe = [ [ "type", "*", 5.0,], [ "type", "LST_LST_LSTCam", 2.5,], [ "type", "MST_MST_NectarCam", 2.5,], [ "type", "SST_ASTRI_CHEC", 1.5,],]
min_picture_neighbors = [ [ "type", "*", 2,],]

[ImageProcessor.ImageQualityQuery]
quality_criteria = [ [ "enough_pixels", "lambda im: np.count_nonzero(im) > 2",], [ "enough_charge", "lambda im: im.sum() > 50",],]

[ShowerProcessor.ShowerQualityQuery]
quality_criteria = [ [ "enough intensity", "lambda p: p.hillas.intensity > 50",], [ "Positive width", "lambda p: p.hillas.width.value > 0",], [ "enough pixels", "lambda p: p.morphology.num_pixels > 3",],]

ctapipe/core/tool.py Outdated Show resolved Hide resolved
@kosack
Copy link
Contributor Author

kosack commented Mar 22, 2022

Probably there is a much nicer way to format the TOML version... I need to look at the spec. That was an auto-generated conversion from the JSON. I guess there are ways to wrap the list of lists we use for TelescopeConfigs and QualityCriteria

@kosack kosack changed the title Feature: allow yaml configs Feature: allow YAML and TOML config files Mar 22, 2022
ctapipe/core/tool.py Outdated Show resolved Hide resolved
@kosack
Copy link
Contributor Author

kosack commented Mar 22, 2022

Any idea how to write nested lists on multiple lines correctly in TOML? I tried what I thought the manual said, and it doesn't work: This Fails to read:

[ImageProcessor.TailcutsImageCleaner]

picture_threshold_pe = [
    [ "type", "*", 10.0,],
    [ "type", "LST_LST_LSTCam", 5.0,],
    [ "type", "MST_MST_NectarCam", 5.0,],
    [ "type", "SST_ASTRI_CHEC", 3.0,],
]

Ah... TOML seems to be super picky about spaces after sections. Even Github won't render it

 toml.decoder.TomlDecodeError: Not a homogeneous array (line 28 column 1 char 634)

@kosack
Copy link
Contributor Author

kosack commented Mar 22, 2022

So TOML so far doesn't work with the TelescopeParameters...

  1. It seems to require that all elements of an array are of the same format, so the nested arrays all have to be strings (then it stops complaining about not being able to parse the file
  2. TelescopeParameter then of course fails to validate the value since it expects a float not a string.

Point 1 seems overly picky. Not sure if this is a problem with the implementation (I'm using toml not tomli), or the specification of TOML itself

@maxnoe
Copy link
Member

maxnoe commented Mar 22, 2022

TOML works fine, the toml library does not fully support toml 1.0. Using tomli works fine:

In [1]: import tomli

In [2]: !cat test.toml
[cuts]
foo = [
	["type", "*", 10.0],
	["type", "LST", 4.0],
	["id", 1, 4.0],
]

In [3]: tomli.load(open('./test.toml', "rb"))
Out[3]: {'cuts': {'foo': [['type', '*', 10.0], ['type', 'LST', 4.0], ['id', 1, 4.0]]}}

So I'd just try tomli and not toml

@kosack
Copy link
Contributor Author

kosack commented Mar 22, 2022

Yes tomli seems to work.

@kosack
Copy link
Contributor Author

kosack commented Apr 11, 2022

This reminds me one of the future PRs has to be to make these example files be generated automatically, including help text from the classes and trait options (similar to what traitlets.config does when it writes out sample python config files). For that, I will need to also add a way to specify which sections/classes to include in the example. Right now there are both the internal class and trait descriptions, and comments in the sample config files. That is unnecessary repetition, and things need to be updated in both places. I have to check if YAML or TOML lets you also include comments in the serialized output, or if it has to be done manually... I think not currently (except for some other implementations of yaml).

@kosack
Copy link
Contributor Author

kosack commented Apr 12, 2022

Will need to update the config files when #1875 is merged

@maxnoe maxnoe modified the milestones: v0.14.0, v0.15.0 Apr 13, 2022
Copy link
Member

@maxnoe maxnoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move the configs from ctapipe/tools/tests/resources to ctapipe/resources? They are not really only test resources anymore

@maxnoe maxnoe modified the milestones: v0.15.0, v0.14.0 Apr 13, 2022
Copy link
Member

@maxnoe maxnoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kosack kosack merged commit 0612a33 into cta-observatory:master Apr 14, 2022
@kosack kosack deleted the feature/allow_yaml_configs branch April 14, 2022 09:23
nbiederbeck pushed a commit that referenced this pull request Apr 14, 2022
* support loading config files from a YAML file
* a sample YAML version of the config
* add yaml example to quickstart
* fixed typo
* add TOML support
* allow multiple config files, fixes #1732
* add config files to provenance tracker
* updated help string for config_file
* a better config file with lots of explanation
* change Tool.config_file -> config_files since list
* fix incorrect f-string (now a log statement)
* update to reflect the new ShowerProcessor
* move ctapipe.tools.test.resources to ctapipe.resources
* require pyyaml and remove HAS_YAML
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ability to use multiple config files add support for YAML config files in core.Tool
3 participants