Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DL1 parameters names and units #37

Closed
vuillaut opened this issue Nov 22, 2018 · 6 comments
Closed

DL1 parameters names and units #37

vuillaut opened this issue Nov 22, 2018 · 6 comments
Labels
enhancement New feature or request

Comments

@vuillaut
Copy link
Member

One of the advantages of using the DL1 container is to fix the names of the parameters that will be dumped and therefore used for later stages:

In #30 I didn't want to change conventions chosen by @misabelber to have the exact same chain.

However, there are some things I'd like to change/discuss:

  • intensity is now by default given as log, therefore setting a min=0. Sure intensity should be > 0 after tailcut but we can imagine a different set of calibration/cleaning that could change that in the future. At minima, the variable should be renamed log_intensity to keep track of this info somewhere.
  • energy is now computed as log(E/GeV) while the container use the default unit used in ctapipe: TeV. The container makes the conversion to TeV based on astropy.units whatever the unit given as an entry, ensuring consistency.
  • impact and mc_core_distance are actually the same. Is there a name carrying more meaning that you'd prefer to keep?

note: removing log will need to update the DL2 stage as RF without scaler will probably not perform as well. However applying log could be at this stage, or another scaler could be used.

@misabelber
Copy link
Collaborator

I'm not sure what to do about the two first points. Keeping the log scaling in the container makes it easier to use the RF without having to re-scale these quantities during the analysis chain. I'm okay with maybe changing the names to "log_whatever" and to keep the energy in TeV instead of GeV. I preferred to use GeV just to avoid negative values in the log.
About impact and mc_core_distance, we can just remove "impact", I think this is a residual from old tests that has remained there.

@maxnoe
Copy link
Member

maxnoe commented Mar 7, 2019

@misabelber Please have a look at https://github.com/fact-project/aict-tools, especially the scripts train_energy_regressor,
https://github.com/fact-project/aict-tools/blob/master/aict_tools/scripts/train_energy_regressor.py

and apply_energy_regressor:
https://github.com/fact-project/aict-tools/blob/master/aict_tools/scripts/apply_energy_regressor.py

especially stuff like log_target, feature selection, feature preparation, model serialization etc.
These scripts contain a lot of stuff that fixes several pitfalls we experienced in the last couple of years, e.g.
scikit-learns trees only being 32 bit floats, random seeds, pickling models correctly together with the features they used, making sure features are always sorted alphabetically, so they are correctly used when loading a pickled model etc. etc.

@misabelber
Copy link
Collaborator

Thanks @maxnoe, I'll take a look! Seems to be very useful information

@rlopezcoto
Copy link
Contributor

any further action was taken specifically on these points or was it merged with some DL1 enhancements #141?

@rlopezcoto rlopezcoto added the enhancement New feature or request label Oct 17, 2019
@vuillaut
Copy link
Member Author

any further action was taken specifically on these points or was it merged with some DL1 enhancements #141?

Not quite in my opinion.

  1. There is no information on units.
  2. Naming is not explicit for variables in log.

Taking care of 2. should be quick (I can do it before release of 0.1 if you want)
Point 1. would need some working...

@vuillaut
Copy link
Member Author

See #194 and issue #132 for units.
Closing this one now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants