Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stats calc tool #2628
base: main
Are you sure you want to change the base?
Stats calc tool #2628
Changes from all commits
4b82a4f
09e49c8
1eefbff
18b98a0
82a4641
cb2e3fe
ba7bb0e
cb94b54
82afdae
d62982f
cca5197
3d91028
092c1e1
3548c09
b0c197c
d3d3304
b6587ec
9f8268f
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd not add this as
input_column_name
but as prefix for the columns. Using prefixes you can also aggregate multiple columns easily.See e.g. how we write tables in
ctapipe-apply-models
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maxnoe , we discussed this with @TjarkMiener offline and we don't see much use cases for this. Also, having the metadata as a prefix for the column name can make queries awkward and schemas inflexible (i.e. set of aggregators will be fixed for each input source). Also, in this case, something like pandas.MultiIndex will be more appropriate in my opinion, but again, I don't see much use. I'd leave this as is for the moment, and modify if a corresponding use case appears.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The usecase is consistency with what we doe elsewhere and loading such data with e.g.
TableLoader
.I.e. my use case would be to load these data and look add aggregated statistics for multiple columns.
This has to match waht e.g. is planned for quality pipe, right? Aggregating high-level information for many of the data columns and other things and loading them at the same time, producing control plots etc.
But fine to do in a follow up PR.
I really don't like the limitation to a single column though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me what you want sounds like a pandas.MultiIndex, which is not supported in the current way of astropy, and I'm not sure whether it will ever be supported like that... Also, aggregating (flattening) and joining can be done elsewhere (e.g. when QualPipe reads it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that independent of the columns names / data organization I don't like that this tool can only aggregate a single column, and that using it for multiple requires separate runs of the tool with separate output files, performing the expensive data loading again and again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tool is not only aggregating stats, but also detecting outliers and treating regions of trouble properly. Each column requires a distinct set of configuration parameters. Therefore, I think it is inevitable to perform separate runs of the tool, which are independent on each other.