-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chunk interpolation to select calibration data #2634
base: main
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
2 similar comments
This comment has been minimized.
This comment has been minimized.
Analysis Details1 IssueCoverage and DuplicationsProject ID: cta-observatory_ctapipe_AY52EYhuvuGcMFidNyUs |
I am a bit stuck here with one of the tests. test_hdf5 is failing in pytest begause the data from the file is not loaded correctly. When i look in the test what x and y values the interpolators have i get some wrong values. However if i try to do the same outside of pytest it works. I used this code to test by myself:
Has anyone an idea what is wrong here? |
Can also be done in a follow-up PR of course. |
I'd consider having a factory then. I think, just "CalibrationInterpolator" won't make much sense, but a specific ones like "FFInterpolator" may. I'd address it in another PR. |
Yes, factory makes sense for this |
super().__init__(**kwargs) | ||
self._interpolators = {} | ||
self.required_columns = ["start_time", "end_time"] | ||
self.expected_units = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these instance variables? And why are they empty?
This is not in line with how the other class works.
It seems this class variable is unused even.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here required_columns
shall become class attributes, but the interpolators and expected units shall remain instance variable, to allow creation of multiple instances. We have to see in the future PR, whether we want further specialization (e.g. a factory of VarNameInterpolator
s), that may lead to change this (they will basically become singletons).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expected units are unused because the quantity is dimensionless, we may want to actually enforce this through u.dimensioneless_unscaled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had implemented that for collumns that are supposed to not have a unit we set the expected unit to None and check if the actual unit is equivalent. I put it like that.
def __init__(self, h5file: None | tables.File = None, **kwargs: Any) -> None: | ||
super().__init__(**kwargs) | ||
self._interpolators = {} | ||
self.required_columns = ["start_time", "end_time"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
required columns as start and stop time shall be the class attributes and shall be frozen, they are mandatory. You can copy them to an instance and extend with a value column(s) when you engage a __call__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as the functions that use required_columns are in the parent class and always look to that name it makes more sense to have it as a modifiable instance variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. The original design idea was to have the required_columns frozen per "final" class. I.e. MonitoringInterpolator
requires altitude
and azimuth
columns.
This is due to the configuration system in ctapipe, which works on class name basis, not instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the most sensible way to keep the required units and columns as class variables is to have pedestal and flatfield interpolators that inherit from the ChunkInterpolator. The ChunkInterpolator now has no required columns or units, but rather the subclasses have those variables. If we use the FlatFieldInterpolator we know we will always look for a column with relative gain factors with no unit, similarly we know what data a PedestalInterpolator will need.
I made those classes, and if we want to use the Chunk interpolation later for some other data we can add another subclass.
self.required_columns.update(columns) | ||
self.required_columns = set(self.required_columns) | ||
for col in columns: | ||
self.expected_units[col] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here i set the unit of the new columns to None, which is then in the next line enforced by _check_tables. This way we ensure the values have no unit.
|
||
for column in self.columns: | ||
self.values[tel_id][column] = input_table[column] | ||
self.start_time[tel_id][column] = input_table["start_time"].to_value("mjd") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why store start_time
and end_time
per column?
raise ValueError( | ||
f"Column '{column}' not found in interpolators for tel_id {tel_id}" | ||
) | ||
result[column] = self._interpolators[tel_id][column](mjd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use self._interpolators
, why keep that around at all?
Why not just call self._interpolate_chunk(tel_id, column, mjd)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
read_table
checks if _interpolators
has already been set up for the given tel_id. _check_interpolators
in MonitoringInterpolator
also does that check and adds data from hdf5file if the interpolator is not set. I can move column to be an argument of _interpolate_chunk
though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's because you didn't move _check_interpolators
to the LinearInterpolator
but kept it in MonitoringInterpolator
. The MonitoringInterpolator
still contains more than just the abstract interface.
@maxnoe can you please take a look? I believe, all your comments have been addressed. |
self.values[tel_id] = {} | ||
self.start_time[tel_id] = input_table["start_time"].to_value("mjd") | ||
self.end_time[tel_id] = input_table["end_time"].to_value("mjd") | ||
self._interpolators[tel_id] = partial(self._interpolate_chunk, tel_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we still storing a lambda / partial here in _interpolators
? Why not just call self._interpolate_chunk
directly in __call__
?
I will need a method to select calibration data for the strar tracker. I made some slides to decribe how it is supposed to work:
https://docs.google.com/presentation/d/1oxIcYSQvGnU7IQYy3fGdcv0qXiLpvaXR9YtmnDesj4Y/edit?usp=sharing