-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace WaveformCleaner and ChargeExtractor with WaveformExtractor #1033
Conversation
- Replaces ChargeExtractor - Renamed charge_extractor.py to waveform_reducer.py - WaveformReducer uses `__call__` in place of `extract_charge` - Created extract_pulse_time_weighted_average - WaveformReducer returns charge and pulse_time - Updated names for classes to be more descriptive
- This is done to avoid confusion with the DataVolumeReducer - Rename waveform_reducer to waveform_extractor - Create subtract_baseline and BaselineSubtractedNeighborWindowSum - Replace “neighbours” with “neighbors”
* charge_extractor: Fixes for tools Create abstract classes for easier traitlet configuration Replace masked array usage in extract_charge_from_peakpos_array # Conflicts: # ctapipe/image/tests/test_charge_extraction.py # ctapipe/image/waveform_extractor.py
* charge_extractor: Adopted suggestion for configuration discussed in cta-observatory#1029 # Conflicts: # ctapipe/image/waveform_extractor.py
* numba_neighbors: Add additional possible argument types Remove get_sum_array ctypes-wrapped function Create numba function neighbor_average_waveform Create test of lwt for NeighbourPeakIntegrator # Conflicts: # ctapipe/image/tests/test_charge_extraction.py # ctapipe/image/waveform_extractor.py
Perhaps there is a better name... right now "WaveformExtractor" seems like it extracts waveforms, and you would then get back a waveform. At the end this algorithm returns estimated charges, right? Is that intended to be before or after calibration? I guess before (even if we did it backwards before), but if before it's actually more still a ChargeExtractor (even if there is some internal smoothing), or if there is calibration, it could be an "IntensityExtractor", since the final output is a calibrated light intensity. |
The |
Before flat-fielding / calibration into p.e. Yeh, the simtelarray waveforms are somewhat calibrated into p.e. in their R1Calibrator. But not fully (as the full calibration depends on the charge extraction scheme used, and the correct calib scale). There needs to be an additional calibration step added after the charge extraction, as discussed in the calibration call a few months back. I intend to add it. |
Some ideas for the name:
|
|
* master: Corrections to address comments Add tests to Tools to check that help message works (cta-observatory#1034) make codacy happy Fix neighbors (cta-observatory#1015) Fix component docs (cta-observatory#1016) related to cta-observatory#1013 allow enums in containers and support in tableio # Conflicts: # ctapipe/image/tests/test_charge_extraction.py # ctapipe/image/waveform_extractor.py # ctapipe/tools/extract_charge_resolution.py
- Rename waveform_extractor.py to extractor.py
Codecov Report
@@ Coverage Diff @@
## master #1033 +/- ##
==========================================
+ Coverage 83.36% 83.45% +0.09%
==========================================
Files 188 186 -2
Lines 10663 10531 -132
==========================================
- Hits 8889 8789 -100
+ Misses 1774 1742 -32
Continue to review full report at Codecov.
|
@kosack @dneise @maxnoe @FrancaCassol It would be really appreciated to have this reviewed so I may continue :) |
* master: Make ctapipe-extra truly optional (cta-observatory#1002) Only detect plugins when they need to exist in global Fix for TargetIOR1Calibrator
Sorry for reviewing late and then asking stupid questions. Why was this API change needed/wanted again? I mean this part:
|
The aim is to greatly simplify the charge extraction process. The first discussion on this can be found at #907. To summarise, this seperation of waveform cleaning and charge extraction is unnecessary, as the two are often very closely coupled. An image extractor should define the waveform operations it wants to use in order to extract the charge (and time). One example is a cross correlation charge extraction technique. Applying the cross correlation to a waveform with a reference pulse results in a trace which defines how correlated the waveform is with the pulse for each position in the waveform. This could be considered a waveform cleaning. However, the value of each sample of this result is already an integration of charge, therefore one only needs to select the correct sample. It does not make sense to allow the user to select a window for this charge extraction technique. See my ASWG presentation in Barcelona for more details on the cross correlation technique if you are interested. I don't think the freedom to flexibly chose a waveform cleaner and charge extractor (and mix and match at will) is necessary at run time. Instead, I think it is better to explicitly define individual charge extractors that are deemed to be useful implementations, which the user can select from. |
And a further point, the |
Wow ..long text ... but thanks a lot |
|
||
""" | ||
samples_i = np.indices(waveforms.shape)[2] | ||
pulse_time = np.average(samples_i, weights=waveforms, axis=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance note: line 127 can be pulled out of this function and be done only once per telescope.
I found it interesting to see, that for waveform shapes like (2, 1000, 300) lines 127 and 128 both took equally long, ~1.7ms or so on my machine. So moving line 127 out of this, should reduce runtime by ~50%.
I found an ugly hack to make sample_i
in a faster way using something called stride_tricks
... I would not recommend using it ... but it might be interesting to know about.
In [65]: %%timeit
...: i = np.indices(waveforms.shape)[2]
...: result = np.average(i, weights=waveforms, axis=2)
3.55 ms ± 16.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [63]: %%timeit
...: S = waveforms.shape
...: I = np.arange(S[2])
...: i = np.lib.stride_tricks.as_strided(I, (S[0], S[1], I.size), (0, 0, I.itemsize)) # a 3D view, no copy of the data
...: result = np.average(i, weights=waveforms, axis=2)
1.53 ms ± 5.35 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep this is a good point. I also noticed that np.indices is surprisingly slow in my investigations for #1038. It would be my intention to replace this function with a numba for-loop version in a future PR, which should result in a huge improvement in performance.
I did not know about stride_tricks
, thank-you for bringing it up.
ctapipe/image/extractor.py
Outdated
samples_i = np.indices(waveforms.shape)[2] | ||
pulse_time = np.average(samples_i, weights=waveforms, axis=2) | ||
pulse_time[pulse_time < 0] = 0 | ||
pulse_time[pulse_time > waveforms.shape[2]] = waveforms.shape[2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure ... is it possible to have so funny weights, that the pulse_time can be outside of the range [0...waveforms.shape[2]]
? ... I think not... unless ... of course .. when the weights are negative, then the average can be pulled out of range, but I think this should be avoided.
So I played a little bit around with this kind of average ... I used this kind of signal:
As one can see the actual pulse start time and the result of this np.average(sample_i, weights=waveforms)
is clearly correlated as expected.
But as soon as the baseline moves around problems occur:
So I think there needs some care to be taken with respect to these weights. And one should not simply say ... ah ... when the pulse_time is negative .. then we set it to zero ... I think when the pulse_time is out of range, this simply is a strong hint, that something was not good about the range of these weights.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry .. I just see, I labelled my axis wrongly: What I called average_time
is actually called pulse_time
.
So dneise_average_time = pulse_time = np.average(samples_i, weights=waveform)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you are absolutely right. This implementation for obtaining pulse time is very simple, but not very robust. The limit I added for the pulse time to be within the waveform readout was done to make it easier to plot this. Thinking about it now, it could be better to set such unrealistic pulse times to nan (what do you think?).
Or do you suggest only use samples with positive weights in the calculation?
I am inclined to suggest that contributions are made for better pulse time algorithms in future pull requests, which can replace the current function used in the ImageExtractors. Unless you are adamantly against accepting this implementation temporarily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not cut at all then. The result of this implementation is (even in the best case) only correlated with and not identical to the arrival time (50% of rising edge) .. so it is difficult to say what the allowed range is ... I think .. so maybe avoiding a cut and leave the cut to the user of this implementation is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No no .. simple implementations are good .. just this cut was puzzling me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that seems somewhat better, thanks. However, with this implementation we would have to handle the case where there are no positive samples, which would involve setting the pulse_time to some value such as zero.
Could we reserve consideration of better pulse_time algorithms for a different PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added it as a task in #1026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed if you set out of range results to nan that might be better, since the user knows there was indeed a cut.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we reserve consideration of better pulse_time algorithms for a different PR?
totally. You asked me do produce these plots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I am now through all commits. Nice.
Setting the out-of-bounds pulse-times to nan can cause problems down the chain (some algorithms are not happy with nans existing in an array). This is currently causing an error in travis. I suggest that the pulse time is set to -1 for the time being, and we create a better pulse_time calculation in a different PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
This PR replaces the
WaveformCleaner
andChargeExtractor
withImageExtractor
.The
ImageExtractor's
purpose is to extract thecharge
andpulse_time
from a waveform. It can also apply any sort of waveform cleaning, upsampling, interpolation etc. to extract these parameters. Essentially, anImageExtractor
is some combination of the functions defined in extractor.py to obtain thecharge
andpulse_time
.The
extract_charge
method of the oldChargeExtractor
has been replaced to use__call__
instead, as per #1024.An initial algorithm for extracting the
pulse_time
(via a weighted average of the waveform) is also included in this PR. This approach may be replaced with a differentpulse_time
calculation if it is identified as providing a better time resolution.The containers, calibration, and tools have also been updated to handle the new DL1 contents.