Replace WaveformCleaner and ChargeExtractor with WaveformExtractor #1033

watsonjj · 2019-03-26T18:37:52Z

This PR replaces the WaveformCleaner and ChargeExtractor with ImageExtractor.

The ImageExtractor's purpose is to extract the charge and pulse_time from a waveform. It can also apply any sort of waveform cleaning, upsampling, interpolation etc. to extract these parameters. Essentially, an ImageExtractor is some combination of the functions defined in extractor.py to obtain the charge and pulse_time.

The extract_charge method of the old ChargeExtractor has been replaced to use __call__ instead, as per #1024.

An initial algorithm for extracting the pulse_time (via a weighted average of the waveform) is also included in this PR. This approach may be replaced with a different pulse_time calculation if it is identified as providing a better time resolution.

The containers, calibration, and tools have also been updated to handle the new DL1 contents.

- Replaces ChargeExtractor - Renamed charge_extractor.py to waveform_reducer.py - WaveformReducer uses `__call__` in place of `extract_charge` - Created extract_pulse_time_weighted_average - WaveformReducer returns charge and pulse_time - Updated names for classes to be more descriptive

- This is done to avoid confusion with the DataVolumeReducer - Rename waveform_reducer to waveform_extractor - Create subtract_baseline and BaselineSubtractedNeighborWindowSum - Replace “neighbours” with “neighbors”

* charge_extractor: Fixes for tools Create abstract classes for easier traitlet configuration Replace masked array usage in extract_charge_from_peakpos_array # Conflicts: # ctapipe/image/tests/test_charge_extraction.py # ctapipe/image/waveform_extractor.py

* charge_extractor: Adopted suggestion for configuration discussed in cta-observatory#1029 # Conflicts: # ctapipe/image/waveform_extractor.py

* numba_neighbors: Add additional possible argument types Remove get_sum_array ctypes-wrapped function Create numba function neighbor_average_waveform Create test of lwt for NeighbourPeakIntegrator # Conflicts: # ctapipe/image/tests/test_charge_extraction.py # ctapipe/image/waveform_extractor.py

kosack · 2019-03-28T15:52:10Z

Perhaps there is a better name... right now "WaveformExtractor" seems like it extracts waveforms, and you would then get back a waveform.

At the end this algorithm returns estimated charges, right? Is that intended to be before or after calibration? I guess before (even if we did it backwards before), but if before it's actually more still a ChargeExtractor (even if there is some internal smoothing), or if there is calibration, it could be an "IntensityExtractor", since the final output is a calibrated light intensity.

watsonjj · 2019-03-28T15:55:07Z

The WaveformExtractor returns both charge and pulse_time.... I am open to a better name. But IntensityExtractor doesn't fully capture it either. I did start with WaveformReducer, as we are reducing the waveform down to two parameters, but I felt this was too similar to DataVolumeReducer.

watsonjj · 2019-03-28T16:00:33Z

Is that intended to be before or after calibration?

Before flat-fielding / calibration into p.e.

Yeh, the simtelarray waveforms are somewhat calibrated into p.e. in their R1Calibrator. But not fully (as the full calibration depends on the charge extraction scheme used, and the correct calib scale). There needs to be an additional calibration step added after the charge extraction, as discussed in the calibration call a few months back. I intend to add it.

watsonjj · 2019-03-28T21:43:44Z

Some ideas for the name:

SignalExtractor
ImageExtractor
CherenkovImageExtractor
WaveformEvaluator
WaveformReducer
ChargeExtractor (and we just accept that it also returns pulse_time) (also less API change...)

kosack · 2019-03-29T11:04:32Z

ImageExtractor isn't too bad, but it may be too general. It does extract "images" of charge and time, but that's not so clear from the name. I guess there is no perfect one!

* master: Corrections to address comments Add tests to Tools to check that help message works (cta-observatory#1034) make codacy happy Fix neighbors (cta-observatory#1015) Fix component docs (cta-observatory#1016) related to cta-observatory#1013 allow enums in containers and support in tableio # Conflicts: # ctapipe/image/tests/test_charge_extraction.py # ctapipe/image/waveform_extractor.py # ctapipe/tools/extract_charge_resolution.py

- Rename waveform_extractor.py to extractor.py

codecov · 2019-03-29T15:25:27Z

Codecov Report

Merging #1033 into master will increase coverage by 0.09%.
The diff coverage is 97.24%.

@@            Coverage Diff             @@
##           master    #1033      +/-   ##
==========================================
+ Coverage   83.36%   83.45%   +0.09%     
==========================================
  Files         188      186       -2     
  Lines       10663    10531     -132     
==========================================
- Hits         8889     8789     -100     
+ Misses       1774     1742      -32

Impacted Files	Coverage Δ
ctapipe/plotting/bokeh_event_viewer.py	`90.98% <ø> (+0.23%)`	⬆️
ctapipe/image/reducers.py	`40% <ø> (ø)`	⬆️
ctapipe/calib/camera/dl1.py	`93.15% <100%> (-0.83%)`	⬇️
ctapipe/image/__init__.py	`100% <100%> (ø)`	⬆️
ctapipe/utils/tests/test_tools.py	`100% <100%> (ø)`	⬆️
ctapipe/tools/display_dl1.py	`90% <100%> (ø)`	⬆️
ctapipe/calib/camera/calibrator.py	`100% <100%> (ø)`	⬆️
ctapipe/calib/camera/tests/test_calibrator.py	`100% <100%> (ø)`	⬆️
ctapipe/tools/display_integrator.py	`96.52% <100%> (-0.27%)`	⬇️
ctapipe/tools/extract_charge_resolution.py	`74.64% <100%> (ø)`	⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ebe9168...0678346. Read the comment docs.

watsonjj · 2019-04-03T09:22:05Z

@kosack @dneise @maxnoe @FrancaCassol It would be really appreciated to have this reviewed so I may continue :)

* master: Make ctapipe-extra truly optional (cta-observatory#1002) Only detect plugins when they need to exist in global Fix for TargetIOR1Calibrator

dneise · 2019-04-04T07:56:27Z

Sorry for reviewing late and then asking stupid questions. Why was this API change needed/wanted again?

I mean this part:

This PR replaces the WaveformCleaner and ChargeExtractor with ImageExtractor.

watsonjj · 2019-04-04T08:09:01Z

The aim is to greatly simplify the charge extraction process. The first discussion on this can be found at #907.

To summarise, this seperation of waveform cleaning and charge extraction is unnecessary, as the two are often very closely coupled. An image extractor should define the waveform operations it wants to use in order to extract the charge (and time).

One example is a cross correlation charge extraction technique. Applying the cross correlation to a waveform with a reference pulse results in a trace which defines how correlated the waveform is with the pulse for each position in the waveform. This could be considered a waveform cleaning. However, the value of each sample of this result is already an integration of charge, therefore one only needs to select the correct sample. It does not make sense to allow the user to select a window for this charge extraction technique. See my ASWG presentation in Barcelona for more details on the cross correlation technique if you are interested.

I don't think the freedom to flexibly chose a waveform cleaner and charge extractor (and mix and match at will) is necessary at run time. Instead, I think it is better to explicitly define individual charge extractors that are deemed to be useful implementations, which the user can select from.

watsonjj · 2019-04-04T08:11:17Z

And a further point, the ChargeExtractor was replaced with ImageExtractor, as its purpose is to extract images of both charge and time from the waveform.

dneise · 2019-04-04T08:27:44Z

Wow ..long text ... but thanks a lot

dneise · 2019-04-04T09:05:22Z

ctapipe/image/extractor.py

+
+    """
+    samples_i = np.indices(waveforms.shape)[2]
+    pulse_time = np.average(samples_i, weights=waveforms, axis=2)


Performance note: line 127 can be pulled out of this function and be done only once per telescope.
I found it interesting to see, that for waveform shapes like (2, 1000, 300) lines 127 and 128 both took equally long, ~1.7ms or so on my machine. So moving line 127 out of this, should reduce runtime by ~50%.

I found an ugly hack to make sample_i in a faster way using something called stride_tricks ... I would not recommend using it ... but it might be interesting to know about.

In [65]: %%timeit ...: i = np.indices(waveforms.shape)[2] ...: result = np.average(i, weights=waveforms, axis=2) 3.55 ms ± 16.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [63]: %%timeit ...: S = waveforms.shape ...: I = np.arange(S[2]) ...: i = np.lib.stride_tricks.as_strided(I, (S[0], S[1], I.size), (0, 0, I.itemsize)) # a 3D view, no copy of the data ...: result = np.average(i, weights=waveforms, axis=2) 1.53 ms ± 5.35 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Yep this is a good point. I also noticed that np.indices is surprisingly slow in my investigations for #1038. It would be my intention to replace this function with a numba for-loop version in a future PR, which should result in a huge improvement in performance.

I did not know about stride_tricks, thank-you for bringing it up.

dneise · 2019-04-04T11:57:32Z

ctapipe/image/extractor.py

+    samples_i = np.indices(waveforms.shape)[2]
+    pulse_time = np.average(samples_i, weights=waveforms, axis=2)
+    pulse_time[pulse_time < 0] = 0
+    pulse_time[pulse_time > waveforms.shape[2]] = waveforms.shape[2]


I'm not sure ... is it possible to have so funny weights, that the pulse_time can be outside of the range [0...waveforms.shape[2]]? ... I think not... unless ... of course .. when the weights are negative, then the average can be pulled out of range, but I think this should be avoided.

So I played a little bit around with this kind of average ... I used this kind of signal:

As one can see the actual pulse start time and the result of this np.average(sample_i, weights=waveforms) is clearly correlated as expected.

But as soon as the baseline moves around problems occur:

So I think there needs some care to be taken with respect to these weights. And one should not simply say ... ah ... when the pulse_time is negative .. then we set it to zero ... I think when the pulse_time is out of range, this simply is a strong hint, that something was not good about the range of these weights.

Ah sorry .. I just see, I labelled my axis wrongly: What I called average_time is actually called pulse_time.

So dneise_average_time = pulse_time = np.average(samples_i, weights=waveform)

Yeah you are absolutely right. This implementation for obtaining pulse time is very simple, but not very robust. The limit I added for the pulse time to be within the waveform readout was done to make it easier to plot this. Thinking about it now, it could be better to set such unrealistic pulse times to nan (what do you think?).

Or do you suggest only use samples with positive weights in the calculation?

I am inclined to suggest that contributions are made for better pulse time algorithms in future pull requests, which can replace the current function used in the ImageExtractors. Unless you are adamantly against accepting this implementation temporarily.

Maybe not cut at all then. The result of this implementation is (even in the best case) only correlated with and not identical to the arrival time (50% of rising edge) .. so it is difficult to say what the allowed range is ... I think .. so maybe avoiding a cut and leave the cut to the user of this implementation is ok.

No no .. simple implementations are good .. just this cut was puzzling me.

Okay, that seems somewhat better, thanks. However, with this implementation we would have to handle the case where there are no positive samples, which would involve setting the pulse_time to some value such as zero.

Could we reserve consideration of better pulse_time algorithms for a different PR?

I have added it as a task in #1026

Indeed if you set out of range results to nan that might be better, since the user knows there was indeed a cut.

Could we reserve consideration of better pulse_time algorithms for a different PR?

totally. You asked me do produce these plots.

ctapipe/image/waveform_reducer.py

dneise

I think I am now through all commits. Nice.

watsonjj · 2019-04-04T15:57:30Z

I have renamed the UserWindowSum and set out-of-range pulse_times to nan.

@kosack @dneise Could you please restore your approvals?

watsonjj · 2019-04-05T08:04:29Z

Setting the out-of-bounds pulse-times to nan can cause problems down the chain (some algorithms are not happy with nans existing in an array). This is currently causing an error in travis. I suggest that the pulse time is set to -1 for the time being, and we create a better pulse_time calculation in a different PR.

dneise

👍

watsonjj added 18 commits March 25, 2019 12:40

Replace WaveformReducer with WaveformExtractor

d07a800

- This is done to avoid confusion with the DataVolumeReducer - Rename waveform_reducer to waveform_extractor - Create subtract_baseline and BaselineSubtractedNeighborWindowSum - Replace “neighbours” with “neighbors”

Created tests for waveform_extractor.py

199e0f8

Removed waveform_cleaning.py

f5ec3b2

Updated containers

e835769

Updated __init__.py

c6c2d2b

Updated calibration to use new WaveformExtractors

7f59bef

Merge branch 'charge_extractor' into waveform_reducer

87dfab1

* charge_extractor: Adopted suggestion for configuration discussed in cta-observatory#1029 # Conflicts: # ctapipe/image/waveform_extractor.py

Fix name mixup after merge

91a10c8

Limit range for pulse_time calculation

55324f0

Update bokeh event plotter to handle new WaveformExtractor

431f23f

Update tools

1981a30

Correct references to ChargeExtractor

54a9f8a

Remove further references

a3419b3

Correct tools

d70c0dc

Improved names of WaveformExtractors

23bc110

watsonjj added API change refactoring labels Mar 26, 2019

watsonjj mentioned this pull request Mar 26, 2019

Major refactoring of calibration chain #1026

Closed

24 tasks

watsonjj added 4 commits March 29, 2019 13:15

Deleted empty file

cabe80e

Rename WaveformExtractor to ImageExtractor

79b9276

- Rename waveform_extractor.py to extractor.py

Update doc

000dbc7

watsonjj added the ready for review label Mar 29, 2019

Merge branch 'master' into waveform_reducer

80be3dc

* master: Make ctapipe-extra truly optional (cta-observatory#1002) Only detect plugins when they need to exist in global Fix for TargetIOR1Calibrator

watsonjj mentioned this pull request Apr 3, 2019

Create numba ufunc for sum of samples within charge extraction window #1038

Merged

kosack previously approved these changes Apr 3, 2019

View reviewed changes

dneise reviewed Apr 4, 2019

View reviewed changes

ctapipe/image/waveform_reducer.py Outdated Show resolved Hide resolved

dneise previously approved these changes Apr 4, 2019

View reviewed changes

watsonjj added 2 commits April 4, 2019 17:55

Rename UserWindowSum to FixedWindowSum

e510fd9

Set to nan if out of range

ca6c2a3

watsonjj dismissed stale reviews from dneise and kosack via ca6c2a3 April 4, 2019 15:56

Set out-of-bounds pulse time to -1

0678346

watsonjj requested review from maxnoe, dneise and kosack April 5, 2019 08:54

kosack approved these changes Apr 8, 2019

View reviewed changes

dneise approved these changes Apr 8, 2019

View reviewed changes

watsonjj merged commit 6a243b2 into cta-observatory:master Apr 8, 2019

watsonjj deleted the waveform_reducer branch April 29, 2019 14:00

This was referenced Oct 24, 2019

How to add peak arrival times? #1021

Closed

Peakpos calculation #907

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace WaveformCleaner and ChargeExtractor with WaveformExtractor #1033

Replace WaveformCleaner and ChargeExtractor with WaveformExtractor #1033

watsonjj commented Mar 26, 2019 •

edited

Loading

kosack commented Mar 28, 2019

watsonjj commented Mar 28, 2019

watsonjj commented Mar 28, 2019

watsonjj commented Mar 28, 2019 •

edited

Loading

kosack commented Mar 29, 2019

codecov bot commented Mar 29, 2019 •

edited

Loading

watsonjj commented Apr 3, 2019

dneise commented Apr 4, 2019

watsonjj commented Apr 4, 2019

watsonjj commented Apr 4, 2019

dneise commented Apr 4, 2019

dneise Apr 4, 2019

watsonjj Apr 4, 2019

dneise Apr 4, 2019 •

edited

Loading

dneise Apr 4, 2019

watsonjj Apr 4, 2019

dneise Apr 4, 2019

dneise Apr 4, 2019

dneise Apr 4, 2019

watsonjj Apr 4, 2019

watsonjj Apr 4, 2019

dneise Apr 4, 2019

dneise Apr 4, 2019

dneise left a comment

watsonjj commented Apr 4, 2019 •

edited

Loading

watsonjj commented Apr 5, 2019

dneise left a comment

Replace WaveformCleaner and ChargeExtractor with WaveformExtractor #1033

Replace WaveformCleaner and ChargeExtractor with WaveformExtractor #1033

Conversation

watsonjj commented Mar 26, 2019 • edited Loading

kosack commented Mar 28, 2019

watsonjj commented Mar 28, 2019

watsonjj commented Mar 28, 2019

watsonjj commented Mar 28, 2019 • edited Loading

kosack commented Mar 29, 2019

codecov bot commented Mar 29, 2019 • edited Loading

Codecov Report

watsonjj commented Apr 3, 2019

dneise commented Apr 4, 2019

watsonjj commented Apr 4, 2019

watsonjj commented Apr 4, 2019

dneise commented Apr 4, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dneise Apr 4, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dneise left a comment

Choose a reason for hiding this comment

watsonjj commented Apr 4, 2019 • edited Loading

watsonjj commented Apr 5, 2019

dneise left a comment

Choose a reason for hiding this comment

watsonjj commented Mar 26, 2019 •

edited

Loading

watsonjj commented Mar 28, 2019 •

edited

Loading

codecov bot commented Mar 29, 2019 •

edited

Loading

dneise Apr 4, 2019 •

edited

Loading

watsonjj commented Apr 4, 2019 •

edited

Loading