Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace WaveformCleaner and ChargeExtractor with WaveformExtractor #1033

Merged
merged 30 commits into from
Apr 8, 2019

Conversation

watsonjj
Copy link
Contributor

@watsonjj watsonjj commented Mar 26, 2019

This PR replaces the WaveformCleaner and ChargeExtractor with ImageExtractor.

The ImageExtractor's purpose is to extract the charge and pulse_time from a waveform. It can also apply any sort of waveform cleaning, upsampling, interpolation etc. to extract these parameters. Essentially, an ImageExtractor is some combination of the functions defined in extractor.py to obtain the charge and pulse_time.

The extract_charge method of the old ChargeExtractor has been replaced to use __call__ instead, as per #1024.

An initial algorithm for extracting the pulse_time (via a weighted average of the waveform) is also included in this PR. This approach may be replaced with a different pulse_time calculation if it is identified as providing a better time resolution.

The containers, calibration, and tools have also been updated to handle the new DL1 contents.

watsonjj added 18 commits March 25, 2019 12:40
- Replaces ChargeExtractor
- Renamed charge_extractor.py to waveform_reducer.py
- WaveformReducer uses `__call__` in place of `extract_charge`
- Created extract_pulse_time_weighted_average
- WaveformReducer returns charge and pulse_time
- Updated names for classes to be more descriptive
- This is done to avoid confusion with the DataVolumeReducer
- Rename waveform_reducer to waveform_extractor
- Create subtract_baseline and BaselineSubtractedNeighborWindowSum
- Replace “neighbours” with “neighbors”
* charge_extractor:
  Fixes for tools
  Create abstract classes for easier traitlet configuration
  Replace masked array usage in extract_charge_from_peakpos_array

# Conflicts:
#	ctapipe/image/tests/test_charge_extraction.py
#	ctapipe/image/waveform_extractor.py
* charge_extractor:
  Adopted suggestion for configuration discussed in cta-observatory#1029

# Conflicts:
#	ctapipe/image/waveform_extractor.py
* numba_neighbors:
  Add additional possible argument types
  Remove get_sum_array ctypes-wrapped function
  Create numba function neighbor_average_waveform
  Create test of lwt for NeighbourPeakIntegrator

# Conflicts:
#	ctapipe/image/tests/test_charge_extraction.py
#	ctapipe/image/waveform_extractor.py
@kosack
Copy link
Contributor

kosack commented Mar 28, 2019

Perhaps there is a better name... right now "WaveformExtractor" seems like it extracts waveforms, and you would then get back a waveform.

At the end this algorithm returns estimated charges, right? Is that intended to be before or after calibration? I guess before (even if we did it backwards before), but if before it's actually more still a ChargeExtractor (even if there is some internal smoothing), or if there is calibration, it could be an "IntensityExtractor", since the final output is a calibrated light intensity.

@watsonjj
Copy link
Contributor Author

The WaveformExtractor returns both charge and pulse_time.... I am open to a better name. But IntensityExtractor doesn't fully capture it either. I did start with WaveformReducer, as we are reducing the waveform down to two parameters, but I felt this was too similar to DataVolumeReducer.

@watsonjj
Copy link
Contributor Author

Is that intended to be before or after calibration?

Before flat-fielding / calibration into p.e.

Yeh, the simtelarray waveforms are somewhat calibrated into p.e. in their R1Calibrator. But not fully (as the full calibration depends on the charge extraction scheme used, and the correct calib scale). There needs to be an additional calibration step added after the charge extraction, as discussed in the calibration call a few months back. I intend to add it.

@watsonjj
Copy link
Contributor Author

watsonjj commented Mar 28, 2019

Some ideas for the name:

  1. SignalExtractor
  2. ImageExtractor
  3. CherenkovImageExtractor
  4. WaveformEvaluator
  5. WaveformReducer
  6. ChargeExtractor (and we just accept that it also returns pulse_time) (also less API change...)

@kosack
Copy link
Contributor

kosack commented Mar 29, 2019

ImageExtractor isn't too bad, but it may be too general. It does extract "images" of charge and time, but that's not so clear from the name. I guess there is no perfect one!

* master:
  Corrections to address comments
  Add tests to Tools to check that help message works (cta-observatory#1034)
  make codacy happy
  Fix neighbors (cta-observatory#1015)
  Fix component docs (cta-observatory#1016) related to cta-observatory#1013
  allow enums in containers and support in tableio

# Conflicts:
#	ctapipe/image/tests/test_charge_extraction.py
#	ctapipe/image/waveform_extractor.py
#	ctapipe/tools/extract_charge_resolution.py
- Rename waveform_extractor.py to extractor.py
@codecov
Copy link

codecov bot commented Mar 29, 2019

Codecov Report

Merging #1033 into master will increase coverage by 0.09%.
The diff coverage is 97.24%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1033      +/-   ##
==========================================
+ Coverage   83.36%   83.45%   +0.09%     
==========================================
  Files         188      186       -2     
  Lines       10663    10531     -132     
==========================================
- Hits         8889     8789     -100     
+ Misses       1774     1742      -32
Impacted Files Coverage Δ
ctapipe/plotting/bokeh_event_viewer.py 90.98% <ø> (+0.23%) ⬆️
ctapipe/image/reducers.py 40% <ø> (ø) ⬆️
ctapipe/calib/camera/dl1.py 93.15% <100%> (-0.83%) ⬇️
ctapipe/image/__init__.py 100% <100%> (ø) ⬆️
ctapipe/utils/tests/test_tools.py 100% <100%> (ø) ⬆️
ctapipe/tools/display_dl1.py 90% <100%> (ø) ⬆️
ctapipe/calib/camera/calibrator.py 100% <100%> (ø) ⬆️
ctapipe/calib/camera/tests/test_calibrator.py 100% <100%> (ø) ⬆️
ctapipe/tools/display_integrator.py 96.52% <100%> (-0.27%) ⬇️
ctapipe/tools/extract_charge_resolution.py 74.64% <100%> (ø) ⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ebe9168...0678346. Read the comment docs.

@watsonjj
Copy link
Contributor Author

watsonjj commented Apr 3, 2019

@kosack @dneise @maxnoe @FrancaCassol It would be really appreciated to have this reviewed so I may continue :)

* master:
  Make ctapipe-extra truly optional (cta-observatory#1002)
  Only detect plugins when they need to exist in global
  Fix for TargetIOR1Calibrator
kosack
kosack previously approved these changes Apr 3, 2019
@dneise
Copy link
Member

dneise commented Apr 4, 2019

Sorry for reviewing late and then asking stupid questions. Why was this API change needed/wanted again?

I mean this part:

This PR replaces the WaveformCleaner and ChargeExtractor with ImageExtractor.

@watsonjj
Copy link
Contributor Author

watsonjj commented Apr 4, 2019

The aim is to greatly simplify the charge extraction process. The first discussion on this can be found at #907.

To summarise, this seperation of waveform cleaning and charge extraction is unnecessary, as the two are often very closely coupled. An image extractor should define the waveform operations it wants to use in order to extract the charge (and time).

One example is a cross correlation charge extraction technique. Applying the cross correlation to a waveform with a reference pulse results in a trace which defines how correlated the waveform is with the pulse for each position in the waveform. This could be considered a waveform cleaning. However, the value of each sample of this result is already an integration of charge, therefore one only needs to select the correct sample. It does not make sense to allow the user to select a window for this charge extraction technique. See my ASWG presentation in Barcelona for more details on the cross correlation technique if you are interested.

I don't think the freedom to flexibly chose a waveform cleaner and charge extractor (and mix and match at will) is necessary at run time. Instead, I think it is better to explicitly define individual charge extractors that are deemed to be useful implementations, which the user can select from.

@watsonjj
Copy link
Contributor Author

watsonjj commented Apr 4, 2019

And a further point, the ChargeExtractor was replaced with ImageExtractor, as its purpose is to extract images of both charge and time from the waveform.

@dneise
Copy link
Member

dneise commented Apr 4, 2019

Wow ..long text ... but thanks a lot


"""
samples_i = np.indices(waveforms.shape)[2]
pulse_time = np.average(samples_i, weights=waveforms, axis=2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance note: line 127 can be pulled out of this function and be done only once per telescope.
I found it interesting to see, that for waveform shapes like (2, 1000, 300) lines 127 and 128 both took equally long, ~1.7ms or so on my machine. So moving line 127 out of this, should reduce runtime by ~50%.

I found an ugly hack to make sample_i in a faster way using something called stride_tricks ... I would not recommend using it ... but it might be interesting to know about.

In [65]: %%timeit
    ...: i = np.indices(waveforms.shape)[2]
    ...: result = np.average(i, weights=waveforms, axis=2)
3.55 ms ± 16.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [63]: %%timeit 
    ...: S = waveforms.shape
    ...: I = np.arange(S[2])
    ...: i = np.lib.stride_tricks.as_strided(I, (S[0], S[1], I.size), (0, 0, I.itemsize))  # a 3D view, no copy of the data
    ...: result = np.average(i, weights=waveforms, axis=2)
1.53 ms ± 5.35 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this is a good point. I also noticed that np.indices is surprisingly slow in my investigations for #1038. It would be my intention to replace this function with a numba for-loop version in a future PR, which should result in a huge improvement in performance.

I did not know about stride_tricks, thank-you for bringing it up.

samples_i = np.indices(waveforms.shape)[2]
pulse_time = np.average(samples_i, weights=waveforms, axis=2)
pulse_time[pulse_time < 0] = 0
pulse_time[pulse_time > waveforms.shape[2]] = waveforms.shape[2]
Copy link
Member

@dneise dneise Apr 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure ... is it possible to have so funny weights, that the pulse_time can be outside of the range [0...waveforms.shape[2]]? ... I think not... unless ... of course .. when the weights are negative, then the average can be pulled out of range, but I think this should be avoided.

So I played a little bit around with this kind of average ... I used this kind of signal:
image

As one can see the actual pulse start time and the result of this np.average(sample_i, weights=waveforms) is clearly correlated as expected.

image

But as soon as the baseline moves around problems occur:
image

image

image

image

So I think there needs some care to be taken with respect to these weights. And one should not simply say ... ah ... when the pulse_time is negative .. then we set it to zero ... I think when the pulse_time is out of range, this simply is a strong hint, that something was not good about the range of these weights.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry .. I just see, I labelled my axis wrongly: What I called average_time is actually called pulse_time.

So dneise_average_time = pulse_time = np.average(samples_i, weights=waveform)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you are absolutely right. This implementation for obtaining pulse time is very simple, but not very robust. The limit I added for the pulse time to be within the waveform readout was done to make it easier to plot this. Thinking about it now, it could be better to set such unrealistic pulse times to nan (what do you think?).

Or do you suggest only use samples with positive weights in the calculation?

I am inclined to suggest that contributions are made for better pulse time algorithms in future pull requests, which can replace the current function used in the ImageExtractors. Unless you are adamantly against accepting this implementation temporarily.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not cut at all then. The result of this implementation is (even in the best case) only correlated with and not identical to the arrival time (50% of rising edge) .. so it is difficult to say what the allowed range is ... I think .. so maybe avoiding a cut and leave the cut to the user of this implementation is ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No no .. simple implementations are good .. just this cut was puzzling me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that seems somewhat better, thanks. However, with this implementation we would have to handle the case where there are no positive samples, which would involve setting the pulse_time to some value such as zero.

Could we reserve consideration of better pulse_time algorithms for a different PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added it as a task in #1026

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed if you set out of range results to nan that might be better, since the user knows there was indeed a cut.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we reserve consideration of better pulse_time algorithms for a different PR?

totally. You asked me do produce these plots.

dneise
dneise previously approved these changes Apr 4, 2019
Copy link
Member

@dneise dneise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am now through all commits. Nice.

@watsonjj watsonjj dismissed stale reviews from dneise and kosack via ca6c2a3 April 4, 2019 15:56
@watsonjj
Copy link
Contributor Author

watsonjj commented Apr 4, 2019

I have renamed the UserWindowSum and set out-of-range pulse_times to nan.

@kosack @dneise Could you please restore your approvals?

@watsonjj
Copy link
Contributor Author

watsonjj commented Apr 5, 2019

Setting the out-of-bounds pulse-times to nan can cause problems down the chain (some algorithms are not happy with nans existing in an array). This is currently causing an error in travis. I suggest that the pulse time is set to -1 for the time being, and we create a better pulse_time calculation in a different PR.

@watsonjj watsonjj requested review from maxnoe, dneise and kosack April 5, 2019 08:54
Copy link
Member

@dneise dneise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@watsonjj watsonjj merged commit 6a243b2 into cta-observatory:master Apr 8, 2019
@watsonjj watsonjj deleted the waveform_reducer branch April 29, 2019 14:00
This was referenced Oct 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants