Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

signal extractor SlidingWindowMaxSum #1568

Merged
merged 15 commits into from
Feb 11, 2021

Conversation

jsitarek
Copy link
Contributor

a new simple signal extractor, slightly slower, but with better accuracy (in particular for weak pulses): SlidingWindowMaxSum

It maximizes the sum on "width" consecutive slices

some speed test using 3 trials of 1000 events of LST1 data:
only r0==> r1 calibration: 14.575s, 14.465s, 14.550s
LocalPeakWindowSum (current extractor) 19.853s, 20.188s, 20.698s
MaxWindowSum (new code): 21.731s, 20.813s, 21.132s

one feature can be improved, namely the correction for the signal outside of the integration window, the current code is reusing LocalPeakWindowSum approach assuming that the shift is half of the total window, which is correct only if the pulse is symmetric (which is not really the case)

… particular for weak pulses): SlidingWindowMaxSum

It maximizes the sum on "width" consecutive slices
@codecov
Copy link

codecov bot commented Dec 30, 2020

Codecov Report

Merging #1568 (fd9e917) into master (09931e2) will decrease coverage by 0.04%.
The diff coverage is 84.88%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1568      +/-   ##
==========================================
- Coverage   90.80%   90.76%   -0.05%     
==========================================
  Files         192      191       -1     
  Lines       14006    14060      +54     
==========================================
+ Hits        12718    12761      +43     
- Misses       1288     1299      +11     
Impacted Files Coverage Δ
ctapipe/image/extractor.py 82.75% <63.88%> (-3.02%) ⬇️
ctapipe/image/tests/test_extractor.py 100.00% <100.00%> (ø)
...pipe/image/tests/test_sliding_window_correction.py 100.00% <100.00%> (ø)
ctapipe/reco/__init__.py 100.00% <100.00%> (ø)
ctapipe/instrument/atmosphere.py 90.90% <0.00%> (-9.10%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 09931e2...7798779. Read the comment docs.

@jsitarek
Copy link
Contributor Author

I made a few iterations solving things pointed by codacy and coverage checks.
codacy reports a missing argument in one of the functions, but I think this is just an issue of the paralelization, since the same type of code as in extract_around_peak is used.
coverage check claims that most of the extract_sliding_window function is not tested, however this function is explicitly tested in test_extractor.py.

I think the code is ready for the review,

kosack
kosack previously approved these changes Jan 5, 2021
Copy link
Contributor

@kosack kosack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

In fact, I think a similar implementation could be used to speed up the TwoPass "MARS-like" method (which also uses a sliding window in the first pass).

@maxnoe
Copy link
Member

maxnoe commented Jan 5, 2021

coverage check claims that most of the extract_sliding_window function is not tested, however this function is explicitly tested in test_extractor.py.

This is because the code that actually runs is the compiled numba code, not the python function. Unfortunately that means that numba functions do not report coverage correctly.

@jsitarek
Copy link
Contributor Author

jsitarek commented Jan 5, 2021

thank you @kosack for the approval and @maxnoe for the explanation about numba

@kosack
Copy link
Contributor

kosack commented Jan 6, 2021

By the way, for the Numba code coverage issue, see #1400

This method is decorated with @lru_cache to ensure it is only
calculated once per telescope.

WARNING: TO BE DONE properly, the current code reuses the function of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you implement this directly here, does not sound to complicated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the main reason why I did not do so is because this feature does not seem to be used (at least in LST), so I did not have a proper set-up to test it, but I can look into making some dummy pulse shape and testing on it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would be great

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have the reference pulse shape in the CameraDescription. I guess it's a fairly small effect though, and the correction doesn't really matter much except to get the cleaning thresholds in the same units for all cameras.

Copy link
Contributor Author

@jsitarek jsitarek Feb 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, I restarted working on this.
CameraDescription.readout is where the code is taking the pulse shape from. However this is not really reliable.
If I execute the following code

import numpy as np
import astropy.units as u
import matplotlib.pyplot as plt
plt.ion()

from ctapipe.instrument import SubarrayDescription, TelescopeDescription

subarray = SubarrayDescription(
        "LST1",
        tel_positions={1: np.zeros(3) * u.m},
        tel_descriptions={
            1: TelescopeDescription.from_name(
                optics_name="LST", camera_name="LSTCam"
            ),
        },
)
pulse_shape=subarray.tel[1].camera.readout.reference_pulse_shape[ dt=subarray.tel[1].camera.readout.reference_pulse_sample_width
xs=np.arange(len(pulse_shape))*dt
plt.plot(xs, pulse_shape)

I get the following figure:
pulse_shape_LST
which is a much broader pulse then it should

The calculation of the correction factor would be much simpler if the pulse shape in this class had the same binning as the actual readout, this is the case in the above example, and one would assume to take it from granted since the shape is taken from the "readout" object, which has the binning embedded, however in the first tests that i was doing in lstchain, when the array was being read from the data the pulse shapes there were actually a delta function with a SSC-like sampling, so obviously it cannot be taken for granted.

I will change the code to use a simple conversion and rounding of sampling to make it work also in this more general case, but the whole issue of the LST pulse shape deserves a separate "issue"

EDIT: I forgot to mention that there seems to be only one reference pulse shape in the CameraDescription, while in reality we should have HG and LG

Copy link
Contributor

@kosack kosack Feb 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The from_name() methods load up a file from ctapipe-extra (which is now a directory on a server rather than a package), and are just meant for unit-testing purposes. Currently everything in ctapipe-extra is from PROD3 or even PROD2 simulations, so quite out of date for real analysis. In the future I want to clean that up and have an option to select which "prod" to use, but there has not been manpower for that (see e.g #738 )

If you load real data from a SimTel file or something else supported, the correct waveform should be loaded into the instrument model that you get from source.subarray.

E.g. if you do:

with EventSource("some_prod5_sim.simtel.gz") as source:
     readout  = source.subarray.tel[2].camera.readout
     plt.plot(readout.reference_pulse_sample_time, readout.reference_pulse_sample_width)

You will get the "latest" pulse that is defined in Prod5

Copy link
Contributor

@kosack kosack Feb 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that if you don't want to always load up a file, you can take any reference prod5 file, for example, and run

ctapipe-dump-instrument --input=some_reference_file.simtel.gz

And it will dump a bunch of FITS files including the Camera geometry and readout definitions to the local directory. You can then setenv CTAPIPE_SVC_PATH=[directory where those files are], and ctapipe will use that when you run the from_name() functions instead of the defaults (by default it searches all paths listed in a ":" separated list in in CTAPIPE_SVC_PATH first, then if it doesn't find that, it will download the default file from the dataserver, which as I said are a bit out of date.

Copy link
Contributor

@kosack kosack Feb 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I found the actual problem... In fact, the camera readout definition for LSTCam does not even exist in the ctapipe-extra directory on the dataserver. It seems the default behavior is to just return some dummy pulse shape if the file is not found, which has no real meaning (you should see a logger warning message if logging is set up)... Clearly this is not good behavior (I think it was there to prevent tests from failing until we updated the testing files, which never happened).

here is a comparison with prod3b:
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also why you get only 1 reference pulse shape if you use from_name()...
I had already opened an issue about this, but obviously forgot: See #1450

@kosack
Copy link
Contributor

kosack commented Feb 2, 2021

I opened a PR with at least a temporary fix to the pulse shape problem.
Until that is accepted, you can also do the following as a hack while testing:

from ctapipe.utils import datasets
datasets.DEFAULT_URL = "http://cccta-dataserver.in2p3.fr/data/ctapipe-extra/v0.3.2/"

@jsitarek
Copy link
Contributor Author

jsitarek commented Feb 2, 2021

thanks a lot this is really helpful

- improved the calculation of the correction for not full integration of a pulse in SlidingWindowMaxSum extractor
- added another test (temporarily in a separate file because of PR cta-observatory#1588) with testing this correction for LST pulse shape
@jsitarek
Copy link
Contributor Author

jsitarek commented Feb 2, 2021

I made this correction properly, tested it using the test_extractor.py, and also made an extra test (using the pulse shapes from @kosack suggestion), so I made the commit, which as you can see however fails:

ctapipe/image/tests/test_concentration.py::test_concentration FAILED [ 27%]
I've run the same test in my machine, but it goes through.

I've run
git pull upstream master
to check if the current changes are in conflict with some other changes made in the meantime (I doubt it because I only changed the code of the new extractor which is not yet used anywhere)

and still test_concentration.py works in my PC.
I checked the other tests and
tests/test_reducer.py::test_tailcuts_data_volume_reducer FAILED
also fails in my PC now, however I do not see how this has anything to do with my commit, since a different extractor is used in that test

I suspect that there might have been some other commit done in the meantime that broke those tests.
How should we proceed ?

@kosack
Copy link
Contributor

kosack commented Feb 2, 2021

I made this correction properly, tested it using the test_extractor.py, and also made an extra test (using the pulse shapes from @kosack suggestion), so I made the commit, which as you can see however fails:

See #1588, I have exactly the same problem. Not sure what the solution is - so far I don't see why it is happening, except for a small change in pixel area (which i still don't understand, as the pixel distances are the same, and I compared the computation to past versions of ctapipe, and it is identical)


from ctapipe.utils import datasets

datasets.DEFAULT_URL = "http://cccta-dataserver.in2p3.fr/data/ctapipe-extra/v0.3.2/"
Copy link
Contributor

@kosack kosack Feb 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't really set this globally, since it affects all other tests (and definitely should not be committed when we merge the PR). Perhaps for now, you might want to use the monkeypatch test fixture instead, something like:

def test_xxx(monkeypatch):
    with monkeypatch.context() as m:
         m.setattr(datasets, "DEFAULT_URL", "http://cccta-dataserver.in2p3.fr/data/ctapipe-extra/v0.3.2/")
         # the rest of the test

see https://docs.pytest.org/en/stable/monkeypatch.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I put this test explicitely into a separate file to avoid affecting other files with a global variable, because I thought that each file is run independently.
either way, now modified as suggested by you

@jsitarek jsitarek requested review from maxnoe and kosack February 3, 2021 10:49
@jsitarek
Copy link
Contributor Author

jsitarek commented Feb 3, 2021

the failing tests were solved in the other PR, I did some small updates to solve the codacy issues, there are three left:

Instance of 'int' has no 'tel' member
self.window_width.tel[telid]

No value for argument 'sum_' in function call
charge, peak_time = extract_sliding_window(

Instance of 'int' has no 'tel' member
waveforms, self.window_width.tel[telid], self.sampling_rate[telid]

1st and 3rd are somehow strange because the window_width in fact is not int but IntTelescopeParameter, but investigating it I corrected how the width of the integration window was changed in the test file.
2nd one is I guess also not a problem, but a feature of guvectorize

so, can we merge this PR?

@maxnoe maxnoe modified the milestones: v0.10.2, v0.11.0 Feb 4, 2021
@jsitarek
Copy link
Contributor Author

Hi @maxnoe @kosack
Please let me know if you want any additional modifications in this PR, or if you can give it already green light for merging

kosack
kosack previously approved these changes Feb 11, 2021
maxnoe
maxnoe previously approved these changes Feb 11, 2021
def test_sw_pulse_lst():
"""
Test function of sliding window extractor for LST camera pulse shape with
the correction for the integration window completeness
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong indentation here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected, I'm not sure why precommit and flake8 did not catch this.
thx @maxnoe for reapproval
@kosack I also need it from you
and then can one of you merge the PR? I do not have permissions for doing it myself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will merge as soon as both approvals are there

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is valid code and since it's a test it is probably also not checked by the documentation build

@jsitarek jsitarek dismissed stale reviews from maxnoe and kosack via 7798779 February 11, 2021 15:19
@kosack kosack merged commit 8f5c793 into cta-observatory:master Feb 11, 2021
@jsitarek jsitarek deleted the implement_SlidingWindowMaxSum branch February 11, 2021 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants