Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bump version of ctapipe-extra directory to v0.3.2 #1588

Merged
merged 5 commits into from
Feb 3, 2021

Conversation

kosack
Copy link
Contributor

@kosack kosack commented Feb 2, 2021

This is the same as v0.3.1, but with camera readout definitions from prod3b

(the data have been updated on the dataserver)

This is the same as v0.3.1, but with camera definitions from prod3b
@kosack
Copy link
Contributor Author

kosack commented Feb 2, 2021

Partially solves #1450, and will also help #1568

@maxnoe
Copy link
Member

maxnoe commented Feb 2, 2021

@kosack something seems to have changed for the LSTCam that makes tests fail.

@kosack
Copy link
Contributor Author

kosack commented Feb 2, 2021

Strangely the test for concentration passes on my machine. But I do get a failure for test_data_volume_reducer.

@kosack
Copy link
Contributor Author

kosack commented Feb 2, 2021

The only changes in the new files are:

  • the camreadout FITS files exist (before they did not, so a dummy signal pulse was used)
  • the camgeom fits files seem to be identical, except the
    • pixel area has changed (not sure why - do we have a bug?) it changed from 0.00216 deg2 to 0.00207 deg2. That change will affect the toymodel, since it scales the intensity by pixel area (so the intensities will reduce a bit)
    • the PIX_TYPE header changed from "hexagonal" to "hexagon" (need to check if this causes any problems anywhere) [edit: I don't see how it could, it's now an ENUM]

The reference pulse is used during image extraction, if the integration correction is enabled (normally yes). So for tests that did not load up a real event, perhaps this causes some difference?
I do note that some tests are only ever tested against the LSTCam, which could mean they are hard-coded to work only there and should be expanded to test other camers (e.g. test_data_volume_reducer).

jsitarek added a commit to jsitarek/ctapipe that referenced this pull request Feb 2, 2021
- improved the calculation of the correction for not full integration of a pulse in SlidingWindowMaxSum extractor
- added another test (temporarily in a separate file because of PR cta-observatory#1588) with testing this correction for LST pulse shape
@kosack
Copy link
Contributor Author

kosack commented Feb 2, 2021

The problem with test_data_volume_reducer is that the test assumes that the signal will be a strict sum of the input waveform, but since now with the reference pulse, the integration correction is performed, the value is not exactly that anymore, so it fails. Even with apply_integration_correction turned off, however, I get a different answer than expected.

The following config seems to fix it (the tests assumes a really simple integration, so I use FixedWindowSum):

 {
            "TailCutsDataVolumeReducer": {
                "TailcutsImageCleaner": {
                    "picture_threshold_pe": 700.0,
                    "boundary_threshold_pe": 350.0,
                    "min_picture_neighbors": 0,
                    "keep_isolated_pixels": True,
                },
                "image_extractor_type": "FixedWindowSum",
                "FixedWindowSum": {
                    "apply_integration_correction": False,
                },
                "n_end_dilates": 1,
                "do_boundary_dilation": True,
            }
        }

@Hckjs is this ok?

Now with integration corrections, it's not clear this test was doing what was expected.
@jsitarek
Copy link
Contributor

jsitarek commented Feb 2, 2021

Hi @kosack
I was looking into this in parallel and also suspected the integration correction.
In the fix above you changed the
NeighborPeakWindowSum to FixedWindowSum extractor
the reason why it was not working with the first extractor is that the waveforms in the test are flat, [100, 100, ... 100] so it selects as the peak position 0th sample. The integration is with 7 slices (hence the code expects 700), however the default of window_shift is 3, so it would integrate from -3 up to 3rd slice, and the negative ones are skipped so you end up with a sum of 400.

alternative solution if "NeighborPeakWindowSum" is to be used is to set:
"NeighborPeakWindowSum": {
"apply_integration_correction": False,
"window_shift":0

this worked for me

@kosack
Copy link
Contributor Author

kosack commented Feb 2, 2021

Thanks, yes, we both notice the same thing - in fact I like your fix better, since it still uses the default extractor

@jsitarek
Copy link
Contributor

jsitarek commented Feb 2, 2021

should I commit it (to the other branch where we had the same problem?)
this still leaves the problem with the concentration test

@jsitarek
Copy link
Contributor

jsitarek commented Feb 2, 2021

I also have a suspicion about the problem with concentration. When I run the test on my PC it gives the value of 0.052 with the limits 0.05-0.2, while in the automatic test it gave 0.0477 just below the limit.
The test is using create_sample_image, which is using generate_image from toymodel.py and there there are
signal = np.random.poisson(expected_signal)
noise = np.random.poisson(nsb_level_pe, size=signal.shape)

I'm not sure if the default random generator is set to any specific seed for all the test, but if no this could explain the problem any why it is not reproducable. The cleanest solution I guess would be to define a random number generator inside the toymodel class and pass the seed to it before the test. However there might be more tests that depend on the same image...

@maxnoe
Copy link
Member

maxnoe commented Feb 2, 2021

The cleanest solution I guess would be to define a random number generator inside the toymodel class and pass the seed to it before the test

Just add np.random.seed(<your favourite seed>) at the top of the test function.

@jsitarek
Copy link
Contributor

jsitarek commented Feb 2, 2021

sorry, I was actually mistaken, the seed is set inside the create_sample_image, so it must be something different, unless something funny happens with the random number generator (e.g. that the same one is used in parallel to different tests)

@maxnoe
Copy link
Member

maxnoe commented Feb 2, 2021

@jsitarek The tests do not run in parallel.

I think the change in pixel area is the culprit. That explains the slightly lower value.

@jsitarek
Copy link
Contributor

jsitarek commented Feb 2, 2021

Hi @maxnoe
if this is the area, why a different values os obtained when testing locally and when running the automatic checks?

@maxnoe
Copy link
Member

maxnoe commented Feb 2, 2021

@jsitarek I guess because the old file is still in the cache?

rm -rf $HOME/.cache/ctapipe

The cache only works by filename.

@kosack
Copy link
Contributor Author

kosack commented Feb 2, 2021

I'll commit it here - it's best to keep it separate.

@kosack
Copy link
Contributor Author

kosack commented Feb 2, 2021

The cache only works by filename.

For safety, maybe we should check more than filename, like a checksum.

@kosack
Copy link
Contributor Author

kosack commented Feb 2, 2021

I think the change in pixel area is the culprit. That explains the slightly lower value.

Yes, this seems to be the issue - the question is why it changed, when the pixel positions are the same. The algorithm implementation to compute it was refactored, but I checked the version used to make the original, and the formulas are identical, just moved around a bit.

@jsitarek
Copy link
Contributor

jsitarek commented Feb 2, 2021

sorry @maxnoe but I do not understand the possible problem with the pixel sizes
I cleaned the cache in my PC, however when I rerun the test I still get the same value 0.052 that passes the test. When the test is run automatically I guess cache is automatically cleaned

EDIT: also the concentration that was triggering the difference is just defined as the brightest pixel to the total light intensity, and this does not depend on the area. What am I missing?

@maxnoe
Copy link
Member

maxnoe commented Feb 2, 2021

For safety, maybe we should check more than filename, like a checksum.

That would require always downloading at least the catalog of checksums. Currently the download is only implemented as fallback.

Simpler would be to store the cache by url and check that the url is the same.

@kosack
Copy link
Contributor Author

kosack commented Feb 2, 2021

Pix area is the main difference:

curl http://cccta-dataserver.in2p3.fr/data/ctapipe-extra/v0.3.1/LSTCam.camgeom.fits.gz -o LSTCam031.camgeom.fits.gz
curl http://cccta-dataserver.in2p3.fr/data/ctapipe-extra/v0.3.2/LSTCam.camgeom.fits.gz -o LSTCam032.camgeom.fits.gz
fitsdiff LSTCam031.camgeom.fits.gz LSTCam032.camgeom.fits.gz
fitsdiff LSTCam031.camgeom.fits.gz LSTCam032.camgeom.fits.gz -a 0.00001

 fitsdiff: 4.2
 a: LSTCam031.camgeom.fits.gz
 b: LSTCam032.camgeom.fits.gz
 Maximum number of different data values to be reported: 10
 Relative tolerance: 0.0, Absolute tolerance: 1e-05

Extension HDU 1:

   Headers contain differences:
     Keyword PIX_TYPE has different values:
        a> hexagonal
         ?        --
        b> hexagon
     Keyword SOURCE   has different values:
        a> gamma_20deg_180deg_run1000___cta-prod3-demo_desert-2150m-Paranal-demo2rad_cone10.simtel.gz
        b> /Users/kkosack/Data/CTA/Prod3b/proton/proton_20deg_0deg_run10___cta-prod3_desert-2150m-Paranal-merged.simtel.gz
     Keyword TAB_VER  has different values:
        a> 1.0
        b> 1.1

   Data contains differences:
     Column pix_area data differs in row 0:
        a> 0.0021649681944033382
        b> 0.002079326892271638
     Column pix_area data differs in row 1:
        a> 0.0021649681944033382
        b> 0.002079326892271638
     Column pix_area data differs in row 2:
        a> 0.0021649681944033382
        b> 0.002079326892271638
     Column pix_area data differs in row 3:
        a> 0.0021649681944033382
        b> 0.002079326892271638
     Column pix_area data differs in row 4:
        a> 0.0021649681944033382
        b> 0.002079326892271638
     Column pix_area data differs in row 5:
        a> 0.0021649681944033382
        b> 0.002079326892271638
     Column pix_area data differs in row 6:
        a> 0.0021649681944033382
        b> 0.002079326892271638
     Column pix_area data differs in row 7:
        a> 0.0021649681944033382
        b> 0.002079326892271638
     Column pix_area data differs in row 8:
        a> 0.0021649681944033382
        b> 0.002079326892271638
     Column pix_area data differs in row 9:
        a> 0.0021649681944033382
        b> 0.002079326892271638
     ...1845 additional difference(s) found.
     ...
     1855 different table data element(s) found (25.00% different).

@kosack
Copy link
Contributor Author

kosack commented Feb 2, 2021

So if we think the new pixel areas are correct, we should just update the test values of the concentration test - I don't think the randomness matters too much, it should be within tolerance. But I would like to understand why the area is not the same. A 4% change is quite strange.

@maxnoe
Copy link
Member

maxnoe commented Feb 2, 2021

@kosack Could it be that that simply changed between prod2 and prod3?

The older data comes from gamma_test.simtel.gz and In think that was prod2.

@kosack
Copy link
Contributor Author

kosack commented Feb 2, 2021

The area is not stored in the files, it's computed from the pixel positions (and those are the same between the two files)

@maxnoe
Copy link
Member

maxnoe commented Feb 2, 2021

@kosack that is not true, the area is read from the files.

def build_camera(cam_settings, pixel_settings, telescope, frame):
pixel_shape = cam_settings["pixel_shape"][0]
try:
pix_type, pix_rotation = CameraGeometry.simtel_shape_to_type(pixel_shape)
except ValueError:
warnings.warn(
f"Unkown pixel_shape {pixel_shape} for camera_type {telescope.camera_name}",
UnknownPixelShapeWarning,
)
pix_type = "hexagon"
pix_rotation = "0d"
geometry = CameraGeometry(
telescope.camera_name,
pix_id=np.arange(cam_settings["n_pixels"]),
pix_x=u.Quantity(cam_settings["pixel_x"], u.m),
pix_y=u.Quantity(cam_settings["pixel_y"], u.m),
pix_area=u.Quantity(cam_settings["pixel_area"], u.m ** 2),
pix_type=pix_type,
pix_rotation=pix_rotation,
cam_rotation=-Angle(cam_settings["cam_rot"], u.rad),
apply_derotation=True,
frame=frame,
)

@kosack
Copy link
Contributor Author

kosack commented Feb 2, 2021

@kosack that is not true, the area is read from the files.

Aha, then that must be the difference: before it was guessed from the pixel distances, but of course in reality the area is not exactly that since there is some width to the winston cone. That answers the question: I haven' realized we now read it (though it makes sense now that we use pyeventio). So the new areas are more correct. Therefore, let's just update the test values. In fact, it's not a very good test. It would be better to make an artificial camera image (not a random toy model)

@kosack
Copy link
Contributor Author

kosack commented Feb 2, 2021

I just changed the test values. I think we should open an issue to create a better test for that, using an artificial input where the concentration is known a priori, but that can come later.

@kosack
Copy link
Contributor Author

kosack commented Feb 2, 2021

Now there is a problem with toymodel.test_intensity, which seems quite strange:

 assert np.average(geom.pix_x.to_value(u.m), weights=signal) == approx(0.2, rel=0.15)

 0.23126951261659126 == 0.2 ± 3.0e-02

This test checks that a gaussian image centered at 0.2 along the x-axis has the intensity values one would expect. The only thing that has changed here is the pixel area (3% lower than before), but that should only affect the overall scale of the image, not the first-order moments of the distribution.

However, I think the issue is a very low image intensity. The "signal" value here is :

>>> print(signal[signal>0])
array([1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 4, 2, 1, 2, 1, 1,1, 1, 1, 2, 1, 1, 1, 1, 1]) 

So clearly not very intense! Increasing the total intensity of the model to 200 PE, it works fine. So I'll do that, I think the test was only passing by chance before. @maxnoe any objection?

50 PE was too low to get a reasonable image.  With 200, the test pass always.
@maxnoe
Copy link
Member

maxnoe commented Feb 2, 2021

@kosack no, looks good

@codecov
Copy link

codecov bot commented Feb 2, 2021

Codecov Report

Merging #1588 (9e6ae3e) into master (b5c90a1) will decrease coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1588      +/-   ##
==========================================
- Coverage   90.81%   90.78%   -0.03%     
==========================================
  Files         192      192              
  Lines       13966    13966              
==========================================
- Hits        12683    12679       -4     
- Misses       1283     1287       +4     
Impacted Files Coverage Δ
ctapipe/image/tests/test_reducer.py 100.00% <ø> (ø)
ctapipe/image/tests/test_concentration.py 95.23% <100.00%> (ø)
ctapipe/image/tests/test_toy.py 100.00% <100.00%> (ø)
ctapipe/utils/datasets.py 89.10% <100.00%> (-1.00%) ⬇️
ctapipe/instrument/camera/readout.py 85.50% <0.00%> (-4.35%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b5c90a1...ce1be58. Read the comment docs.

@maxnoe
Copy link
Member

maxnoe commented Feb 3, 2021

@kosack Just to confirm:

In [9]: cam.pix_area
Out[9]: 
<Quantity [0.00207933, 0.00207933, 0.00207933, ..., 0.00207933, 0.00207933,
           0.00207933] m2>

In [10]: cam.guess_pixel_area(cam.pix_x, cam.pix_y, cam.pix_type)
Out[10]: 
<Quantity [0.00216497, 0.00216497, 0.00216497, ..., 0.00216497, 0.00216497,
           0.00216497] m2>

@kosack
Copy link
Contributor Author

kosack commented Feb 3, 2021

Yes, the guess method doesn't perfectly reproduce the real areas, since in reality some intensity is lost to gaps/edges of the winston cone. We could add a fudge factor for that to make it more realistic, by comparing the real values to the guessed values for all cameras, and averaging for example. In that case we'd just multiply the guessed area by e.g. 98%. But that should be a separate issue/PR.

The areas are mostly only used for plotting, so the effect can also be ignored. The only place so far they are used for computation is in the toy model, but that doesn't need to be perfect.

@maxnoe
Copy link
Member

maxnoe commented Feb 3, 2021

The areas are mostly only used for plotting, so the effect can also be ignored. The only place so far they are used for computation is in the toy model, but that doesn't need to be perfect.

Muon intensity fitter.

@maxnoe maxnoe merged commit 0d4b976 into cta-observatory:master Feb 3, 2021
LukasNickel added a commit to LukasNickel/ctapipe that referenced this pull request Feb 8, 2021
- Value changed slightly due to differences in the camera readout
definitions (cta-observatory#1588)
kosack pushed a commit that referenced this pull request Feb 11, 2021
* a new signal extractor, slightly slower, but with better accuracy (in particular for weak pulses): SlidingWindowMaxSum
It maximizes the sum on "width" consecutive slices

* added missing apply_integration_correction variable to SlidingWindowMaxSum class

* adding a standard test of the newly added single extractor SlidingWindowMaxSum

* added the test for the extract_sliding_window function that is used by SlidingWindowMaxSum extractor

* fixing a few small issues from codacy scan

* fixing two trailing whitespaces

* follow up commit on PR #1568

- improved the calculation of the correction for not full integration of a pulse in SlidingWindowMaxSum extractor
- added another test (temporarily in a separate file because of PR #1588) with testing this correction for LST pulse shape

* moved the dirty fix of the LST pulse shape to monkeypatch

* solving codacy warnings in PR 1588

* resolving codacy issues

- swapped imports

* follow up of PR 1568

removed a monkeypatch that is unnecessary after PR 1451

* fixed an error left in the previous commit that would fail the test

* corrected indentation
kosack pushed a commit that referenced this pull request Aug 4, 2021
* Cache files by full url/path

* Fix part file not deleted on ctrl c

* Add container for nominal reconstruction of hillas parameters

- x,y replaced by lon and lat
- units changed to deg from m

* Add method to calculate hillas parameters in the nominal frame

- Requires pixel positions in the nominal frame instead of a geom object
- Most of the code is a duplicate of the reconstruction in the camera
frame for now

* Remove second hillas parameters function

- Decide which container to return based on pixel position unit

* Add containers for nominal frame and fix hillas-related parameters

* Use image parameters in nominal frame, bump data level version

- Save image parameters calculated in the nominal frame
- This affects the hillas parameters and the timing slope
- The coordinate transformations happen in the image processor

* Fix import

* Clean up container structure, use telescope frame for image parameters

- Stage 1 and the connected tools now calculate the hillas parameters in
the telescope frame
- Hillas and TimingParametersContainer now store values as deg. Old
containers persist as CameraHillas/TimingParametersContainer
- In order to keep a single ImageParametersContainer, base containers
have been added for hillas and timing parameters
- Failing tests adapted: x/y -> lon/lat, reconstruction algorithms use
the CameraHillasParametersContainer

* Precompute camera geometries

* Fix unit test

- Value changed slightly due to differences in the camera readout
definitions (#1588)

* Rename lon/lat to fov_lon/fov_lat

* Fix use of ImageParametersContainer

- Avoid base containers and thus lacking columns in the output file

* Address codacy complaints

* Fix default ImageParametersContainer

- Sets the Hillas/Timing parameters container in the telescope frame
as default values as opposed to the base containers

* Change datamodel version

- Replace v1.0.4 with v1.1.0 since column names have changed on top of
the units

* Use fov_lon/lat instead of x/y

* Distinguish between CameraHillasParametersContainer and HillasParametersContainer

* Adapt hillas intersection to work with hillas parameters in the
telescope and camera frame

* Remove duplicated tests of the results

- Comparing values between both reconstructions should handle all cases
- Comparing results in the telescope frame to fixed values was error prone, because the image is generated in the camera frame requiring explicit conversion every time

* Remove unused declarations and imports

* Fix datamodel version

* Fix unit test

- The test file gets generated by calling stage1 on a simtel file, so
  this has to take the current datamodel into account

* Fix comparison to nan in unit tests

* Fix reading of dl1 files

- Add v1.3 to the list of supported datamodel versions
- The renaming of the parameter containers changed the default container
  prefixes (HillasParametersContainer -> CameraHillasParametersContainer
  leads to the camera_hillas prefix), breaking backwards compatibility
- This is fixed by specifying explicit prefixes when reading <v1.3 files

* Fix tests for the dl1eventsource

- Improper comparison to np.nan lead to tests effectively being skipped
- The assertion would have failed actually, because the test file
  contains nan events (which is expected)
- The new test checks if all entries are nan instead. Due to the event
  loop in the event source, this requires collecting the values in a
  list first

* Write out correct datamodel version

* Fix unit test

* Try to fix the docs by making the base containers private

* Fix codacy complaints

* Add base containers to __all__

- This seems to be necessary for the sphinx docs

* Add option to choose camera frame instead of telescope frame

* Remove unused import

* Add missing underscore

* Fix selection of camera frame geometry

* Fix test converting to wrong unit

* Add missing variable name

* Use traits for the selection of the frame

* Fix unit test

Co-authored-by: Maximilian Nöthe <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants