Fix thinned modis reading in 'hdfeos_l1b' reader #492

mraspaud · 2018-11-08T20:53:53Z

Thinned modis files, eg as received via eumetcast, were not working with the current implementation of the hdfeos reader, since it wasn't capable of reading both navigation and band data from the same file. This PR addresses the issue.

Tests passed
Passes git diff origin/master -- "*py" | flake8 --diff

satpy/etc/readers/hdfeos_l1b.yaml

coveralls · 2018-11-08T21:36:36Z

Coverage increased (+0.2%) to 74.256% when pulling ab7294a on mraspaud:bugfix-thin-modis into a633ba2 on pytroll:master.

codecov · 2018-11-08T21:49:42Z

Codecov Report

Merging #492 into master will increase coverage by 0.2%.
The diff coverage is 74.28%.

@@            Coverage Diff            @@
##           master     #492     +/-   ##
=========================================
+ Coverage   74.05%   74.25%   +0.2%     
=========================================
  Files         137      137             
  Lines       18190    18218     +28     
=========================================
+ Hits        13470    13528     +58     
+ Misses       4720     4690     -30

Impacted Files	Coverage Δ
satpy/composites/__init__.py	`41.15% <100%> (+0.67%)`	⬆️
satpy/readers/hdfeos_l1b.py	`15.04% <27.27%> (-0.84%)`	⬇️
satpy/tests/compositor_tests/__init__.py	`97.77% <94.44%> (+0.25%)`	⬆️
satpy/scene.py	`84.03% <0%> (-0.43%)`	⬇️
satpy/tests/test_scene.py	`99.32% <0%> (-0.01%)`	⬇️
satpy/composites/viirs.py	`84.09% <0%> (ø)`	⬆️
satpy/readers/hrit_msg.py	`44.63% <0%> (ø)`	⬆️
satpy/writers/geotiff.py	`43.51% <0%> (ø)`	⬆️
satpy/readers/iasi_l2.py	`98.8% <0%> (+0.02%)`	⬆️
satpy/tests/test_writers.py	`97.31% <0%> (+0.11%)`	⬆️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a633ba2...ab7294a. Read the comment docs.

djhoese

We talked about this on slack but maybe we should discuss it here. I am against MODIS data having saturated pixels being set to NaN. The average user with default behavior will end up with images (ignoring composites all together) that have huge black holes in the tops of clouds. The saturated fill value represents a saturated sensor, that is a pixel that's reflectance was higher than the instrument could reliably record. At least that's how I understand it. Having the reader set it to the maximum valid reflectance (1.2 iirc, or whatever the equivalent uint8 is) will produce valid data that represents what the fill value represents.

If users want to do something that is not the typical/usual/common case and have a separate mask for the saturated pixels then we can add that to the reader, but 99% of the time I don't think it would be needed.

mraspaud · 2018-11-09T08:15:09Z

Thanks for a constructive critique :) I have a different opinion, which I will explain here.
We agree on the meaning of the saturation. And I agree that large holes in the data is not what the lambda user wants to see.
However, the saturated data is marked in the level 1 data as invalid, and reading the documentation here on page 81, the pixels being marked as saturated may also have up to six more reasons to be invalid:

If more than one condition applies to a datum, the data value reflects only the condition which was encountered first

As far as I understand, there is no individual error flag variable in this format that we could know for sure that the pixel is only saturated.

Another reason I have for leaving saturated pixels as masked in the reader is that the specific channels used in the true_color_thin composite (channels 10 and 12) are not meant to be used for measuring the reflectance of clouds, but rather for darker pixels like land or sea, and are thus much more sensitive than the broader channels we usually use for the true color composite, namely 3 and 4. Someone wanting to use these channels as they are meant to be used for scientific work would not want saturated values filled as it would be difficult to then trust that the data values are actually sensed values.

Last reason, but not least, is that I think that satpy shouldn't be adding intelligence on the reader part and I want it to keep it simple and stupid. I believe that satpy could be used as a data reading interface to other software in the future, so I would like to keep this part as transparent as possible.

This being said, I totally agree that having holes in the data for imagery purposes is highly unpleasant, and I propose to treat this at a later stage.

One solution is the one I implemented here, which just fills green and blue clouds with the red clouds for the true color composite.
Another solution would be to have a modifier that takes as parameter the channels and a saturation mask (that we could have as a dataset, or even a dataset that actually returns the fill values), and fills the channels with the fixed value you propose. This way we could load individual channels too like scn.load(DatasetID(name='10', modifiers=('fill_saturated_reflectances')))

I'm of course open to further discussion and propositions :)

djhoese · 2018-11-09T14:17:51Z

it would be difficult to then trust that the data values are actually sensed values.

I'm not sure I understand. If the pixels are marked as saturated then they are at least saturated. You're saying because they are marked as saturated, given what you said about first found fill value being assigned, that a scientist can't be sure that these values are good enough to work with? Also making the entire hdfeos_l1b reader behave a certain way because of what is available in the thin files sounds like a bad idea.

I had not thought about the modifier approach. That does make it easier, but it still requires extra work by the average user (and arguably more processing depending on how dask/xarray hands .where) to get a good image so I'm not 100% convinced.

I think there are two things to consider: what is the most common use case and do any solutions stop a user from doing the not-common use case. As a possible third concern, does the chosen solution make the MODIS reader behave differently or have to handled differently than other readers. So if we say there are 2 use cases: scientist who wants to know where saturated pixels are and image-user who wants a pretty image/quicklook. The two most plausible solutions are:

Fill all "quality flag" pixels with NaNs.
Fill pixels marked as saturated with max reflectance, all other quality flags as NaNs.

These two solutions could be controlled by a environment variable (SATPY_READERS__MODIS__FILL_SAT) and in the near future a donfig config setting (which could also use that environment variable. Based on this discussion it sounds like we need a DQF dataset or equivalent to fully support either of the above solutions. We had talked about something like this for the VIIRS SDRs in the past too. As for my 2 to 3 concerns, I think making images with satpy is the most common use case right now but I agree that stopping people from doing science is a bad idea. However, your solution as it is now does not distinguish between the fill values so your composite could be filling invalid pixels (not just saturated) with valid data. So it may not be the reader being smart, but it is the composite being "dumb" 😄.

Reading the documentation that you linked to now, I have a feeling that the fill values are checked in the order specified for a specific reason and I would hope that someone thought of this exact situation. That is, a saturated pixel can't also be invalid. I could ask some of the MODIS experts in my building (Liam Gumley and Kathy Strabala). As maybe a weaker argument, with your solution all fill values (bad and saturated) are marked as NaN so they can't be used without a separate mask. If saturated pixels were filled with max reflectance then at least they can sort of be detected. In how many scientific calculations is a saturated pixel not useful as a max reflectance?

For what it's worth Liam Gumley was the one who told me to fill saturated pixels with max reflectance. Admittedly this was for generating images with Polar2Grid and not for doing science. Looking at the P2G code now it looks like I only check the saturation fill value specifically for band 2 visible data and no others. Additionally, and harder to argue for, I set any "can't aggregate" fill values (65528) to max reflectance for band 2 as well. This is a known issue from what I understand where the aggregation software doesn't distinguish between saturated values and other missing values so you can end up with "can't aggregate" fills where they were all "saturated" fills before.

mraspaud · 2018-11-09T16:03:34Z

it would be difficult to then trust that the data values are actually sensed values.

I'm not sure I understand. If the pixels are marked as saturated then they are at least saturated. You're saying because they are marked as saturated, given what you said about first found fill value being assigned, that a scientist can't be sure that these values are good enough to work with?

I meant that filling invalid pixels (like saturated ones) can break the trust a scientist has in the data.

Also making the entire hdfeos_l1b reader behave a certain way because of what is available in the thin files sounds like a bad idea.

That is not my intention at all, and I don't see how my handling of invalid values would lead to that.

I had not thought about the modifier approach. That does make it easier, but it still requires extra work by the average user (and arguably more processing depending on how dask/xarray hands .where) to get a good image so I'm not 100% convinced.

As I stated before, I would really like the readers in general to modify the data as little as possible, and let the modifier to the rest, as we do with sun-normalisation or rayleigh scattering correction. Yes, this will make the data a bit harder to read for the satpy user that wants to load filled data, but I have a feeling that people reading individual channels are mostly interested in processing the data arrays themselves, and thus we could leave it to them on how to fill the holes.

I think there are two things to consider: what is the most common use case and do any solutions stop a user from doing the not-common use case. As a possible third concern, does the chosen solution make the MODIS reader behave differently or have to handled differently than other readers. So if we say there are 2 use cases: scientist who wants to know where saturated pixels are and image-user who wants a pretty image/quicklook. The two most plausible solutions are:

Fill all "quality flag" pixels with NaNs.

Fill pixels marked as saturated with max reflectance, all other quality flags as NaNs.

These two solutions could be controlled by a environment variable (SATPY_READERS__MODIS__FILL_SAT) and in the near future a donfig config setting (which could also use that environment variable. Based on this discussion it sounds like we need a DQF dataset or equivalent to fully support either of the above solutions. We had talked about something like this for the VIIRS SDRs in the past too. As for my 2 to 3 concerns, I think making images with satpy is the most common use case right now but I agree that stopping people from doing science is a bad idea. However, your solution as it is now does not distinguish between the fill values so your composite could be filling invalid pixels (not just saturated) with valid data. So it may not be the reader being smart, but it is the composite being "dumb" .

Agreed, my true_color_thin composite is just a quick fix and needs to be improved further when the quality flags will be made available from the hdfeos reader. I was just trying to address a concern raised by the user with the datasets at hand. I can be improved, but the idea behind it isn't so dumb I think.

When it comes to the pretty picture users, I don't think leaving the filling out the reader would harm them as long as be provide ready made recipes they can reuse/copy.

Regarding making the hdfeos reader different from the other readers, I don't understand how my handling of the values outside the valid_range would lead to believe that. I actually think that applying implicit corrections to the data (as opposed to explicit with a modifier) would actually be the thing making the hdfeos reader stand out, which I don't like.

As for switching the behaviour of a reader with an environment variable, please let's not do that. IMHO, a reader should just read the data, and make it available for further processing in a consistent manner.

Reading the documentation that you linked to now, I have a feeling that the fill values are checked in the order specified for a specific reason and I would hope that someone thought of this exact situation. That is, a saturated pixel can't also be invalid. I could ask some of the MODIS experts in my building (Liam Gumley and Kathy Strabala). As maybe a weaker argument, with your solution all fill values (bad and saturated) are marked as NaN so they can't be used without a separate mask. If saturated pixels were filled with max reflectance then at least they can sort of be detected. In how many scientific calculations is a saturated pixel not useful as a max reflectance?

I could be that the saturation is indeed the only thing that can happen to these pixels, indeed. I'm no expert on the format, so if you have the possibility, please as the experts.
About loosing the value that lead to a NaN, maybe we should revert to np.ma arrays then ? 😄 Joke aside, we indeed need to expose a DQF dataset so that scientific users can handle the data as fits their algorithm best. And having bright clouds masked out is actually pretty useful in the few ocean color applications I'm aware of (thresholding to get algal bloom states)

For what it's worth Liam Gumley was the one who told me to fill saturated pixels with max reflectance. Admittedly this was for generating images with Polar2Grid and not for doing science. Looking at the P2G code now it looks like I only check the saturation fill value specifically for band 2 visible data and no others. Additionally, and harder to argue for, I set any "can't aggregate" fill values (65528) to max reflectance for band 2 as well. This is a known issue from what I understand where the aggregation software doesn't distinguish between saturated values and other missing values so you can end up with "can't aggregate" fills where they were all "saturated" fills before.

As you say, this is for generating images, and I'm perfectly fine having this corrections applied at the modifying or compositing stage.

So to sum up, we have 3 use cases:

I think we agree that pretty pictures need filling the holes, and that there are different ways this can be done, my true_color_thin probably not being the best at the moment. We could organise a competition for the best gap filled true color 😄
For scientific use, we seem to diverge on the utility of gap-filling. I'm proposing channels with gaps + a DQF dataset to leave the data as untouched as possible.
Finally, there is the issue of the lambda user who want to play with the data, that would get mad with holes in the data. Providing a datasetid would be the fix, and I think we can live with that.

djhoese · 2018-11-09T16:39:59Z

I meant that filling invalid pixels (like saturated ones) can break the trust a scientist has in the data.

Understood. However, I still feel like assigning max refl isn't breaking this trust and is modifying the data the way it is most useful and is what the saturated fill value represents (assuming pixels can be saturated and bad at the same time - waiting on the experts).

That is not my intention at all, and I don't see how my handling of invalid values would lead to that.

I meant that one of your justifications for why setting the saturated pixels to max refl was bad was that your thin composite uses special bands to make a true color because the traditional ones aren't available. That's how I understood that paragraph about the composite.

Yes, this will make the data a bit harder to read for the satpy user that wants to load filled data, but I have a feeling that people reading individual channels are mostly interested in processing the data arrays themselves, and thus we could leave it to them on how to fill the holes.

I understand the desire to limit how much the readers modify the data and in most other cases I would agree. That's why I think (assuming saturated doesn't mean bad) it isn't really modifying the data in my opinion. Not only does handling the data as you have it currently make it harder for users doing .load, but it makes it so every composite that could be made with MODIS data has to be redefined to add this modifier. You say that you think most people want to analyze the data when they are loading individual channels. I completely disagree with this. The majority of Polar2Grid's users only load individual channels (true color and natural color RGBs weren't even possible for the first couple versions and are still the only ones builtin). They want to see the data that their DB station recorded and show it on their website. Additionally, I think there are a lot of forecasters in the US that are used to looking at straight reflectance or BT data (they look at RGBs too but I'm not certain what they prefer). That's why I feel so strongly about the default enhancements for these data types as well.

Regarding making the hdfeos reader different from the other readers, I don't understand how my handling of the values outside the valid_range would lead to believe that. I actually think that applying implicit corrections to the data (as opposed to explicit with a modifier) would actually be the thing making the hdfeos reader stand out, which I don't like.

As for switching the behaviour of a reader with an environment variable, please let's not do that. IMHO, a reader should just read the data, and make it available for further processing in a consistent manner.

Understood. I can see what you're saying and I would agree...in most other cases. It seems the main difference between our arguments is that I don't consider saturation to be a fill value. Setting it to max reflectance is "fixing" something that I consider a deficiency of the file format and L1B design. It makes MODIS data looks like every other polar-orbiting instrument's data.

For the environment variable, I was kind of thinking it like a low-level behavior change in satpy that could apply to multiple readers. Kind of like xarray's configurable "what to do with attrs when combining DataArrays". It would be like "correct extreme values in readers" or "clip valid range data versus invalidate out of range data". Just an idea, don't feel super strongly about this.

And having bright clouds masked out is actually pretty useful in the few ocean color applications I'm aware of (thresholding to get algal bloom states)

If you have to mask out bright clouds for an algorithm then why would you not have to mask out high reflectances that weren't marked as saturated? Like a reflectance of 119% is valid, but the sensor (or L1B processing) marks reflectances of 120% as saturated. Both need to be masked out for your algorithm. This sounds like an argument for filling the saturated pixels with max reflectance. That way you don't need to load the DQF information and can just mask out anything >110% or whatever is needed.

adybbroe · 2018-11-09T17:50:28Z

Uhu, haven't read your novels above, sorry!
Just want to say that the MODIS thinned product service is a rather special (unique) data stream. Not very useful (bad timeliness) for operations. It was a demo introduced by EUMETSAT to make the transition to VIIRS more smooth - so it s also quite an old one. We included only part of the MODIS bands we thought was useful and was comparable to VIIRS.
So, for the sake of this service we should not over do it!
For the principle discussion I prefer as little intelligence as possible on the reader side.

adybbroe · 2018-11-09T17:51:50Z

I will read your novels eventually @mraspaud and @djhoese
Just now it is Friday evening, and I will close down :-)

mraspaud · 2018-11-10T00:30:54Z

Ok, so a few things. Saturation. For me filling saturated pixels with a fixed value isn't fixing the data, it's just making it unreliable. By masking it, the sensor admits it can't measure above that value, so instead of giving you a wrong value (e.g. 120% instead of the actual 130 or 140), it's just saying you it can't give you a reliable physical measurement for it. And I don't think it's the reader's role to make a guess at what it should be, because it depends on the application. Now, don't get me wrong, I understand and agree with your point that for e.g. band 2, the saturation is probably a defect of the detectors, but this can't be generalised to all bands (See the paragraph on special band further down). Still, imho fixing the data isn't the role of the reader. E.g. would you do denoising in the reader ? Single channel images. We do use them too, and of course we want them to look nice. However, just as we have separated composites from enhancements to keep image operations from altering the physical meaning of the data, I believe the reader shouldn't implement anything composite/image related. What users want is single channel *composites*, ie a *visualisation* of the sensor data, which means that enhancements and other effects might need to be added to the actual sensor data, like eg gap filling. Special bands in thinned products. They are not special, they are also available in the regular modis l1b. It's just that we don't have 3 and 4 in thinned l1b, so we use these instead. But because they are meant for other things than cloud observation, they saturate faster (looks lower than 100%). So to come back to your point that you want to make modis look like other polar orbiting imager, I'm sorry to say that modis is different. It is mostly made for sea and land observation, so the bands are made differently than let's say viirs, which is mostly made for meteorology. We work of course with modis in meteorology, but we need to keep in mind the limitations of some of the bands in the sensor. Adapting the higher sensitivity bands for cloud observation is fine to make pretty pictures, but I don't want this adaptation to happen at reading time, because we don't know what application the data will be used for. I hope what I wrote makes sense, it's kind of late now 😄. Tell me if I missed something.

djhoese · 2018-11-10T02:20:50Z

Preliminary answer: I am convinced that your way is more flexible and opens the doors to other similar situations that satpy may have to handle. I do have to be convinced on the easiest ways for users to do what you've talked about. Especially what I'll have to do in Polar2Grid to get this to work without the user knowing and of course I'll have to make it possible to turn it off probably too because...users.

As for the "would you do denoising in the reader", not in satpy, but in polar2grid I did limb correction for ATMS data in the reader. This requires all of the input channels to be able to do it so speed-wise it might be best to do it right away in the reader. But thinking about it more P2G's reader also did the things like DNB normalization algorithms in the reading step (reader + non-RGB compositors).

To be clear, P2G does this only for band 2 and does not look at these fill values for any other band. My understanding is that in normal L1B files band 2 is the only band that this fill value shows up in.

So how do we do handle this? We can move this conversation to slack probably.

mraspaud · 2018-11-12T12:37:23Z

Ok, since this PR doesn't actually touch the handling of invalid values, are you (@djhoese) ok with merging this and opening another issue and PR for the saturated values ?

djhoese

Could you add some docstrings and maybe a test for the filling compositor?

mraspaud · 2018-11-12T14:07:55Z

Sure

satpy/tests/compositor_tests/__init__.py

mraspaud added 2 commits November 8, 2018 21:49

Add a true_color for thinned modis datasets

a1a8449

Add support for reading both band and geolocation from same modis file

b53409b

mraspaud added bug component:readers labels Nov 8, 2018

mraspaud added this to the v0.10 milestone Nov 8, 2018

mraspaud self-assigned this Nov 8, 2018

djhoese reviewed Nov 8, 2018

View reviewed changes

satpy/etc/readers/hdfeos_l1b.yaml Show resolved Hide resolved

mraspaud added 2 commits November 8, 2018 22:53

Add a FillingCompositor for filling bands with another band

a023796

Add info on meaning of hdfeos fill values

948e2ce

djhoese requested changes Nov 9, 2018

View reviewed changes

djhoese changed the title ~~Fix thinned modis reading~~ Fix thinned modis reading in 'hdfeos_l1b' reader Nov 9, 2018

djhoese approved these changes Nov 12, 2018

View reviewed changes

djhoese requested changes Nov 12, 2018

View reviewed changes

mraspaud mentioned this pull request Nov 12, 2018

Graceful handling of saturated values in MODIS data #499

Closed

mraspaud added 3 commits November 13, 2018 09:44

Test and document FillingCompositor

25a7e4c

Cleanup

be19995

Document hdfeos_l1b classes

0dae385

stickler-ci reviewed Nov 13, 2018

View reviewed changes

satpy/tests/compositor_tests/__init__.py Show resolved Hide resolved

Style fixing

111f632

djhoese approved these changes Nov 13, 2018

View reviewed changes

Fix style

ab7294a

mraspaud merged commit 6fc2669 into pytroll:master Nov 13, 2018

mraspaud deleted the bugfix-thin-modis branch November 13, 2018 22:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix thinned modis reading in 'hdfeos_l1b' reader #492

Fix thinned modis reading in 'hdfeos_l1b' reader #492

mraspaud commented Nov 8, 2018 •

edited

Loading

coveralls commented Nov 8, 2018 •

edited

Loading

codecov bot commented Nov 8, 2018 •

edited

Loading

djhoese left a comment

mraspaud commented Nov 9, 2018

djhoese commented Nov 9, 2018

mraspaud commented Nov 9, 2018

djhoese commented Nov 9, 2018

adybbroe commented Nov 9, 2018

adybbroe commented Nov 9, 2018

mraspaud commented Nov 10, 2018 via email •

edited

Loading

djhoese commented Nov 10, 2018

mraspaud commented Nov 12, 2018

djhoese left a comment

mraspaud commented Nov 12, 2018

Fix thinned modis reading in 'hdfeos_l1b' reader #492

Fix thinned modis reading in 'hdfeos_l1b' reader #492

Conversation

mraspaud commented Nov 8, 2018 • edited Loading

coveralls commented Nov 8, 2018 • edited Loading

codecov bot commented Nov 8, 2018 • edited Loading

Codecov Report

djhoese left a comment

Choose a reason for hiding this comment

mraspaud commented Nov 9, 2018

djhoese commented Nov 9, 2018

mraspaud commented Nov 9, 2018

djhoese commented Nov 9, 2018

adybbroe commented Nov 9, 2018

adybbroe commented Nov 9, 2018

mraspaud commented Nov 10, 2018 via email • edited Loading

djhoese commented Nov 10, 2018

mraspaud commented Nov 12, 2018

djhoese left a comment

Choose a reason for hiding this comment

mraspaud commented Nov 12, 2018

mraspaud commented Nov 8, 2018 •

edited

Loading

coveralls commented Nov 8, 2018 •

edited

Loading

codecov bot commented Nov 8, 2018 •

edited

Loading

mraspaud commented Nov 10, 2018 via email •

edited

Loading