Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix thinned modis reading in 'hdfeos_l1b' reader #492

Merged
merged 9 commits into from
Nov 13, 2018

Conversation

mraspaud
Copy link
Member

@mraspaud mraspaud commented Nov 8, 2018

Thinned modis files, eg as received via eumetcast, were not working with the current implementation of the hdfeos reader, since it wasn't capable of reading both navigation and band data from the same file. This PR addresses the issue.

  • Tests passed
  • Passes git diff origin/master -- "*py" | flake8 --diff

@mraspaud mraspaud added this to the v0.10 milestone Nov 8, 2018
@mraspaud mraspaud self-assigned this Nov 8, 2018
@coveralls
Copy link

coveralls commented Nov 8, 2018

Coverage Status

Coverage increased (+0.2%) to 74.256% when pulling ab7294a on mraspaud:bugfix-thin-modis into a633ba2 on pytroll:master.

@codecov
Copy link

codecov bot commented Nov 8, 2018

Codecov Report

Merging #492 into master will increase coverage by 0.2%.
The diff coverage is 74.28%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #492     +/-   ##
=========================================
+ Coverage   74.05%   74.25%   +0.2%     
=========================================
  Files         137      137             
  Lines       18190    18218     +28     
=========================================
+ Hits        13470    13528     +58     
+ Misses       4720     4690     -30
Impacted Files Coverage Δ
satpy/composites/__init__.py 41.15% <100%> (+0.67%) ⬆️
satpy/readers/hdfeos_l1b.py 15.04% <27.27%> (-0.84%) ⬇️
satpy/tests/compositor_tests/__init__.py 97.77% <94.44%> (+0.25%) ⬆️
satpy/scene.py 84.03% <0%> (-0.43%) ⬇️
satpy/tests/test_scene.py 99.32% <0%> (-0.01%) ⬇️
satpy/composites/viirs.py 84.09% <0%> (ø) ⬆️
satpy/readers/hrit_msg.py 44.63% <0%> (ø) ⬆️
satpy/writers/geotiff.py 43.51% <0%> (ø) ⬆️
satpy/readers/iasi_l2.py 98.8% <0%> (+0.02%) ⬆️
satpy/tests/test_writers.py 97.31% <0%> (+0.11%) ⬆️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a633ba2...ab7294a. Read the comment docs.

Copy link
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about this on slack but maybe we should discuss it here. I am against MODIS data having saturated pixels being set to NaN. The average user with default behavior will end up with images (ignoring composites all together) that have huge black holes in the tops of clouds. The saturated fill value represents a saturated sensor, that is a pixel that's reflectance was higher than the instrument could reliably record. At least that's how I understand it. Having the reader set it to the maximum valid reflectance (1.2 iirc, or whatever the equivalent uint8 is) will produce valid data that represents what the fill value represents.

If users want to do something that is not the typical/usual/common case and have a separate mask for the saturated pixels then we can add that to the reader, but 99% of the time I don't think it would be needed.

@mraspaud
Copy link
Member Author

mraspaud commented Nov 9, 2018

Thanks for a constructive critique :) I have a different opinion, which I will explain here.
We agree on the meaning of the saturation. And I agree that large holes in the data is not what the lambda user wants to see.
However, the saturated data is marked in the level 1 data as invalid, and reading the documentation here on page 81, the pixels being marked as saturated may also have up to six more reasons to be invalid:

If more than one condition applies to a datum, the data value reflects only the condition which was encountered first

As far as I understand, there is no individual error flag variable in this format that we could know for sure that the pixel is only saturated.

Another reason I have for leaving saturated pixels as masked in the reader is that the specific channels used in the true_color_thin composite (channels 10 and 12) are not meant to be used for measuring the reflectance of clouds, but rather for darker pixels like land or sea, and are thus much more sensitive than the broader channels we usually use for the true color composite, namely 3 and 4. Someone wanting to use these channels as they are meant to be used for scientific work would not want saturated values filled as it would be difficult to then trust that the data values are actually sensed values.

Last reason, but not least, is that I think that satpy shouldn't be adding intelligence on the reader part and I want it to keep it simple and stupid. I believe that satpy could be used as a data reading interface to other software in the future, so I would like to keep this part as transparent as possible.

This being said, I totally agree that having holes in the data for imagery purposes is highly unpleasant, and I propose to treat this at a later stage.

One solution is the one I implemented here, which just fills green and blue clouds with the red clouds for the true color composite.
Another solution would be to have a modifier that takes as parameter the channels and a saturation mask (that we could have as a dataset, or even a dataset that actually returns the fill values), and fills the channels with the fixed value you propose. This way we could load individual channels too like scn.load(DatasetID(name='10', modifiers=('fill_saturated_reflectances')))

I'm of course open to further discussion and propositions :)

@djhoese
Copy link
Member

djhoese commented Nov 9, 2018

it would be difficult to then trust that the data values are actually sensed values.

I'm not sure I understand. If the pixels are marked as saturated then they are at least saturated. You're saying because they are marked as saturated, given what you said about first found fill value being assigned, that a scientist can't be sure that these values are good enough to work with? Also making the entire hdfeos_l1b reader behave a certain way because of what is available in the thin files sounds like a bad idea.

I had not thought about the modifier approach. That does make it easier, but it still requires extra work by the average user (and arguably more processing depending on how dask/xarray hands .where) to get a good image so I'm not 100% convinced.

I think there are two things to consider: what is the most common use case and do any solutions stop a user from doing the not-common use case. As a possible third concern, does the chosen solution make the MODIS reader behave differently or have to handled differently than other readers. So if we say there are 2 use cases: scientist who wants to know where saturated pixels are and image-user who wants a pretty image/quicklook. The two most plausible solutions are:

  1. Fill all "quality flag" pixels with NaNs.
  2. Fill pixels marked as saturated with max reflectance, all other quality flags as NaNs.

These two solutions could be controlled by a environment variable (SATPY_READERS__MODIS__FILL_SAT) and in the near future a donfig config setting (which could also use that environment variable. Based on this discussion it sounds like we need a DQF dataset or equivalent to fully support either of the above solutions. We had talked about something like this for the VIIRS SDRs in the past too. As for my 2 to 3 concerns, I think making images with satpy is the most common use case right now but I agree that stopping people from doing science is a bad idea. However, your solution as it is now does not distinguish between the fill values so your composite could be filling invalid pixels (not just saturated) with valid data. So it may not be the reader being smart, but it is the composite being "dumb" 😄.

Reading the documentation that you linked to now, I have a feeling that the fill values are checked in the order specified for a specific reason and I would hope that someone thought of this exact situation. That is, a saturated pixel can't also be invalid. I could ask some of the MODIS experts in my building (Liam Gumley and Kathy Strabala). As maybe a weaker argument, with your solution all fill values (bad and saturated) are marked as NaN so they can't be used without a separate mask. If saturated pixels were filled with max reflectance then at least they can sort of be detected. In how many scientific calculations is a saturated pixel not useful as a max reflectance?

For what it's worth Liam Gumley was the one who told me to fill saturated pixels with max reflectance. Admittedly this was for generating images with Polar2Grid and not for doing science. Looking at the P2G code now it looks like I only check the saturation fill value specifically for band 2 visible data and no others. Additionally, and harder to argue for, I set any "can't aggregate" fill values (65528) to max reflectance for band 2 as well. This is a known issue from what I understand where the aggregation software doesn't distinguish between saturated values and other missing values so you can end up with "can't aggregate" fills where they were all "saturated" fills before.

@djhoese djhoese changed the title Fix thinned modis reading Fix thinned modis reading in 'hdfeos_l1b' reader Nov 9, 2018
@mraspaud
Copy link
Member Author

mraspaud commented Nov 9, 2018

it would be difficult to then trust that the data values are actually sensed values.

I'm not sure I understand. If the pixels are marked as saturated then they are at least saturated. You're saying because they are marked as saturated, given what you said about first found fill value being assigned, that a scientist can't be sure that these values are good enough to work with?

I meant that filling invalid pixels (like saturated ones) can break the trust a scientist has in the data.

Also making the entire hdfeos_l1b reader behave a certain way because of what is available in the thin files sounds like a bad idea.

That is not my intention at all, and I don't see how my handling of invalid values would lead to that.

I had not thought about the modifier approach. That does make it easier, but it still requires extra work by the average user (and arguably more processing depending on how dask/xarray hands .where) to get a good image so I'm not 100% convinced.

As I stated before, I would really like the readers in general to modify the data as little as possible, and let the modifier to the rest, as we do with sun-normalisation or rayleigh scattering correction. Yes, this will make the data a bit harder to read for the satpy user that wants to load filled data, but I have a feeling that people reading individual channels are mostly interested in processing the data arrays themselves, and thus we could leave it to them on how to fill the holes.

I think there are two things to consider: what is the most common use case and do any solutions stop a user from doing the not-common use case. As a possible third concern, does the chosen solution make the MODIS reader behave differently or have to handled differently than other readers. So if we say there are 2 use cases: scientist who wants to know where saturated pixels are and image-user who wants a pretty image/quicklook. The two most plausible solutions are:

  1. Fill all "quality flag" pixels with NaNs.
  2. Fill pixels marked as saturated with max reflectance, all other quality flags as NaNs.

These two solutions could be controlled by a environment variable (SATPY_READERS__MODIS__FILL_SAT) and in the near future a donfig config setting (which could also use that environment variable. Based on this discussion it sounds like we need a DQF dataset or equivalent to fully support either of the above solutions. We had talked about something like this for the VIIRS SDRs in the past too. As for my 2 to 3 concerns, I think making images with satpy is the most common use case right now but I agree that stopping people from doing science is a bad idea. However, your solution as it is now does not distinguish between the fill values so your composite could be filling invalid pixels (not just saturated) with valid data. So it may not be the reader being smart, but it is the composite being "dumb" .

Agreed, my true_color_thin composite is just a quick fix and needs to be improved further when the quality flags will be made available from the hdfeos reader. I was just trying to address a concern raised by the user with the datasets at hand. I can be improved, but the idea behind it isn't so dumb I think.

When it comes to the pretty picture users, I don't think leaving the filling out the reader would harm them as long as be provide ready made recipes they can reuse/copy.

Regarding making the hdfeos reader different from the other readers, I don't understand how my handling of the values outside the valid_range would lead to believe that. I actually think that applying implicit corrections to the data (as opposed to explicit with a modifier) would actually be the thing making the hdfeos reader stand out, which I don't like.

As for switching the behaviour of a reader with an environment variable, please let's not do that. IMHO, a reader should just read the data, and make it available for further processing in a consistent manner.

Reading the documentation that you linked to now, I have a feeling that the fill values are checked in the order specified for a specific reason and I would hope that someone thought of this exact situation. That is, a saturated pixel can't also be invalid. I could ask some of the MODIS experts in my building (Liam Gumley and Kathy Strabala). As maybe a weaker argument, with your solution all fill values (bad and saturated) are marked as NaN so they can't be used without a separate mask. If saturated pixels were filled with max reflectance then at least they can sort of be detected. In how many scientific calculations is a saturated pixel not useful as a max reflectance?

I could be that the saturation is indeed the only thing that can happen to these pixels, indeed. I'm no expert on the format, so if you have the possibility, please as the experts.
About loosing the value that lead to a NaN, maybe we should revert to np.ma arrays then ? 😄 Joke aside, we indeed need to expose a DQF dataset so that scientific users can handle the data as fits their algorithm best. And having bright clouds masked out is actually pretty useful in the few ocean color applications I'm aware of (thresholding to get algal bloom states)

For what it's worth Liam Gumley was the one who told me to fill saturated pixels with max reflectance. Admittedly this was for generating images with Polar2Grid and not for doing science. Looking at the P2G code now it looks like I only check the saturation fill value specifically for band 2 visible data and no others. Additionally, and harder to argue for, I set any "can't aggregate" fill values (65528) to max reflectance for band 2 as well. This is a known issue from what I understand where the aggregation software doesn't distinguish between saturated values and other missing values so you can end up with "can't aggregate" fills where they were all "saturated" fills before.

As you say, this is for generating images, and I'm perfectly fine having this corrections applied at the modifying or compositing stage.

So to sum up, we have 3 use cases:

  • I think we agree that pretty pictures need filling the holes, and that there are different ways this can be done, my true_color_thin probably not being the best at the moment. We could organise a competition for the best gap filled true color 😄
  • For scientific use, we seem to diverge on the utility of gap-filling. I'm proposing channels with gaps + a DQF dataset to leave the data as untouched as possible.
  • Finally, there is the issue of the lambda user who want to play with the data, that would get mad with holes in the data. Providing a datasetid would be the fix, and I think we can live with that.

@djhoese
Copy link
Member

djhoese commented Nov 9, 2018

I meant that filling invalid pixels (like saturated ones) can break the trust a scientist has in the data.

Understood. However, I still feel like assigning max refl isn't breaking this trust and is modifying the data the way it is most useful and is what the saturated fill value represents (assuming pixels can be saturated and bad at the same time - waiting on the experts).

That is not my intention at all, and I don't see how my handling of invalid values would lead to that.

I meant that one of your justifications for why setting the saturated pixels to max refl was bad was that your thin composite uses special bands to make a true color because the traditional ones aren't available. That's how I understood that paragraph about the composite.

Yes, this will make the data a bit harder to read for the satpy user that wants to load filled data, but I have a feeling that people reading individual channels are mostly interested in processing the data arrays themselves, and thus we could leave it to them on how to fill the holes.

I understand the desire to limit how much the readers modify the data and in most other cases I would agree. That's why I think (assuming saturated doesn't mean bad) it isn't really modifying the data in my opinion. Not only does handling the data as you have it currently make it harder for users doing .load, but it makes it so every composite that could be made with MODIS data has to be redefined to add this modifier. You say that you think most people want to analyze the data when they are loading individual channels. I completely disagree with this. The majority of Polar2Grid's users only load individual channels (true color and natural color RGBs weren't even possible for the first couple versions and are still the only ones builtin). They want to see the data that their DB station recorded and show it on their website. Additionally, I think there are a lot of forecasters in the US that are used to looking at straight reflectance or BT data (they look at RGBs too but I'm not certain what they prefer). That's why I feel so strongly about the default enhancements for these data types as well.

Regarding making the hdfeos reader different from the other readers, I don't understand how my handling of the values outside the valid_range would lead to believe that. I actually think that applying implicit corrections to the data (as opposed to explicit with a modifier) would actually be the thing making the hdfeos reader stand out, which I don't like.

As for switching the behaviour of a reader with an environment variable, please let's not do that. IMHO, a reader should just read the data, and make it available for further processing in a consistent manner.

Understood. I can see what you're saying and I would agree...in most other cases. It seems the main difference between our arguments is that I don't consider saturation to be a fill value. Setting it to max reflectance is "fixing" something that I consider a deficiency of the file format and L1B design. It makes MODIS data looks like every other polar-orbiting instrument's data.

For the environment variable, I was kind of thinking it like a low-level behavior change in satpy that could apply to multiple readers. Kind of like xarray's configurable "what to do with attrs when combining DataArrays". It would be like "correct extreme values in readers" or "clip valid range data versus invalidate out of range data". Just an idea, don't feel super strongly about this.

And having bright clouds masked out is actually pretty useful in the few ocean color applications I'm aware of (thresholding to get algal bloom states)

If you have to mask out bright clouds for an algorithm then why would you not have to mask out high reflectances that weren't marked as saturated? Like a reflectance of 119% is valid, but the sensor (or L1B processing) marks reflectances of 120% as saturated. Both need to be masked out for your algorithm. This sounds like an argument for filling the saturated pixels with max reflectance. That way you don't need to load the DQF information and can just mask out anything >110% or whatever is needed.

@adybbroe
Copy link
Contributor

adybbroe commented Nov 9, 2018

Uhu, haven't read your novels above, sorry!
Just want to say that the MODIS thinned product service is a rather special (unique) data stream. Not very useful (bad timeliness) for operations. It was a demo introduced by EUMETSAT to make the transition to VIIRS more smooth - so it s also quite an old one. We included only part of the MODIS bands we thought was useful and was comparable to VIIRS.
So, for the sake of this service we should not over do it!
For the principle discussion I prefer as little intelligence as possible on the reader side.

@adybbroe
Copy link
Contributor

adybbroe commented Nov 9, 2018

I will read your novels eventually @mraspaud and @djhoese
Just now it is Friday evening, and I will close down :-)

@mraspaud
Copy link
Member Author

mraspaud commented Nov 10, 2018 via email

@djhoese
Copy link
Member

djhoese commented Nov 10, 2018

Preliminary answer: I am convinced that your way is more flexible and opens the doors to other similar situations that satpy may have to handle. I do have to be convinced on the easiest ways for users to do what you've talked about. Especially what I'll have to do in Polar2Grid to get this to work without the user knowing and of course I'll have to make it possible to turn it off probably too because...users.

As for the "would you do denoising in the reader", not in satpy, but in polar2grid I did limb correction for ATMS data in the reader. This requires all of the input channels to be able to do it so speed-wise it might be best to do it right away in the reader. But thinking about it more P2G's reader also did the things like DNB normalization algorithms in the reading step (reader + non-RGB compositors).

To be clear, P2G does this only for band 2 and does not look at these fill values for any other band. My understanding is that in normal L1B files band 2 is the only band that this fill value shows up in.

So how do we do handle this? We can move this conversation to slack probably.

@mraspaud
Copy link
Member Author

Ok, since this PR doesn't actually touch the handling of invalid values, are you (@djhoese) ok with merging this and opening another issue and PR for the saturated values ?

Copy link
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some docstrings and maybe a test for the filling compositor?

@mraspaud
Copy link
Member Author

Sure

@mraspaud mraspaud merged commit 6fc2669 into pytroll:master Nov 13, 2018
@mraspaud mraspaud deleted the bugfix-thin-modis branch November 13, 2018 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants