-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix thinned modis reading in 'hdfeos_l1b' reader #492
Conversation
Codecov Report
@@ Coverage Diff @@
## master #492 +/- ##
=========================================
+ Coverage 74.05% 74.25% +0.2%
=========================================
Files 137 137
Lines 18190 18218 +28
=========================================
+ Hits 13470 13528 +58
+ Misses 4720 4690 -30
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked about this on slack but maybe we should discuss it here. I am against MODIS data having saturated pixels being set to NaN. The average user with default behavior will end up with images (ignoring composites all together) that have huge black holes in the tops of clouds. The saturated fill value represents a saturated sensor, that is a pixel that's reflectance was higher than the instrument could reliably record. At least that's how I understand it. Having the reader set it to the maximum valid reflectance (1.2 iirc, or whatever the equivalent uint8 is) will produce valid data that represents what the fill value represents.
If users want to do something that is not the typical/usual/common case and have a separate mask for the saturated pixels then we can add that to the reader, but 99% of the time I don't think it would be needed.
Thanks for a constructive critique :) I have a different opinion, which I will explain here.
As far as I understand, there is no individual error flag variable in this format that we could know for sure that the pixel is only saturated. Another reason I have for leaving saturated pixels as masked in the reader is that the specific channels used in the true_color_thin composite (channels 10 and 12) are not meant to be used for measuring the reflectance of clouds, but rather for darker pixels like land or sea, and are thus much more sensitive than the broader channels we usually use for the true color composite, namely 3 and 4. Someone wanting to use these channels as they are meant to be used for scientific work would not want saturated values filled as it would be difficult to then trust that the data values are actually sensed values. Last reason, but not least, is that I think that satpy shouldn't be adding intelligence on the reader part and I want it to keep it simple and stupid. I believe that satpy could be used as a data reading interface to other software in the future, so I would like to keep this part as transparent as possible. This being said, I totally agree that having holes in the data for imagery purposes is highly unpleasant, and I propose to treat this at a later stage. One solution is the one I implemented here, which just fills green and blue clouds with the red clouds for the true color composite. I'm of course open to further discussion and propositions :) |
I'm not sure I understand. If the pixels are marked as saturated then they are at least saturated. You're saying because they are marked as saturated, given what you said about first found fill value being assigned, that a scientist can't be sure that these values are good enough to work with? Also making the entire hdfeos_l1b reader behave a certain way because of what is available in the thin files sounds like a bad idea. I had not thought about the modifier approach. That does make it easier, but it still requires extra work by the average user (and arguably more processing depending on how dask/xarray hands I think there are two things to consider: what is the most common use case and do any solutions stop a user from doing the not-common use case. As a possible third concern, does the chosen solution make the MODIS reader behave differently or have to handled differently than other readers. So if we say there are 2 use cases: scientist who wants to know where saturated pixels are and image-user who wants a pretty image/quicklook. The two most plausible solutions are:
These two solutions could be controlled by a environment variable ( Reading the documentation that you linked to now, I have a feeling that the fill values are checked in the order specified for a specific reason and I would hope that someone thought of this exact situation. That is, a saturated pixel can't also be invalid. I could ask some of the MODIS experts in my building (Liam Gumley and Kathy Strabala). As maybe a weaker argument, with your solution all fill values (bad and saturated) are marked as NaN so they can't be used without a separate mask. If saturated pixels were filled with max reflectance then at least they can sort of be detected. In how many scientific calculations is a saturated pixel not useful as a max reflectance? For what it's worth Liam Gumley was the one who told me to fill saturated pixels with max reflectance. Admittedly this was for generating images with Polar2Grid and not for doing science. Looking at the P2G code now it looks like I only check the saturation fill value specifically for band 2 visible data and no others. Additionally, and harder to argue for, I set any "can't aggregate" fill values (65528) to max reflectance for band 2 as well. This is a known issue from what I understand where the aggregation software doesn't distinguish between saturated values and other missing values so you can end up with "can't aggregate" fills where they were all "saturated" fills before. |
I meant that filling invalid pixels (like saturated ones) can break the trust a scientist has in the data.
That is not my intention at all, and I don't see how my handling of invalid values would lead to that.
As I stated before, I would really like the readers in general to modify the data as little as possible, and let the modifier to the rest, as we do with sun-normalisation or rayleigh scattering correction. Yes, this will make the data a bit harder to read for the satpy user that wants to load filled data, but I have a feeling that people reading individual channels are mostly interested in processing the data arrays themselves, and thus we could leave it to them on how to fill the holes.
Agreed, my When it comes to the pretty picture users, I don't think leaving the filling out the reader would harm them as long as be provide ready made recipes they can reuse/copy. Regarding making the hdfeos reader different from the other readers, I don't understand how my handling of the values outside the valid_range would lead to believe that. I actually think that applying implicit corrections to the data (as opposed to explicit with a modifier) would actually be the thing making the hdfeos reader stand out, which I don't like. As for switching the behaviour of a reader with an environment variable, please let's not do that. IMHO, a reader should just read the data, and make it available for further processing in a consistent manner.
I could be that the saturation is indeed the only thing that can happen to these pixels, indeed. I'm no expert on the format, so if you have the possibility, please as the experts.
As you say, this is for generating images, and I'm perfectly fine having this corrections applied at the modifying or compositing stage. So to sum up, we have 3 use cases:
|
Understood. However, I still feel like assigning max refl isn't breaking this trust and is modifying the data the way it is most useful and is what the saturated fill value represents (assuming pixels can be saturated and bad at the same time - waiting on the experts).
I meant that one of your justifications for why setting the saturated pixels to max refl was bad was that your thin composite uses special bands to make a true color because the traditional ones aren't available. That's how I understood that paragraph about the composite.
I understand the desire to limit how much the readers modify the data and in most other cases I would agree. That's why I think (assuming saturated doesn't mean bad) it isn't really modifying the data in my opinion. Not only does handling the data as you have it currently make it harder for users doing
Understood. I can see what you're saying and I would agree...in most other cases. It seems the main difference between our arguments is that I don't consider saturation to be a fill value. Setting it to max reflectance is "fixing" something that I consider a deficiency of the file format and L1B design. It makes MODIS data looks like every other polar-orbiting instrument's data. For the environment variable, I was kind of thinking it like a low-level behavior change in satpy that could apply to multiple readers. Kind of like xarray's configurable "what to do with attrs when combining DataArrays". It would be like "correct extreme values in readers" or "clip valid range data versus invalidate out of range data". Just an idea, don't feel super strongly about this.
If you have to mask out bright clouds for an algorithm then why would you not have to mask out high reflectances that weren't marked as saturated? Like a reflectance of 119% is valid, but the sensor (or L1B processing) marks reflectances of 120% as saturated. Both need to be masked out for your algorithm. This sounds like an argument for filling the saturated pixels with max reflectance. That way you don't need to load the DQF information and can just mask out anything |
Uhu, haven't read your novels above, sorry! |
Ok, so a few things.
Saturation. For me filling saturated pixels with a fixed value isn't fixing
the data, it's just making it unreliable. By masking it, the sensor admits
it can't measure above that value, so instead of giving you a wrong value
(e.g. 120% instead of the actual 130 or 140), it's just saying you it can't
give you a reliable physical measurement for it. And I don't think it's the
reader's role to make a guess at what it should be, because it depends on
the application. Now, don't get me wrong, I understand and agree with your
point that for e.g. band 2, the saturation is probably a defect of the
detectors, but this can't be generalised to all bands (See the paragraph on
special band further down). Still, imho fixing the data isn't the role of
the reader. E.g. would you do denoising in the reader ?
Single channel images. We do use them too, and of course we want them to
look nice. However, just as we have separated composites from enhancements
to keep image operations from altering the physical meaning of the data, I
believe the reader shouldn't implement anything composite/image related.
What users want is single channel *composites*, ie a *visualisation* of the
sensor data, which means that enhancements and other effects might need to
be added to the actual sensor data, like eg gap filling.
Special bands in thinned products. They are not special, they are also
available in the regular modis l1b. It's just that we don't have 3 and 4 in
thinned l1b, so we use these instead. But because they are meant for other
things than cloud observation, they saturate faster (looks lower than
100%). So to come back to your point that you want to make modis look like
other polar orbiting imager, I'm sorry to say that modis is different. It
is mostly made for sea and land observation, so the bands are made
differently than let's say viirs, which is mostly made for meteorology. We
work of course with modis in meteorology, but we need to keep in mind the
limitations of some of the bands in the sensor. Adapting the higher
sensitivity bands for cloud observation is fine to make pretty pictures,
but I don't want this adaptation to happen at reading time, because we
don't know what application the data will be used for.
I hope what I wrote makes sense, it's kind of late now 😄. Tell me if
I missed something.
|
Preliminary answer: I am convinced that your way is more flexible and opens the doors to other similar situations that satpy may have to handle. I do have to be convinced on the easiest ways for users to do what you've talked about. Especially what I'll have to do in Polar2Grid to get this to work without the user knowing and of course I'll have to make it possible to turn it off probably too because...users. As for the "would you do denoising in the reader", not in satpy, but in polar2grid I did limb correction for ATMS data in the reader. This requires all of the input channels to be able to do it so speed-wise it might be best to do it right away in the reader. But thinking about it more P2G's reader also did the things like DNB normalization algorithms in the reading step (reader + non-RGB compositors). To be clear, P2G does this only for band 2 and does not look at these fill values for any other band. My understanding is that in normal L1B files band 2 is the only band that this fill value shows up in. So how do we do handle this? We can move this conversation to slack probably. |
Ok, since this PR doesn't actually touch the handling of invalid values, are you (@djhoese) ok with merging this and opening another issue and PR for the saturated values ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some docstrings and maybe a test for the filling compositor?
Sure |
Thinned modis files, eg as received via eumetcast, were not working with the current implementation of the hdfeos reader, since it wasn't capable of reading both navigation and band data from the same file. This PR addresses the issue.
git diff origin/master -- "*py" | flake8 --diff