Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change start_time and end_time handling in combine_metadata #2737

Merged
merged 21 commits into from
Feb 15, 2024

Conversation

pnuu
Copy link
Member

@pnuu pnuu commented Feb 1, 2024

The times of datasets used in composites were averaged to get the final time values for the composite. With this PR, the start_time and end_time attributes are instead changed to use the earliest and latest values, respectively. In addition, for StaticImageCompositor the default start_time and end_time values are set to None if they are not available in the filename.

@pnuu pnuu added bug enhancement code enhancements, features, improvements component:compositors labels Feb 1, 2024
@pnuu pnuu requested a review from gerritholl February 1, 2024 13:17
@pnuu pnuu self-assigned this Feb 1, 2024
@pnuu pnuu requested a review from zxdawn February 1, 2024 13:30
@pnuu
Copy link
Member Author

pnuu commented Feb 1, 2024

Going through the failing tests. The others are easy to fix, but I'm not sure what combine_times=False behaviour should be in satpy.multiscene.stack(). Looks like then the average is used, but would it ok to just skip that option and always use min/max of start/end times?

@pnuu
Copy link
Member Author

pnuu commented Feb 1, 2024

The satpy.multiscene.stack() option combine_times is not documented, so that would indicate that the deletion of it would be safe.

Copy link

codecov bot commented Feb 1, 2024

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (eb4ac0b) 95.40% compared to head (b8a47a9) 95.89%.
Report is 12 commits behind head on main.

Files Patch % Lines
satpy/dataset/metadata.py 97.14% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2737      +/-   ##
==========================================
+ Coverage   95.40%   95.89%   +0.48%     
==========================================
  Files         371      371              
  Lines       52825    52826       +1     
==========================================
+ Hits        50399    50656     +257     
+ Misses       2426     2170     -256     
Flag Coverage Δ
behaviourtests 4.16% <9.75%> (+<0.01%) ⬆️
unittests 95.99% <98.78%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@coveralls
Copy link

coveralls commented Feb 1, 2024

Pull Request Test Coverage Report for Build 7897556037

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • -1 of 82 (98.78%) changed or added relevant lines in 6 files are covered.
  • 5 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.003%) to 95.971%

Changes Missing Coverage Covered Lines Changed/Added Lines %
satpy/dataset/metadata.py 34 35 97.14%
Files with Coverage Reduction New Missed Lines %
satpy/tests/test_readers.py 1 99.36%
satpy/readers/init.py 4 98.65%
Totals Coverage Status
Change from base Build 7726219656: 0.003%
Covered Lines: 50528
Relevant Lines: 52649

💛 - Coveralls

Copy link
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome job @pnuu! This looks good to me. It is obviously backwards incompatible, but I think this is the right path forward. I had a couple inline questions. The biggest one is what to do with time_parameters (see comment). Otherwise, I think the MultiScene removal of the combine_times makes sense. The only reason it existed was because it wasn't done in combine_metadata so if it happens in combine_metadata then great. It was only done in MultiScene because I wasn't sure if we wanted that behavior everywhere.

Some other concerns: What happens in Scene.save_datasets if start_time is None? I believe the default filename pattern includes {start_time:%Y%m%d_%H%M%S}. How do we want to handle that? Let it fail?

satpy/composites/__init__.py Outdated Show resolved Hide resolved
satpy/dataset/metadata.py Outdated Show resolved Hide resolved
@pnuu
Copy link
Member Author

pnuu commented Feb 5, 2024

What happens in Scene.save_datasets if start_time is None?
How do we want to handle that? Let it fail?

The only case this should happen if the user is saving the plain data from generic_image reader that didn't have the time available in the filename. I guess we could add some kind of error handling for this case, but I'm not sure it's worth the effort 🤔

@djhoese
Copy link
Member

djhoese commented Feb 5, 2024

The only case this should happen if the user is saving the plain data from generic_image reader that didn't have the time available in the filename. I guess we could add some kind of error handling for this case, but I'm not sure it's worth the effort 🤔

Good point. I guess my only other fear would be odd situations with the MultiScene where you need to resample and have a static image, but the MultiScene wants to do something with ordering by start time...nah this shouldn't be a problem. Ok sounds good to not worry about it.

satpy/dataset/metadata.py Outdated Show resolved Hide resolved
satpy/dataset/metadata.py Outdated Show resolved Hide resolved
Copy link
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good. I think we should get @mraspaud, @sfinkens, @gerritholl, and @ameraner or @strandgren's opinions since this will affect all granule and segment based readers. I assume this group of developers have the widest experience with potential time-based edge cases.

@djhoese
Copy link
Member

djhoese commented Feb 9, 2024

Oh @pnuu I think this min/max code (including the time_parameters method it calls) could be removed:

new_dict = self._combine(all_infos, min, "start_time", "start_orbit")
new_dict.update(self._combine(all_infos, max, "end_time", "end_orbit"))
new_dict.update(self._combine_orbital_parameters(all_infos))
new_dict.update(self._combine_time_parameters(all_infos))

Copy link
Member

@sfinkens sfinkens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, thanks!

satpy/dataset/metadata.py Outdated Show resolved Hide resolved
@pnuu
Copy link
Member Author

pnuu commented Feb 12, 2024

Oh @pnuu I think this min/max code (including the time_parameters method it calls) could be removed:

new_dict = self._combine(all_infos, min, "start_time", "start_orbit")
new_dict.update(self._combine(all_infos, max, "end_time", "end_orbit"))
new_dict.update(self._combine_orbital_parameters(all_infos))
new_dict.update(self._combine_time_parameters(all_infos))

Removed the duplicate handling of times and adjusted the file handler test to actually use datetimes.

Copy link
Member

@ameraner ameraner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for sorting this out!
I don't see any problem regarding the segmented readers with this, since it seems to me that the previous behaviour in terms of min/max calculations for start, nominal and observation times is preserved. The segment sorting etc. is anyway not impacted by this, as it's based on chunk numbering from the single filehandlers.

Note: As discussed above, what still worries me a little bit is indeed the generic_image reader possibly returning datasets without a valid start_time... I think there are users that use satpy for simple operations like opening a geotiff, resample it and save it again, which could fail. Or maybe also applications like SIFT that may rely on a dataset having a start_time. But this goes outside the scope of this PR, which at least correctly fixed the composites misbehaviours.

Copy link
Collaborator

@gerritholl gerritholl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this work! There is a small risk that users will notice this backwards-incompatibility, so I have made a suggestion on explicitly mentioning in the documentation that the behaviour has changed, and (optionally) on raising a DeprecrationWarning or similar if a user does still pass combine_times.

@@ -27,33 +27,37 @@
from satpy.writers.utils import flatten_dict


def combine_metadata(*metadata_objects, average_times=True):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a deprecation path, where a warning is raised if code passes average_times?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure any people are actually using this, but given that it exists means we thought someone might want to control it so I agree that it should be documented at the very least. A specific deprecation warning would be nice to have.

The changes in the multiscene code are also backwards incompatible, but very very unlikely to be used by anyone except maybe Adam and Ernst. If I remember correctly the default behavior is preserved and was changed when the related kwarg was added to the multiscene stacking function. So my vote is no deprecation warning on the multiscene stuff, but warning on the metadata.py average_times would be nice to have.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see about the deprecation warning, hopefully tomorrow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a UserWarning if someone tries to use the average_times kwarg.

satpy/dataset/metadata.py Show resolved Hide resolved
@djhoese
Copy link
Member

djhoese commented Feb 13, 2024

Note: As discussed above, what still worries me a little bit is indeed the generic_image reader possibly returning datasets without a valid start_time... I think there are users that use satpy for simple operations like opening a geotiff, resample it and save it again, which could fail.

@ameraner good point, but skimming the changes in this PR again, I don't think the generic_image reader's behavior has changed at all. It was already returning a start_time of None and it was up to the user to override that...right?

satpy/dataset/metadata.py Outdated Show resolved Hide resolved
@ameraner
Copy link
Member

@ameraner good point, but skimming the changes in this PR again, I don't think the generic_image reader's behavior has changed at all. It was already returning a start_time of None and it was up to the user to override that...right?

Yes, indeed. Changing that would be outside the scope of this PR, and I'm not sure what the best solution would be anyway (since giving a "dummy" start_time can mess up other calculations, as we see here).

@gerritholl
Copy link
Collaborator

what still worries me a little bit is indeed the generic_image reader possibly returning datasets without a valid start_time...

I am not convinced that the generic_image reader should guarantee a (valid) start_time. We could possibly expose whatever times are in image metadata (such as EXIF headers) or file metadata, but the type of imagery read by the generic_image reader is too diverse for any of those to be generally valid as a start_time, as elsewhere in Satpy, this refers to the measurement time, not to the image creation time. Dealing with images that don't have a start_time would rather seem to be the responsibility of downstream users.

Copy link
Collaborator

@gerritholl gerritholl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@pnuu
Copy link
Member Author

pnuu commented Feb 14, 2024

@ameraner good point, but skimming the changes in this PR again, I don't think the generic_image reader's behavior has changed at all. It was already returning a start_time of None and it was up to the user to override that...right?

Yes, indeed. Changing that would be outside the scope of this PR, and I'm not sure what the best solution would be anyway (since giving a "dummy" start_time can mess up other calculations, as we see here).

This could be handled in the writer (in another PR) with a simple (pseudo-code)

if self.start_time is None:
    self.start_time = dt.datetime.utcnow()
fname = self.compose_fname_from_stuff()

@djhoese
Copy link
Member

djhoese commented Feb 14, 2024

This could be handled in the writer (in another PR) with a simple (pseudo-code)

Eh, too much magic. If the start_time being None is a problem then the user should have to work around it. For example, if the filename generation is the problem then they should save it with a different filename template string.

@mraspaud
Copy link
Member

Everybody seems happy about this one, merging

@mraspaud mraspaud merged commit 25d5357 into pytroll:main Feb 15, 2024
18 of 19 checks passed
@pnuu pnuu deleted the min-max-dataset-times branch February 15, 2024 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug component:compositors enhancement code enhancements, features, improvements
Projects
None yet
8 participants