Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change start_time and end_time handling in combine_metadata #2737

Merged
merged 21 commits into from
Feb 15, 2024
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
8319b4f
Change time attribute averaging to min/max for start/end times
lahtinep Feb 1, 2024
782ac68
Change time attribute averaging to min/max for start/end times
lahtinep Feb 1, 2024
cf98aa2
Allow None as time value
lahtinep Feb 1, 2024
ab90428
Update combine_metadata docstring
lahtinep Feb 1, 2024
cbe2549
Merge branch 'main' into min-max-dataset-times
lahtinep Feb 1, 2024
942be9c
Do not include top-level non-time objects in shared_info as times
lahtinep Feb 1, 2024
db2bf31
Remove combine_times kwarg from multiscene.stack and default to its d…
lahtinep Feb 1, 2024
e0fe845
Remove obsolete private function
lahtinep Feb 2, 2024
6e1342f
Remove unnecessary setting of start/end time attributes
lahtinep Feb 5, 2024
13168cf
Remove assertions that don't hold when not loading an actual image
lahtinep Feb 5, 2024
f7e7570
Combine also values of 'time_parameters' dictionary items
lahtinep Feb 5, 2024
4e85a8f
Separate value combination to a new function
lahtinep Feb 6, 2024
91dbc8e
Reword docstring of combine_metadata() to include time_parameters dict
lahtinep Feb 8, 2024
9d5665c
Clarify combine_metadata() docstring
lahtinep Feb 9, 2024
83f6bc8
Update satpy/dataset/metadata.py
djhoese Feb 9, 2024
b68920f
Remove douple handling of start/end times and time parameters
lahtinep Feb 12, 2024
d1c33a1
Use datetimes when testing times
lahtinep Feb 12, 2024
3e050b0
Refactor to remove unnecessary return
lahtinep Feb 12, 2024
aee140a
Update satpy/dataset/metadata.py
pnuu Feb 13, 2024
52b2d41
Update satpy/dataset/metadata.py
pnuu Feb 13, 2024
b8a47a9
Add a warning when trying to use removed 'average_times' kwarg
lahtinep Feb 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 0 additions & 7 deletions satpy/composites/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2015-2023 Satpy developers

Check notice on line 1 in satpy/composites/__init__.py

View check run for this annotation

CodeScene Delta Analysis / CodeScene Cloud Delta Analysis (main)

✅ Getting better: Lines of Code in a Single File

The lines of code decreases from 1261 to 1255, improve code health by reducing it to 600. The number of Lines of Code in a single file. More Lines of Code lowers the code health.
#
# This file is part of satpy.
#
Expand Down Expand Up @@ -1665,13 +1665,6 @@
img.attrs["mode"] = "".join(img.bands.data)
img.attrs.pop("modifiers", None)
img.attrs.pop("calibration", None)
# Add start time if not present in the filename
if "start_time" not in img.attrs or not img.attrs["start_time"]:
import datetime as dt
img.attrs["start_time"] = dt.datetime.utcnow()
if "end_time" not in img.attrs or not img.attrs["end_time"]:
import datetime as dt
img.attrs["end_time"] = dt.datetime.utcnow()

return img

Expand Down
66 changes: 53 additions & 13 deletions satpy/dataset/metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,20 +27,25 @@
from satpy.writers.utils import flatten_dict


def combine_metadata(*metadata_objects, average_times=True):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a deprecation path, where a warning is raised if code passes average_times?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure any people are actually using this, but given that it exists means we thought someone might want to control it so I agree that it should be documented at the very least. A specific deprecation warning would be nice to have.

The changes in the multiscene code are also backwards incompatible, but very very unlikely to be used by anyone except maybe Adam and Ernst. If I remember correctly the default behavior is preserved and was changed when the related kwarg was added to the multiscene stacking function. So my vote is no deprecation warning on the multiscene stuff, but warning on the metadata.py average_times would be nice to have.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see about the deprecation warning, hopefully tomorrow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a UserWarning if someone tries to use the average_times kwarg.

def combine_metadata(*metadata_objects):
"""Combine the metadata of two or more Datasets.

If the values corresponding to any keys are not equal or do not
exist in all provided dictionaries then they are not included in
the returned dictionary. By default any keys with the word 'time'
in them and consisting of datetime objects will be averaged. This
is to handle cases where data were observed at almost the same time
but not exactly. In the interest of time, lazy arrays are compared by
object identity rather than by their contents.
the returned dictionary.

All values of the keys containing the substring 'start_time' will be set
to the earliest value and similarly for 'end_time' to latest time. All
other keys containing the word 'time' are averaged. Before these adjustments,
`None` values resulting from data that don't have times associated to them
are removed. These rules are applied also to values in the 'time_parameters'
dictionary.

pnuu marked this conversation as resolved.
Show resolved Hide resolved
In the interest of processing time, lazy arrays are compared by object
identity rather than by their contents.

Args:
*metadata_objects: MetadataObject or dict objects to combine
average_times (bool): Average any keys with 'time' in the name

Returns:
dict: the combined metadata
Expand All @@ -53,7 +58,7 @@

shared_keys = _shared_keys(info_dicts)

return _combine_shared_info(shared_keys, info_dicts, average_times)
return _combine_shared_info(shared_keys, info_dicts)


def _get_valid_dicts(metadata_objects):
Expand All @@ -75,17 +80,52 @@
return reduce(set.intersection, key_sets)


def _combine_shared_info(shared_keys, info_dicts, average_times):
def _combine_shared_info(shared_keys, info_dicts):
shared_info = {}
for key in shared_keys:
values = [info[key] for info in info_dicts]
if "time" in key and isinstance(values[0], datetime) and average_times:
shared_info[key] = average_datetimes(values)
elif _are_values_combinable(values):
shared_info[key] = values[0]
_combine_values(key, values, shared_info)

Check notice on line 87 in satpy/dataset/metadata.py

View check run for this annotation

CodeScene Delta Analysis / CodeScene Cloud Delta Analysis (main)

✅ No longer an issue: Complex Conditional

_combine_shared_info no longer has a complex conditional
return shared_info


def _combine_values(key, values, shared_info):
if "time" in key:
times = _combine_times(key, values)
if times is None:
return
shared_info[key] = times
sfinkens marked this conversation as resolved.
Show resolved Hide resolved
elif _are_values_combinable(values):
shared_info[key] = values[0]


def _combine_times(key, values):
if key == "time_parameters":
return _combine_time_parameters(values)
filtered_values = _filter_time_values(values)
if not filtered_values:
return None
if "end_time" in key:
return max(filtered_values)
elif "start_time" in key:
return min(filtered_values)
return average_datetimes(filtered_values)


def _combine_time_parameters(values):
# Assume the first item has all the keys
keys = values[0].keys()
res = {}
for key in keys:
sub_values = [itm[key] for itm in values]
res[key] = _combine_times(key, sub_values)
return res


def _filter_time_values(values):
"""Remove values that are not datetime objects."""
return [v for v in values if isinstance(v, datetime)]


def average_datetimes(datetime_list):
"""Average a series of datetime objects.

Expand Down
41 changes: 8 additions & 33 deletions satpy/multiscene/_blend_funcs.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
from __future__ import annotations

from datetime import datetime
from typing import Callable, Iterable, Mapping, Optional, Sequence

import pandas as pd
Expand All @@ -13,7 +12,6 @@
def stack(
data_arrays: Sequence[xr.DataArray],
weights: Optional[Sequence[xr.DataArray]] = None,
combine_times: bool = True,
blend_type: str = "select_with_weights"
) -> xr.DataArray:
"""Combine a series of datasets in different ways.
Expand All @@ -39,19 +37,18 @@

"""
if weights:
return _stack_with_weights(data_arrays, weights, combine_times, blend_type)
return _stack_no_weights(data_arrays, combine_times)
return _stack_with_weights(data_arrays, weights, blend_type)
return _stack_no_weights(data_arrays)


def _stack_with_weights(
datasets: Sequence[xr.DataArray],
weights: Sequence[xr.DataArray],
combine_times: bool,
blend_type: str
) -> xr.DataArray:
blend_func = _get_weighted_blending_func(blend_type)
filled_weights = list(_fill_weights_for_invalid_dataset_pixels(datasets, weights))
return blend_func(datasets, filled_weights, combine_times)
return blend_func(datasets, filled_weights)


def _get_weighted_blending_func(blend_type: str) -> Callable:
Expand Down Expand Up @@ -84,10 +81,9 @@
def _stack_blend_by_weights(
datasets: Sequence[xr.DataArray],
weights: Sequence[xr.DataArray],
combine_times: bool
) -> xr.DataArray:
"""Stack datasets blending overlap using weights."""
attrs = _combine_stacked_attrs([data_arr.attrs for data_arr in datasets], combine_times)
attrs = _combine_stacked_attrs([data_arr.attrs for data_arr in datasets])

overlays = []
for weight, overlay in zip(weights, datasets):
Expand All @@ -109,14 +105,13 @@
def _stack_select_by_weights(
datasets: Sequence[xr.DataArray],
weights: Sequence[xr.DataArray],
combine_times: bool
) -> xr.DataArray:
"""Stack datasets selecting pixels using weights."""
indices = da.argmax(da.dstack(weights), axis=-1)
if "bands" in datasets[0].dims:
indices = [indices] * datasets[0].sizes["bands"]

attrs = _combine_stacked_attrs([data_arr.attrs for data_arr in datasets], combine_times)
attrs = _combine_stacked_attrs([data_arr.attrs for data_arr in datasets])
dims = datasets[0].dims
coords = datasets[0].coords
selected_array = xr.DataArray(da.choose(indices, datasets), dims=dims, coords=coords, attrs=attrs)
Expand All @@ -125,7 +120,6 @@

def _stack_no_weights(
datasets: Sequence[xr.DataArray],
combine_times: bool
) -> xr.DataArray:
base = datasets[0].copy()
collected_attrs = [base.attrs]
Expand All @@ -136,32 +130,13 @@
except KeyError:
base = base.where(data_arr.isnull(), data_arr)

attrs = _combine_stacked_attrs(collected_attrs, combine_times)
attrs = _combine_stacked_attrs(collected_attrs)
base.attrs = attrs
return base


def _combine_stacked_attrs(collected_attrs: Sequence[Mapping], combine_times: bool) -> dict:
attrs = combine_metadata(*collected_attrs)
if combine_times and ("start_time" in attrs or "end_time" in attrs):
new_start, new_end = _get_combined_start_end_times(collected_attrs)
if new_start:
attrs["start_time"] = new_start
if new_end:
attrs["end_time"] = new_end
return attrs


def _get_combined_start_end_times(metadata_objects: Iterable[Mapping]) -> tuple[datetime | None, datetime | None]:
"""Get the start and end times attributes valid for the entire dataset series."""
start_time = None
end_time = None
for md_obj in metadata_objects:
if "start_time" in md_obj and (start_time is None or md_obj["start_time"] < start_time):
start_time = md_obj["start_time"]
if "end_time" in md_obj and (end_time is None or md_obj["end_time"] > end_time):
end_time = md_obj["end_time"]
return start_time, end_time
def _combine_stacked_attrs(collected_attrs: Sequence[Mapping]) -> dict:

Check notice on line 138 in satpy/multiscene/_blend_funcs.py

View check run for this annotation

CodeScene Delta Analysis / CodeScene Cloud Delta Analysis (main)

✅ No longer an issue: Complex Conditional

_combine_stacked_attrs no longer has a complex conditional

Check notice on line 138 in satpy/multiscene/_blend_funcs.py

View check run for this annotation

CodeScene Delta Analysis / CodeScene Cloud Delta Analysis (main)

✅ No longer an issue: Complex Conditional

_get_combined_start_end_times no longer has a complex conditional
return combine_metadata(*collected_attrs)


def timeseries(datasets):
Expand Down
15 changes: 5 additions & 10 deletions satpy/tests/multiscene_tests/test_blend.py
Original file line number Diff line number Diff line change
Expand Up @@ -245,42 +245,37 @@
("select_with_weights", _get_expected_stack_select),
("blend_with_weights", _get_expected_stack_blend),
])
@pytest.mark.parametrize("combine_times", [False, True])
def test_blend_two_scenes_using_stack_weighted(self, multi_scene_and_weights, groups,
scene1_with_weights, scene2_with_weights,
combine_times, blend_func, exp_result_func):
blend_func, exp_result_func):
"""Test stacking two scenes using weights.

Here we test that the start and end times can be combined so that they
describe the start and times of the entire data series. We also test
the various types of weighted stacking functions (ex. select, blend).

"""
from functools import partial

multi_scene, weights = multi_scene_and_weights
scene1, weights1 = scene1_with_weights
scene2, weights2 = scene2_with_weights

simple_groups = {DataQuery(name="CloudType"): groups[DataQuery(name="CloudType")]}
multi_scene.group(simple_groups)

weights = [weights[0][0], weights[1][0]]
stack_func = partial(stack, weights=weights, blend_type=blend_func, combine_times=combine_times)
stack_func = partial(stack, weights=weights, blend_type=blend_func)
weighted_blend = multi_scene.blend(blend_function=stack_func)

expected = exp_result_func(scene1, scene2)
result = weighted_blend["CloudType"].compute()
# result has NaNs and xarray's xr.testing.assert_equal doesn't support NaN comparison
np.testing.assert_allclose(result.data, expected.data)

_check_stacked_metadata(result, "CloudType")
if combine_times:
assert result.attrs["start_time"] == datetime(2023, 1, 16, 11, 9, 17)
assert result.attrs["end_time"] == datetime(2023, 1, 16, 11, 28, 1, 900000)
else:
assert result.attrs["start_time"] == datetime(2023, 1, 16, 11, 11, 7, 250000)
assert result.attrs["end_time"] == datetime(2023, 1, 16, 11, 20, 11, 950000)
assert result.attrs["start_time"] == datetime(2023, 1, 16, 11, 9, 17)
assert result.attrs["end_time"] == datetime(2023, 1, 16, 11, 28, 1, 900000)

Check notice on line 278 in satpy/tests/multiscene_tests/test_blend.py

View check run for this annotation

CodeScene Delta Analysis / CodeScene Cloud Delta Analysis (main)

✅ Getting better: Excess Number of Function Arguments

TestBlendFuncs.test_blend_two_scenes_using_stack_weighted decreases from 7 to 6 arguments, threshold = 4. This function has too many arguments, indicating a lack of encapsulation. Avoid adding more arguments.

@pytest.fixture()
def datasets_and_weights(self):
Expand Down Expand Up @@ -329,7 +324,7 @@
input_data["weights"][1][line, :] = 2
input_data["weights"][2][:, column] = 2

stack_with_weights = partial(stack, weights=input_data["weights"], combine_times=False)
stack_with_weights = partial(stack, weights=input_data["weights"])
blend_result = stack_with_weights(input_data["datasets"][0:3])

ds1 = input_data["datasets"][0]
Expand Down
4 changes: 0 additions & 4 deletions satpy/tests/test_composites.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/usr/bin/env python

Check notice on line 1 in satpy/tests/test_composites.py

View check run for this annotation

CodeScene Delta Analysis / CodeScene Cloud Delta Analysis (main)

✅ Getting better: Lines of Code in a Single File

The lines of code decreases from 1484 to 1480, improve code health by reducing it to 600. The number of Lines of Code in a single file. More Lines of Code lowers the code health.
# -*- coding: utf-8 -*-
# Copyright (c) 2018-2020 Satpy developers
#
Expand Down Expand Up @@ -1420,8 +1420,6 @@
filenames=["/foo.tif"])
register.assert_not_called()
retrieve.assert_not_called()
assert "start_time" in res.attrs
assert "end_time" in res.attrs
assert res.attrs["sensor"] is None
assert "modifiers" not in res.attrs
assert "calibration" not in res.attrs
Expand All @@ -1434,8 +1432,6 @@
res = comp()
Scene.assert_called_once_with(reader="generic_image",
filenames=["data_dir/foo.tif"])
assert "start_time" in res.attrs
assert "end_time" in res.attrs
assert res.attrs["sensor"] is None
assert "modifiers" not in res.attrs
assert "calibration" not in res.attrs
Expand Down
66 changes: 56 additions & 10 deletions satpy/tests/test_dataset.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2015-2023 Satpy developers

Check notice on line 1 in satpy/tests/test_dataset.py

View check run for this annotation

CodeScene Delta Analysis / CodeScene Cloud Delta Analysis (main)

ℹ Getting worse: Lines of Code in a Single File

The lines of code increases from 717 to 760, improve code health by reducing it to 600. The number of Lines of Code in a single file. More Lines of Code lowers the code health.

Check notice on line 1 in satpy/tests/test_dataset.py

View check run for this annotation

CodeScene Delta Analysis / CodeScene Cloud Delta Analysis (main)

ℹ Getting worse: Low Cohesion

The number of different responsibilities increases from 51 to 54, threshold = 4. Cohesion is calculated using the LCOM4 metric. Low cohesion means that the module/class has multiple unrelated responsibilities, doing too many things and breaking the Single Responsibility Principle.
#
# This file is part of satpy.
#
Expand Down Expand Up @@ -101,13 +101,42 @@

def setUp(self):
"""Set up the test case."""
self.datetime_dts = (
# The times need to be in ascending order (oldest first)
self.start_time_dts = (
{"start_time": datetime(2018, 2, 1, 11, 58, 0)},
{"start_time": datetime(2018, 2, 1, 11, 59, 0)},
{"start_time": datetime(2018, 2, 1, 12, 0, 0)},
{"start_time": datetime(2018, 2, 1, 12, 1, 0)},
{"start_time": datetime(2018, 2, 1, 12, 2, 0)},
)
self.end_time_dts = (
{"end_time": datetime(2018, 2, 1, 11, 58, 0)},
{"end_time": datetime(2018, 2, 1, 11, 59, 0)},
{"end_time": datetime(2018, 2, 1, 12, 0, 0)},
{"end_time": datetime(2018, 2, 1, 12, 1, 0)},
{"end_time": datetime(2018, 2, 1, 12, 2, 0)},
)
self.other_time_dts = (
{"other_time": datetime(2018, 2, 1, 11, 58, 0)},
{"other_time": datetime(2018, 2, 1, 11, 59, 0)},
{"other_time": datetime(2018, 2, 1, 12, 0, 0)},
{"other_time": datetime(2018, 2, 1, 12, 1, 0)},
{"other_time": datetime(2018, 2, 1, 12, 2, 0)},
)
self.start_time_dts_with_none = (
{"start_time": None},
{"start_time": datetime(2018, 2, 1, 11, 59, 0)},
{"start_time": datetime(2018, 2, 1, 12, 0, 0)},
{"start_time": datetime(2018, 2, 1, 12, 1, 0)},
{"start_time": datetime(2018, 2, 1, 12, 2, 0)},
)
self.end_time_dts_with_none = (
{"end_time": datetime(2018, 2, 1, 11, 58, 0)},
{"end_time": datetime(2018, 2, 1, 11, 59, 0)},
{"end_time": datetime(2018, 2, 1, 12, 0, 0)},
{"end_time": datetime(2018, 2, 1, 12, 1, 0)},
{"end_time": None},
)

def test_average_datetimes(self):
"""Test the average_datetimes helper function."""
Expand All @@ -122,18 +151,35 @@
ret = average_datetimes(dts)
assert dts[2] == ret

def test_combine_times_with_averaging(self):
"""Test the combine_metadata with times with averaging."""
def test_combine_start_times(self):
"""Test the combine_metadata with start times."""
from satpy.dataset.metadata import combine_metadata
ret = combine_metadata(*self.start_time_dts)
assert ret["start_time"] == self.start_time_dts[0]["start_time"]

def test_combine_end_times(self):
"""Test the combine_metadata with end times."""
from satpy.dataset.metadata import combine_metadata
ret = combine_metadata(*self.end_time_dts)
assert ret["end_time"] == self.end_time_dts[-1]["end_time"]

def test_combine_start_times_with_none(self):
"""Test the combine_metadata with start times when there's a None included."""
from satpy.dataset.metadata import combine_metadata
ret = combine_metadata(*self.start_time_dts_with_none)
assert ret["start_time"] == self.start_time_dts_with_none[1]["start_time"]

def test_combine_end_times_with_none(self):
"""Test the combine_metadata with end times when there's a None included."""
from satpy.dataset.metadata import combine_metadata
ret = combine_metadata(*self.datetime_dts)
assert self.datetime_dts[2]["start_time"] == ret["start_time"]
ret = combine_metadata(*self.end_time_dts_with_none)
assert ret["end_time"] == self.end_time_dts_with_none[-2]["end_time"]

def test_combine_times_without_averaging(self):
"""Test the combine_metadata with times without averaging."""
def test_combine_other_times(self):
"""Test the combine_metadata with other time values than start or end times."""
from satpy.dataset.metadata import combine_metadata
ret = combine_metadata(*self.datetime_dts, average_times=False)
# times are not equal so don't include it in the final result
assert "start_time" not in ret
ret = combine_metadata(*self.other_time_dts)
assert ret["other_time"] == self.other_time_dts[2]["other_time"]

def test_combine_arrays(self):
"""Test the combine_metadata with arrays."""
Expand Down
Loading