Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: xarray/main #413

Draft
wants to merge 31 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
8b92e71
import directly to xarray and rewrite utils_transformation to work wi…
mats-knmi Jul 17, 2024
23825dd
convert to_rainrate function also
mats-knmi Jul 17, 2024
9952bd3
Rewrite more code to xarray
mats-knmi Jul 18, 2024
904bce8
conversion.py works on the xarray datamodel
mats-knmi Jul 18, 2024
b3aa009
fix converters.py
mats-knmi Jul 18, 2024
016d28f
make dimension.py xarray compatible (#397)
mats-knmi Aug 12, 2024
e7c081c
make all nowcast methods xarray compatible (#414)
mats-knmi Sep 2, 2024
c2ae9db
Added member and time dimension (#432)
gjm174 Sep 23, 2024
9ea07fc
Initial commit to branch
sidekock Oct 8, 2024
53a98a9
Working on _apply_noise_and_ar_model method
sidekock Oct 8, 2024
0c5185f
Only update function needs to be added
sidekock Oct 8, 2024
46bc44a
Fully refactored code
sidekock Oct 9, 2024
eb33c06
Possible solution for errors solved
sidekock Oct 9, 2024
a1ce4bc
Fixed small bug in _nowcast_main
sidekock Oct 9, 2024
b002354
Name changes from feedback Ruben
sidekock Oct 22, 2024
e105a24
Name changes from feedback Ruben v2
sidekock Oct 22, 2024
b6c47af
Name changes from feedback Ruben v3
sidekock Oct 22, 2024
5aaf24d
Added config dataclass to steps nowcast
sidekock Oct 31, 2024
fa9a1ef
Added config dataclass to steps nowcast, v2
sidekock Oct 31, 2024
3da1696
Added config dataclass to steps nowcast, v3
sidekock Oct 31, 2024
8c7982c
Added config dataclass to steps nowcast, v4
sidekock Oct 31, 2024
aa26517
Added config dataclass to steps nowcast, fixed some assignment issues…
sidekock Oct 31, 2024
ff2e2be
Fixed num_ensemble_workers bug
sidekock Oct 31, 2024
2543683
Added params and state dataclasses
sidekock Oct 31, 2024
37739de
Implemented a reset of the states and params to allow for multiple fo…
sidekock Oct 31, 2024
f123bde
Removed some redundant fields
sidekock Oct 31, 2024
1a71e61
Update pysteps/nowcasts/steps.py
sidekock Nov 4, 2024
db953fb
Added suggested changed by Mats regarding __ and typing
sidekock Nov 4, 2024
eff9248
Possible fix for static code analysis
sidekock Nov 4, 2024
663a9a2
Added the needed documentation to the class
sidekock Nov 4, 2024
3226042
Merge remote-tracking branch 'origin/refactoring_steps' into xarray/main
mats-knmi Nov 6, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
repos:
- repo: https://github.com/psf/black
rev: 24.8.0
rev: 24.10.0
hooks:
- id: black
language_version: python3
1 change: 1 addition & 0 deletions ci/ci_test_env.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ dependencies:
- pillow
- pyproj
- scipy
- xarray
# Optional dependencies
- dask
- pyfftw
Expand Down
12 changes: 9 additions & 3 deletions pysteps/decorators.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@

import numpy as np

from pysteps.xarray_helpers import convert_input_to_xarray_dataset


def _add_extra_kwrds_to_docstrings(target_func, extra_kwargs_doc_text):
"""
Expand Down Expand Up @@ -66,7 +68,7 @@ def postprocess_import(fillna=np.nan, dtype="double"):
def _postprocess_import(importer):
@wraps(importer)
def _import_with_postprocessing(*args, **kwargs):
precip, *other_args = importer(*args, **kwargs)
precip, quality, metadata = importer(*args, **kwargs)

_dtype = kwargs.get("dtype", dtype)

Expand All @@ -88,7 +90,9 @@ def _import_with_postprocessing(*args, **kwargs):
mask = ~np.isfinite(precip)
precip[mask] = _fillna

return (precip.astype(_dtype),) + tuple(other_args)
return convert_input_to_xarray_dataset(
precip.astype(_dtype), quality, metadata
)

extra_kwargs_doc = """
Other Parameters
Expand Down Expand Up @@ -124,7 +128,9 @@ def new_function(*args, **kwargs):
target motion_method_func function.
"""

input_images = args[0]
dataset = args[0]
precip_var = dataset.attrs["precip_var"]
input_images = dataset[precip_var].values
if input_images.ndim != 3:
raise ValueError(
"input_images dimension mismatch.\n"
Expand Down
102 changes: 99 additions & 3 deletions pysteps/io/importers.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,104 @@
| zr_b | the Z-R exponent b in Z = a*R**b |
+------------------+----------------------------------------------------------+

The data and metadata is then postprocessed into an xarray dataset. This dataset will
always contain an x and y dimension, but can be extended with a time dimension and/or
an ensemble member dimension over the course of the process.

The dataset can contain the following coordinate variables:


.. tabularcolumns:: |p{2cm}|L|

+--------------------+-------------------------------------------------------------------------------------------+
| Coordinate | Description |
+====================+===========================================================================================+
| y | y-coordinate in Cartesian system, with units determined by ``metadata["cartesian_unit"]`` |
+--------------------+-------------------------------------------------------------------------------------------+
| x | x-coordinate in Cartesian system, with units determined by ``metadata["cartesian_unit"]`` |
+--------------------+-------------------------------------------------------------------------------------------+
| lat | latitude coordinate in degrees |
+--------------------+-------------------------------------------------------------------------------------------+
| lon | longitude coordinate in degrees |
+--------------------+-------------------------------------------------------------------------------------------+
| time | forecast time in seconds since forecast start time |
+--------------------+-------------------------------------------------------------------------------------------+
| ens_number | ensemble member number (integer) |
+--------------------+-------------------------------------------------------------------------------------------+
| direction | used by proesmans to return the forward and backward advection and consistency fields |
+--------------------+-------------------------------------------------------------------------------------------+

The time, x and y dimensions all MUST be regularly spaced, with the stepsize included
in a ``stepsize`` attribute. The stepsize is given in the unit of the dimension (this
is alwyas seconds for the time dimension).

The dataset can contain the following data variables:

.. tabularcolumns:: |p{2cm}|L|

+-------------------+-----------------------------------------------------------------------------------------------------------+
| Variable | Description |
+===================+===========================================================================================================+
| precip_intensity, | precipitation data, based on the unit the data has it is stored in one of these 3 possible variables |
| precip_accum | precip_intensity if unit is ``mm/h``, precip_accum if unit is ``mm`` and reflectivity if unit is ``dBZ``, |
| or reflectivity | the attributes of this variable contain metadata relevant to this attribute (see below) |
+-------------------+-----------------------------------------------------------------------------------------------------------+
| velocity_x | x-component of the advection field in cartesian_unit per timestep |
+-------------------+-----------------------------------------------------------------------------------------------------------+
| velocity_y | y-component of the advection field in cartesian_unit per timestep |
+-------------------+-----------------------------------------------------------------------------------------------------------+
| quality | value between 0 and 1 denoting the quality of the precipitation data, currently not used for anything |
+-------------------+-----------------------------------------------------------------------------------------------------------+
| velocity_quality | value between 0 and 1 denoting the quality of the velocity data, currently only returned by proesmans |
+-------------------+-----------------------------------------------------------------------------------------------------------+

Some of the metadata in the metadata dictionary is not explicitely stored in the dataset,
but is still implicitly present. For example ``x1`` can easily be found by taking the first
value from the x coordinate variable. Metadata that is not implicitly present is explicitly
stored either in the datasets global attributes or as attributes of the precipitation variable.
Data that relates to the entire dataset is stored in the global attributes. The following data
is stored in the global attributes:

.. tabularcolumns:: |p{2cm}|L|

+------------------+----------------------------------------------------------+
| Key | Value |
+==================+==========================================================+
| projection | PROJ.4-compatible projection definition |
+------------------+----------------------------------------------------------+
| institution | name of the institution who provides the data |
+------------------+----------------------------------------------------------+
| precip_var | the name of the precipitation variable in this dataset |
+------------------+----------------------------------------------------------+

The following data is stored as attributes of the precipitation variable:

.. tabularcolumns:: |p{2cm}|L|

+------------------+----------------------------------------------------------+
| Key | Value |
+==================+==========================================================+
| units | the physical unit of the data: 'mm/h', 'mm' or 'dBZ' |
+------------------+----------------------------------------------------------+
| transform | the transformation of the data: None, 'dB', 'Box-Cox' or |
| | others |
+------------------+----------------------------------------------------------+
| accutime | the accumulation time in minutes of the data, float |
+------------------+----------------------------------------------------------+
| threshold | the rain/no rain threshold with the same unit, |
| | transformation and accutime of the data. |
+------------------+----------------------------------------------------------+
| zerovalue | the value assigned to the no rain pixels with the same |
| | unit, transformation and accutime of the data. |
+------------------+----------------------------------------------------------+
| zr_a | the Z-R constant a in Z = a*R**b |
+------------------+----------------------------------------------------------+
| zr_b | the Z-R exponent b in Z = a*R**b |
+------------------+----------------------------------------------------------+

Furthermore the dataset can contain some additional metadata to make the dataset
CF-compliant.

Available Importers
-------------------

Expand All @@ -89,12 +187,10 @@
from functools import partial

import numpy as np

from matplotlib.pyplot import imread

from pysteps.decorators import postprocess_import
from pysteps.exceptions import DataModelError
from pysteps.exceptions import MissingOptionalDependency
from pysteps.exceptions import DataModelError, MissingOptionalDependency
from pysteps.utils import aggregate_fields

try:
Expand Down
77 changes: 46 additions & 31 deletions pysteps/io/readers.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,14 @@
"""

import numpy as np
import xarray as xr


def read_timeseries(inputfns, importer, **kwargs):
def read_timeseries(inputfns, importer, timestep=None, **kwargs) -> xr.Dataset | None:
"""
Read a time series of input files using the methods implemented in the
:py:mod:`pysteps.io.importers` module and stack them into a 3d array of
shape (num_timesteps, height, width).
:py:mod:`pysteps.io.importers` module and stack them into a 3d xarray
dataset of shape (num_timesteps, height, width).

Parameters
----------
Expand All @@ -27,55 +28,69 @@ def read_timeseries(inputfns, importer, **kwargs):
:py:mod:`pysteps.io.archive` module.
importer: function
A function implemented in the :py:mod:`pysteps.io.importers` module.
timestep: int, optional
The timestep in seconds, this value is optional if more than 1 inputfns
are given.
kwargs: dict
Optional keyword arguments for the importer.

Returns
-------
out: tuple
A three-element tuple containing the read data and quality rasters and
out: Dataset
A dataset containing the read data and quality rasters and
associated metadata. If an input file name is None, the corresponding
precipitation and quality fields are filled with nan values. If all
input file names are None or if the length of the file name list is
zero, a three-element tuple containing None values is returned.
zero, None is returned.

"""

# check for missing data
precip_ref = None
dataset_ref = None
if all(ifn is None for ifn in inputfns):
return None, None, None
return None
else:
if len(inputfns[0]) == 0:
return None, None, None
return None
for ifn in inputfns[0]:
if ifn is not None:
precip_ref, quality_ref, metadata = importer(ifn, **kwargs)
dataset_ref = importer(ifn, **kwargs)
break

if precip_ref is None:
return None, None, None
if dataset_ref is None:
return None

precip = []
quality = []
timestamps = []
startdate = min(inputfns[1])
sorted_dates = sorted(inputfns[1])
timestep_dates = int((sorted_dates[1] - sorted_dates[0]).total_seconds())

if timestep is None:
timestep = timestep_dates
if timestep != timestep_dates:
raise ValueError("given timestep does not match inputfns")
for i in range(len(sorted_dates) - 1):
if int((sorted_dates[i + 1] - sorted_dates[i]).total_seconds()) != timestep:
raise ValueError("supplied dates are not evenly spaced")

datasets = []
for i, ifn in enumerate(inputfns[0]):
if ifn is not None:
precip_, quality_, _ = importer(ifn, **kwargs)
precip.append(precip_)
quality.append(quality_)
timestamps.append(inputfns[1][i])
dataset_ = importer(ifn, **kwargs)
else:
precip.append(precip_ref * np.nan)
if quality_ref is not None:
quality.append(quality_ref * np.nan)
else:
quality.append(None)
timestamps.append(inputfns[1][i])

# Replace this with stack?
precip = np.concatenate([precip_[None, :, :] for precip_ in precip])
# TODO: Q should be organized as R, but this is not trivial as Q_ can be also None or a scalar
metadata["timestamps"] = np.array(timestamps)
dataset_ = dataset_ref * np.nan
dataset_ = dataset_.expand_dims(dim="time", axis=0)
dataset_ = dataset_.assign_coords(
time=(
"time",
[inputfns[1][i]],
{
"long_name": "forecast time",
"units": f"seconds since {startdate:%Y-%m-%d %H:%M:%S}",
"stepsize": timestep,
},
)
)
datasets.append(dataset_)

return precip, quality, metadata
dataset = xr.concat(datasets, dim="time")
return dataset
26 changes: 17 additions & 9 deletions pysteps/motion/constant.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,27 +14,32 @@

import numpy as np
import scipy.optimize as op
import xarray as xr
from scipy.ndimage import map_coordinates


def constant(R, **kwargs):
def constant(dataset: xr.Dataset, **kwargs):
"""
Compute a constant advection field by finding a translation vector that
maximizes the correlation between two successive images.

Parameters
----------
R: array_like
Array of shape (T,m,n) containing a sequence of T two-dimensional input
images of shape (m,n). If T > 2, two last elements along axis 0 are used.
dataset: xarray.Dataset
Input dataset as described in the documentation of
:py:mod:`pysteps.io.importers`. It has to contain a precipitation data variable.
The dataset has to have a time dimension. If the size of this dimension
is larger than 2, the last 2 entries of this dimension are used.

Returns
-------
out: array_like
The constant advection field having shape (2, m, n), where out[0, :, :]
contains the x-components of the motion vectors and out[1, :, :]
contains the y-components.
out: xarray.Dataset
The input dataset with the constant advection field added in the ``velocity_x``
and ``velocity_y`` data variables.
"""
dataset = dataset.copy(deep=True)
precip_var = dataset.attrs["precip_var"]
R = dataset[precip_var].values
m, n = R.shape[1:]
X, Y = np.meshgrid(np.arange(n), np.arange(m))

Expand All @@ -51,4 +56,7 @@ def f(v):
options = {"initial_simplex": (np.array([(0, 1), (1, 0), (1, 1)]))}
result = op.minimize(f, (1, 1), method="Nelder-Mead", options=options)

return np.stack([-result.x[0] * np.ones((m, n)), -result.x[1] * np.ones((m, n))])
output = np.stack([-result.x[0] * np.ones((m, n)), -result.x[1] * np.ones((m, n))])
dataset["velocity_x"] = (["y", "x"], output[0])
dataset["velocity_y"] = (["y", "x"], output[1])
return dataset
Loading
Loading