-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow Variable type as dim argument to concat #8384
Conversation
Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @maresb !
In order to test this, would you be up for finding a test which passes a Variable, and adding a -> None
to its return type? Then mypy will check the types. (and mypy should raise an error on the old code).
Thanks @max-sixty for the response! If I understand what you mean, then this test already meets your criteria except for mypy erroring on old code: [EDIT: See code snipped from the comment below because I selected the wrong function.] So now the mystery is why mypy allows this? By replacing the annotations which I modified in this PR with In contrast, if I
among other errors, which is more logical to me. Any idea what's going on? |
That is odd! BTW I think the function that tests this is the one above: xarray/xarray/tests/test_concat.py Lines 898 to 903 in d40609a
I had a look through — it's possible that I tried removing the overload and using diff --git a/xarray/core/concat.py b/xarray/core/concat.py
index a136480b..acbd8c80 100644
--- a/xarray/core/concat.py
+++ b/xarray/core/concat.py
@@ -1,7 +1,7 @@
from __future__ import annotations
from collections.abc import Hashable, Iterable
-from typing import TYPE_CHECKING, Any, Union, overload
+from typing import TYPE_CHECKING, Any, Iterator, Sequence, Union, overload
import numpy as np
import pandas as pd
@@ -16,7 +16,7 @@
merge_attrs,
merge_collected,
)
-from xarray.core.types import T_DataArray, T_Dataset
+from xarray.core.types import T_DataArray, T_Dataset, T_Xarray
from xarray.core.variable import Variable
from xarray.core.variable import concat as concat_vars
@@ -31,47 +31,17 @@
T_DataVars = Union[ConcatOptions, Iterable[Hashable]]
-@overload
def concat(
- objs: Iterable[T_Dataset],
+ objs: Iterable[T_Xarray],
dim: Hashable | T_DataArray | pd.Index,
data_vars: T_DataVars = "all",
- coords: ConcatOptions | list[Hashable] = "different",
- compat: CompatOptions = "equals",
- positions: Iterable[Iterable[int]] | None = None,
- fill_value: object = dtypes.NA,
- join: JoinOptions = "outer",
- combine_attrs: CombineAttrsOptions = "override",
-) -> T_Dataset:
- ...
-
-
-@overload
-def concat(
- objs: Iterable[T_DataArray],
- dim: Hashable | T_DataArray | pd.Index,
- data_vars: T_DataVars = "all",
- coords: ConcatOptions | list[Hashable] = "different",
- compat: CompatOptions = "equals",
- positions: Iterable[Iterable[int]] | None = None,
- fill_value: object = dtypes.NA,
- join: JoinOptions = "outer",
- combine_attrs: CombineAttrsOptions = "override",
-) -> T_DataArray:
- ...
-
-
-def concat(
- objs,
- dim,
- data_vars: T_DataVars = "all",
coords="different",
compat: CompatOptions = "equals",
positions=None,
fill_value=dtypes.NA,
join: JoinOptions = "outer",
combine_attrs: CombineAttrsOptions = "override",
-):
+) -> T_Xarray:
"""Concatenate xarray objects along a new or existing dimension.
Parameters
@@ -449,7 +419,7 @@ def _parse_datasets(
def _dataset_concat(
- datasets: list[T_Dataset],
+ datasets: Iterable[T_Dataset],
dim: str | T_DataArray | pd.Index,
data_vars: T_DataVars,
coords: str | list[str],
diff --git a/xarray/plot/dataarray_plot.py b/xarray/plot/dataarray_plot.py
index 61f2014f..bdb97cac 100644
--- a/xarray/plot/dataarray_plot.py
+++ b/xarray/plot/dataarray_plot.py
@@ -193,8 +193,9 @@ def _prepare_plot1d_data(
for v in ["z", "x"]:
dim = coords_to_plot.get(v, None)
if (dim is not None) and (dim in darray.dims):
- darray_nan = np.nan * darray.isel({dim: -1})
- darray = concat([darray, darray_nan], dim=dim)
+ darray_nan: T_DataArray = np.nan * darray.isel({dim: -1})
+ arrays: list[T_DataArray] = [darray, darray_nan]
+ darray = concat(arrays, dim=dim)
dims_T.append(coords_to_plot[v])
# Lines should never connect to the same coordinate when stacked, Regardless, let's merge, thanks for the change! |
Thanks @max-sixty for the merge! I had a look at your diff and respective mypy error in First of all, one easy fix for darray_nan = np.nan * darray.isel({dim: -1}) which gives a end_of_v = darray.isel({dim: -1})
darray_nan = end_of_v.copy(data=np.full_like(darray.data, np.nan)) Then the error is that In the TypeVar docs, it states:
...
...
Looking at the definition Line 162 in f63ede9
we see that Thus according to the docs, One almost-solution is to switch from Lines 164 to 167 in f63ede9
But this would be bad since the (first) In conclusion, the minimally-disruptive solution appears to be to keep the The more obvious solution would be to not use |
That's a very impressive explanation @maresb — thank you!
We would like to support them better, so I we should try and move towards
So it sounds like this problem is intractable — that we can't use
"retain their concrete type through their methods" is important — IIUC, if we use |
In my current understanding this is correct. (I'm tenacious but no expert in the internals of Python typing.)
Not with
Not with
I don't understand exactly what you mean with your def sel_x(data: T_DataArrayOrSet) -> T_DataArrayOrSet:
return data.sel(x=2) It seems that there is some level of type "evaluation" in mypy since Ugliness of |
The disadvantage of That example is a good one — though if we typed ds = ds.sel(x=2) ...because the variable would be a
Yes, I think you're correct! i.e. in place of making things generic, we could just write out all the concrete types, almost like manual C++ templating.. |
But semantically def identity(data: T_DataArrayOrSet) -> T_DataArrayOrSet:
return data
class DatasetSubclass(Dataset):
pass
ds = DatasetSubclass({"x": 0})
ds2 = identity(ds)
reveal_type(ds2) # DatasetSubclass It's just that for whatever reason mypy is choking on the combination of As far as I can tell, the only fundamental reason to avoid |
OK, that's v helpful — I was wrong in my assumptions about these two constructions — they're actually different:
It's only the
So I think that's correct. And not exactly what the comments above the types in our code state (!)
Yes, I think that's also correct (which makes the somewhat misguided comments to use
I'd be very open to this if it worked out overall. OTOH I would not favor having Thanks a lot for the feedback @maresb ! Lmk if there are some code or doc / comment changes we could make from this... |
Given the following excerpt from the
xr.concat()
source codexarray/xarray/core/concat.py
Lines 475 to 480 in bb489fa
it seems like it's explicitly intended that the
dim=
argument can be of typexr.Variable
. However, it's not indicated as such in the type hints or the documentation. This leads to a type error in pyright when aVariable
type is used fordim
.I'm submitting this PR to fix this apparent shortcoming. Or have I overlooked some reason why this should not be the case? Thanks for your consideration!