Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Declare Dataset, DataArray, Variable, GroupBy unhashable #8392

Merged
merged 3 commits into from
Nov 9, 2023

Conversation

maresb
Copy link
Contributor

@maresb maresb commented Oct 29, 2023

Declare Dataset, DataArray, Variable, DatasetGroupBy, and DataArrayGroupBy unhashable by adding the following type declaration to each class's OpsMixin class:

__hash__: None  # type:ignore[assignment]

Background

This is a follow-up to #8384 (attn: @max-sixty). There I observed that pyright was objecting that Variable is not a valid type to the dim argument of concat. On the other hand mypy accepted this because mypy recognizes Variable as Hashable even though it is not.

The appropriate place to do declare Variable unhashable seems to be beside the definition of __eq__ in VariableOpsMixin since defining __eq__ without __hash__ is what renders an object unhashable.

For a reference to how types are marked as non-hashable, see https://github.com/python/typeshed/pull/3219/files.

After making this change, mypy reported the following additional error:

xarray/core/dataset.py: note: In member "swap_dims" of class "Dataset":
xarray/core/dataset.py:4415: error: Incompatible types in assignment (expression has type "Variable", variable has type "Hashable")  [assignment]
xarray/core/dataset.py:4415: note: Following member(s) of "Variable" have conflicts:
xarray/core/dataset.py:4415: note:     __hash__: expected "Callable[[], int]", got "None"
xarray/core/dataset.py:4416: error: "Hashable" has no attribute "dims"  [attr-defined]
xarray/core/dataset.py:4419: error: "Hashable" has no attribute "to_index_variable"  [attr-defined]
xarray/core/dataset.py:4430: error: "Hashable" has no attribute "to_base_variable"  [attr-defined]

This error was the result of a variable redefinition in Dataset.swap_dims. Giving the variables distinct (and more descriptive) names resolved the error.

@maresb maresb force-pushed the make-variable-unhashable branch from efbc043 to ec1b67e Compare October 29, 2023 22:01
@@ -632,6 +632,8 @@ def __eq__(self, other: VarCompatible) -> Self:
def __eq__(self, other: VarCompatible) -> Self | T_DataArray:
return self._binary_op(other, nputils.array_eq)

__hash__: None # type:ignore[assignment]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good for testing but FYI:

# This file was generated using xarray.util.generate_ops. Do not edit manually.

@max-sixty
Copy link
Collaborator

This looks good! CC @headtr1ck & @Illviljan . I have a faint memory of issues with Hashable, but it may well be on a completely different piece of code.

We'll need to move the __hash__: None # type:ignore[assignment] into the script that generates the file...

@maresb
Copy link
Contributor Author

maresb commented Oct 29, 2023

I'm done for the day, and I'm not sure when I can come back to this, so if someone else is motivated to update the script and get this done, then please don't wait for me.

@maresb maresb force-pushed the make-variable-unhashable branch 3 times, most recently from 2bc7ef6 to 81bde70 Compare November 9, 2023 19:51
The previous commit revealed the following mypy error:

xarray/core/dataset.py: note: In member "swap_dims" of class "Dataset":
xarray/core/dataset.py:4415: error: Incompatible types in assignment (expression has type "Variable", variable has type "Hashable")  [assignment]
xarray/core/dataset.py:4415: note: Following member(s) of "Variable" have conflicts:
xarray/core/dataset.py:4415: note:     __hash__: expected "Callable[[], int]", got "None"
xarray/core/dataset.py:4416: error: "Hashable" has no attribute "dims"  [attr-defined]
xarray/core/dataset.py:4419: error: "Hashable" has no attribute "to_index_variable"  [attr-defined]
xarray/core/dataset.py:4430: error: "Hashable" has no attribute "to_base_variable"  [attr-defined]
@maresb maresb force-pushed the make-variable-unhashable branch from 81bde70 to f28b735 Compare November 9, 2023 19:58
@maresb maresb changed the title Declare xr.Variable unhashable Declare Dataset, DataArray, Variable, GroupBy unhashable Nov 9, 2023
@maresb
Copy link
Contributor Author

maresb commented Nov 9, 2023

@max-sixty, I think I got it via the template. Ready for another round of review.

I verified that the mypy diff from the regenerate commit is empty.

var.dims = dims
variables[k] = var
variables[current_name] = var
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, these are good renames!

@max-sixty max-sixty merged commit 15328b6 into pydata:main Nov 9, 2023
20 of 28 checks passed
@max-sixty
Copy link
Collaborator

Thank you @maresb !

@maresb maresb deleted the make-variable-unhashable branch November 9, 2023 21:57
dcherian added a commit to rabernat/xarray that referenced this pull request Nov 29, 2023
* main:
  [skip-ci] dev whats-new (pydata#8467)
  2023.11.0 Whats-new (pydata#8461)
  migrate the other CI to python 3.11 (pydata#8416)
  preserve vlen string dtypes, allow vlen string fill_values (pydata#7869)
  Pin mypy < 1.7 (pydata#8458)
  Fix typos found by codespell (pydata#8457)
  [skip-ci] Small updates to IO docs. (pydata#8452)
  Deprecate certain cftime frequency strings following pandas (pydata#8415)
  Added driver parameter for h5netcdf (pydata#8360)
  Raise exception in to_dataset if resulting variable is also the name of a coordinate (pydata#8433)
  Automatic region detection and transpose for `to_zarr()` (pydata#8434)
  remove `cdms2` (pydata#8441)
  Remove PseudoNetCDF (pydata#8446)
  Pin pint to >=0.22 (pydata#8445)
  Remove keep_attrs from resample signature (pydata#8444)
  Rename `to_array` to `to_dataarray` (pydata#8438)
  Add missing DataArray.dt.total_seconds() method (pydata#8435)
  Declare Dataset, DataArray, Variable, GroupBy unhashable (pydata#8392)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants