-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable datatree * dataset commutativity #7497
Changes from all commits
7c6fa70
5ef43be
d184764
d986df3
d2e8ec3
08ff5c4
1401ca5
b153152
c5b8d10
62b5e27
ffa53c4
a8f752d
eed3a71
3d3c29f
74fea3a
95d76e6
caafe90
462e0b3
91c6ee1
bc6a538
3baf79e
667d5cd
dfe763b
d231055
f61e3af
87f5e25
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -41,4 +41,5 @@ dependencies: | |
- sparse | ||
- toolz | ||
- typing_extensions | ||
- xarray-datatree | ||
- zarr |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -45,4 +45,5 @@ dependencies: | |
# - sparse | ||
- toolz | ||
- typing_extensions | ||
- xarray-datatree | ||
- zarr |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -41,4 +41,5 @@ dependencies: | |
# - sparse | ||
- toolz | ||
- typing_extensions | ||
- xarray-datatree | ||
- zarr |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -41,4 +41,5 @@ dependencies: | |
- sparse | ||
- toolz | ||
- typing_extensions | ||
- xarray-datatree | ||
- zarr |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -156,6 +156,9 @@ to the original netCDF file, regardless if they exist in the original dataset. | |
Groups | ||
~~~~~~ | ||
|
||
Single groups as datasets | ||
......................... | ||
|
||
NetCDF groups are not supported as part of the :py:class:`Dataset` data model. | ||
Instead, groups can be loaded individually as Dataset objects. | ||
To do so, pass a ``group`` keyword argument to the | ||
|
@@ -228,10 +231,34 @@ Either of these groups can be loaded from the file as an independent :py:class:` | |
Data variables: | ||
b int64 ... | ||
|
||
.. note:: | ||
.. _io.netcdf_datatree_groups: | ||
|
||
Multiple Groups as a DataTree | ||
............................. | ||
|
||
For native handling of multiple groups with xarray, including I/O, you might be interested in the experimental | ||
`xarray-datatree <https://github.com/xarray-contrib/datatree>`_ package. | ||
If installed, this package's API can be imported directly from xarray, i.e. ``from xarray import DataTree``. | ||
|
||
Whilst netCDF groups can only be loaded individually as Dataset objects, a whole file of many nested groups can be loaded | ||
as a single :py:class:`DataTree` object. | ||
To open a whole netCDF file as a tree of groups use the :py:func:`open_datatree()` function. | ||
To save a DataTree object as a netCDF file containing many groups, use the :py:meth:`DataTree.to_netcdf()`` method. | ||
|
||
.. _netcdf.group.warning: | ||
|
||
.. warning:: | ||
``DataTree`` objects do not follow the exact same data model as netCDF files, which means that perfect round-tripping | ||
is not always possible. | ||
|
||
In particular in the netCDF data model dimensions are entities that can exist regardless of whether any variable possesses them. | ||
This is in contrast to `xarray's data model <https://docs.xarray.dev/en/stable/user-guide/data-structures.html>`_ | ||
(and hence `datatree's data model <https://xarray-datatree.readthedocs.io/en/latest/data-structures.html>`_) in which the dimensions of a (Dataset/Tree) | ||
object are simply the set of dimensions present across all variables in that dataset. | ||
|
||
For native handling of multiple groups with xarray, including I/O, you might be interested in the experimental | ||
`xarray-datatree <https://github.com/xarray-contrib/datatree>`_ package. | ||
This means that if a netCDF file contains dimensions but no variables which possess those dimensions, | ||
these dimensions will not be present when that file is opened as a DataTree object. | ||
Saving this DataTree object to file will therefore not preserve these "unused" dimensions. | ||
|
||
|
||
.. _io.encoding: | ||
|
@@ -633,6 +660,21 @@ To read back a zarr dataset that has been created this way, we use the | |
ds_zarr = xr.open_zarr("path/to/directory.zarr") | ||
ds_zarr | ||
|
||
Groups | ||
~~~~~~ | ||
|
||
Like for netCDF, zarr groups can either be opened as individual :py:class:`Dataset` objects using the ``group`` keyword argument to :py:func:`open_dataset`, | ||
or alternatively nested groups in zarr stores can be represented by loading the store as a :py:class:`DataTree` object. | ||
(The latter option requires that you have the `xarray-datatree <https://github.com/xarray-contrib/datatree>`_ package installed.) | ||
|
||
To open a whole zarr store as a tree of groups use the :py:func:`open_datatree()` function. | ||
To save a DataTree object as a zarr store containing many groups, use the :py:meth:`DataTree.to_zarr()` method. | ||
|
||
.. note:: | ||
Note that perfect round-tripping should always be possible with a zarr store (:ref:`unlike for netCDF files<netcdf.group.warning>`), | ||
as zarr does not support "unused" dimensions. | ||
Comment on lines
+673
to
+675
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would just leave this note under the netCDF section -- only people concerned about netCDF need to worry about this |
||
|
||
|
||
Cloud Storage Buckets | ||
~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
|
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -3656,6 +3656,48 @@ def reduce( | |||||||||||||||||||||
var = self.variable.reduce(func, dim, axis, keep_attrs, keepdims, **kwargs) | ||||||||||||||||||||||
return self._replace_maybe_drop_dims(var) | ||||||||||||||||||||||
|
||||||||||||||||||||||
def to_datatree(self, node_name: str | None = None, name: str | None = None): | ||||||||||||||||||||||
""" | ||||||||||||||||||||||
Convert this dataarray into a datatree.DataTree. | ||||||||||||||||||||||
|
||||||||||||||||||||||
WARNING: The DataTree structure is considered experimental, | ||||||||||||||||||||||
and the API is less solidified than for other xarray features. | ||||||||||||||||||||||
|
||||||||||||||||||||||
The returned tree will only consist of a single node. | ||||||||||||||||||||||
That node will contain a copy of the dataarray's data, | ||||||||||||||||||||||
meaning including its coordinates, dimensions and attributes. | ||||||||||||||||||||||
|
||||||||||||||||||||||
Requires the xarray-datatree package to be installed. | ||||||||||||||||||||||
Find it at https://github.com/xarray-contrib/datatree. | ||||||||||||||||||||||
|
||||||||||||||||||||||
Parameters | ||||||||||||||||||||||
---------- | ||||||||||||||||||||||
node_name: str, optional | ||||||||||||||||||||||
The name of the datatree node created. | ||||||||||||||||||||||
name: str, optional | ||||||||||||||||||||||
Name to substitute for this array's name. | ||||||||||||||||||||||
|
||||||||||||||||||||||
Returns | ||||||||||||||||||||||
------- | ||||||||||||||||||||||
dt : DataTree | ||||||||||||||||||||||
A single-node datatree object, containing the information from this dataarray. | ||||||||||||||||||||||
|
||||||||||||||||||||||
See Also | ||||||||||||||||||||||
-------- | ||||||||||||||||||||||
datatree.DataTree | ||||||||||||||||||||||
""" | ||||||||||||||||||||||
|
||||||||||||||||||||||
try: | ||||||||||||||||||||||
from datatree import DataTree | ||||||||||||||||||||||
except ImportError: | ||||||||||||||||||||||
raise ImportError( | ||||||||||||||||||||||
"Could not import the datatree package. " | ||||||||||||||||||||||
"Find it at https://github.com/xarray-contrib/datatree" | ||||||||||||||||||||||
) | ||||||||||||||||||||||
|
||||||||||||||||||||||
Comment on lines
+3692
to
+3697
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Python will give more informative error message ("The above exception was the direct cause of the following exception") if you use
Suggested change
|
||||||||||||||||||||||
ds = self.to_dataset(name=name) | ||||||||||||||||||||||
return DataTree(data=ds, name=node_name) | ||||||||||||||||||||||
|
||||||||||||||||||||||
def to_pandas(self) -> DataArray | pd.Series | pd.DataFrame: | ||||||||||||||||||||||
"""Convert this array into a pandas object with the same shape. | ||||||||||||||||||||||
|
||||||||||||||||||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6116,6 +6116,45 @@ def to_array( | |
|
||
return DataArray._construct_direct(variable, coords, name, indexes) | ||
|
||
def to_datatree(self, node_name: str | None = None): | ||
""" | ||
Convert this dataset into a datatree.DataTree. | ||
|
||
.. warning:: The DataTree structure is considered experimental, | ||
and the API is less solidified than for other xarray features. | ||
|
||
The returned tree will only consist of a single node. | ||
That node will contain a copy of the dataset's data, | ||
meaning all variables, coordinates, dimensions and attributes. | ||
|
||
Requires the xarray-datatree package to be installed. | ||
Find it at https://github.com/xarray-contrib/datatree. | ||
|
||
Parameters | ||
---------- | ||
node_name: str, optional | ||
The name of the datatree node created. | ||
|
||
Returns | ||
------- | ||
dt : DataTree | ||
A single-node datatree object, containing the information from this dataset. | ||
|
||
See Also | ||
-------- | ||
datatree.DataTree | ||
""" | ||
|
||
try: | ||
from datatree import DataTree | ||
except ImportError: | ||
raise ImportError( | ||
"Could not import the datatree package. " | ||
"Find it at https://github.com/xarray-contrib/datatree" | ||
) | ||
|
||
return DataTree(data=self, name=node_name) | ||
|
||
def _normalize_dim_order( | ||
self, dim_order: Sequence[Hashable] | None = None | ||
) -> dict[Hashable, int]: | ||
|
@@ -6589,6 +6628,14 @@ def _binary_op(self, other, f, reflexive=False, join=None) -> Dataset: | |
from xarray.core.dataarray import DataArray | ||
from xarray.core.groupby import GroupBy | ||
|
||
try: | ||
from datatree import DataTree | ||
|
||
if isinstance(other, DataTree): | ||
return NotImplemented | ||
except ImportError: | ||
pass | ||
Comment on lines
+6631
to
+6637
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider using |
||
|
||
if isinstance(other, GroupBy): | ||
return NotImplemented | ||
align_type = OPTIONS["arithmetic_join"] if join is None else join | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
import xarray.testing as xrt | ||
from xarray import Dataset | ||
from xarray.tests import requires_datatree | ||
|
||
|
||
@requires_datatree | ||
def test_import_datatree(): | ||
"""Just test importing datatree package from xarray-contrib repo""" | ||
from xarray import DataTree | ||
|
||
DataTree() | ||
|
||
|
||
@requires_datatree | ||
def test_to_datatree(): | ||
from xarray import DataTree | ||
|
||
ds = Dataset({"a": ("x", [1, 2, 3])}) | ||
dt = ds.to_datatree(node_name="group1") | ||
|
||
assert isinstance(dt, DataTree) | ||
assert dt.name == "group1" | ||
xrt.assert_identical(dt.to_dataset(), ds) | ||
|
||
da = ds["a"] | ||
dt = da.to_datatree(node_name="group1") | ||
|
||
assert isinstance(dt, DataTree) | ||
assert dt.name == "group1" | ||
xrt.assert_identical(dt["a"], da) | ||
|
||
|
||
@requires_datatree | ||
def test_binary_ops(): | ||
import datatree.testing as dtt | ||
|
||
from xarray import DataTree | ||
|
||
ds1 = Dataset({"a": [5], "b": [3]}) | ||
ds2 = Dataset({"x": [0.1, 0.2], "y": [10, 20]}) | ||
dt = DataTree(data=ds1) | ||
DataTree(name="subnode", data=ds2, parent=dt) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this line do anything? |
||
other_ds = Dataset({"z": ("z", [0.1, 0.2])}) | ||
|
||
expected = DataTree(data=ds1 * other_ds) | ||
DataTree(name="subnode", data=ds2 * other_ds, parent=expected) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same concern as above |
||
|
||
result = dt * other_ds | ||
dtt.assert_equal(result, expected) | ||
|
||
# This ordering won't work unless xarray.Dataset defers to DataTree. | ||
# See https://github.com/xarray-contrib/datatree/issues/146 | ||
Comment on lines
+51
to
+52
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Presumably this comment can be deleted now :) |
||
result = other_ds * dt | ||
dtt.assert_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have this same issue with round-tripping Dataset objects. Unless it's particularly more pronounced for Datatree objects, I would consider consolidating the discussion, maybe in a new section like "netCDF files that Xarray cannot represent."