Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable datatree * dataset commutativity #7497

Closed
wants to merge 26 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
7c6fa70
list datatree in public API
TomNicholas Jan 4, 2023
5ef43be
attempt to import datatree API on xarray import
TomNicholas Jan 4, 2023
d184764
incorporate datatree links into io docs on groups
TomNicholas Jan 4, 2023
d986df3
Merge branch 'main' into import_datatree
TomNicholas Jan 4, 2023
d2e8ec3
add Dataset.to_datatree() method
TomNicholas Jan 12, 2023
08ff5c4
Merge branch 'import_datatree' of https://github.com/TomNicholas/xarr…
TomNicholas Jan 12, 2023
1401ca5
Merge branch 'main' into import_datatree
TomNicholas Jan 25, 2023
b153152
Merge branch 'main' into import_datatree
TomNicholas Jan 27, 2023
c5b8d10
add test that DataTree class can be imported
TomNicholas Jan 31, 2023
62b5e27
add to every CI environment that also has flox
TomNicholas Jan 31, 2023
ffa53c4
also check we can import accessor
TomNicholas Feb 1, 2023
a8f752d
whatsnew
TomNicholas Feb 1, 2023
eed3a71
Merge branch 'import_datatree' of https://github.com/TomNicholas/xarr…
TomNicholas Feb 1, 2023
3d3c29f
Update to_node docstring
TomNicholas Feb 1, 2023
74fea3a
Merge branch 'main' into import_datatree
TomNicholas Feb 1, 2023
95d76e6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 1, 2023
caafe90
test .to_datatree method
TomNicholas Feb 1, 2023
462e0b3
Merge branch 'import_datatree' of https://github.com/TomNicholas/xarr…
TomNicholas Feb 1, 2023
91c6ee1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 1, 2023
bc6a538
fix datatree import
TomNicholas Feb 1, 2023
3baf79e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 1, 2023
667d5cd
protect my import from the exacting ruff linter
TomNicholas Feb 1, 2023
dfe763b
Merge branch 'import_datatree' of https://github.com/TomNicholas/xarr…
TomNicholas Feb 1, 2023
d231055
try installing datatree from main
TomNicholas Feb 1, 2023
f61e3af
return NotImplemented
TomNicholas Feb 1, 2023
87f5e25
test
TomNicholas Feb 1, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions ci/install-upstream-wheels.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ conda uninstall -y --force \
bottleneck \
sparse \
flox \
datatree \
h5netcdf \
xarray
# to limit the runtime of Upstream CI
Expand Down Expand Up @@ -47,5 +48,6 @@ python -m pip install \
git+https://github.com/intake/filesystem_spec \
git+https://github.com/SciTools/nc-time-axis \
git+https://github.com/xarray-contrib/flox \
git+https://github.com/xarray-contrib/xarray-datatree \
git+https://github.com/h5netcdf/h5netcdf
python -m pip install pytest-timeout
1 change: 1 addition & 0 deletions ci/requirements/all-but-dask.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,4 +41,5 @@ dependencies:
- sparse
- toolz
- typing_extensions
- xarray-datatree
- zarr
1 change: 1 addition & 0 deletions ci/requirements/environment-py311.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,4 +45,5 @@ dependencies:
# - sparse
- toolz
- typing_extensions
- xarray-datatree
- zarr
1 change: 1 addition & 0 deletions ci/requirements/environment-windows-py311.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,4 +41,5 @@ dependencies:
# - sparse
- toolz
- typing_extensions
- xarray-datatree
- zarr
1 change: 1 addition & 0 deletions ci/requirements/environment-windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,4 +41,5 @@ dependencies:
- sparse
- toolz
- typing_extensions
- xarray-datatree
- zarr
2 changes: 2 additions & 0 deletions ci/requirements/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,5 @@ dependencies:
- toolz
- typing_extensions
- zarr
- pip:
- git+https://github.com/xarray-contrib/datatree
14 changes: 14 additions & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1133,6 +1133,20 @@ used filetypes in the xarray universe.
backends.StoreBackendEntrypoint
backends.ZarrBackendEntrypoint

DataTree
========

Experimental API for handling nested groups of data.
Requires the `xarray-datatree package <https://github.com/xarray-contrib/datatree>`_ to be installed.
See the `datatree documentation <https://xarray-datatree.readthedocs.io/en/latest/>`_ for details.

.. autosummary::
:toctree: generated/

DataTree
open_datatree
register_datatree_accessor

Deprecated / Pending Deprecation
================================

Expand Down
48 changes: 45 additions & 3 deletions doc/user-guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,9 @@ to the original netCDF file, regardless if they exist in the original dataset.
Groups
~~~~~~

Single groups as datasets
.........................

NetCDF groups are not supported as part of the :py:class:`Dataset` data model.
Instead, groups can be loaded individually as Dataset objects.
To do so, pass a ``group`` keyword argument to the
Expand Down Expand Up @@ -228,10 +231,34 @@ Either of these groups can be loaded from the file as an independent :py:class:`
Data variables:
b int64 ...

.. note::
.. _io.netcdf_datatree_groups:

Multiple Groups as a DataTree
.............................

For native handling of multiple groups with xarray, including I/O, you might be interested in the experimental
`xarray-datatree <https://github.com/xarray-contrib/datatree>`_ package.
If installed, this package's API can be imported directly from xarray, i.e. ``from xarray import DataTree``.

Whilst netCDF groups can only be loaded individually as Dataset objects, a whole file of many nested groups can be loaded
as a single :py:class:`DataTree` object.
To open a whole netCDF file as a tree of groups use the :py:func:`open_datatree()` function.
To save a DataTree object as a netCDF file containing many groups, use the :py:meth:`DataTree.to_netcdf()`` method.

.. _netcdf.group.warning:

.. warning::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this same issue with round-tripping Dataset objects. Unless it's particularly more pronounced for Datatree objects, I would consider consolidating the discussion, maybe in a new section like "netCDF files that Xarray cannot represent."

``DataTree`` objects do not follow the exact same data model as netCDF files, which means that perfect round-tripping
is not always possible.

In particular in the netCDF data model dimensions are entities that can exist regardless of whether any variable possesses them.
This is in contrast to `xarray's data model <https://docs.xarray.dev/en/stable/user-guide/data-structures.html>`_
(and hence `datatree's data model <https://xarray-datatree.readthedocs.io/en/latest/data-structures.html>`_) in which the dimensions of a (Dataset/Tree)
object are simply the set of dimensions present across all variables in that dataset.

For native handling of multiple groups with xarray, including I/O, you might be interested in the experimental
`xarray-datatree <https://github.com/xarray-contrib/datatree>`_ package.
This means that if a netCDF file contains dimensions but no variables which possess those dimensions,
these dimensions will not be present when that file is opened as a DataTree object.
Saving this DataTree object to file will therefore not preserve these "unused" dimensions.


.. _io.encoding:
Expand Down Expand Up @@ -633,6 +660,21 @@ To read back a zarr dataset that has been created this way, we use the
ds_zarr = xr.open_zarr("path/to/directory.zarr")
ds_zarr

Groups
~~~~~~

Like for netCDF, zarr groups can either be opened as individual :py:class:`Dataset` objects using the ``group`` keyword argument to :py:func:`open_dataset`,
or alternatively nested groups in zarr stores can be represented by loading the store as a :py:class:`DataTree` object.
(The latter option requires that you have the `xarray-datatree <https://github.com/xarray-contrib/datatree>`_ package installed.)

To open a whole zarr store as a tree of groups use the :py:func:`open_datatree()` function.
To save a DataTree object as a zarr store containing many groups, use the :py:meth:`DataTree.to_zarr()` method.

.. note::
Note that perfect round-tripping should always be possible with a zarr store (:ref:`unlike for netCDF files<netcdf.group.warning>`),
as zarr does not support "unused" dimensions.
Comment on lines +673 to +675
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just leave this note under the netCDF section -- only people concerned about netCDF need to worry about this



Cloud Storage Buckets
~~~~~~~~~~~~~~~~~~~~~

Expand Down
9 changes: 9 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,15 @@ v2023.01.1 (unreleased)
New Features
~~~~~~~~~~~~

- Allow importing the prototype :py:class:`DataTree` class (as well as the accompanying :py:func:`open_datatree()` and :py:func:`register_datatree_accessor` functions).
Currently ``from xarray import DataTree`` disguises an import from a separate package ``xarray-contrib/xarray-datatree``.
Importing these features will raise an ``ImportError`` unless the datatree package is installed.
Full integration of the :py:class:`DataTree` class in xarray is planned in the future (see our development roadmap),
but for now is proceeding on a provisional basis, and as such the API is still experimental and subject to change without notice.
In the meantime, you are encouraged to try using these features, and please let us know about your experiences!
(:issue:`4118`, :pull:`7418`)
By `Tom Nicholas <https://github.com/TomNicholas>`_.


Breaking changes
~~~~~~~~~~~~~~~~
Expand Down
6 changes: 6 additions & 0 deletions xarray/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,12 @@
# Disable minimum version checks on downstream libraries.
__version__ = "999"

try:
from datatree import DataTree # noqa
except ImportError:
...


# A hardcoded __all__ variable is necessary to appease
# `mypy --strict` running in projects that import xarray.
__all__ = (
Expand Down
42 changes: 42 additions & 0 deletions xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -3656,6 +3656,48 @@ def reduce(
var = self.variable.reduce(func, dim, axis, keep_attrs, keepdims, **kwargs)
return self._replace_maybe_drop_dims(var)

def to_datatree(self, node_name: str | None = None, name: str | None = None):
"""
Convert this dataarray into a datatree.DataTree.

WARNING: The DataTree structure is considered experimental,
and the API is less solidified than for other xarray features.

The returned tree will only consist of a single node.
That node will contain a copy of the dataarray's data,
meaning including its coordinates, dimensions and attributes.

Requires the xarray-datatree package to be installed.
Find it at https://github.com/xarray-contrib/datatree.

Parameters
----------
node_name: str, optional
The name of the datatree node created.
name: str, optional
Name to substitute for this array's name.

Returns
-------
dt : DataTree
A single-node datatree object, containing the information from this dataarray.

See Also
--------
datatree.DataTree
"""

try:
from datatree import DataTree
except ImportError:
raise ImportError(
"Could not import the datatree package. "
"Find it at https://github.com/xarray-contrib/datatree"
)

Comment on lines +3692 to +3697
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python will give more informative error message ("The above exception was the direct cause of the following exception") if you use raise ... from ... here:

Suggested change
except ImportError:
raise ImportError(
"Could not import the datatree package. "
"Find it at https://github.com/xarray-contrib/datatree"
)
except ImportError as e:
raise ImportError(
"Could not import the datatree package. "
"Find it at https://github.com/xarray-contrib/datatree"
) from e

ds = self.to_dataset(name=name)
return DataTree(data=ds, name=node_name)

def to_pandas(self) -> DataArray | pd.Series | pd.DataFrame:
"""Convert this array into a pandas object with the same shape.

Expand Down
47 changes: 47 additions & 0 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -6116,6 +6116,45 @@ def to_array(

return DataArray._construct_direct(variable, coords, name, indexes)

def to_datatree(self, node_name: str | None = None):
"""
Convert this dataset into a datatree.DataTree.

.. warning:: The DataTree structure is considered experimental,
and the API is less solidified than for other xarray features.

The returned tree will only consist of a single node.
That node will contain a copy of the dataset's data,
meaning all variables, coordinates, dimensions and attributes.

Requires the xarray-datatree package to be installed.
Find it at https://github.com/xarray-contrib/datatree.

Parameters
----------
node_name: str, optional
The name of the datatree node created.

Returns
-------
dt : DataTree
A single-node datatree object, containing the information from this dataset.

See Also
--------
datatree.DataTree
"""

try:
from datatree import DataTree
except ImportError:
raise ImportError(
"Could not import the datatree package. "
"Find it at https://github.com/xarray-contrib/datatree"
)

return DataTree(data=self, name=node_name)

def _normalize_dim_order(
self, dim_order: Sequence[Hashable] | None = None
) -> dict[Hashable, int]:
Expand Down Expand Up @@ -6589,6 +6628,14 @@ def _binary_op(self, other, f, reflexive=False, join=None) -> Dataset:
from xarray.core.dataarray import DataArray
from xarray.core.groupby import GroupBy

try:
from datatree import DataTree

if isinstance(other, DataTree):
return NotImplemented
except ImportError:
pass
Comment on lines +6631 to +6637
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


if isinstance(other, GroupBy):
return NotImplemented
align_type = OPTIONS["arithmetic_join"] if join is None else join
Expand Down
5 changes: 5 additions & 0 deletions xarray/core/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,11 @@
from xarray.core.indexes import Index
from xarray.core.variable import Variable

try:
from datatree import DataTree as T_DataTree
except ImportError:
T_DataTree = Any

try:
from dask.array import Array as DaskArray
except ImportError:
Expand Down
1 change: 1 addition & 0 deletions xarray/tests/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ def _importorskip(
has_pint, requires_pint = _importorskip("pint")
has_numexpr, requires_numexpr = _importorskip("numexpr")
has_flox, requires_flox = _importorskip("flox")
has_datatree, requires_datatree = _importorskip("datatree")


# some special cases
Expand Down
54 changes: 54 additions & 0 deletions xarray/tests/test_datatree.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
import xarray.testing as xrt
from xarray import Dataset
from xarray.tests import requires_datatree


@requires_datatree
def test_import_datatree():
"""Just test importing datatree package from xarray-contrib repo"""
from xarray import DataTree

DataTree()


@requires_datatree
def test_to_datatree():
from xarray import DataTree

ds = Dataset({"a": ("x", [1, 2, 3])})
dt = ds.to_datatree(node_name="group1")

assert isinstance(dt, DataTree)
assert dt.name == "group1"
xrt.assert_identical(dt.to_dataset(), ds)

da = ds["a"]
dt = da.to_datatree(node_name="group1")

assert isinstance(dt, DataTree)
assert dt.name == "group1"
xrt.assert_identical(dt["a"], da)


@requires_datatree
def test_binary_ops():
import datatree.testing as dtt

from xarray import DataTree

ds1 = Dataset({"a": [5], "b": [3]})
ds2 = Dataset({"x": [0.1, 0.2], "y": [10, 20]})
dt = DataTree(data=ds1)
DataTree(name="subnode", data=ds2, parent=dt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this line do anything?

other_ds = Dataset({"z": ("z", [0.1, 0.2])})

expected = DataTree(data=ds1 * other_ds)
DataTree(name="subnode", data=ds2 * other_ds, parent=expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern as above


result = dt * other_ds
dtt.assert_equal(result, expected)

# This ordering won't work unless xarray.Dataset defers to DataTree.
# See https://github.com/xarray-contrib/datatree/issues/146
Comment on lines +51 to +52
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably this comment can be deleted now :)

result = other_ds * dt
dtt.assert_equal(result, expected)