` for a guide on how to use these strategies.
+
+.. warning::
+ These strategies should be considered highly experimental, and liable to change at any time.
+
+.. autosummary::
+ :toctree: generated/
+
+ testing.strategies.supported_dtypes
+ testing.strategies.names
+ testing.strategies.dimension_names
+ testing.strategies.dimension_sizes
+ testing.strategies.attrs
+ testing.strategies.variables
+ testing.strategies.unique_subset_of
+
Exceptions
==========
@@ -1083,12 +1107,14 @@ Advanced API
.. autosummary::
:toctree: generated/
+ Coordinates
Dataset.variables
DataArray.variable
Variable
IndexVariable
as_variable
- indexes.Index
+ Index
+ IndexSelResult
Context
register_dataset_accessor
register_dataarray_accessor
@@ -1096,6 +1122,7 @@ Advanced API
backends.BackendArray
backends.BackendEntrypoint
backends.list_engines
+ backends.refresh_engines
Default, pandas-backed indexes built-in Xarray:
@@ -1111,7 +1138,6 @@ arguments for the ``load_store`` and ``dump_to_store`` Dataset methods:
backends.NetCDF4DataStore
backends.H5NetCDFStore
- backends.PseudoNetCDFDataStore
backends.PydapDataStore
backends.ScipyDataStore
backends.ZarrStore
@@ -1127,7 +1153,6 @@ used filetypes in the xarray universe.
backends.NetCDF4BackendEntrypoint
backends.H5netcdfBackendEntrypoint
- backends.PseudoNetCDFBackendEntrypoint
backends.PydapBackendEntrypoint
backends.ScipyBackendEntrypoint
backends.StoreBackendEntrypoint
diff --git a/doc/conf.py b/doc/conf.py
index 0b6c6766c3b..152eb6794b4 100644
--- a/doc/conf.py
+++ b/doc/conf.py
@@ -49,25 +49,16 @@
matplotlib.use("Agg")
-try:
- import rasterio # noqa: F401
-except ImportError:
- allowed_failures.update(
- ["gallery/plot_rasterio_rgb.py", "gallery/plot_rasterio.py"]
- )
-
try:
import cartopy # noqa: F401
except ImportError:
allowed_failures.update(
[
"gallery/plot_cartopy_facetgrid.py",
- "gallery/plot_rasterio_rgb.py",
- "gallery/plot_rasterio.py",
]
)
-nbsphinx_allow_errors = True
+nbsphinx_allow_errors = False
# -- General configuration ------------------------------------------------
@@ -93,6 +84,7 @@
"sphinx_copybutton",
"sphinxext.rediraffe",
"sphinx_design",
+ "sphinx_inline_tabs",
]
@@ -239,6 +231,7 @@
# canonical_url="",
repository_url="https://github.com/pydata/xarray",
repository_branch="main",
+ navigation_with_keys=False, # pydata/pydata-sphinx-theme#1492
path_to_docs="doc",
use_edit_page_button=True,
use_repository_button=True,
@@ -247,19 +240,20 @@
extra_footer="""Xarray is a fiscally sponsored project of NumFOCUS,
a nonprofit dedicated to supporting the open-source scientific computing community.
Theme by the Executable Book Project
""",
- twitter_url="https://twitter.com/xarray_devs",
+ twitter_url="https://twitter.com/xarray_dev",
icon_links=[], # workaround for pydata/pydata-sphinx-theme#1220
+ announcement="🍾 Xarray is now 10 years old! 🎉",
)
# The name of an image file (relative to this directory) to place at the top
# of the sidebar.
-html_logo = "_static/dataset-diagram-logo.png"
+html_logo = "_static/logos/Xarray_Logo_RGB_Final.svg"
# The name of an image file (within the static path) to use as favicon of the
# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
# pixels large.
-html_favicon = "_static/favicon.ico"
+html_favicon = "_static/logos/Xarray_Icon_Final.svg"
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
@@ -270,11 +264,11 @@
# configuration for sphinxext.opengraph
ogp_site_url = "https://docs.xarray.dev/en/latest/"
-ogp_image = "https://docs.xarray.dev/en/stable/_static/dataset-diagram-logo.png"
+ogp_image = "https://docs.xarray.dev/en/stable/_static/logos/Xarray_Logo_RGB_Final.png"
ogp_custom_meta_tags = [
'',
'',
- '',
+ '',
]
# Redirects for pages that were moved to new locations
@@ -322,17 +316,22 @@
# Example configuration for intersphinx: refer to the Python standard library.
intersphinx_mapping = {
- "python": ("https://docs.python.org/3/", None),
- "pandas": ("https://pandas.pydata.org/pandas-docs/stable", None),
+ "cftime": ("https://unidata.github.io/cftime", None),
+ "cubed": ("https://cubed-dev.github.io/cubed/", None),
+ "dask": ("https://docs.dask.org/en/latest", None),
+ "datatree": ("https://xarray-datatree.readthedocs.io/en/latest/", None),
+ "flox": ("https://flox.readthedocs.io/en/latest/", None),
+ "hypothesis": ("https://hypothesis.readthedocs.io/en/latest/", None),
"iris": ("https://scitools-iris.readthedocs.io/en/latest", None),
+ "matplotlib": ("https://matplotlib.org/stable/", None),
+ "numba": ("https://numba.readthedocs.io/en/stable/", None),
"numpy": ("https://numpy.org/doc/stable", None),
+ "pandas": ("https://pandas.pydata.org/pandas-docs/stable", None),
+ "python": ("https://docs.python.org/3/", None),
"scipy": ("https://docs.scipy.org/doc/scipy", None),
- "numba": ("https://numba.readthedocs.io/en/stable/", None),
- "matplotlib": ("https://matplotlib.org/stable/", None),
- "dask": ("https://docs.dask.org/en/latest", None),
- "cftime": ("https://unidata.github.io/cftime", None),
- "rasterio": ("https://rasterio.readthedocs.io/en/latest", None),
"sparse": ("https://sparse.pydata.org/en/latest/", None),
+ "xarray-tutorial": ("https://tutorial.xarray.dev/", None),
+ "zarr": ("https://zarr.readthedocs.io/en/latest/", None),
}
diff --git a/doc/contributing.rst b/doc/contributing.rst
index 07938f23c9f..c3dc484f4c1 100644
--- a/doc/contributing.rst
+++ b/doc/contributing.rst
@@ -4,29 +4,46 @@
Contributing to xarray
**********************
-
.. note::
Large parts of this document came from the `Pandas Contributing
Guide `_.
+Overview
+========
+
+We welcome your skills and enthusiasm at the xarray project!. There are numerous opportunities to
+contribute beyond just writing code.
+All contributions, including bug reports, bug fixes, documentation improvements, enhancement suggestions,
+and other ideas are welcome.
+
+If you have any questions on the process or how to fix something feel free to ask us!
+The recommended place to ask a question is on `GitHub Discussions `_
+, but we also have a `Discord `_ and a
+`mailing list `_. There is also a
+`"python-xarray" tag on Stack Overflow `_ which we monitor for questions.
+
+We also have a biweekly community call, details of which are announced on the
+`Developers meeting `_.
+You are very welcome to join! Though we would love to hear from you, there is no expectation to
+contribute during the meeting either - you are always welcome to just sit in and listen.
+
+This project is a community effort, and everyone is welcome to contribute. Everyone within the community
+is expected to abide by our `code of conduct `_.
+
Where to start?
===============
-All contributions, bug reports, bug fixes, documentation improvements,
-enhancements, and ideas are welcome.
-
If you are brand new to *xarray* or open-source development, we recommend going
through the `GitHub "issues" tab `_
-to find issues that interest you. There are a number of issues listed under
-`Documentation `_
+to find issues that interest you.
+Some issues are particularly suited for new contributors by the label `Documentation `_
and `good first issue
-`_
-where you could start out. Once you've found an interesting issue, you can
-return here to get your development environment setup.
+`_ where you could start out.
+These are well documented issues, that do not require a deep understanding of the internals of xarray.
-Feel free to ask questions on the `mailing list
-`_.
+Once you've found an interesting issue, you can return here to get your development environment setup.
+The xarray project does not assign issues. Issues are "assigned" by opening a Pull Request(PR).
.. _contributing.bug_reports:
@@ -34,15 +51,20 @@ Bug reports and enhancement requests
====================================
Bug reports are an important part of making *xarray* more stable. Having a complete bug
-report will allow others to reproduce the bug and provide insight into fixing. See
-this `stackoverflow article for tips on
-writing a good bug report `_ .
+report will allow others to reproduce the bug and provide insight into fixing.
Trying out the bug-producing code on the *main* branch is often a worthwhile exercise
to confirm that the bug still exists. It is also worth searching existing bug reports and
pull requests to see if the issue has already been reported and/or fixed.
-Bug reports must:
+Submitting a bug report
+-----------------------
+
+If you find a bug in the code or documentation, do not hesitate to submit a ticket to the
+`Issue Tracker `_.
+You are also welcome to post feature requests or pull requests.
+
+If you are reporting a bug, please use the provided template which includes the following:
#. Include a short, self-contained Python snippet reproducing the problem.
You can format the code nicely by using `GitHub Flavored Markdown
@@ -67,13 +89,12 @@ Bug reports must:
#. Explain why the current behavior is wrong/not desired and what you expect instead.
-The issue will then show up to the *xarray* community and be open to comments/ideas
-from others.
+The issue will then show up to the *xarray* community and be open to comments/ideas from others.
-.. _contributing.github:
+See this `stackoverflow article for tips on writing a good bug report `_ .
-Working with the code
-=====================
+
+.. _contributing.github:
Now that you have an issue you want to fix, enhancement to add, or documentation
to improve, you need to learn how to work with GitHub and the *xarray* code base.
@@ -81,12 +102,7 @@ to improve, you need to learn how to work with GitHub and the *xarray* code base
.. _contributing.version_control:
Version control, Git, and GitHub
---------------------------------
-
-To the new user, working with Git is one of the more daunting aspects of contributing
-to *xarray*. It can very quickly become overwhelming, but sticking to the guidelines
-below will help keep the process straightforward and mostly trouble free. As always,
-if you are having difficulties please feel free to ask for help.
+================================
The code is hosted on `GitHub `_. To
contribute you will need to sign up for a `free GitHub account
@@ -112,41 +128,41 @@ you can work seamlessly between your local repository and GitHub.
but contributors who are new to git may find it easier to use other tools instead such as
`Github Desktop `_.
-.. _contributing.forking:
+Development workflow
+====================
+
+To keep your work well organized, with readable history, and in turn make it easier for project
+maintainers to see what you've done, and why you did it, we recommend you to follow workflow:
-Forking
--------
+1. `Create an account `_ on GitHub if you do not already have one.
-You will need your own fork to work on the code. Go to the `xarray project
-page `_ and hit the ``Fork`` button. You will
-want to clone your fork to your machine::
+2. You will need your own fork to work on the code. Go to the `xarray project
+ page `_ and hit the ``Fork`` button near the top of the page.
+ This creates a copy of the code under your account on the GitHub server.
+
+3. Clone your fork to your machine::
git clone https://github.com/your-user-name/xarray.git
cd xarray
git remote add upstream https://github.com/pydata/xarray.git
-This creates the directory `xarray` and connects your repository to
-the upstream (main project) *xarray* repository.
-
-Creating a branch
------------------
-
-You want your ``main`` branch to reflect only production-ready code, so create a
-feature branch before making your changes. For example::
+ This creates the directory `xarray` and connects your repository to
+ the upstream (main project) *xarray* repository.
- git branch shiny-new-feature
- git checkout shiny-new-feature
+Creating a development environment
+----------------------------------
-The above can be simplified to::
+To test out code changes locally, you'll need to build *xarray* from source, which requires you to
+`create a local development environment `_.
- git checkout -b shiny-new-feature
+Update the ``main`` branch
+--------------------------
-This changes your working directory to the shiny-new-feature branch. Keep any
-changes in this branch specific to one bug or feature so it is clear
-what the branch brings to *xarray*. You can have many "shiny-new-features"
-and switch in between them using the ``git checkout`` command.
+First make sure you have followed `Setting up xarray for development
+`_
-To update this branch, you need to retrieve the changes from the ``main`` branch::
+Before starting a new set of changes, fetch all changes from ``upstream/main``, and start a new
+feature branch from that. From time to time you should fetch the upstream changes from GitHub: ::
git fetch upstream
git merge upstream/main
@@ -157,10 +173,83 @@ request. If you have uncommitted changes, you will need to ``git stash`` them
prior to updating. This will effectively store your changes, which can be
reapplied after updating.
+Create a new feature branch
+---------------------------
+
+Create a branch to save your changes, even before you start making changes. You want your
+``main branch`` to contain only production-ready code::
+
+ git checkout -b shiny-new-feature
+
+This changes your working directory to the ``shiny-new-feature`` branch. Keep any changes in this
+branch specific to one bug or feature so it is clear what the branch brings to *xarray*. You can have
+many "shiny-new-features" and switch in between them using the ``git checkout`` command.
+
+Generally, you will want to keep your feature branches on your public GitHub fork of xarray. To do this,
+you ``git push`` this new branch up to your GitHub repo. Generally (if you followed the instructions in
+these pages, and by default), git will have a link to your fork of the GitHub repo, called ``origin``.
+You push up to your own fork with: ::
+
+ git push origin shiny-new-feature
+
+In git >= 1.7 you can ensure that the link is correctly set by using the ``--set-upstream`` option: ::
+
+ git push --set-upstream origin shiny-new-feature
+
+From now on git will know that ``shiny-new-feature`` is related to the ``shiny-new-feature branch`` in the GitHub repo.
+
+The editing workflow
+--------------------
+
+1. Make some changes
+
+2. See which files have changed with ``git status``. You'll see a listing like this one: ::
+
+ # On branch shiny-new-feature
+ # Changed but not updated:
+ # (use "git add ..." to update what will be committed)
+ # (use "git checkout -- ..." to discard changes in working directory)
+ #
+ # modified: README
+
+3. Check what the actual changes are with ``git diff``.
+
+4. Build the `documentation run `_
+for the documentation changes.
+
+`Run the test suite `_ for code changes.
+
+Commit and push your changes
+----------------------------
+
+1. To commit all modified files into the local copy of your repo, do ``git commit -am 'A commit message'``.
+
+2. To push the changes up to your forked repo on GitHub, do a ``git push``.
+
+Open a pull request
+-------------------
+
+When you're ready or need feedback on your code, open a Pull Request (PR) so that the xarray developers can
+give feedback and eventually include your suggested code into the ``main`` branch.
+`Pull requests (PRs) on GitHub `_
+are the mechanism for contributing to xarray's code and documentation.
+
+Enter a title for the set of changes with some explanation of what you've done.
+Follow the PR template, which looks like this. ::
+
+ [ ]Closes #xxxx
+ [ ]Tests added
+ [ ]User visible changes (including notable bug fixes) are documented in whats-new.rst
+ [ ]New functions/methods are listed in api.rst
+
+Mention anything you'd like particular attention for - such as a complicated change or some code you are not happy with.
+If you don't think your request is ready to be merged, just say so in your pull request message and use
+the "Draft PR" feature of GitHub. This is a good way of getting some preliminary code review.
+
.. _contributing.dev_env:
Creating a development environment
-----------------------------------
+==================================
To test out code changes locally, you'll need to build *xarray* from source, which
requires a Python environment. If you're making documentation changes, you can
@@ -182,7 +271,7 @@ documentation locally before pushing your changes.
.. _contributing.dev_python:
Creating a Python Environment
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+-----------------------------
Before starting any development, you'll need to create an isolated xarray
development environment:
@@ -240,6 +329,22 @@ To return to your root environment::
See the full `conda docs here `__.
+Install pre-commit hooks
+------------------------
+
+We highly recommend that you setup `pre-commit `_ hooks to automatically
+run all the above tools every time you make a git commit. To install the hooks::
+
+ python -m pip install pre-commit
+ pre-commit install
+
+This can be done by running: ::
+
+ pre-commit run
+
+from the root of the xarray repository. You can skip the pre-commit checks with
+``git commit --no-verify``.
+
.. _contributing.documentation:
Contributing to the documentation
@@ -363,6 +468,60 @@ If you want to do a full clean build, do::
make clean
make html
+Writing ReST pages
+------------------
+
+Most documentation is either in the docstrings of individual classes and methods, in explicit
+``.rst`` files, or in examples and tutorials. All of these use the
+`ReST `_ syntax and are processed by
+`Sphinx `_.
+
+This section contains additional information and conventions how ReST is used in the
+xarray documentation.
+
+Section formatting
+~~~~~~~~~~~~~~~~~~
+
+We aim to follow the recommendations from the
+`Python documentation `_
+and the `Sphinx reStructuredText documentation `_
+for section markup characters,
+
+- ``*`` with overline, for chapters
+
+- ``=``, for heading
+
+- ``-``, for sections
+
+- ``~``, for subsections
+
+- ``**`` text ``**``, for **bold** text
+
+Referring to other documents and sections
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+`Sphinx `_ allows internal
+`references `_ between documents.
+
+Documents can be linked with the ``:doc:`` directive:
+
+::
+
+ See the :doc:`/getting-started-guide/installing`
+
+ See the :doc:`/getting-started-guide/quick-overview`
+
+will render as:
+
+See the `Installation `_
+
+See the `Quick Overview `_
+
+Including figures and files
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Image files can be directly included in pages with the ``image::`` directive.
+
.. _contributing.code:
Contributing to the code base
@@ -490,9 +649,7 @@ Writing tests
All tests should go into the ``tests`` subdirectory of the specific package.
This folder contains many current examples of tests, and we suggest looking to these for
-inspiration. If your test requires working with files or
-network connectivity, there is more information on the `testing page
-`_ of the wiki.
+inspiration.
The ``xarray.testing`` module has many special ``assert`` functions that
make it easier to make statements about whether DataArray or Dataset objects are
@@ -513,8 +670,7 @@ typically find tests wrapped in a class.
.. code-block:: python
- class TestReallyCoolFeature:
- ...
+ class TestReallyCoolFeature: ...
Going forward, we are moving to a more *functional* style using the
`pytest `__ framework, which offers a richer
@@ -523,8 +679,7 @@ writing test classes, we will write test functions like this:
.. code-block:: python
- def test_really_cool_feature():
- ...
+ def test_really_cool_feature(): ...
Using ``pytest``
~~~~~~~~~~~~~~~~
@@ -672,17 +827,17 @@ Running the performance test suite
Performance matters and it is worth considering whether your code has introduced
performance regressions. *xarray* is starting to write a suite of benchmarking tests
-using `asv `__
+using `asv `__
to enable easy monitoring of the performance of critical *xarray* operations.
These benchmarks are all found in the ``xarray/asv_bench`` directory.
To use all features of asv, you will need either ``conda`` or
``virtualenv``. For more details please check the `asv installation
-webpage `_.
+webpage `_.
To install asv::
- pip install git+https://github.com/spacetelescope/asv
+ python -m pip install asv
If you need to run a benchmark, change your directory to ``asv_bench/`` and run::
@@ -912,7 +1067,7 @@ PR checklist
- Write new tests if needed. See `"Test-driven development/code writing" `_.
- Test the code using `Pytest `_. Running all tests (type ``pytest`` in the root directory) takes a while, so feel free to only run the tests you think are needed based on your PR (example: ``pytest xarray/tests/test_dataarray.py``). CI will catch any failing tests.
- - By default, the upstream dev CI is disabled on pull request and push events. You can override this behavior per commit by adding a [test-upstream] tag to the first line of the commit message. For documentation-only commits, you can skip the CI per commit by adding a "[skip-ci]" tag to the first line of the commit message.
+ - By default, the upstream dev CI is disabled on pull request and push events. You can override this behavior per commit by adding a ``[test-upstream]`` tag to the first line of the commit message. For documentation-only commits, you can skip the CI per commit by adding a ``[skip-ci]`` tag to the first line of the commit message.
- **Properly format your code** and verify that it passes the formatting guidelines set by `Black `_ and `Flake8 `_. See `"Code formatting" `_. You can use `pre-commit `_ to run these automatically on each commit.
diff --git a/doc/developers-meeting.rst b/doc/developers-meeting.rst
index 1c49a900f66..153f3520f26 100644
--- a/doc/developers-meeting.rst
+++ b/doc/developers-meeting.rst
@@ -3,18 +3,18 @@ Developers meeting
Xarray developers meet bi-weekly every other Wednesday.
-The meeting occurs on `Zoom `__.
+The meeting occurs on `Zoom `__.
-Find the `notes for the meeting here `__.
+Find the `notes for the meeting here `__.
There is a :issue:`GitHub issue for changes to the meeting<4001>`.
You can subscribe to this calendar to be notified of changes:
-* `Google Calendar `__
-* `iCal `__
+* `Google Calendar `__
+* `iCal `__
.. raw:: html
-
+
diff --git a/doc/ecosystem.rst b/doc/ecosystem.rst
index e6e970c6239..076874d82f3 100644
--- a/doc/ecosystem.rst
+++ b/doc/ecosystem.rst
@@ -36,11 +36,13 @@ Geosciences
- `rioxarray `_: geospatial xarray extension powered by rasterio
- `salem `_: Adds geolocalised subsetting, masking, and plotting operations to xarray's data structures via accessors.
- `SatPy `_ : Library for reading and manipulating meteorological remote sensing data and writing it to various image and data file formats.
+- `SARXarray `_: xarray extension for reading and processing large Synthetic Aperture Radar (SAR) data stacks.
- `Spyfit `_: FTIR spectroscopy of the atmosphere
- `windspharm `_: Spherical
harmonic wind analysis in Python.
- `wradlib `_: An Open Source Library for Weather Radar Data Processing.
- `wrf-python `_: A collection of diagnostic and interpolation routines for use with output of the Weather Research and Forecasting (WRF-ARW) Model.
+- `xarray-regrid `_: xarray extension for regridding rectilinear data.
- `xarray-simlab `_: xarray extension for computer model simulations.
- `xarray-spatial `_: Numba-accelerated raster-based spatial processing tools (NDVI, curvature, zonal-statistics, proximity, hillshading, viewshed, etc.)
- `xarray-topo `_: xarray extension for topographic analysis and modelling.
@@ -77,6 +79,7 @@ Extend xarray capabilities
- `xarray-dataclasses `_: xarray extension for typed DataArray and Dataset creation.
- `xarray_einstats `_: Statistics, linear algebra and einops for xarray
- `xarray_extras `_: Advanced algorithms for xarray objects (e.g. integrations/interpolations).
+- `xeofs `_: PCA/EOF analysis and related techniques, integrated with xarray and Dask for efficient handling of large-scale data.
- `xpublish `_: Publish Xarray Datasets via a Zarr compatible REST API.
- `xrft `_: Fourier transforms for xarray data.
- `xr-scipy `_: A lightweight scipy wrapper for xarray.
@@ -96,7 +99,6 @@ Visualization
Non-Python projects
~~~~~~~~~~~~~~~~~~~
- `xframe `_: C++ data structures inspired by xarray.
-- `AxisArrays `_ and
- `NamedArrays `_: similar data structures for Julia.
+- `AxisArrays `_, `NamedArrays `_ and `YAXArrays.jl `_: similar data structures for Julia.
More projects can be found at the `"xarray" Github topic `_.
diff --git a/doc/examples/apply_ufunc_vectorize_1d.ipynb b/doc/examples/apply_ufunc_vectorize_1d.ipynb
index 68d011d0725..c2ab7271873 100644
--- a/doc/examples/apply_ufunc_vectorize_1d.ipynb
+++ b/doc/examples/apply_ufunc_vectorize_1d.ipynb
@@ -11,7 +11,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "This example will illustrate how to conveniently apply an unvectorized function `func` to xarray objects using `apply_ufunc`. `func` expects 1D numpy arrays and returns a 1D numpy array. Our goal is to coveniently apply this function along a dimension of xarray objects that may or may not wrap dask arrays with a signature.\n",
+ "This example will illustrate how to conveniently apply an unvectorized function `func` to xarray objects using `apply_ufunc`. `func` expects 1D numpy arrays and returns a 1D numpy array. Our goal is to conveniently apply this function along a dimension of xarray objects that may or may not wrap dask arrays with a signature.\n",
"\n",
"We will illustrate this using `np.interp`: \n",
"\n",
diff --git a/doc/examples/multidimensional-coords.ipynb b/doc/examples/multidimensional-coords.ipynb
index f7471f05e5d..ce8a091a5da 100644
--- a/doc/examples/multidimensional-coords.ipynb
+++ b/doc/examples/multidimensional-coords.ipynb
@@ -56,7 +56,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "In this example, the _logical coordinates_ are `x` and `y`, while the _physical coordinates_ are `xc` and `yc`, which represent the latitudes and longitude of the data."
+ "In this example, the _logical coordinates_ are `x` and `y`, while the _physical coordinates_ are `xc` and `yc`, which represent the longitudes and latitudes of the data."
]
},
{
diff --git a/doc/examples/visualization_gallery.ipynb b/doc/examples/visualization_gallery.ipynb
index e6fa564db0d..e7e9196a6f6 100644
--- a/doc/examples/visualization_gallery.ipynb
+++ b/doc/examples/visualization_gallery.ipynb
@@ -193,90 +193,6 @@
"# Show\n",
"plt.tight_layout()"
]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "jp-MarkdownHeadingCollapsed": true,
- "tags": []
- },
- "source": [
- "## `imshow()` and rasterio map projections\n",
- "\n",
- "\n",
- "Using rasterio's projection information for more accurate plots.\n",
- "\n",
- "This example extends `recipes.rasterio` and plots the image in the\n",
- "original map projection instead of relying on pcolormesh and a map\n",
- "transformation."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "da = xr.tutorial.open_rasterio(\"RGB.byte\")\n",
- "\n",
- "# The data is in UTM projection. We have to set it manually until\n",
- "# https://github.com/SciTools/cartopy/issues/813 is implemented\n",
- "crs = ccrs.UTM(\"18\")\n",
- "\n",
- "# Plot on a map\n",
- "ax = plt.subplot(projection=crs)\n",
- "da.plot.imshow(ax=ax, rgb=\"band\", transform=crs)\n",
- "ax.coastlines(\"10m\", color=\"r\")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Parsing rasterio geocoordinates\n",
- "\n",
- "Converting a projection's cartesian coordinates into 2D longitudes and\n",
- "latitudes.\n",
- "\n",
- "These new coordinates might be handy for plotting and indexing, but it should\n",
- "be kept in mind that a grid which is regular in projection coordinates will\n",
- "likely be irregular in lon/lat. It is often recommended to work in the data's\n",
- "original map projection (see `recipes.rasterio_rgb`)."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from pyproj import Transformer\n",
- "import numpy as np\n",
- "\n",
- "da = xr.tutorial.open_rasterio(\"RGB.byte\")\n",
- "\n",
- "x, y = np.meshgrid(da[\"x\"], da[\"y\"])\n",
- "transformer = Transformer.from_crs(da.crs, \"EPSG:4326\", always_xy=True)\n",
- "lon, lat = transformer.transform(x, y)\n",
- "da.coords[\"lon\"] = ((\"y\", \"x\"), lon)\n",
- "da.coords[\"lat\"] = ((\"y\", \"x\"), lat)\n",
- "\n",
- "# Compute a greyscale out of the rgb image\n",
- "greyscale = da.mean(dim=\"band\")\n",
- "\n",
- "# Plot on a map\n",
- "ax = plt.subplot(projection=ccrs.PlateCarree())\n",
- "greyscale.plot(\n",
- " ax=ax,\n",
- " x=\"lon\",\n",
- " y=\"lat\",\n",
- " transform=ccrs.PlateCarree(),\n",
- " cmap=\"Greys_r\",\n",
- " shading=\"auto\",\n",
- " add_colorbar=False,\n",
- ")\n",
- "ax.coastlines(\"10m\", color=\"r\")"
- ]
}
],
"metadata": {
@@ -296,6 +212,13 @@
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
+ },
+ "widgets": {
+ "application/vnd.jupyter.widget-state+json": {
+ "state": {},
+ "version_major": 2,
+ "version_minor": 0
+ }
}
},
"nbformat": 4,
diff --git a/doc/gallery.yml b/doc/gallery.yml
index f1a147dae87..f8316017d8c 100644
--- a/doc/gallery.yml
+++ b/doc/gallery.yml
@@ -25,12 +25,12 @@ notebooks-examples:
- title: Applying unvectorized functions with apply_ufunc
path: examples/apply_ufunc_vectorize_1d.html
- thumbnail: _static/dataset-diagram-square-logo.png
+ thumbnail: _static/logos/Xarray_Logo_RGB_Final.svg
external-examples:
- title: Managing raster data with rioxarray
path: https://corteva.github.io/rioxarray/stable/examples/examples.html
- thumbnail: _static/dataset-diagram-square-logo.png
+ thumbnail: _static/logos/Xarray_Logo_RGB_Final.svg
- title: Xarray and dask on the cloud with Pangeo
path: https://gallery.pangeo.io/
@@ -38,7 +38,7 @@ external-examples:
- title: Xarray with Dask Arrays
path: https://examples.dask.org/xarray.html_
- thumbnail: _static/dataset-diagram-square-logo.png
+ thumbnail: _static/logos/Xarray_Logo_RGB_Final.svg
- title: Project Pythia Foundations Book
path: https://foundations.projectpythia.org/core/xarray.html
diff --git a/doc/gallery/plot_cartopy_facetgrid.py b/doc/gallery/plot_cartopy_facetgrid.py
index d8f5e73ee56..faa148938d6 100644
--- a/doc/gallery/plot_cartopy_facetgrid.py
+++ b/doc/gallery/plot_cartopy_facetgrid.py
@@ -13,7 +13,6 @@
.. _this discussion: https://github.com/pydata/xarray/issues/1397#issuecomment-299190567
"""
-
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
@@ -30,7 +29,7 @@
transform=ccrs.PlateCarree(), # the data's projection
col="time",
col_wrap=1, # multiplot settings
- aspect=ds.dims["lon"] / ds.dims["lat"], # for a sensible figsize
+ aspect=ds.sizes["lon"] / ds.sizes["lat"], # for a sensible figsize
subplot_kws={"projection": map_proj}, # the plot's projection
)
diff --git a/doc/gallery/plot_control_colorbar.py b/doc/gallery/plot_control_colorbar.py
index 8fb8d7f8be6..280e753db9a 100644
--- a/doc/gallery/plot_control_colorbar.py
+++ b/doc/gallery/plot_control_colorbar.py
@@ -6,6 +6,7 @@
Use ``cbar_kwargs`` keyword to specify the number of ticks.
The ``spacing`` kwarg can be used to draw proportional ticks.
"""
+
import matplotlib.pyplot as plt
import xarray as xr
diff --git a/doc/gallery/plot_rasterio.py b/doc/gallery/plot_rasterio.py
deleted file mode 100644
index 853923a38bd..00000000000
--- a/doc/gallery/plot_rasterio.py
+++ /dev/null
@@ -1,49 +0,0 @@
-"""
-.. _recipes.rasterio:
-
-=================================
-Parsing rasterio's geocoordinates
-=================================
-
-
-Converting a projection's cartesian coordinates into 2D longitudes and
-latitudes.
-
-These new coordinates might be handy for plotting and indexing, but it should
-be kept in mind that a grid which is regular in projection coordinates will
-likely be irregular in lon/lat. It is often recommended to work in the data's
-original map projection (see :ref:`recipes.rasterio_rgb`).
-"""
-
-import cartopy.crs as ccrs
-import matplotlib.pyplot as plt
-import numpy as np
-from pyproj import Transformer
-
-import xarray as xr
-
-# Read the data
-url = "https://github.com/rasterio/rasterio/raw/master/tests/data/RGB.byte.tif"
-da = xr.open_rasterio(url)
-
-# Compute the lon/lat coordinates with pyproj
-transformer = Transformer.from_crs(da.crs, "EPSG:4326", always_xy=True)
-lon, lat = transformer.transform(*np.meshgrid(da["x"], da["y"]))
-da.coords["lon"] = (("y", "x"), lon)
-da.coords["lat"] = (("y", "x"), lat)
-
-# Compute a greyscale out of the rgb image
-greyscale = da.mean(dim="band")
-
-# Plot on a map
-ax = plt.subplot(projection=ccrs.PlateCarree())
-greyscale.plot(
- ax=ax,
- x="lon",
- y="lat",
- transform=ccrs.PlateCarree(),
- cmap="Greys_r",
- add_colorbar=False,
-)
-ax.coastlines("10m", color="r")
-plt.show()
diff --git a/doc/gallery/plot_rasterio_rgb.py b/doc/gallery/plot_rasterio_rgb.py
deleted file mode 100644
index 912224ac132..00000000000
--- a/doc/gallery/plot_rasterio_rgb.py
+++ /dev/null
@@ -1,32 +0,0 @@
-"""
-.. _recipes.rasterio_rgb:
-
-============================
-imshow() and map projections
-============================
-
-Using rasterio's projection information for more accurate plots.
-
-This example extends :ref:`recipes.rasterio` and plots the image in the
-original map projection instead of relying on pcolormesh and a map
-transformation.
-"""
-
-import cartopy.crs as ccrs
-import matplotlib.pyplot as plt
-
-import xarray as xr
-
-# Read the data
-url = "https://github.com/rasterio/rasterio/raw/master/tests/data/RGB.byte.tif"
-da = xr.open_rasterio(url)
-
-# The data is in UTM projection. We have to set it manually until
-# https://github.com/SciTools/cartopy/issues/813 is implemented
-crs = ccrs.UTM("18N")
-
-# Plot on a map
-ax = plt.subplot(projection=crs)
-da.plot.imshow(ax=ax, rgb="band", transform=crs)
-ax.coastlines("10m", color="r")
-plt.show()
diff --git a/doc/getting-started-guide/faq.rst b/doc/getting-started-guide/faq.rst
index 08cb9646f94..7f99fa77e3a 100644
--- a/doc/getting-started-guide/faq.rst
+++ b/doc/getting-started-guide/faq.rst
@@ -168,18 +168,11 @@ integration with Cartopy_.
.. _Iris: https://scitools-iris.readthedocs.io/en/stable/
.. _Cartopy: https://scitools.org.uk/cartopy/docs/latest/
-`UV-CDAT`__ is another Python library that implements in-memory netCDF-like
-variables and `tools for working with climate data`__.
-
-__ https://uvcdat.llnl.gov/
-__ https://drclimate.wordpress.com/2014/01/02/a-beginners-guide-to-scripting-with-uv-cdat/
-
We think the design decisions we have made for xarray (namely, basing it on
pandas) make it a faster and more flexible data analysis tool. That said, Iris
-and CDAT have some great domain specific functionality, and xarray includes
-methods for converting back and forth between xarray and these libraries. See
-:py:meth:`~xarray.DataArray.to_iris` and :py:meth:`~xarray.DataArray.to_cdms2`
-for more details.
+has some great domain specific functionality, and xarray includes
+methods for converting back and forth between xarray and Iris. See
+:py:meth:`~xarray.DataArray.to_iris` for more details.
What other projects leverage xarray?
------------------------------------
@@ -356,6 +349,25 @@ There may be situations where you need to specify the engine manually using the
Some packages may have additional functionality beyond what is shown here. You can refer to the documentation for each package for more information.
+How does xarray handle missing values?
+--------------------------------------
+
+**xarray can handle missing values using ``np.NaN``**
+
+- ``np.NaN`` is used to represent missing values in labeled arrays and datasets. It is a commonly used standard for representing missing or undefined numerical data in scientific computing. ``np.NaN`` is a constant value in NumPy that represents "Not a Number" or missing values.
+
+- Most of xarray's computation methods are designed to automatically handle missing values appropriately.
+
+ For example, when performing operations like addition or multiplication on arrays that contain missing values, xarray will automatically ignore the missing values and only perform the operation on the valid data. This makes it easy to work with data that may contain missing or undefined values without having to worry about handling them explicitly.
+
+- Many of xarray's `aggregation methods `_, such as ``sum()``, ``mean()``, ``min()``, ``max()``, and others, have a skipna argument that controls whether missing values (represented by NaN) should be skipped (True) or treated as NaN (False) when performing the calculation.
+
+ By default, ``skipna`` is set to `True`, so missing values are ignored when computing the result. However, you can set ``skipna`` to `False` if you want missing values to be treated as NaN and included in the calculation.
+
+- On `plotting `_ an xarray dataset or array that contains missing values, xarray will simply leave the missing values as blank spaces in the plot.
+
+- We have a set of `methods `_ for manipulating missing and filling values.
+
How should I cite xarray?
-------------------------
diff --git a/doc/getting-started-guide/installing.rst b/doc/getting-started-guide/installing.rst
index 6b3283adcbd..f7eaf92f9cf 100644
--- a/doc/getting-started-guide/installing.rst
+++ b/doc/getting-started-guide/installing.rst
@@ -7,9 +7,9 @@ Required dependencies
---------------------
- Python (3.9 or later)
-- `numpy `__ (1.21 or later)
-- `packaging `__ (21.3 or later)
-- `pandas `__ (1.4 or later)
+- `numpy `__ (1.23 or later)
+- `packaging `__ (22 or later)
+- `pandas `__ (1.5 or later)
.. _optional-dependencies:
@@ -38,11 +38,6 @@ For netCDF and IO
- `cftime `__: recommended if you
want to encode/decode datetimes for non-standard calendars or dates before
year 1678 or after year 2262.
-- `PseudoNetCDF `__: recommended
- for accessing CAMx, GEOS-Chem (bpch), NOAA ARL files, ICARTT files
- (ffi1001) and many other.
-- `rasterio `__: for reading GeoTiffs and
- other gridded raster datasets.
- `iris `__: for conversion to and from iris'
Cube objects
@@ -88,7 +83,7 @@ Minimum dependency versions
Xarray adopts a rolling policy regarding the minimum supported version of its
dependencies:
-- **Python:** 24 months
+- **Python:** 30 months
(`NEP-29 `_)
- **numpy:** 18 months
(`NEP-29 `_)
@@ -137,13 +132,13 @@ We also maintain other dependency sets for different subsets of functionality::
The above commands should install most of the `optional dependencies`_. However,
some packages which are either not listed on PyPI or require extra
installation steps are excluded. To know which dependencies would be
-installed, take a look at the ``[options.extras_require]`` section in
-``setup.cfg``:
+installed, take a look at the ``[project.optional-dependencies]`` section in
+``pyproject.toml``:
-.. literalinclude:: ../../setup.cfg
- :language: ini
- :start-at: [options.extras_require]
- :end-before: [options.package_data]
+.. literalinclude:: ../../pyproject.toml
+ :language: toml
+ :start-at: [project.optional-dependencies]
+ :end-before: [build-system]
Development versions
--------------------
diff --git a/doc/howdoi.rst b/doc/howdoi.rst
index b6374cc5100..97b0872fdc4 100644
--- a/doc/howdoi.rst
+++ b/doc/howdoi.rst
@@ -36,13 +36,13 @@ How do I ...
* - rename a variable, dimension or coordinate
- :py:meth:`Dataset.rename`, :py:meth:`DataArray.rename`, :py:meth:`Dataset.rename_vars`, :py:meth:`Dataset.rename_dims`,
* - convert a DataArray to Dataset or vice versa
- - :py:meth:`DataArray.to_dataset`, :py:meth:`Dataset.to_array`, :py:meth:`Dataset.to_stacked_array`, :py:meth:`DataArray.to_unstacked_dataset`
+ - :py:meth:`DataArray.to_dataset`, :py:meth:`Dataset.to_dataarray`, :py:meth:`Dataset.to_stacked_array`, :py:meth:`DataArray.to_unstacked_dataset`
* - extract variables that have certain attributes
- :py:meth:`Dataset.filter_by_attrs`
* - extract the underlying array (e.g. NumPy or Dask arrays)
- :py:attr:`DataArray.data`
* - convert to and extract the underlying NumPy array
- - :py:attr:`DataArray.values`
+ - :py:attr:`DataArray.to_numpy`
* - convert to a pandas DataFrame
- :py:attr:`Dataset.to_dataframe`
* - sort values
diff --git a/doc/internals/chunked-arrays.rst b/doc/internals/chunked-arrays.rst
new file mode 100644
index 00000000000..ba7ce72c834
--- /dev/null
+++ b/doc/internals/chunked-arrays.rst
@@ -0,0 +1,102 @@
+.. currentmodule:: xarray
+
+.. _internals.chunkedarrays:
+
+Alternative chunked array types
+===============================
+
+.. warning::
+
+ This is a *highly* experimental feature. Please report any bugs or other difficulties on `xarray's issue tracker `_.
+ In particular see discussion on `xarray issue #6807 `_
+
+Xarray can wrap chunked dask arrays (see :ref:`dask`), but can also wrap any other chunked array type that exposes the correct interface.
+This allows us to support using other frameworks for distributed and out-of-core processing, with user code still written as xarray commands.
+In particular xarray also supports wrapping :py:class:`cubed.Array` objects
+(see `Cubed's documentation `_ and the `cubed-xarray package `_).
+
+The basic idea is that by wrapping an array that has an explicit notion of ``.chunks``, xarray can expose control over
+the choice of chunking scheme to users via methods like :py:meth:`DataArray.chunk` whilst the wrapped array actually
+implements the handling of processing all of the chunks.
+
+Chunked array methods and "core operations"
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A chunked array needs to meet all the :ref:`requirements for normal duck arrays `, but must also
+implement additional features.
+
+Chunked arrays have additional attributes and methods, such as ``.chunks`` and ``.rechunk``.
+Furthermore, Xarray dispatches chunk-aware computations across one or more chunked arrays using special functions known
+as "core operations". Examples include ``map_blocks``, ``blockwise``, and ``apply_gufunc``.
+
+The core operations are generalizations of functions first implemented in :py:mod:`dask.array`.
+The implementation of these functions is specific to the type of arrays passed to them. For example, when applying the
+``map_blocks`` core operation, :py:class:`dask.array.Array` objects must be processed by :py:func:`dask.array.map_blocks`,
+whereas :py:class:`cubed.Array` objects must be processed by :py:func:`cubed.map_blocks`.
+
+In order to use the correct implementation of a core operation for the array type encountered, xarray dispatches to the
+corresponding subclass of :py:class:`~xarray.namedarray.parallelcompat.ChunkManagerEntrypoint`,
+also known as a "Chunk Manager". Therefore **a full list of the operations that need to be defined is set by the
+API of the** :py:class:`~xarray.namedarray.parallelcompat.ChunkManagerEntrypoint` **abstract base class**. Note that chunked array
+methods are also currently dispatched using this class.
+
+Chunked array creation is also handled by this class. As chunked array objects have a one-to-one correspondence with
+in-memory numpy arrays, it should be possible to create a chunked array from a numpy array by passing the desired
+chunking pattern to an implementation of :py:class:`~xarray.namedarray.parallelcompat.ChunkManagerEntrypoint.from_array``.
+
+.. note::
+
+ The :py:class:`~xarray.namedarray.parallelcompat.ChunkManagerEntrypoint` abstract base class is mostly just acting as a
+ namespace for containing the chunked-aware function primitives. Ideally in the future we would have an API standard
+ for chunked array types which codified this structure, making the entrypoint system unnecessary.
+
+.. currentmodule:: xarray.namedarray.parallelcompat
+
+.. autoclass:: xarray.namedarray.parallelcompat.ChunkManagerEntrypoint
+ :members:
+
+Registering a new ChunkManagerEntrypoint subclass
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Rather than hard-coding various chunk managers to deal with specific chunked array implementations, xarray uses an
+entrypoint system to allow developers of new chunked array implementations to register their corresponding subclass of
+:py:class:`~xarray.namedarray.parallelcompat.ChunkManagerEntrypoint`.
+
+
+To register a new entrypoint you need to add an entry to the ``setup.cfg`` like this::
+
+ [options.entry_points]
+ xarray.chunkmanagers =
+ dask = xarray.namedarray.daskmanager:DaskManager
+
+See also `cubed-xarray `_ for another example.
+
+To check that the entrypoint has worked correctly, you may find it useful to display the available chunkmanagers using
+the internal function :py:func:`~xarray.namedarray.parallelcompat.list_chunkmanagers`.
+
+.. autofunction:: list_chunkmanagers
+
+
+User interface
+~~~~~~~~~~~~~~
+
+Once the chunkmanager subclass has been registered, xarray objects wrapping the desired array type can be created in 3 ways:
+
+#. By manually passing the array type to the :py:class:`~xarray.DataArray` constructor, see the examples for :ref:`numpy-like arrays `,
+
+#. Calling :py:meth:`~xarray.DataArray.chunk`, passing the keyword arguments ``chunked_array_type`` and ``from_array_kwargs``,
+
+#. Calling :py:func:`~xarray.open_dataset`, passing the keyword arguments ``chunked_array_type`` and ``from_array_kwargs``.
+
+The latter two methods ultimately call the chunkmanager's implementation of ``.from_array``, to which they pass the ``from_array_kwargs`` dict.
+The ``chunked_array_type`` kwarg selects which registered chunkmanager subclass to dispatch to. It defaults to ``'dask'``
+if Dask is installed, otherwise it defaults to whichever chunkmanager is registered if only one is registered.
+If multiple chunkmanagers are registered it will raise an error by default.
+
+Parallel processing without chunks
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To use a parallel array type that does not expose a concept of chunks explicitly, none of the information on this page
+is theoretically required. Such an array type (e.g. `Ramba `_ or
+`Arkouda `_) could be wrapped using xarray's existing support for
+:ref:`numpy-like "duck" arrays `.
diff --git a/doc/internals/duck-arrays-integration.rst b/doc/internals/duck-arrays-integration.rst
index d403328aa2f..43b17be8bb8 100644
--- a/doc/internals/duck-arrays-integration.rst
+++ b/doc/internals/duck-arrays-integration.rst
@@ -1,23 +1,59 @@
-.. _internals.duck_arrays:
+.. _internals.duckarrays:
Integrating with duck arrays
=============================
.. warning::
- This is a experimental feature.
+ This is an experimental feature. Please report any bugs or other difficulties on `xarray's issue tracker `_.
-Xarray can wrap custom :term:`duck array` objects as long as they define numpy's
-``shape``, ``dtype`` and ``ndim`` properties and the ``__array__``,
-``__array_ufunc__`` and ``__array_function__`` methods.
+Xarray can wrap custom numpy-like arrays (":term:`duck array`\s") - see the :ref:`user guide documentation `.
+This page is intended for developers who are interested in wrapping a new custom array type with xarray.
+
+.. _internals.duckarrays.requirements:
+
+Duck array requirements
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Xarray does not explicitly check that required methods are defined by the underlying duck array object before
+attempting to wrap the given array. However, a wrapped array type should at a minimum define these attributes:
+
+* ``shape`` property,
+* ``dtype`` property,
+* ``ndim`` property,
+* ``__array__`` method,
+* ``__array_ufunc__`` method,
+* ``__array_function__`` method.
+
+These need to be defined consistently with :py:class:`numpy.ndarray`, for example the array ``shape``
+property needs to obey `numpy's broadcasting rules `_
+(see also the `Python Array API standard's explanation `_
+of these same rules).
+
+.. _internals.duckarrays.array_api_standard:
+
+Python Array API standard support
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+As an integration library xarray benefits greatly from the standardization of duck-array libraries' APIs, and so is a
+big supporter of the `Python Array API Standard `_.
+
+We aim to support any array libraries that follow the Array API standard out-of-the-box. However, xarray does occasionally
+call some numpy functions which are not (yet) part of the standard (e.g. :py:meth:`xarray.DataArray.pad` calls :py:func:`numpy.pad`).
+See `xarray issue #7848 `_ for a list of such functions. We can still support dispatching on these functions through
+the array protocols above, it just means that if you exclusively implement the methods in the Python Array API standard
+then some features in xarray will not work.
+
+Custom inline reprs
+~~~~~~~~~~~~~~~~~~~
In certain situations (e.g. when printing the collapsed preview of
variables of a ``Dataset``), xarray will display the repr of a :term:`duck array`
in a single line, truncating it to a certain number of characters. If that
would drop too much information, the :term:`duck array` may define a
``_repr_inline_`` method that takes ``max_width`` (number of characters) as an
-argument:
+argument
.. code:: python
diff --git a/doc/internals/extending-xarray.rst b/doc/internals/extending-xarray.rst
index f8b61d12a2f..0537ae85389 100644
--- a/doc/internals/extending-xarray.rst
+++ b/doc/internals/extending-xarray.rst
@@ -1,6 +1,8 @@
-Extending xarray
-================
+.. _internals.accessors:
+
+Extending xarray using accessors
+================================
.. ipython:: python
:suppress:
@@ -8,11 +10,16 @@ Extending xarray
import xarray as xr
-Xarray is designed as a general purpose library, and hence tries to avoid
+Xarray is designed as a general purpose library and hence tries to avoid
including overly domain specific functionality. But inevitably, the need for more
domain specific logic arises.
-One standard solution to this problem is to subclass Dataset and/or DataArray to
+.. _internals.accessors.composition:
+
+Composition over Inheritance
+----------------------------
+
+One potential solution to this problem is to subclass Dataset and/or DataArray to
add domain specific functionality. However, inheritance is not very robust. It's
easy to inadvertently use internal APIs when subclassing, which means that your
code may break when xarray upgrades. Furthermore, many builtin methods will
@@ -21,15 +28,23 @@ only return native xarray objects.
The standard advice is to use :issue:`composition over inheritance <706>`, but
reimplementing an API as large as xarray's on your own objects can be an onerous
task, even if most methods are only forwarding to xarray implementations.
+(For an example of a project which took this approach of subclassing see `UXarray `_).
If you simply want the ability to call a function with the syntax of a
method call, then the builtin :py:meth:`~xarray.DataArray.pipe` method (copied
from pandas) may suffice.
+.. _internals.accessors.writing accessors:
+
+Writing Custom Accessors
+------------------------
+
To resolve this issue for more complex cases, xarray has the
:py:func:`~xarray.register_dataset_accessor` and
:py:func:`~xarray.register_dataarray_accessor` decorators for adding custom
-"accessors" on xarray objects. Here's how you might use these decorators to
+"accessors" on xarray objects, thereby "extending" the functionality of your xarray object.
+
+Here's how you might use these decorators to
write a custom "geo" accessor implementing a geography specific extension to
xarray:
@@ -88,7 +103,7 @@ The intent here is that libraries that extend xarray could add such an accessor
to implement subclass specific functionality rather than using actual subclasses
or patching in a large number of domain specific methods. For further reading
on ways to write new accessors and the philosophy behind the approach, see
-:issue:`1080`.
+https://github.com/pydata/xarray/issues/1080.
To help users keep things straight, please `let us know
`_ if you plan to write a new accessor
diff --git a/doc/internals/how-to-add-new-backend.rst b/doc/internals/how-to-add-new-backend.rst
index a106232958e..4352dd3df5b 100644
--- a/doc/internals/how-to-add-new-backend.rst
+++ b/doc/internals/how-to-add-new-backend.rst
@@ -9,7 +9,8 @@ to integrate any code in Xarray; all you need to do is:
- Create a class that inherits from Xarray :py:class:`~xarray.backends.BackendEntrypoint`
and implements the method ``open_dataset`` see :ref:`RST backend_entrypoint`
-- Declare this class as an external plugin in your ``setup.py``, see :ref:`RST backend_registration`
+- Declare this class as an external plugin in your project configuration, see :ref:`RST
+ backend_registration`
If you also want to support lazy loading and dask see :ref:`RST lazy_loading`.
@@ -267,42 +268,57 @@ interface only the boolean keywords related to the supported decoders.
How to register a backend
+++++++++++++++++++++++++
-Define a new entrypoint in your ``setup.py`` (or ``setup.cfg``) with:
+Define a new entrypoint in your ``pyproject.toml`` (or ``setup.cfg/setup.py`` for older
+configurations), with:
- group: ``xarray.backends``
- name: the name to be passed to :py:meth:`~xarray.open_dataset` as ``engine``
- object reference: the reference of the class that you have implemented.
-You can declare the entrypoint in ``setup.py`` using the following syntax:
+You can declare the entrypoint in your project configuration like so:
-.. code-block::
+.. tab:: pyproject.toml
- setuptools.setup(
- entry_points={
- "xarray.backends": ["my_engine=my_package.my_module:MyBackendEntryClass"],
- },
- )
+ .. code:: toml
+
+ [project.entry-points."xarray.backends"]
+ my_engine = "my_package.my_module:MyBackendEntrypoint"
+
+.. tab:: pyproject.toml [Poetry]
+
+ .. code-block:: toml
+
+ [tool.poetry.plugins."xarray.backends"]
+ my_engine = "my_package.my_module:MyBackendEntrypoint"
-in ``setup.cfg``:
+.. tab:: setup.cfg
-.. code-block:: cfg
+ .. code-block:: cfg
- [options.entry_points]
- xarray.backends =
- my_engine = my_package.my_module:MyBackendEntryClass
+ [options.entry_points]
+ xarray.backends =
+ my_engine = my_package.my_module:MyBackendEntrypoint
+.. tab:: setup.py
-See https://packaging.python.org/specifications/entry-points/#data-model
-for more information
+ .. code-block::
-If you are using `Poetry `_ for your build system, you can accomplish the same thing using "plugins". In this case you would need to add the following to your ``pyproject.toml`` file:
+ setuptools.setup(
+ entry_points={
+ "xarray.backends": [
+ "my_engine=my_package.my_module:MyBackendEntrypoint"
+ ],
+ },
+ )
-.. code-block:: toml
- [tool.poetry.plugins."xarray.backends"]
- "my_engine" = "my_package.my_module:MyBackendEntryClass"
+See the `Python Packaging User Guide
+`_ for more
+information on entrypoints and details of the syntax.
-See https://python-poetry.org/docs/pyproject/#plugins for more information on Poetry plugins.
+If you're using Poetry, note that table name in ``pyproject.toml`` is slightly different.
+See `the Poetry docs `_ for more
+information on plugins.
.. _RST lazy_loading:
diff --git a/doc/internals/how-to-create-custom-index.rst b/doc/internals/how-to-create-custom-index.rst
new file mode 100644
index 00000000000..90b3412c2cb
--- /dev/null
+++ b/doc/internals/how-to-create-custom-index.rst
@@ -0,0 +1,235 @@
+.. currentmodule:: xarray
+
+.. _internals.custom indexes:
+
+How to create a custom index
+============================
+
+.. warning::
+
+ This feature is highly experimental. Support for custom indexes has been
+ introduced in v2022.06.0 and is still incomplete. API is subject to change
+ without deprecation notice. However we encourage you to experiment and report issues that arise.
+
+Xarray's built-in support for label-based indexing (e.g. `ds.sel(latitude=40, method="nearest")`) and alignment operations
+relies on :py:class:`pandas.Index` objects. Pandas Indexes are powerful and suitable for many
+applications but also have some limitations:
+
+- it only works with 1-dimensional coordinates where explicit labels
+ are fully loaded in memory
+- it is hard to reuse it with irregular data for which there exist more
+ efficient, tree-based structures to perform data selection
+- it doesn't support extra metadata that may be required for indexing and
+ alignment (e.g., a coordinate reference system)
+
+Fortunately, Xarray now allows extending this functionality with custom indexes,
+which can be implemented in 3rd-party libraries.
+
+The Index base class
+--------------------
+
+Every Xarray index must inherit from the :py:class:`Index` base class. It is for
+example the case of Xarray built-in ``PandasIndex`` and ``PandasMultiIndex``
+subclasses, which wrap :py:class:`pandas.Index` and
+:py:class:`pandas.MultiIndex` respectively.
+
+The ``Index`` API closely follows the :py:class:`Dataset` and
+:py:class:`DataArray` API, e.g., for an index to support :py:meth:`DataArray.sel` it needs to
+implement :py:meth:`Index.sel`, to support :py:meth:`DataArray.stack` and :py:meth:`DataArray.unstack` it
+needs to implement :py:meth:`Index.stack` and :py:meth:`Index.unstack`, etc.
+
+Some guidelines and examples are given below. More details can be found in the
+documented :py:class:`Index` API.
+
+Minimal requirements
+--------------------
+
+Every index must at least implement the :py:meth:`Index.from_variables` class
+method, which is used by Xarray to build a new index instance from one or more
+existing coordinates in a Dataset or DataArray.
+
+Since any collection of coordinates can be passed to that method (i.e., the
+number, order and dimensions of the coordinates are all arbitrary), it is the
+responsibility of the index to check the consistency and validity of those input
+coordinates.
+
+For example, :py:class:`~xarray.core.indexes.PandasIndex` accepts only one coordinate and
+:py:class:`~xarray.core.indexes.PandasMultiIndex` accepts one or more 1-dimensional coordinates that must all
+share the same dimension. Other, custom indexes need not have the same
+constraints, e.g.,
+
+- a georeferenced raster index which only accepts two 1-d coordinates with
+ distinct dimensions
+- a staggered grid index which takes coordinates with different dimension name
+ suffixes (e.g., "_c" and "_l" for center and left)
+
+Optional requirements
+---------------------
+
+Pretty much everything else is optional. Depending on the method, in the absence
+of a (re)implementation, an index will either raise a `NotImplementedError`
+or won't do anything specific (just drop, pass or copy itself
+from/to the resulting Dataset or DataArray).
+
+For example, you can just skip re-implementing :py:meth:`Index.rename` if there
+is no internal attribute or object to rename according to the new desired
+coordinate or dimension names. In the case of ``PandasIndex``, we rename the
+underlying ``pandas.Index`` object and/or update the ``PandasIndex.dim``
+attribute since the associated dimension name has been changed.
+
+Wrap index data as coordinate data
+----------------------------------
+
+In some cases it is possible to reuse the index's underlying object or structure
+as coordinate data and hence avoid data duplication.
+
+For ``PandasIndex`` and ``PandasMultiIndex``, we
+leverage the fact that ``pandas.Index`` objects expose some array-like API. In
+Xarray we use some wrappers around those underlying objects as a thin
+compatibility layer to preserve dtypes, handle explicit and n-dimensional
+indexing, etc.
+
+Other structures like tree-based indexes (e.g., kd-tree) may differ too much
+from arrays to reuse it as coordinate data.
+
+If the index data can be reused as coordinate data, the ``Index`` subclass
+should implement :py:meth:`Index.create_variables`. This method accepts a
+dictionary of variable names as keys and :py:class:`Variable` objects as values (used for propagating
+variable metadata) and should return a dictionary of new :py:class:`Variable` or
+:py:class:`IndexVariable` objects.
+
+Data selection
+--------------
+
+For an index to support label-based selection, it needs to at least implement
+:py:meth:`Index.sel`. This method accepts a dictionary of labels where the keys
+are coordinate names (already filtered for the current index) and the values can
+be pretty much anything (e.g., a slice, a tuple, a list, a numpy array, a
+:py:class:`Variable` or a :py:class:`DataArray`). It is the responsibility of
+the index to properly handle those input labels.
+
+:py:meth:`Index.sel` must return an instance of :py:class:`IndexSelResult`. The
+latter is a small data class that holds positional indexers (indices) and that
+may also hold new variables, new indexes, names of variables or indexes to drop,
+names of dimensions to rename, etc. For example, this is useful in the case of
+``PandasMultiIndex`` as it allows Xarray to convert it into a single ``PandasIndex``
+when only one level remains after the selection.
+
+The :py:class:`IndexSelResult` class is also used to merge results from label-based
+selection performed by different indexes. Note that it is now possible to have
+two distinct indexes for two 1-d coordinates sharing the same dimension, but it
+is not currently possible to use those two indexes in the same call to
+:py:meth:`Dataset.sel`.
+
+Optionally, the index may also implement :py:meth:`Index.isel`. In the case of
+``PandasIndex`` we use it to create a new index object by just indexing the
+underlying ``pandas.Index`` object. In other cases this may not be possible,
+e.g., a kd-tree object may not be easily indexed. If ``Index.isel()`` is not
+implemented, the index in just dropped in the DataArray or Dataset resulting
+from the selection.
+
+Alignment
+---------
+
+For an index to support alignment, it needs to implement:
+
+- :py:meth:`Index.equals`, which compares the index with another index and
+ returns either ``True`` or ``False``
+- :py:meth:`Index.join`, which combines the index with another index and returns
+ a new Index object
+- :py:meth:`Index.reindex_like`, which queries the index with another index and
+ returns positional indexers that are used to re-index Dataset or DataArray
+ variables along one or more dimensions
+
+Xarray ensures that those three methods are called with an index of the same
+type as argument.
+
+Meta-indexes
+------------
+
+Nothing prevents writing a custom Xarray index that itself encapsulates other
+Xarray index(es). We call such index a "meta-index".
+
+Here is a small example of a meta-index for geospatial, raster datasets (i.e.,
+regularly spaced 2-dimensional data) that internally relies on two
+``PandasIndex`` instances for the x and y dimensions respectively:
+
+.. code-block:: python
+
+ from xarray import Index
+ from xarray.core.indexes import PandasIndex
+ from xarray.core.indexing import merge_sel_results
+
+
+ class RasterIndex(Index):
+ def __init__(self, xy_indexes):
+ assert len(xy_indexes) == 2
+
+ # must have two distinct dimensions
+ dim = [idx.dim for idx in xy_indexes.values()]
+ assert dim[0] != dim[1]
+
+ self._xy_indexes = xy_indexes
+
+ @classmethod
+ def from_variables(cls, variables):
+ assert len(variables) == 2
+
+ xy_indexes = {
+ k: PandasIndex.from_variables({k: v}) for k, v in variables.items()
+ }
+
+ return cls(xy_indexes)
+
+ def create_variables(self, variables):
+ idx_variables = {}
+
+ for index in self._xy_indexes.values():
+ idx_variables.update(index.create_variables(variables))
+
+ return idx_variables
+
+ def sel(self, labels):
+ results = []
+
+ for k, index in self._xy_indexes.items():
+ if k in labels:
+ results.append(index.sel({k: labels[k]}))
+
+ return merge_sel_results(results)
+
+
+This basic index only supports label-based selection. Providing a full-featured
+index by implementing the other ``Index`` methods should be pretty
+straightforward for this example, though.
+
+This example is also not very useful unless we add some extra functionality on
+top of the two encapsulated ``PandasIndex`` objects, such as a coordinate
+reference system.
+
+How to use a custom index
+-------------------------
+
+You can use :py:meth:`Dataset.set_xindex` or :py:meth:`DataArray.set_xindex` to assign a
+custom index to a Dataset or DataArray, e.g., using the ``RasterIndex`` above:
+
+.. code-block:: python
+
+ import numpy as np
+ import xarray as xr
+
+ da = xr.DataArray(
+ np.random.uniform(size=(100, 50)),
+ coords={"x": ("x", np.arange(50)), "y": ("y", np.arange(100))},
+ dims=("y", "x"),
+ )
+
+ # Xarray create default indexes for the 'x' and 'y' coordinates
+ # we first need to explicitly drop it
+ da = da.drop_indexes(["x", "y"])
+
+ # Build a RasterIndex from the 'x' and 'y' coordinates
+ da_raster = da.set_xindex(["x", "y"], RasterIndex)
+
+ # RasterIndex now takes care of label-based selection
+ selected = da_raster.sel(x=10, y=slice(20, 50))
diff --git a/doc/internals/index.rst b/doc/internals/index.rst
index e4ca9779dd7..b2a37900338 100644
--- a/doc/internals/index.rst
+++ b/doc/internals/index.rst
@@ -1,6 +1,6 @@
.. _internals:
-xarray Internals
+Xarray Internals
================
Xarray builds upon two of the foundational libraries of the scientific Python
@@ -8,13 +8,21 @@ stack, NumPy and pandas. It is written in pure Python (no C or Cython
extensions), which makes it easy to develop and extend. Instead, we push
compiled code to :ref:`optional dependencies`.
+The pages in this section are intended for:
+
+* Contributors to xarray who wish to better understand some of the internals,
+* Developers from other fields who wish to extend xarray with domain-specific logic, perhaps to support a new scientific community of users,
+* Developers of other packages who wish to interface xarray with their existing tools, e.g. by creating a backend for reading a new file format, or wrapping a custom array type.
.. toctree::
:maxdepth: 2
:hidden:
- variable-objects
+ internal-design
+ interoperability
duck-arrays-integration
+ chunked-arrays
extending-xarray
- zarr-encoding-spec
how-to-add-new-backend
+ how-to-create-custom-index
+ zarr-encoding-spec
diff --git a/doc/internals/internal-design.rst b/doc/internals/internal-design.rst
new file mode 100644
index 00000000000..55ab2d79dbe
--- /dev/null
+++ b/doc/internals/internal-design.rst
@@ -0,0 +1,224 @@
+.. ipython:: python
+ :suppress:
+
+ import numpy as np
+ import pandas as pd
+ import xarray as xr
+
+ np.random.seed(123456)
+ np.set_printoptions(threshold=20)
+
+.. _internal design:
+
+Internal Design
+===============
+
+This page gives an overview of the internal design of xarray.
+
+In totality, the Xarray project defines 4 key data structures.
+In order of increasing complexity, they are:
+
+- :py:class:`xarray.Variable`,
+- :py:class:`xarray.DataArray`,
+- :py:class:`xarray.Dataset`,
+- :py:class:`datatree.DataTree`.
+
+The user guide lists only :py:class:`xarray.DataArray` and :py:class:`xarray.Dataset`,
+but :py:class:`~xarray.Variable` is the fundamental object internally,
+and :py:class:`~datatree.DataTree` is a natural generalisation of :py:class:`xarray.Dataset`.
+
+.. note::
+
+ Our :ref:`roadmap` includes plans both to document :py:class:`~xarray.Variable` as fully public API,
+ and to merge the `xarray-datatree `_ package into xarray's main repository.
+
+Internally private :ref:`lazy indexing classes ` are used to avoid loading more data than necessary,
+and flexible indexes classes (derived from :py:class:`~xarray.indexes.Index`) provide performant label-based lookups.
+
+
+.. _internal design.data structures:
+
+Data Structures
+---------------
+
+The :ref:`data structures` page in the user guide explains the basics and concentrates on user-facing behavior,
+whereas this section explains how xarray's data structure classes actually work internally.
+
+
+.. _internal design.data structures.variable:
+
+Variable Objects
+~~~~~~~~~~~~~~~~
+
+The core internal data structure in xarray is the :py:class:`~xarray.Variable`,
+which is used as the basic building block behind xarray's
+:py:class:`~xarray.Dataset`, :py:class:`~xarray.DataArray` types. A
+:py:class:`~xarray.Variable` consists of:
+
+- ``dims``: A tuple of dimension names.
+- ``data``: The N-dimensional array (typically a NumPy or Dask array) storing
+ the Variable's data. It must have the same number of dimensions as the length
+ of ``dims``.
+- ``attrs``: A dictionary of metadata associated with this array. By
+ convention, xarray's built-in operations never use this metadata.
+- ``encoding``: Another dictionary used to store information about how
+ these variable's data is represented on disk. See :ref:`io.encoding` for more
+ details.
+
+:py:class:`~xarray.Variable` has an interface similar to NumPy arrays, but extended to make use
+of named dimensions. For example, it uses ``dim`` in preference to an ``axis``
+argument for methods like ``mean``, and supports :ref:`compute.broadcasting`.
+
+However, unlike ``Dataset`` and ``DataArray``, the basic ``Variable`` does not
+include coordinate labels along each axis.
+
+:py:class:`~xarray.Variable` is public API, but because of its incomplete support for labeled
+data, it is mostly intended for advanced uses, such as in xarray itself, for
+writing new backends, or when creating custom indexes.
+You can access the variable objects that correspond to xarray objects via the (readonly)
+:py:attr:`Dataset.variables ` and
+:py:attr:`DataArray.variable ` attributes.
+
+
+.. _internal design.dataarray:
+
+DataArray Objects
+~~~~~~~~~~~~~~~~~
+
+The simplest data structure used by most users is :py:class:`~xarray.DataArray`.
+A :py:class:`~xarray.DataArray` is a composite object consisting of multiple
+:py:class:`~xarray.core.variable.Variable` objects which store related data.
+
+A single :py:class:`~xarray.core.Variable` is referred to as the "data variable", and stored under the :py:attr:`~xarray.DataArray.variable`` attribute.
+A :py:class:`~xarray.DataArray` inherits all of the properties of this data variable, i.e. ``dims``, ``data``, ``attrs`` and ``encoding``,
+all of which are implemented by forwarding on to the underlying ``Variable`` object.
+
+In addition, a :py:class:`~xarray.DataArray` stores additional ``Variable`` objects stored in a dict under the private ``_coords`` attribute,
+each of which is referred to as a "Coordinate Variable". These coordinate variable objects are only allowed to have ``dims`` that are a subset of the data variable's ``dims``,
+and each dim has a specific length. This means that the full :py:attr:`~xarray.DataArray.size` of the dataarray can be represented by a dictionary mapping dimension names to integer sizes.
+The underlying data variable has this exact same size, and the attached coordinate variables have sizes which are some subset of the size of the data variable.
+Another way of saying this is that all coordinate variables must be "alignable" with the data variable.
+
+When a coordinate is accessed by the user (e.g. via the dict-like :py:class:`~xarray.DataArray.__getitem__` syntax),
+then a new ``DataArray`` is constructed by finding all coordinate variables that have compatible dimensions and re-attaching them before the result is returned.
+This is why most users never see the ``Variable`` class underlying each coordinate variable - it is always promoted to a ``DataArray`` before returning.
+
+Lookups are performed by special :py:class:`~xarray.indexes.Index` objects, which are stored in a dict under the private ``_indexes`` attribute.
+Indexes must be associated with one or more coordinates, and essentially act by translating a query given in physical coordinate space
+(typically via the :py:meth:`~xarray.DataArray.sel` method) into a set of integer indices in array index space that can be used to index the underlying n-dimensional array-like ``data``.
+Indexing in array index space (typically performed via the :py:meth:`~xarray.DataArray.isel` method) does not require consulting an ``Index`` object.
+
+Finally a :py:class:`~xarray.DataArray` defines a :py:attr:`~xarray.DataArray.name` attribute, which refers to its data
+variable but is stored on the wrapping ``DataArray`` class.
+The ``name`` attribute is primarily used when one or more :py:class:`~xarray.DataArray` objects are promoted into a :py:class:`~xarray.Dataset`
+(e.g. via :py:meth:`~xarray.DataArray.to_dataset`).
+Note that the underlying :py:class:`~xarray.core.Variable` objects are all unnamed, so they can always be referred to uniquely via a
+dict-like mapping.
+
+.. _internal design.dataset:
+
+Dataset Objects
+~~~~~~~~~~~~~~~
+
+The :py:class:`~xarray.Dataset` class is a generalization of the :py:class:`~xarray.DataArray` class that can hold multiple data variables.
+Internally all data variables and coordinate variables are stored under a single ``variables`` dict, and coordinates are
+specified by storing their names in a private ``_coord_names`` dict.
+
+The dataset's ``dims`` are the set of all dims present across any variable, but (similar to in dataarrays) coordinate
+variables cannot have a dimension that is not present on any data variable.
+
+When a data variable or coordinate variable is accessed, a new ``DataArray`` is again constructed from all compatible
+coordinates before returning.
+
+.. _internal design.subclassing:
+
+.. note::
+
+ The way that selecting a variable from a ``DataArray`` or ``Dataset`` actually involves internally wrapping the
+ ``Variable`` object back up into a ``DataArray``/``Dataset`` is the primary reason :ref:`we recommend against subclassing `
+ Xarray objects. The main problem it creates is that we currently cannot easily guarantee that for example selecting
+ a coordinate variable from your ``SubclassedDataArray`` would return an instance of ``SubclassedDataArray`` instead
+ of just an :py:class:`xarray.DataArray`. See `GH issue `_ for more details.
+
+.. _internal design.lazy indexing:
+
+Lazy Indexing Classes
+---------------------
+
+Lazy Loading
+~~~~~~~~~~~~
+
+If we open a ``Variable`` object from disk using :py:func:`~xarray.open_dataset` we can see that the actual values of
+the array wrapped by the data variable are not displayed.
+
+.. ipython:: python
+
+ da = xr.tutorial.open_dataset("air_temperature")["air"]
+ var = da.variable
+ var
+
+We can see the size, and the dtype of the underlying array, but not the actual values.
+This is because the values have not yet been loaded.
+
+If we look at the private attribute :py:meth:`~xarray.Variable._data` containing the underlying array object, we see
+something interesting:
+
+.. ipython:: python
+
+ var._data
+
+You're looking at one of xarray's internal `Lazy Indexing Classes`. These powerful classes are hidden from the user,
+but provide important functionality.
+
+Calling the public :py:attr:`~xarray.Variable.data` property loads the underlying array into memory.
+
+.. ipython:: python
+
+ var.data
+
+This array is now cached, which we can see by accessing the private attribute again:
+
+.. ipython:: python
+
+ var._data
+
+Lazy Indexing
+~~~~~~~~~~~~~
+
+The purpose of these lazy indexing classes is to prevent more data being loaded into memory than is necessary for the
+subsequent analysis, by deferring loading data until after indexing is performed.
+
+Let's open the data from disk again.
+
+.. ipython:: python
+
+ da = xr.tutorial.open_dataset("air_temperature")["air"]
+ var = da.variable
+
+Now, notice how even after subsetting the data has does not get loaded:
+
+.. ipython:: python
+
+ var.isel(time=0)
+
+The shape has changed, but the values are still not shown.
+
+Looking at the private attribute again shows how this indexing information was propagated via the hidden lazy indexing classes:
+
+.. ipython:: python
+
+ var.isel(time=0)._data
+
+.. note::
+
+ Currently only certain indexing operations are lazy, not all array operations. For discussion of making all array
+ operations lazy see `GH issue #5081 `_.
+
+
+Lazy Dask Arrays
+~~~~~~~~~~~~~~~~
+
+Note that xarray's implementation of Lazy Indexing classes is completely separate from how :py:class:`dask.array.Array`
+objects evaluate lazily. Dask-backed xarray objects delay almost all operations until :py:meth:`~xarray.DataArray.compute`
+is called (either explicitly or implicitly via :py:meth:`~xarray.DataArray.plot` for example). The exceptions to this
+laziness are operations whose output shape is data-dependent, such as when calling :py:meth:`~xarray.DataArray.where`.
diff --git a/doc/internals/interoperability.rst b/doc/internals/interoperability.rst
new file mode 100644
index 00000000000..a45363bcab7
--- /dev/null
+++ b/doc/internals/interoperability.rst
@@ -0,0 +1,45 @@
+.. _interoperability:
+
+Interoperability of Xarray
+==========================
+
+Xarray is designed to be extremely interoperable, in many orthogonal ways.
+Making xarray as flexible as possible is the common theme of most of the goals on our :ref:`roadmap`.
+
+This interoperability comes via a set of flexible abstractions into which the user can plug in. The current full list is:
+
+- :ref:`Custom file backends ` via the :py:class:`~xarray.backends.BackendEntrypoint` system,
+- Numpy-like :ref:`"duck" array wrapping `, which supports the `Python Array API Standard `_,
+- :ref:`Chunked distributed array computation ` via the :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint` system,
+- Custom :py:class:`~xarray.Index` objects for :ref:`flexible label-based lookups `,
+- Extending xarray objects with domain-specific methods via :ref:`custom accessors `.
+
+.. warning::
+
+ One obvious way in which xarray could be more flexible is that whilst subclassing xarray objects is possible, we
+ currently don't support it in most transformations, instead recommending composition over inheritance. See the
+ :ref:`internal design page ` for the rationale and look at the corresponding `GH issue `_
+ if you're interested in improving support for subclassing!
+
+.. note::
+
+ If you think there is another way in which xarray could become more generically flexible then please
+ tell us your ideas by `raising an issue to request the feature `_!
+
+
+Whilst xarray was originally designed specifically to open ``netCDF4`` files as :py:class:`numpy.ndarray` objects labelled by :py:class:`pandas.Index` objects,
+it is entirely possible today to:
+
+- lazily open an xarray object directly from a custom binary file format (e.g. using ``xarray.open_dataset(path, engine='my_custom_format')``,
+- handle the data as any API-compliant numpy-like array type (e.g. sparse or GPU-backed),
+- distribute out-of-core computation across that array type in parallel (e.g. via :ref:`dask`),
+- track the physical units of the data through computations (e.g via `pint-xarray `_),
+- query the data via custom index logic optimized for specific applications (e.g. an :py:class:`~xarray.Index` object backed by a KDTree structure),
+- attach domain-specific logic via accessor methods (e.g. to understand geographic Coordinate Reference System metadata),
+- organize hierarchical groups of xarray data in a :py:class:`~datatree.DataTree` (e.g. to treat heterogeneous simulation and observational data together during analysis).
+
+All of these features can be provided simultaneously, using libraries compatible with the rest of the scientific python ecosystem.
+In this situation xarray would be essentially a thin wrapper acting as pure-python framework, providing a common interface and
+separation of concerns via various domain-agnostic abstractions.
+
+Most of the remaining pages in the documentation of xarray's internals describe these various types of interoperability in more detail.
diff --git a/doc/internals/variable-objects.rst b/doc/internals/variable-objects.rst
deleted file mode 100644
index 6ae3c2f7e6d..00000000000
--- a/doc/internals/variable-objects.rst
+++ /dev/null
@@ -1,31 +0,0 @@
-Variable objects
-================
-
-The core internal data structure in xarray is the :py:class:`~xarray.Variable`,
-which is used as the basic building block behind xarray's
-:py:class:`~xarray.Dataset` and :py:class:`~xarray.DataArray` types. A
-``Variable`` consists of:
-
-- ``dims``: A tuple of dimension names.
-- ``data``: The N-dimensional array (typically, a NumPy or Dask array) storing
- the Variable's data. It must have the same number of dimensions as the length
- of ``dims``.
-- ``attrs``: An ordered dictionary of metadata associated with this array. By
- convention, xarray's built-in operations never use this metadata.
-- ``encoding``: Another ordered dictionary used to store information about how
- these variable's data is represented on disk. See :ref:`io.encoding` for more
- details.
-
-``Variable`` has an interface similar to NumPy arrays, but extended to make use
-of named dimensions. For example, it uses ``dim`` in preference to an ``axis``
-argument for methods like ``mean``, and supports :ref:`compute.broadcasting`.
-
-However, unlike ``Dataset`` and ``DataArray``, the basic ``Variable`` does not
-include coordinate labels along each axis.
-
-``Variable`` is public API, but because of its incomplete support for labeled
-data, it is mostly intended for advanced uses, such as in xarray itself or for
-writing new backends. You can access the variable objects that correspond to
-xarray objects via the (readonly) :py:attr:`Dataset.variables
-` and
-:py:attr:`DataArray.variable ` attributes.
diff --git a/doc/roadmap.rst b/doc/roadmap.rst
index eeaaf10813b..820ff82151c 100644
--- a/doc/roadmap.rst
+++ b/doc/roadmap.rst
@@ -156,7 +156,7 @@ types would also be highly useful for xarray users.
By pursuing these improvements in NumPy we hope to extend the benefits
to the full scientific Python community, and avoid tight coupling
between xarray and specific third-party libraries (e.g., for
-implementing untis). This will allow xarray to maintain its domain
+implementing units). This will allow xarray to maintain its domain
agnostic strengths.
We expect that we may eventually add some minimal interfaces in xarray
diff --git a/doc/user-guide/computation.rst b/doc/user-guide/computation.rst
index f913ea41a91..f8141f40321 100644
--- a/doc/user-guide/computation.rst
+++ b/doc/user-guide/computation.rst
@@ -63,33 +63,121 @@ Data arrays also implement many :py:class:`numpy.ndarray` methods:
arr.round(2)
arr.T
+ intarr = xr.DataArray([0, 1, 2, 3, 4, 5])
+ intarr << 2 # only supported for int types
+ intarr >> 1
+
.. _missing_values:
Missing values
==============
+Xarray represents missing values using the "NaN" (Not a Number) value from NumPy, which is a
+special floating-point value that indicates a value that is undefined or unrepresentable.
+There are several methods for handling missing values in xarray:
+
Xarray objects borrow the :py:meth:`~xarray.DataArray.isnull`,
:py:meth:`~xarray.DataArray.notnull`, :py:meth:`~xarray.DataArray.count`,
:py:meth:`~xarray.DataArray.dropna`, :py:meth:`~xarray.DataArray.fillna`,
:py:meth:`~xarray.DataArray.ffill`, and :py:meth:`~xarray.DataArray.bfill`
methods for working with missing data from pandas:
+:py:meth:`~xarray.DataArray.isnull` is a method in xarray that can be used to check for missing or null values in an xarray object.
+It returns a new xarray object with the same dimensions as the original object, but with boolean values
+indicating where **missing values** are present.
+
.. ipython:: python
x = xr.DataArray([0, 1, np.nan, np.nan, 2], dims=["x"])
x.isnull()
+
+In this example, the third and fourth elements of 'x' are NaN, so the resulting :py:class:`~xarray.DataArray`
+object has 'True' values in the third and fourth positions and 'False' values in the other positions.
+
+:py:meth:`~xarray.DataArray.notnull` is a method in xarray that can be used to check for non-missing or non-null values in an xarray
+object. It returns a new xarray object with the same dimensions as the original object, but with boolean
+values indicating where **non-missing values** are present.
+
+.. ipython:: python
+
+ x = xr.DataArray([0, 1, np.nan, np.nan, 2], dims=["x"])
x.notnull()
+
+In this example, the first two and the last elements of x are not NaN, so the resulting
+:py:class:`~xarray.DataArray` object has 'True' values in these positions, and 'False' values in the
+third and fourth positions where NaN is located.
+
+:py:meth:`~xarray.DataArray.count` is a method in xarray that can be used to count the number of
+non-missing values along one or more dimensions of an xarray object. It returns a new xarray object with
+the same dimensions as the original object, but with each element replaced by the count of non-missing
+values along the specified dimensions.
+
+.. ipython:: python
+
+ x = xr.DataArray([0, 1, np.nan, np.nan, 2], dims=["x"])
x.count()
+
+In this example, 'x' has five elements, but two of them are NaN, so the resulting
+:py:class:`~xarray.DataArray` object having a single element containing the value '3', which represents
+the number of non-null elements in x.
+
+:py:meth:`~xarray.DataArray.dropna` is a method in xarray that can be used to remove missing or null values from an xarray object.
+It returns a new xarray object with the same dimensions as the original object, but with missing values
+removed.
+
+.. ipython:: python
+
+ x = xr.DataArray([0, 1, np.nan, np.nan, 2], dims=["x"])
x.dropna(dim="x")
+
+In this example, on calling x.dropna(dim="x") removes any missing values and returns a new
+:py:class:`~xarray.DataArray` object with only the non-null elements [0, 1, 2] of 'x', in the
+original order.
+
+:py:meth:`~xarray.DataArray.fillna` is a method in xarray that can be used to fill missing or null values in an xarray object with a
+specified value or method. It returns a new xarray object with the same dimensions as the original object, but with missing values filled.
+
+.. ipython:: python
+
+ x = xr.DataArray([0, 1, np.nan, np.nan, 2], dims=["x"])
x.fillna(-1)
+
+In this example, there are two NaN values in 'x', so calling x.fillna(-1) replaces these values with -1 and
+returns a new :py:class:`~xarray.DataArray` object with five elements, containing the values
+[0, 1, -1, -1, 2] in the original order.
+
+:py:meth:`~xarray.DataArray.ffill` is a method in xarray that can be used to forward fill (or fill forward) missing values in an
+xarray object along one or more dimensions. It returns a new xarray object with the same dimensions as the
+original object, but with missing values replaced by the last non-missing value along the specified dimensions.
+
+.. ipython:: python
+
+ x = xr.DataArray([0, 1, np.nan, np.nan, 2], dims=["x"])
x.ffill("x")
+
+In this example, there are two NaN values in 'x', so calling x.ffill("x") fills these values with the last
+non-null value in the same dimension, which are 0 and 1, respectively. The resulting :py:class:`~xarray.DataArray` object has
+five elements, containing the values [0, 1, 1, 1, 2] in the original order.
+
+:py:meth:`~xarray.DataArray.bfill` is a method in xarray that can be used to backward fill (or fill backward) missing values in an
+xarray object along one or more dimensions. It returns a new xarray object with the same dimensions as the original object, but
+with missing values replaced by the next non-missing value along the specified dimensions.
+
+.. ipython:: python
+
+ x = xr.DataArray([0, 1, np.nan, np.nan, 2], dims=["x"])
x.bfill("x")
+In this example, there are two NaN values in 'x', so calling x.bfill("x") fills these values with the next
+non-null value in the same dimension, which are 2 and 2, respectively. The resulting :py:class:`~xarray.DataArray` object has
+five elements, containing the values [0, 1, 2, 2, 2] in the original order.
+
Like pandas, xarray uses the float value ``np.nan`` (not-a-number) to represent
missing values.
Xarray objects also have an :py:meth:`~xarray.DataArray.interpolate_na` method
-for filling missing values via 1D interpolation.
+for filling missing values via 1D interpolation. It returns a new xarray object with the same dimensions
+as the original object, but with missing values interpolated.
.. ipython:: python
@@ -100,6 +188,13 @@ for filling missing values via 1D interpolation.
)
x.interpolate_na(dim="x", method="linear", use_coordinate="xx")
+In this example, there are two NaN values in 'x', so calling x.interpolate_na(dim="x", method="linear",
+use_coordinate="xx") fills these values with interpolated values along the "x" dimension using linear
+interpolation based on the values of the xx coordinate. The resulting :py:class:`~xarray.DataArray` object has five elements,
+containing the values [0., 1., 1.05, 1.45, 2.] in the original order. Note that the interpolated values
+are calculated based on the values of the 'xx' coordinate, which has non-integer values, resulting in
+non-integer interpolated values.
+
Note that xarray slightly diverges from the pandas ``interpolate`` syntax by
providing the ``use_coordinate`` keyword which facilitates a clear specification
of which values to use as the index in the interpolation.
diff --git a/doc/user-guide/data-structures.rst b/doc/user-guide/data-structures.rst
index e0fd4bd0d25..64e7b3625ac 100644
--- a/doc/user-guide/data-structures.rst
+++ b/doc/user-guide/data-structures.rst
@@ -19,7 +19,8 @@ DataArray
:py:class:`xarray.DataArray` is xarray's implementation of a labeled,
multi-dimensional array. It has several key properties:
-- ``values``: a :py:class:`numpy.ndarray` holding the array's values
+- ``values``: a :py:class:`numpy.ndarray` or
+ :ref:`numpy-like array ` holding the array's values
- ``dims``: dimension names for each axis (e.g., ``('x', 'y', 'z')``)
- ``coords``: a dict-like container of arrays (*coordinates*) that label each
point (e.g., 1-dimensional arrays of numbers, datetime objects or
@@ -46,7 +47,8 @@ Creating a DataArray
The :py:class:`~xarray.DataArray` constructor takes:
- ``data``: a multi-dimensional array of values (e.g., a numpy ndarray,
- :py:class:`~pandas.Series`, :py:class:`~pandas.DataFrame` or ``pandas.Panel``)
+ a :ref:`numpy-like array `, :py:class:`~pandas.Series`,
+ :py:class:`~pandas.DataFrame` or ``pandas.Panel``)
- ``coords``: a list or dictionary of coordinates. If a list, it should be a
list of tuples where the first element is the dimension name and the second
element is the corresponding coordinate array_like object.
diff --git a/doc/user-guide/duckarrays.rst b/doc/user-guide/duckarrays.rst
index 78c7d1e572a..f0650ac61b5 100644
--- a/doc/user-guide/duckarrays.rst
+++ b/doc/user-guide/duckarrays.rst
@@ -1,30 +1,183 @@
.. currentmodule:: xarray
+.. _userguide.duckarrays:
+
Working with numpy-like arrays
==============================
+NumPy-like arrays (often known as :term:`duck array`\s) are drop-in replacements for the :py:class:`numpy.ndarray`
+class but with different features, such as propagating physical units or a different layout in memory.
+Xarray can often wrap these array types, allowing you to use labelled dimensions and indexes whilst benefiting from the
+additional features of these array libraries.
+
+Some numpy-like array types that xarray already has some support for:
+
+* `Cupy `_ - GPU support (see `cupy-xarray `_),
+* `Sparse `_ - for performant arrays with many zero elements,
+* `Pint `_ - for tracking the physical units of your data (see `pint-xarray `_),
+* `Dask `_ - parallel computing on larger-than-memory arrays (see :ref:`using dask with xarray `),
+* `Cubed `_ - another parallel computing framework that emphasises reliability (see `cubed-xarray `_).
+
.. warning::
- This feature should be considered experimental. Please report any bug you may find on
- xarray’s github repository.
+ This feature should be considered somewhat experimental. Please report any bugs you find on
+ `xarray’s issue tracker `_.
+
+.. note::
+
+ For information on wrapping dask arrays see :ref:`dask`. Whilst xarray wraps dask arrays in a similar way to that
+ described on this page, chunked array types like :py:class:`dask.array.Array` implement additional methods that require
+ slightly different user code (e.g. calling ``.chunk`` or ``.compute``). See the docs on :ref:`wrapping chunked arrays `.
+
+Why "duck"?
+-----------
+
+Why is it also called a "duck" array? This comes from a common statement of object-oriented programming -
+"If it walks like a duck, and quacks like a duck, treat it like a duck". In other words, a library like xarray that
+is capable of using multiple different types of arrays does not have to explicitly check that each one it encounters is
+permitted (e.g. ``if dask``, ``if numpy``, ``if sparse`` etc.). Instead xarray can take the more permissive approach of simply
+treating the wrapped array as valid, attempting to call the relevant methods (e.g. ``.mean()``) and only raising an
+error if a problem occurs (e.g. the method is not found on the wrapped class). This is much more flexible, and allows
+objects and classes from different libraries to work together more easily.
+
+What is a numpy-like array?
+---------------------------
+
+A "numpy-like array" (also known as a "duck array") is a class that contains array-like data, and implements key
+numpy-like functionality such as indexing, broadcasting, and computation methods.
+
+For example, the `sparse `_ library provides a sparse array type which is useful for representing nD array objects like sparse matrices
+in a memory-efficient manner. We can create a sparse array object (of the :py:class:`sparse.COO` type) from a numpy array like this:
+
+.. ipython:: python
+
+ from sparse import COO
+
+ x = np.eye(4, dtype=np.uint8) # create diagonal identity matrix
+ s = COO.from_numpy(x)
+ s
-NumPy-like arrays (:term:`duck array`) extend the :py:class:`numpy.ndarray` with
-additional features, like propagating physical units or a different layout in memory.
+This sparse object does not attempt to explicitly store every element in the array, only the non-zero elements.
+This approach is much more efficient for large arrays with only a few non-zero elements (such as tri-diagonal matrices).
+Sparse array objects can be converted back to a "dense" numpy array by calling :py:meth:`sparse.COO.todense`.
-:py:class:`DataArray` and :py:class:`Dataset` objects can wrap these duck arrays, as
-long as they satisfy certain conditions (see :ref:`internals.duck_arrays`).
+Just like :py:class:`numpy.ndarray` objects, :py:class:`sparse.COO` arrays support indexing
+
+.. ipython:: python
+
+ s[1, 1] # diagonal elements should be ones
+ s[2, 3] # off-diagonal elements should be zero
+
+broadcasting,
+
+.. ipython:: python
+
+ x2 = np.zeros(
+ (4, 1), dtype=np.uint8
+ ) # create second sparse array of different shape
+ s2 = COO.from_numpy(x2)
+ (s * s2) # multiplication requires broadcasting
+
+and various computation methods
+
+.. ipython:: python
+
+ s.sum(axis=1)
+
+This numpy-like array also supports calling so-called `numpy ufuncs `_
+("universal functions") on it directly:
+
+.. ipython:: python
+
+ np.sum(s, axis=1)
+
+
+Notice that in each case the API for calling the operation on the sparse array is identical to that of calling it on the
+equivalent numpy array - this is the sense in which the sparse array is "numpy-like".
.. note::
- For ``dask`` support see :ref:`dask`.
+ For discussion on exactly which methods a class needs to implement to be considered "numpy-like", see :ref:`internals.duckarrays`.
+
+Wrapping numpy-like arrays in xarray
+------------------------------------
+
+:py:class:`DataArray`, :py:class:`Dataset`, and :py:class:`Variable` objects can wrap these numpy-like arrays.
+Constructing xarray objects which wrap numpy-like arrays
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Missing features
-----------------
-Most of the API does support :term:`duck array` objects, but there are a few areas where
-the code will still cast to ``numpy`` arrays:
+The primary way to create an xarray object which wraps a numpy-like array is to pass that numpy-like array instance directly
+to the constructor of the xarray class. The :ref:`page on xarray data structures ` shows how :py:class:`DataArray` and :py:class:`Dataset`
+both accept data in various forms through their ``data`` argument, but in fact this data can also be any wrappable numpy-like array.
-- dimension coordinates, and thus all indexing operations:
+For example, we can wrap the sparse array we created earlier inside a new DataArray object:
+
+.. ipython:: python
+
+ s_da = xr.DataArray(s, dims=["i", "j"])
+ s_da
+
+We can see what's inside - the printable representation of our xarray object (the repr) automatically uses the printable
+representation of the underlying wrapped array.
+
+Of course our sparse array object is still there underneath - it's stored under the ``.data`` attribute of the dataarray:
+
+.. ipython:: python
+
+ s_da.data
+
+Array methods
+~~~~~~~~~~~~~
+
+We saw above that numpy-like arrays provide numpy methods. Xarray automatically uses these when you call the corresponding xarray method:
+
+.. ipython:: python
+
+ s_da.sum(dim="j")
+
+Converting wrapped types
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+If you want to change the type inside your xarray object you can use :py:meth:`DataArray.as_numpy`:
+
+.. ipython:: python
+
+ s_da.as_numpy()
+
+This returns a new :py:class:`DataArray` object, but now wrapping a normal numpy array.
+
+If instead you want to convert to numpy and return that numpy array you can use either :py:meth:`DataArray.to_numpy` or
+:py:meth:`DataArray.values`, where the former is strongly preferred. The difference is in the way they coerce to numpy - :py:meth:`~DataArray.values`
+always uses :py:func:`numpy.asarray` which will fail for some array types (e.g. ``cupy``), whereas :py:meth:`~DataArray.to_numpy`
+uses the correct method depending on the array type.
+
+.. ipython:: python
+
+ s_da.to_numpy()
+
+.. ipython:: python
+ :okexcept:
+
+ s_da.values
+
+This illustrates the difference between :py:meth:`~DataArray.data` and :py:meth:`~DataArray.values`,
+which is sometimes a point of confusion for new xarray users.
+Explicitly: :py:meth:`DataArray.data` returns the underlying numpy-like array, regardless of type, whereas
+:py:meth:`DataArray.values` converts the underlying array to a numpy array before returning it.
+(This is another reason to use :py:meth:`~DataArray.to_numpy` over :py:meth:`~DataArray.values` - the intention is clearer.)
+
+Conversion to numpy as a fallback
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If a wrapped array does not implement the corresponding array method then xarray will often attempt to convert the
+underlying array to a numpy array so that the operation can be performed. You may want to watch out for this behavior,
+and report any instances in which it causes problems.
+
+Most of xarray's API does support using :term:`duck array` objects, but there are a few areas where
+the code will still convert to ``numpy`` arrays:
+
+- Dimension coordinates, and thus all indexing operations:
* :py:meth:`Dataset.sel` and :py:meth:`DataArray.sel`
* :py:meth:`Dataset.loc` and :py:meth:`DataArray.loc`
@@ -33,7 +186,7 @@ the code will still cast to ``numpy`` arrays:
:py:meth:`DataArray.reindex` and :py:meth:`DataArray.reindex_like`: duck arrays in
data variables and non-dimension coordinates won't be casted
-- functions and methods that depend on external libraries or features of ``numpy`` not
+- Functions and methods that depend on external libraries or features of ``numpy`` not
covered by ``__array_function__`` / ``__array_ufunc__``:
* :py:meth:`Dataset.ffill` and :py:meth:`DataArray.ffill` (uses ``bottleneck``)
@@ -49,17 +202,25 @@ the code will still cast to ``numpy`` arrays:
:py:class:`numpy.vectorize`)
* :py:func:`apply_ufunc` with ``vectorize=True`` (uses :py:class:`numpy.vectorize`)
-- incompatibilities between different :term:`duck array` libraries:
+- Incompatibilities between different :term:`duck array` libraries:
* :py:meth:`Dataset.chunk` and :py:meth:`DataArray.chunk`: this fails if the data was
not already chunked and the :term:`duck array` (e.g. a ``pint`` quantity) should
- wrap the new ``dask`` array; changing the chunk sizes works.
-
+ wrap the new ``dask`` array; changing the chunk sizes works however.
Extensions using duck arrays
----------------------------
-Here's a list of libraries extending ``xarray`` to make working with wrapped duck arrays
-easier:
+
+Whilst the features above allow many numpy-like array libraries to be used pretty seamlessly with xarray, it often also
+makes sense to use an interfacing package to make certain tasks easier.
+
+For example the `pint-xarray package `_ offers a custom ``.pint`` accessor (see :ref:`internals.accessors`) which provides
+convenient access to information stored within the wrapped array (e.g. ``.units`` and ``.magnitude``), and makes makes
+creating wrapped pint arrays (and especially xarray-wrapping-pint-wrapping-dask arrays) simpler for the user.
+
+We maintain a list of libraries extending ``xarray`` to make working with particular wrapped duck arrays
+easier. If you know of more that aren't on this list please raise an issue to add them!
- `pint-xarray `_
- `cupy-xarray `_
+- `cubed-xarray `_
diff --git a/doc/user-guide/groupby.rst b/doc/user-guide/groupby.rst
index dce20dce228..1ad2d52fc00 100644
--- a/doc/user-guide/groupby.rst
+++ b/doc/user-guide/groupby.rst
@@ -177,28 +177,18 @@ This last line is roughly equivalent to the following::
results.append(group - alt.sel(letters=label))
xr.concat(results, dim='x')
-Squeezing
-~~~~~~~~~
+Iterating and Squeezing
+~~~~~~~~~~~~~~~~~~~~~~~
-When grouping over a dimension, you can control whether the dimension is
-squeezed out or if it should remain with length one on each group by using
-the ``squeeze`` parameter:
-
-.. ipython:: python
-
- next(iter(arr.groupby("x")))
+Previously, Xarray defaulted to squeezing out dimensions of size one when iterating over
+a GroupBy object. This behaviour is being removed.
+You can always squeeze explicitly later with the Dataset or DataArray
+:py:meth:`~xarray.DataArray.squeeze` methods.
.. ipython:: python
next(iter(arr.groupby("x", squeeze=False)))
-Although xarray will attempt to automatically
-:py:attr:`~xarray.DataArray.transpose` dimensions back into their original order
-when you use apply, it is sometimes useful to set ``squeeze=False`` to
-guarantee that all original dimensions remain unchanged.
-
-You can always squeeze explicitly later with the Dataset or DataArray
-:py:meth:`~xarray.DataArray.squeeze` methods.
.. _groupby.multidim:
diff --git a/doc/user-guide/index.rst b/doc/user-guide/index.rst
index 0ac25d68930..45f0ce352de 100644
--- a/doc/user-guide/index.rst
+++ b/doc/user-guide/index.rst
@@ -25,4 +25,5 @@ examples that describe many common tasks that you can accomplish with xarray.
dask
plotting
options
+ testing
duckarrays
diff --git a/doc/user-guide/indexing.rst b/doc/user-guide/indexing.rst
index 492316f898f..fba9dd585ab 100644
--- a/doc/user-guide/indexing.rst
+++ b/doc/user-guide/indexing.rst
@@ -352,7 +352,6 @@ dimensions:
ind_x = xr.DataArray([0, 1], dims=["x"])
ind_y = xr.DataArray([0, 1], dims=["y"])
da[ind_x, ind_y] # orthogonal indexing
- da[ind_x, ind_x] # vectorized indexing
Slices or sequences/arrays without named-dimensions are treated as if they have
the same dimension which is indexed along:
@@ -399,6 +398,12 @@ These methods may also be applied to ``Dataset`` objects
Vectorized indexing may be used to extract information from the nearest
grid cells of interest, for example, the nearest climate model grid cells
to a collection specified weather station latitudes and longitudes.
+To trigger vectorized indexing behavior
+you will need to provide the selection dimensions with a new
+shared output dimension name. In the example below, the selections
+of the closest latitude and longitude are renamed to an output
+dimension named "points":
+
.. ipython:: python
@@ -544,6 +549,7 @@ __ https://numpy.org/doc/stable/user/basics.indexing.html#assigning-values-to-in
You can also assign values to all variables of a :py:class:`Dataset` at once:
.. ipython:: python
+ :okwarning:
ds_org = xr.tutorial.open_dataset("eraint_uvz").isel(
latitude=slice(56, 59), longitude=slice(255, 258), level=0
diff --git a/doc/user-guide/interpolation.rst b/doc/user-guide/interpolation.rst
index 7b40962e826..311e1bf0129 100644
--- a/doc/user-guide/interpolation.rst
+++ b/doc/user-guide/interpolation.rst
@@ -292,8 +292,8 @@ Let's see how :py:meth:`~xarray.DataArray.interp` works on real data.
axes[0].set_title("Raw data")
# Interpolated data
- new_lon = np.linspace(ds.lon[0], ds.lon[-1], ds.dims["lon"] * 4)
- new_lat = np.linspace(ds.lat[0], ds.lat[-1], ds.dims["lat"] * 4)
+ new_lon = np.linspace(ds.lon[0], ds.lon[-1], ds.sizes["lon"] * 4)
+ new_lat = np.linspace(ds.lat[0], ds.lat[-1], ds.sizes["lat"] * 4)
dsi = ds.interp(lat=new_lat, lon=new_lon)
dsi.air.plot(ax=axes[1])
@savefig interpolation_sample3.png width=8in
diff --git a/doc/user-guide/io.rst b/doc/user-guide/io.rst
index d5de181f562..48751c5f299 100644
--- a/doc/user-guide/io.rst
+++ b/doc/user-guide/io.rst
@@ -44,9 +44,9 @@ __ https://www.unidata.ucar.edu/software/netcdf/
.. _netCDF FAQ: https://www.unidata.ucar.edu/software/netcdf/docs/faq.html#What-Is-netCDF
-Reading and writing netCDF files with xarray requires scipy or the
-`netCDF4-Python`__ library to be installed (the latter is required to
-read/write netCDF V4 files and use the compression options described below).
+Reading and writing netCDF files with xarray requires scipy, h5netcdf, or the
+`netCDF4-Python`__ library to be installed. SciPy only supports reading and writing
+of netCDF V3 files.
__ https://github.com/Unidata/netcdf4-python
@@ -115,10 +115,7 @@ you try to perform some sort of actual computation. For an example of how these
lazy arrays work, see the OPeNDAP section below.
There may be minor differences in the :py:class:`Dataset` object returned
-when reading a NetCDF file with different engines. For example,
-single-valued attributes are returned as scalars by the default
-``engine=netcdf4``, but as arrays of size ``(1,)`` when reading with
-``engine=h5netcdf``.
+when reading a NetCDF file with different engines.
It is important to note that when you modify values of a Dataset, even one
linked to files on disk, only the in-memory copy you are manipulating in xarray
@@ -254,31 +251,22 @@ You can view this encoding information (among others) in the
:py:attr:`DataArray.encoding` and
:py:attr:`DataArray.encoding` attributes:
-.. ipython::
- :verbatim:
+.. ipython:: python
- In [1]: ds_disk["y"].encoding
- Out[1]:
- {'zlib': False,
- 'shuffle': False,
- 'complevel': 0,
- 'fletcher32': False,
- 'contiguous': True,
- 'chunksizes': None,
- 'source': 'saved_on_disk.nc',
- 'original_shape': (5,),
- 'dtype': dtype('int64'),
- 'units': 'days since 2000-01-01 00:00:00',
- 'calendar': 'proleptic_gregorian'}
-
- In [9]: ds_disk.encoding
- Out[9]:
- {'unlimited_dims': set(),
- 'source': 'saved_on_disk.nc'}
+ ds_disk["y"].encoding
+ ds_disk.encoding
Note that all operations that manipulate variables other than indexing
will remove encoding information.
+In some cases it is useful to intentionally reset a dataset's original encoding values.
+This can be done with either the :py:meth:`Dataset.drop_encoding` or
+:py:meth:`DataArray.drop_encoding` methods.
+
+.. ipython:: python
+
+ ds_no_encoding = ds_disk.drop_encoding()
+ ds_no_encoding.encoding
.. _combining multiple files:
@@ -568,6 +556,67 @@ and currently raises a warning unless ``invalid_netcdf=True`` is set:
Note that this produces a file that is likely to be not readable by other netCDF
libraries!
+.. _io.hdf5:
+
+HDF5
+----
+`HDF5`_ is both a file format and a data model for storing information. HDF5 stores
+data hierarchically, using groups to create a nested structure. HDF5 is a more
+general version of the netCDF4 data model, so the nested structure is one of many
+similarities between the two data formats.
+
+Reading HDF5 files in xarray requires the ``h5netcdf`` engine, which can be installed
+with ``conda install h5netcdf``. Once installed we can use xarray to open HDF5 files:
+
+.. code:: python
+
+ xr.open_dataset("/path/to/my/file.h5")
+
+The similarities between HDF5 and netCDF4 mean that HDF5 data can be written with the
+same :py:meth:`Dataset.to_netcdf` method as used for netCDF4 data:
+
+.. ipython:: python
+
+ ds = xr.Dataset(
+ {"foo": (("x", "y"), np.random.rand(4, 5))},
+ coords={
+ "x": [10, 20, 30, 40],
+ "y": pd.date_range("2000-01-01", periods=5),
+ "z": ("x", list("abcd")),
+ },
+ )
+
+ ds.to_netcdf("saved_on_disk.h5")
+
+Groups
+~~~~~~
+
+If you have multiple or highly nested groups, xarray by default may not read the group
+that you want. A particular group of an HDF5 file can be specified using the ``group``
+argument:
+
+.. code:: python
+
+ xr.open_dataset("/path/to/my/file.h5", group="/my/group")
+
+While xarray cannot interrogate an HDF5 file to determine which groups are available,
+the HDF5 Python reader `h5py`_ can be used instead.
+
+Natively the xarray data structures can only handle one level of nesting, organized as
+DataArrays inside of Datasets. If your HDF5 file has additional levels of hierarchy you
+can only access one group and a time and will need to specify group names.
+
+.. note::
+
+ For native handling of multiple HDF5 groups with xarray, including I/O, you might be
+ interested in the experimental
+ `xarray-datatree `_ package.
+
+
+.. _HDF5: https://hdfgroup.github.io/hdf5/index.html
+.. _h5py: https://www.h5py.org/
+
+
.. _io.zarr:
Zarr
@@ -617,10 +666,17 @@ store is already present at that path, an error will be raised, preventing it
from being overwritten. To override this behavior and overwrite an existing
store, add ``mode='w'`` when invoking :py:meth:`~Dataset.to_zarr`.
+DataArrays can also be saved to disk using the :py:meth:`DataArray.to_zarr` method,
+and loaded from disk using the :py:func:`open_dataarray` function with `engine='zarr'`.
+Similar to :py:meth:`DataArray.to_netcdf`, :py:meth:`DataArray.to_zarr` will
+convert the ``DataArray`` to a ``Dataset`` before saving, and then convert back
+when loading, ensuring that the ``DataArray`` that is loaded is always exactly
+the same as the one that was saved.
+
.. note::
- xarray does not write NCZarr attributes. Therefore, NCZarr data must be
- opened in read-only mode.
+ xarray does not write `NCZarr `_ attributes.
+ Therefore, NCZarr data must be opened in read-only mode.
To store variable length strings, convert them to object arrays first with
``dtype=object``.
@@ -640,10 +696,10 @@ It is possible to read and write xarray datasets directly from / to cloud
storage buckets using zarr. This example uses the `gcsfs`_ package to provide
an interface to `Google Cloud Storage`_.
-From v0.16.2: general `fsspec`_ URLs are parsed and the store set up for you
-automatically when reading, such that you can open a dataset in a single
-call. You should include any arguments to the storage backend as the
-key ``storage_options``, part of ``backend_kwargs``.
+General `fsspec`_ URLs, those that begin with ``s3://`` or ``gcs://`` for example,
+are parsed and the store set up for you automatically when reading.
+You should include any arguments to the storage backend as the
+key ```storage_options``, part of ``backend_kwargs``.
.. code:: python
@@ -659,7 +715,7 @@ key ``storage_options``, part of ``backend_kwargs``.
This also works with ``open_mfdataset``, allowing you to pass a list of paths or
a URL to be interpreted as a glob string.
-For older versions, and for writing, you must explicitly set up a ``MutableMapping``
+For writing, you must explicitly set up a ``MutableMapping``
instance and pass this, as follows:
.. code:: python
@@ -713,10 +769,10 @@ Consolidated Metadata
~~~~~~~~~~~~~~~~~~~~~
Xarray needs to read all of the zarr metadata when it opens a dataset.
-In some storage mediums, such as with cloud object storage (e.g. amazon S3),
+In some storage mediums, such as with cloud object storage (e.g. `Amazon S3`_),
this can introduce significant overhead, because two separate HTTP calls to the
object store must be made for each variable in the dataset.
-As of xarray version 0.18, xarray by default uses a feature called
+By default Xarray uses a feature called
*consolidated metadata*, storing all metadata for the entire dataset with a
single key (by default called ``.zmetadata``). This typically drastically speeds
up opening the store. (For more information on this feature, consult the
@@ -740,16 +796,20 @@ reads. Because this fall-back option is so much slower, xarray issues a
.. _io.zarr.appending:
-Appending to existing Zarr stores
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Modifying existing Zarr stores
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Xarray supports several ways of incrementally writing variables to a Zarr
store. These options are useful for scenarios when it is infeasible or
undesirable to write your entire dataset at once.
+1. Use ``mode='a'`` to add or overwrite entire variables,
+2. Use ``append_dim`` to resize and append to existing variables, and
+3. Use ``region`` to write to limited regions of existing arrays.
+
.. tip::
- If you can load all of your data into a single ``Dataset`` using dask, a
+ For ``Dataset`` objects containing dask arrays, a
single call to ``to_zarr()`` will write all of your data in parallel.
.. warning::
@@ -763,7 +823,7 @@ with ``mode='a'`` on a Dataset containing the new variables, passing in an
existing Zarr store or path to a Zarr store.
To resize and then append values along an existing dimension in a store, set
-``append_dim``. This is a good option if data always arives in a particular
+``append_dim``. This is a good option if data always arrives in a particular
order, e.g., for time-stepping a simulation:
.. ipython:: python
@@ -820,17 +880,20 @@ and then calling ``to_zarr`` with ``compute=False`` to write only metadata
ds.to_zarr(path, compute=False)
Now, a Zarr store with the correct variable shapes and attributes exists that
-can be filled out by subsequent calls to ``to_zarr``. The ``region`` provides a
-mapping from dimension names to Python ``slice`` objects indicating where the
-data should be written (in index space, not coordinate space), e.g.,
+can be filled out by subsequent calls to ``to_zarr``.
+Setting ``region="auto"`` will open the existing store and determine the
+correct alignment of the new data with the existing coordinates, or as an
+explicit mapping from dimension names to Python ``slice`` objects indicating
+where the data should be written (in index space, not label space), e.g.,
.. ipython:: python
# For convenience, we'll slice a single dataset, but in the real use-case
# we would create them separately possibly even from separate processes.
ds = xr.Dataset({"foo": ("x", np.arange(30))})
- ds.isel(x=slice(0, 10)).to_zarr(path, region={"x": slice(0, 10)})
- ds.isel(x=slice(10, 20)).to_zarr(path, region={"x": slice(10, 20)})
+ # Any of the following region specifications are valid
+ ds.isel(x=slice(0, 10)).to_zarr(path, region="auto")
+ ds.isel(x=slice(10, 20)).to_zarr(path, region={"x": "auto"})
ds.isel(x=slice(20, 30)).to_zarr(path, region={"x": slice(20, 30)})
Concurrent writes with ``region`` are safe as long as they modify distinct
@@ -1157,46 +1220,7 @@ search indices or other automated data discovery tools.
Rasterio
--------
-GeoTIFFs and other gridded raster datasets can be opened using `rasterio`_, if
-rasterio is installed. Here is an example of how to use
-:py:func:`open_rasterio` to read one of rasterio's `test files`_:
-
-.. deprecated:: 0.20.0
-
- Deprecated in favor of rioxarray.
- For information about transitioning, see:
- `rioxarray getting started docs``
-
-.. ipython::
- :verbatim:
-
- In [7]: rio = xr.open_rasterio("RGB.byte.tif")
-
- In [8]: rio
- Out[8]:
-
- [1703814 values with dtype=uint8]
- Coordinates:
- * band (band) int64 1 2 3
- * y (y) float64 2.827e+06 2.826e+06 2.826e+06 2.826e+06 2.826e+06 ...
- * x (x) float64 1.021e+05 1.024e+05 1.027e+05 1.03e+05 1.033e+05 ...
- Attributes:
- res: (300.0379266750948, 300.041782729805)
- transform: (300.0379266750948, 0.0, 101985.0, 0.0, -300.041782729805, 28...
- is_tiled: 0
- crs: +init=epsg:32618
-
-
-The ``x`` and ``y`` coordinates are generated out of the file's metadata
-(``bounds``, ``width``, ``height``), and they can be understood as cartesian
-coordinates defined in the file's projection provided by the ``crs`` attribute.
-``crs`` is a PROJ4 string which can be parsed by e.g. `pyproj`_ or rasterio.
-See :ref:`/examples/visualization_gallery.ipynb#Parsing-rasterio-geocoordinates`
-for an example of how to convert these to longitudes and latitudes.
-
-
-Additionally, you can use `rioxarray`_ for reading in GeoTiff, netCDF or other
-GDAL readable raster data using `rasterio`_ as well as for exporting to a geoTIFF.
+GDAL readable raster data using `rasterio`_ such as GeoTIFFs can be opened using the `rioxarray`_ extension.
`rioxarray`_ can also handle geospatial related tasks such as re-projecting and clipping.
.. ipython::
@@ -1291,27 +1315,6 @@ We recommend installing PyNIO via conda::
.. _PyNIO backend is deprecated: https://github.com/pydata/xarray/issues/4491
.. _PyNIO is no longer maintained: https://github.com/NCAR/pynio/issues/53
-.. _io.PseudoNetCDF:
-
-Formats supported by PseudoNetCDF
----------------------------------
-
-Xarray can also read CAMx, BPCH, ARL PACKED BIT, and many other file
-formats supported by PseudoNetCDF_, if PseudoNetCDF is installed.
-PseudoNetCDF can also provide Climate Forecasting Conventions to
-CMAQ files. In addition, PseudoNetCDF can automatically register custom
-readers that subclass PseudoNetCDF.PseudoNetCDFFile. PseudoNetCDF can
-identify readers either heuristically, or by a format specified via a key in
-`backend_kwargs`.
-
-To use PseudoNetCDF to read such files, supply
-``engine='pseudonetcdf'`` to :py:func:`open_dataset`.
-
-Add ``backend_kwargs={'format': ''}`` where ``
-options are listed on the PseudoNetCDF page.
-
-.. _PseudoNetCDF: https://github.com/barronh/PseudoNetCDF
-
CSV and other formats supported by pandas
-----------------------------------------
diff --git a/doc/user-guide/reshaping.rst b/doc/user-guide/reshaping.rst
index 95bf21a71b0..14b343549e2 100644
--- a/doc/user-guide/reshaping.rst
+++ b/doc/user-guide/reshaping.rst
@@ -4,7 +4,12 @@
Reshaping and reorganizing data
###############################
-These methods allow you to reorganize your data by changing dimensions, array shape, order of values, or indexes.
+Reshaping and reorganizing data refers to the process of changing the structure or organization of data by modifying dimensions, array shapes, order of values, or indexes. Xarray provides several methods to accomplish these tasks.
+
+These methods are particularly useful for reshaping xarray objects for use in machine learning packages, such as scikit-learn, that usually require two-dimensional numpy arrays as inputs. Reshaping can also be required before passing data to external visualization tools, for example geospatial data might expect input organized into a particular format corresponding to stacks of satellite images.
+
+Importing the library
+---------------------
.. ipython:: python
:suppress:
@@ -54,11 +59,11 @@ use :py:meth:`~xarray.DataArray.squeeze`
Converting between datasets and arrays
--------------------------------------
-To convert from a Dataset to a DataArray, use :py:meth:`~xarray.Dataset.to_array`:
+To convert from a Dataset to a DataArray, use :py:meth:`~xarray.Dataset.to_dataarray`:
.. ipython:: python
- arr = ds.to_array()
+ arr = ds.to_dataarray()
arr
This method broadcasts all data variables in the dataset against each other,
@@ -72,7 +77,7 @@ To convert back from a DataArray to a Dataset, use
arr.to_dataset(dim="variable")
-The broadcasting behavior of ``to_array`` means that the resulting array
+The broadcasting behavior of ``to_dataarray`` means that the resulting array
includes the union of data variable dimensions:
.. ipython:: python
@@ -83,7 +88,7 @@ includes the union of data variable dimensions:
ds2
# the resulting array has 6 elements
- ds2.to_array()
+ ds2.to_dataarray()
Otherwise, the result could not be represented as an orthogonal array.
@@ -156,8 +161,8 @@ arrays as inputs. For datasets with only one variable, we only need ``stack``
and ``unstack``, but combining multiple variables in a
:py:class:`xarray.Dataset` is more complicated. If the variables in the dataset
have matching numbers of dimensions, we can call
-:py:meth:`~xarray.Dataset.to_array` and then stack along the the new coordinate.
-But :py:meth:`~xarray.Dataset.to_array` will broadcast the dataarrays together,
+:py:meth:`~xarray.Dataset.to_dataarray` and then stack along the the new coordinate.
+But :py:meth:`~xarray.Dataset.to_dataarray` will broadcast the dataarrays together,
which will effectively tile the lower dimensional variable along the missing
dimensions. The method :py:meth:`xarray.Dataset.to_stacked_array` allows
combining variables of differing dimensions without this wasteful copying while
@@ -269,7 +274,7 @@ Sort
----
One may sort a DataArray/Dataset via :py:meth:`~xarray.DataArray.sortby` and
-:py:meth:`~xarray.DataArray.sortby`. The input can be an individual or list of
+:py:meth:`~xarray.Dataset.sortby`. The input can be an individual or list of
1D ``DataArray`` objects:
.. ipython:: python
diff --git a/doc/user-guide/terminology.rst b/doc/user-guide/terminology.rst
index 24e6ab69927..55937310827 100644
--- a/doc/user-guide/terminology.rst
+++ b/doc/user-guide/terminology.rst
@@ -47,30 +47,28 @@ complete examples, please consult the relevant documentation.*
all but one of these degrees of freedom is fixed. We can think of each
dimension axis as having a name, for example the "x dimension". In
xarray, a ``DataArray`` object's *dimensions* are its named dimension
- axes, and the name of the ``i``-th dimension is ``arr.dims[i]``. If an
- array is created without dimension names, the default dimension names are
- ``dim_0``, ``dim_1``, and so forth.
+ axes ``da.dims``, and the name of the ``i``-th dimension is ``da.dims[i]``.
+ If an array is created without specifying dimension names, the default dimension
+ names will be ``dim_0``, ``dim_1``, and so forth.
Coordinate
An array that labels a dimension or set of dimensions of another
``DataArray``. In the usual one-dimensional case, the coordinate array's
- values can loosely be thought of as tick labels along a dimension. There
- are two types of coordinate arrays: *dimension coordinates* and
- *non-dimension coordinates* (see below). A coordinate named ``x`` can be
- retrieved from ``arr.coords[x]``. A ``DataArray`` can have more
- coordinates than dimensions because a single dimension can be labeled by
- multiple coordinate arrays. However, only one coordinate array can be a
- assigned as a particular dimension's dimension coordinate array. As a
- consequence, ``len(arr.dims) <= len(arr.coords)`` in general.
+ values can loosely be thought of as tick labels along a dimension. We
+ distinguish :term:`Dimension coordinate` vs. :term:`Non-dimension
+ coordinate` and :term:`Indexed coordinate` vs. :term:`Non-indexed
+ coordinate`. A coordinate named ``x`` can be retrieved from
+ ``arr.coords[x]``. A ``DataArray`` can have more coordinates than
+ dimensions because a single dimension can be labeled by multiple
+ coordinate arrays. However, only one coordinate array can be a assigned
+ as a particular dimension's dimension coordinate array.
Dimension coordinate
A one-dimensional coordinate array assigned to ``arr`` with both a name
- and dimension name in ``arr.dims``. Dimension coordinates are used for
- label-based indexing and alignment, like the index found on a
- :py:class:`pandas.DataFrame` or :py:class:`pandas.Series`. In fact,
- dimension coordinates use :py:class:`pandas.Index` objects under the
- hood for efficient computation. Dimension coordinates are marked by
- ``*`` when printing a ``DataArray`` or ``Dataset``.
+ and dimension name in ``arr.dims``. Usually (but not always), a
+ dimension coordinate is also an :term:`Indexed coordinate` so that it can
+ be used for label-based indexing and alignment, like the index found on
+ a :py:class:`pandas.DataFrame` or :py:class:`pandas.Series`.
Non-dimension coordinate
A coordinate array assigned to ``arr`` with a name in ``arr.coords`` but
@@ -79,20 +77,40 @@ complete examples, please consult the relevant documentation.*
example, multidimensional coordinates are often used in geoscience
datasets when :doc:`the data's physical coordinates (such as latitude
and longitude) differ from their logical coordinates
- <../examples/multidimensional-coords>`. However, non-dimension coordinates
- are not indexed, and any operation on non-dimension coordinates that
- leverages indexing will fail. Printing ``arr.coords`` will print all of
- ``arr``'s coordinate names, with the corresponding dimension(s) in
- parentheses. For example, ``coord_name (dim_name) 1 2 3 ...``.
+ <../examples/multidimensional-coords>`. Printing ``arr.coords`` will
+ print all of ``arr``'s coordinate names, with the corresponding
+ dimension(s) in parentheses. For example, ``coord_name (dim_name) 1 2 3
+ ...``.
+
+ Indexed coordinate
+ A coordinate which has an associated :term:`Index`. Generally this means
+ that the coordinate labels can be used for indexing (selection) and/or
+ alignment. An indexed coordinate may have one or more arbitrary
+ dimensions although in most cases it is also a :term:`Dimension
+ coordinate`. It may or may not be grouped with other indexed coordinates
+ depending on whether they share the same index. Indexed coordinates are
+ marked by an asterisk ``*`` when printing a ``DataArray`` or ``Dataset``.
+
+ Non-indexed coordinate
+ A coordinate which has no associated :term:`Index`. It may still
+ represent fixed labels along one or more dimensions but it cannot be
+ used for label-based indexing and alignment.
Index
- An *index* is a data structure optimized for efficient selecting and
- slicing of an associated array. Xarray creates indexes for dimension
- coordinates so that operations along dimensions are fast, while
- non-dimension coordinates are not indexed. Under the hood, indexes are
- implemented as :py:class:`pandas.Index` objects. The index associated
- with dimension name ``x`` can be retrieved by ``arr.indexes[x]``. By
- construction, ``len(arr.dims) == len(arr.indexes)``
+ An *index* is a data structure optimized for efficient data selection
+ and alignment within a discrete or continuous space that is defined by
+ coordinate labels (unless it is a functional index). By default, Xarray
+ creates a :py:class:`~xarray.indexes.PandasIndex` object (i.e., a
+ :py:class:`pandas.Index` wrapper) for each :term:`Dimension coordinate`.
+ For more advanced use cases (e.g., staggered or irregular grids,
+ geospatial indexes), Xarray also accepts any instance of a specialized
+ :py:class:`~xarray.indexes.Index` subclass that is associated to one or
+ more arbitrary coordinates. The index associated with the coordinate
+ ``x`` can be retrieved by ``arr.xindexes[x]`` (or ``arr.indexes["x"]``
+ if the index is convertible to a :py:class:`pandas.Index` object). If
+ two coordinates ``x`` and ``y`` share the same index,
+ ``arr.xindexes[x]`` and ``arr.xindexes[y]`` both return the same
+ :py:class:`~xarray.indexes.Index` object.
name
The names of dimensions, coordinates, DataArray objects and data
@@ -112,3 +130,128 @@ complete examples, please consult the relevant documentation.*
``__array_ufunc__`` and ``__array_function__`` protocols are also required.
__ https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html
+
+ .. ipython:: python
+ :suppress:
+
+ import numpy as np
+ import xarray as xr
+
+ Aligning
+ Aligning refers to the process of ensuring that two or more DataArrays or Datasets
+ have the same dimensions and coordinates, so that they can be combined or compared properly.
+
+ .. ipython:: python
+
+ x = xr.DataArray(
+ [[25, 35], [10, 24]],
+ dims=("lat", "lon"),
+ coords={"lat": [35.0, 40.0], "lon": [100.0, 120.0]},
+ )
+ y = xr.DataArray(
+ [[20, 5], [7, 13]],
+ dims=("lat", "lon"),
+ coords={"lat": [35.0, 42.0], "lon": [100.0, 120.0]},
+ )
+ x
+ y
+
+ Broadcasting
+ A technique that allows operations to be performed on arrays with different shapes and dimensions.
+ When performing operations on arrays with different shapes and dimensions, xarray will automatically attempt to broadcast the
+ arrays to a common shape before the operation is applied.
+
+ .. ipython:: python
+
+ # 'a' has shape (3,) and 'b' has shape (4,)
+ a = xr.DataArray(np.array([1, 2, 3]), dims=["x"])
+ b = xr.DataArray(np.array([4, 5, 6, 7]), dims=["y"])
+
+ # 2D array with shape (3, 4)
+ a + b
+
+ Merging
+ Merging is used to combine two or more Datasets or DataArrays that have different variables or coordinates along
+ the same dimensions. When merging, xarray aligns the variables and coordinates of the different datasets along
+ the specified dimensions and creates a new ``Dataset`` containing all the variables and coordinates.
+
+ .. ipython:: python
+
+ # create two 1D arrays with names
+ arr1 = xr.DataArray(
+ [1, 2, 3], dims=["x"], coords={"x": [10, 20, 30]}, name="arr1"
+ )
+ arr2 = xr.DataArray(
+ [4, 5, 6], dims=["x"], coords={"x": [20, 30, 40]}, name="arr2"
+ )
+
+ # merge the two arrays into a new dataset
+ merged_ds = xr.Dataset({"arr1": arr1, "arr2": arr2})
+ merged_ds
+
+ Concatenating
+ Concatenating is used to combine two or more Datasets or DataArrays along a dimension. When concatenating,
+ xarray arranges the datasets or dataarrays along a new dimension, and the resulting ``Dataset`` or ``Dataarray``
+ will have the same variables and coordinates along the other dimensions.
+
+ .. ipython:: python
+
+ a = xr.DataArray([[1, 2], [3, 4]], dims=("x", "y"))
+ b = xr.DataArray([[5, 6], [7, 8]], dims=("x", "y"))
+ c = xr.concat([a, b], dim="c")
+ c
+
+ Combining
+ Combining is the process of arranging two or more DataArrays or Datasets into a single ``DataArray`` or
+ ``Dataset`` using some combination of merging and concatenation operations.
+
+ .. ipython:: python
+
+ ds1 = xr.Dataset(
+ {"data": xr.DataArray([[1, 2], [3, 4]], dims=("x", "y"))},
+ coords={"x": [1, 2], "y": [3, 4]},
+ )
+ ds2 = xr.Dataset(
+ {"data": xr.DataArray([[5, 6], [7, 8]], dims=("x", "y"))},
+ coords={"x": [2, 3], "y": [4, 5]},
+ )
+
+ # combine the datasets
+ combined_ds = xr.combine_by_coords([ds1, ds2])
+ combined_ds
+
+ lazy
+ Lazily-evaluated operations do not load data into memory until necessary.Instead of doing calculations
+ right away, xarray lets you plan what calculations you want to do, like finding the
+ average temperature in a dataset.This planning is called "lazy evaluation." Later, when
+ you're ready to see the final result, you tell xarray, "Okay, go ahead and do those calculations now!"
+ That's when xarray starts working through the steps you planned and gives you the answer you wanted.This
+ lazy approach helps save time and memory because xarray only does the work when you actually need the
+ results.
+
+ labeled
+ Labeled data has metadata describing the context of the data, not just the raw data values.
+ This contextual information can be labels for array axes (i.e. dimension names) tick labels along axes (stored as Coordinate variables) or unique names for each array. These labels
+ provide context and meaning to the data, making it easier to understand and work with. If you have
+ temperature data for different cities over time. Using xarray, you can label the dimensions: one for
+ cities and another for time.
+
+ serialization
+ Serialization is the process of converting your data into a format that makes it easy to save and share.
+ When you serialize data in xarray, you're taking all those temperature measurements, along with their
+ labels and other information, and turning them into a format that can be stored in a file or sent over
+ the internet. xarray objects can be serialized into formats which store the labels alongside the data.
+ Some supported serialization formats are files that can then be stored or transferred (e.g. netCDF),
+ whilst others are protocols that allow for data access over a network (e.g. Zarr).
+
+ indexing
+ :ref:`Indexing` is how you select subsets of your data which you are interested in.
+
+ - Label-based Indexing: Selecting data by passing a specific label and comparing it to the labels
+ stored in the associated coordinates. You can use labels to specify what you want like "Give me the
+ temperature for New York on July 15th."
+
+ - Positional Indexing: You can use numbers to refer to positions in the data like "Give me the third temperature value" This is useful when you know the order of your data but don't need to remember the exact labels.
+
+ - Slicing: You can take a "slice" of your data, like you might want all temperatures from July 1st
+ to July 10th. xarray supports slicing for both positional and label-based indexing.
diff --git a/doc/user-guide/testing.rst b/doc/user-guide/testing.rst
new file mode 100644
index 00000000000..13279eccb0b
--- /dev/null
+++ b/doc/user-guide/testing.rst
@@ -0,0 +1,303 @@
+.. _testing:
+
+Testing your code
+=================
+
+.. ipython:: python
+ :suppress:
+
+ import numpy as np
+ import pandas as pd
+ import xarray as xr
+
+ np.random.seed(123456)
+
+.. _testing.hypothesis:
+
+Hypothesis testing
+------------------
+
+.. note::
+
+ Testing with hypothesis is a fairly advanced topic. Before reading this section it is recommended that you take a look
+ at our guide to xarray's :ref:`data structures`, are familiar with conventional unit testing in
+ `pytest `_, and have seen the
+ `hypothesis library documentation `_.
+
+`The hypothesis library `_ is a powerful tool for property-based testing.
+Instead of writing tests for one example at a time, it allows you to write tests parameterized by a source of many
+dynamically generated examples. For example you might have written a test which you wish to be parameterized by the set
+of all possible integers via :py:func:`hypothesis.strategies.integers()`.
+
+Property-based testing is extremely powerful, because (unlike more conventional example-based testing) it can find bugs
+that you did not even think to look for!
+
+Strategies
+~~~~~~~~~~
+
+Each source of examples is called a "strategy", and xarray provides a range of custom strategies which produce xarray
+data structures containing arbitrary data. You can use these to efficiently test downstream code,
+quickly ensuring that your code can handle xarray objects of all possible structures and contents.
+
+These strategies are accessible in the :py:mod:`xarray.testing.strategies` module, which provides
+
+.. currentmodule:: xarray
+
+.. autosummary::
+
+ testing.strategies.supported_dtypes
+ testing.strategies.names
+ testing.strategies.dimension_names
+ testing.strategies.dimension_sizes
+ testing.strategies.attrs
+ testing.strategies.variables
+ testing.strategies.unique_subset_of
+
+These build upon the numpy and array API strategies offered in :py:mod:`hypothesis.extra.numpy` and :py:mod:`hypothesis.extra.array_api`:
+
+.. ipython:: python
+
+ import hypothesis.extra.numpy as npst
+
+Generating Examples
+~~~~~~~~~~~~~~~~~~~
+
+To see an example of what each of these strategies might produce, you can call one followed by the ``.example()`` method,
+which is a general hypothesis method valid for all strategies.
+
+.. ipython:: python
+
+ import xarray.testing.strategies as xrst
+
+ xrst.variables().example()
+ xrst.variables().example()
+ xrst.variables().example()
+
+You can see that calling ``.example()`` multiple times will generate different examples, giving you an idea of the wide
+range of data that the xarray strategies can generate.
+
+In your tests however you should not use ``.example()`` - instead you should parameterize your tests with the
+:py:func:`hypothesis.given` decorator:
+
+.. ipython:: python
+
+ from hypothesis import given
+
+.. ipython:: python
+
+ @given(xrst.variables())
+ def test_function_that_acts_on_variables(var):
+ assert func(var) == ...
+
+
+Chaining Strategies
+~~~~~~~~~~~~~~~~~~~
+
+Xarray's strategies can accept other strategies as arguments, allowing you to customise the contents of the generated
+examples.
+
+.. ipython:: python
+
+ # generate a Variable containing an array with a complex number dtype, but all other details still arbitrary
+ from hypothesis.extra.numpy import complex_number_dtypes
+
+ xrst.variables(dtype=complex_number_dtypes()).example()
+
+This also works with custom strategies, or strategies defined in other packages.
+For example you could imagine creating a ``chunks`` strategy to specify particular chunking patterns for a dask-backed array.
+
+Fixing Arguments
+~~~~~~~~~~~~~~~~
+
+If you want to fix one aspect of the data structure, whilst allowing variation in the generated examples
+over all other aspects, then use :py:func:`hypothesis.strategies.just()`.
+
+.. ipython:: python
+
+ import hypothesis.strategies as st
+
+ # Generates only variable objects with dimensions ["x", "y"]
+ xrst.variables(dims=st.just(["x", "y"])).example()
+
+(This is technically another example of chaining strategies - :py:func:`hypothesis.strategies.just()` is simply a
+special strategy that just contains a single example.)
+
+To fix the length of dimensions you can instead pass ``dims`` as a mapping of dimension names to lengths
+(i.e. following xarray objects' ``.sizes()`` property), e.g.
+
+.. ipython:: python
+
+ # Generates only variables with dimensions ["x", "y"], of lengths 2 & 3 respectively
+ xrst.variables(dims=st.just({"x": 2, "y": 3})).example()
+
+You can also use this to specify that you want examples which are missing some part of the data structure, for instance
+
+.. ipython:: python
+
+ # Generates a Variable with no attributes
+ xrst.variables(attrs=st.just({})).example()
+
+Through a combination of chaining strategies and fixing arguments, you can specify quite complicated requirements on the
+objects your chained strategy will generate.
+
+.. ipython:: python
+
+ fixed_x_variable_y_maybe_z = st.fixed_dictionaries(
+ {"x": st.just(2), "y": st.integers(3, 4)}, optional={"z": st.just(2)}
+ )
+ fixed_x_variable_y_maybe_z.example()
+
+ special_variables = xrst.variables(dims=fixed_x_variable_y_maybe_z)
+
+ special_variables.example()
+ special_variables.example()
+
+Here we have used one of hypothesis' built-in strategies :py:func:`hypothesis.strategies.fixed_dictionaries` to create a
+strategy which generates mappings of dimension names to lengths (i.e. the ``size`` of the xarray object we want).
+This particular strategy will always generate an ``x`` dimension of length 2, and a ``y`` dimension of
+length either 3 or 4, and will sometimes also generate a ``z`` dimension of length 2.
+By feeding this strategy for dictionaries into the ``dims`` argument of xarray's :py:func:`~st.variables` strategy,
+we can generate arbitrary :py:class:`~xarray.Variable` objects whose dimensions will always match these specifications.
+
+Generating Duck-type Arrays
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Xarray objects don't have to wrap numpy arrays, in fact they can wrap any array type which presents the same API as a
+numpy array (so-called "duck array wrapping", see :ref:`wrapping numpy-like arrays `).
+
+Imagine we want to write a strategy which generates arbitrary ``Variable`` objects, each of which wraps a
+:py:class:`sparse.COO` array instead of a ``numpy.ndarray``. How could we do that? There are two ways:
+
+1. Create a xarray object with numpy data and use the hypothesis' ``.map()`` method to convert the underlying array to a
+different type:
+
+.. ipython:: python
+
+ import sparse
+
+.. ipython:: python
+
+ def convert_to_sparse(var):
+ return var.copy(data=sparse.COO.from_numpy(var.to_numpy()))
+
+.. ipython:: python
+
+ sparse_variables = xrst.variables(dims=xrst.dimension_names(min_dims=1)).map(
+ convert_to_sparse
+ )
+
+ sparse_variables.example()
+ sparse_variables.example()
+
+2. Pass a function which returns a strategy which generates the duck-typed arrays directly to the ``array_strategy_fn`` argument of the xarray strategies:
+
+.. ipython:: python
+
+ def sparse_random_arrays(shape: tuple[int]) -> sparse._coo.core.COO:
+ """Strategy which generates random sparse.COO arrays"""
+ if shape is None:
+ shape = npst.array_shapes()
+ else:
+ shape = st.just(shape)
+ density = st.integers(min_value=0, max_value=1)
+ # note sparse.random does not accept a dtype kwarg
+ return st.builds(sparse.random, shape=shape, density=density)
+
+
+ def sparse_random_arrays_fn(
+ *, shape: tuple[int, ...], dtype: np.dtype
+ ) -> st.SearchStrategy[sparse._coo.core.COO]:
+ return sparse_random_arrays(shape=shape)
+
+
+.. ipython:: python
+
+ sparse_random_variables = xrst.variables(
+ array_strategy_fn=sparse_random_arrays_fn, dtype=st.just(np.dtype("float64"))
+ )
+ sparse_random_variables.example()
+
+Either approach is fine, but one may be more convenient than the other depending on the type of the duck array which you
+want to wrap.
+
+Compatibility with the Python Array API Standard
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Xarray aims to be compatible with any duck-array type that conforms to the `Python Array API Standard `_
+(see our :ref:`docs on Array API Standard support `).
+
+.. warning::
+
+ The strategies defined in :py:mod:`testing.strategies` are **not** guaranteed to use array API standard-compliant
+ dtypes by default.
+ For example arrays with the dtype ``np.dtype('float16')`` may be generated by :py:func:`testing.strategies.variables`
+ (assuming the ``dtype`` kwarg was not explicitly passed), despite ``np.dtype('float16')`` not being in the
+ array API standard.
+
+If the array type you want to generate has an array API-compliant top-level namespace
+(e.g. that which is conventionally imported as ``xp`` or similar),
+you can use this neat trick:
+
+.. ipython:: python
+ :okwarning:
+
+ from numpy import array_api as xp # available in numpy 1.26.0
+
+ from hypothesis.extra.array_api import make_strategies_namespace
+
+ xps = make_strategies_namespace(xp)
+
+ xp_variables = xrst.variables(
+ array_strategy_fn=xps.arrays,
+ dtype=xps.scalar_dtypes(),
+ )
+ xp_variables.example()
+
+Another array API-compliant duck array library would replace the import, e.g. ``import cupy as cp`` instead.
+
+Testing over Subsets of Dimensions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A common task when testing xarray user code is checking that your function works for all valid input dimensions.
+We can chain strategies to achieve this, for which the helper strategy :py:func:`~testing.strategies.unique_subset_of`
+is useful.
+
+It works for lists of dimension names
+
+.. ipython:: python
+
+ dims = ["x", "y", "z"]
+ xrst.unique_subset_of(dims).example()
+ xrst.unique_subset_of(dims).example()
+
+as well as for mappings of dimension names to sizes
+
+.. ipython:: python
+
+ dim_sizes = {"x": 2, "y": 3, "z": 4}
+ xrst.unique_subset_of(dim_sizes).example()
+ xrst.unique_subset_of(dim_sizes).example()
+
+This is useful because operations like reductions can be performed over any subset of the xarray object's dimensions.
+For example we can write a pytest test that tests that a reduction gives the expected result when applying that reduction
+along any possible valid subset of the Variable's dimensions.
+
+.. code-block:: python
+
+ import numpy.testing as npt
+
+
+ @given(st.data(), xrst.variables(dims=xrst.dimension_names(min_dims=1)))
+ def test_mean(data, var):
+ """Test that the mean of an xarray Variable is always equal to the mean of the underlying array."""
+
+ # specify arbitrary reduction along at least one dimension
+ reduction_dims = data.draw(xrst.unique_subset_of(var.dims, min_size=1))
+
+ # create expected result (using nanmean because arrays with Nans will be generated)
+ reduction_axes = tuple(var.get_axis_num(dim) for dim in reduction_dims)
+ expected = np.nanmean(var.data, axis=reduction_axes)
+
+ # assert property is always satisfied
+ result = var.mean(dim=reduction_dims).data
+ npt.assert_equal(expected, result)
diff --git a/doc/user-guide/time-series.rst b/doc/user-guide/time-series.rst
index d2e15adeba7..82172aa8998 100644
--- a/doc/user-guide/time-series.rst
+++ b/doc/user-guide/time-series.rst
@@ -89,7 +89,7 @@ items and with the `slice` object:
.. ipython:: python
- time = pd.date_range("2000-01-01", freq="H", periods=365 * 24)
+ time = pd.date_range("2000-01-01", freq="h", periods=365 * 24)
ds = xr.Dataset({"foo": ("time", np.arange(365 * 24)), "time": time})
ds.sel(time="2000-01")
ds.sel(time=slice("2000-06-01", "2000-06-10"))
@@ -115,7 +115,7 @@ given ``DataArray`` can be quickly computed using a special ``.dt`` accessor.
.. ipython:: python
- time = pd.date_range("2000-01-01", freq="6H", periods=365 * 4)
+ time = pd.date_range("2000-01-01", freq="6h", periods=365 * 4)
ds = xr.Dataset({"foo": ("time", np.arange(365 * 4)), "time": time})
ds.time.dt.hour
ds.time.dt.dayofweek
@@ -207,7 +207,7 @@ For example, we can downsample our dataset from hourly to 6-hourly:
.. ipython:: python
:okwarning:
- ds.resample(time="6H")
+ ds.resample(time="6h")
This will create a specialized ``Resample`` object which saves information
necessary for resampling. All of the reduction methods which work with
@@ -216,14 +216,21 @@ necessary for resampling. All of the reduction methods which work with
.. ipython:: python
:okwarning:
- ds.resample(time="6H").mean()
+ ds.resample(time="6h").mean()
You can also supply an arbitrary reduction function to aggregate over each
resampling group:
.. ipython:: python
- ds.resample(time="6H").reduce(np.mean)
+ ds.resample(time="6h").reduce(np.mean)
+
+You can also resample on the time dimension while applying reducing along other dimensions at the same time
+by specifying the `dim` keyword argument
+
+.. code-block:: python
+
+ ds.resample(time="6h").mean(dim=["time", "latitude", "longitude"])
For upsampling, xarray provides six methods: ``asfreq``, ``ffill``, ``bfill``, ``pad``,
``nearest`` and ``interpolate``. ``interpolate`` extends ``scipy.interpolate.interp1d``
@@ -236,8 +243,20 @@ Data that has indices outside of the given ``tolerance`` are set to ``NaN``.
.. ipython:: python
- ds.resample(time="1H").nearest(tolerance="1H")
+ ds.resample(time="1h").nearest(tolerance="1h")
+
+It is often desirable to center the time values after a resampling operation.
+That can be accomplished by updating the resampled dataset time coordinate values
+using time offset arithmetic via the `pandas.tseries.frequencies.to_offset`_ function.
+
+.. _pandas.tseries.frequencies.to_offset: https://pandas.pydata.org/docs/reference/api/pandas.tseries.frequencies.to_offset.html
+
+.. ipython:: python
+ resampled_ds = ds.resample(time="6h").mean()
+ offset = pd.tseries.frequencies.to_offset("6h") / 2
+ resampled_ds["time"] = resampled_ds.get_index("time") + offset
+ resampled_ds
For more examples of using grouped operations on a time dimension, see
:doc:`../examples/weather-data`.
diff --git a/doc/user-guide/weather-climate.rst b/doc/user-guide/weather-climate.rst
index 30876eb36bc..5014f5a8641 100644
--- a/doc/user-guide/weather-climate.rst
+++ b/doc/user-guide/weather-climate.rst
@@ -57,14 +57,14 @@ CF-compliant coordinate variables
.. _CFTimeIndex:
-Non-standard calendars and dates outside the Timestamp-valid range
-------------------------------------------------------------------
+Non-standard calendars and dates outside the nanosecond-precision range
+-----------------------------------------------------------------------
Through the standalone ``cftime`` library and a custom subclass of
:py:class:`pandas.Index`, xarray supports a subset of the indexing
functionality enabled through the standard :py:class:`pandas.DatetimeIndex` for
dates from non-standard calendars commonly used in climate science or dates
-using a standard calendar, but outside the `Timestamp-valid range`_
+using a standard calendar, but outside the `nanosecond-precision range`_
(approximately between years 1678 and 2262).
.. note::
@@ -75,13 +75,19 @@ using a standard calendar, but outside the `Timestamp-valid range`_
any of the following are true:
- The dates are from a non-standard calendar
- - Any dates are outside the Timestamp-valid range.
+ - Any dates are outside the nanosecond-precision range.
Otherwise pandas-compatible dates from a standard calendar will be
represented with the ``np.datetime64[ns]`` data type, enabling the use of a
:py:class:`pandas.DatetimeIndex` or arrays with dtype ``np.datetime64[ns]``
and their full set of associated features.
+ As of pandas version 2.0.0, pandas supports non-nanosecond precision datetime
+ values. For the time being, xarray still automatically casts datetime values
+ to nanosecond-precision for backwards compatibility with older pandas
+ versions; however, this is something we would like to relax going forward.
+ See :issue:`7493` for more discussion.
+
For example, you can create a DataArray indexed by a time
coordinate with dates from a no-leap calendar and a
:py:class:`~xarray.CFTimeIndex` will automatically be used:
@@ -233,8 +239,8 @@ For data indexed by a :py:class:`~xarray.CFTimeIndex` xarray currently supports:
.. ipython:: python
- da.resample(time="81T", closed="right", label="right", offset="3T").mean()
+ da.resample(time="81min", closed="right", label="right", offset="3min").mean()
-.. _Timestamp-valid range: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timestamp-limitations
+.. _nanosecond-precision range: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timestamp-limitations
.. _ISO 8601 standard: https://en.wikipedia.org/wiki/ISO_8601
.. _partial datetime string indexing: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#partial-string-indexing
diff --git a/doc/whats-new.rst b/doc/whats-new.rst
index a7218ba11da..1ef6f86f20a 100644
--- a/doc/whats-new.rst
+++ b/doc/whats-new.rst
@@ -15,9 +15,9 @@ What's New
np.random.seed(123456)
-.. _whats-new.2023.04.0:
+.. _whats-new.2024.03.0:
-v2023.04.0 (unreleased)
+v2024.03.0 (unreleased)
-----------------------
New Features
@@ -25,14 +25,1048 @@ New Features
- Allow control over padding in rolling. (:issue:`2007`, :pr:`5603`).
By `Kevin Squire `_.
+- Do not broadcast in arithmetic operations when global option ``arithmetic_broadcast=False``
+ (:issue:`6806`, :pull:`8784`).
+ By `Etienne Schalk `_ and `Deepak Cherian `_.
+- Add the ``.oindex`` property to Explicitly Indexed Arrays for orthogonal indexing functionality. (:issue:`8238`, :pull:`8750`)
+ By `Anderson Banihirwe `_.
+
+- Add the ``.vindex`` property to Explicitly Indexed Arrays for vectorized indexing functionality. (:issue:`8238`, :pull:`8780`)
+ By `Anderson Banihirwe `_.
+
+- Expand use of ``.oindex`` and ``.vindex`` properties. (:pull: `8790`)
+ By `Anderson Banihirwe `_ and `Deepak Cherian `_.
+
+Breaking changes
+~~~~~~~~~~~~~~~~
+
+
+Deprecations
+~~~~~~~~~~~~
+
+
+Bug fixes
+~~~~~~~~~
+- The default ``freq`` parameter in :py:meth:`xr.date_range` and :py:meth:`xr.cftime_range` is
+ set to ``'D'`` only if ``periods``, ``start``, or ``end`` are ``None`` (:issue:`8770`, :pull:`8774`).
+ By `Roberto Chang `_.
+- Ensure that non-nanosecond precision :py:class:`numpy.datetime64` and
+ :py:class:`numpy.timedelta64` values are cast to nanosecond precision values
+ when used in :py:meth:`DataArray.expand_dims` and
+ ::py:meth:`Dataset.expand_dims` (:pull:`8781`). By `Spencer
+ Clark `_.
+- CF conform handling of `_FillValue`/`missing_value` and `dtype` in
+ `CFMaskCoder`/`CFScaleOffsetCoder` (:issue:`2304`, :issue:`5597`,
+ :issue:`7691`, :pull:`8713`, see also discussion in :pull:`7654`).
+ By `Kai Mühlbauer `_.
+- do not cast `_FillValue`/`missing_value` in `CFMaskCoder` if `_Unsigned` is provided
+ (:issue:`8844`, :pull:`8852`).
+- Adapt handling of copy keyword argument in scipy backend for numpy >= 2.0dev
+ (:issue:`8844`, :pull:`8851`).
+ By `Kai Mühlbauer `_.
+
+Documentation
+~~~~~~~~~~~~~
+
+
+Internal Changes
+~~~~~~~~~~~~~~~~
+- Migrates ``treenode`` functionality into ``xarray/core`` (:pull:`8757`)
+ By `Matt Savoie `_ and `Tom Nicholas
+ `_.
+
+
+.. _whats-new.2024.02.0:
+
+v2024.02.0 (Feb 19, 2024)
+-------------------------
+
+This release brings size information to the text ``repr``, changes to the accepted frequency
+strings, and various bug fixes.
+
+Thanks to our 12 contributors:
+
+Anderson Banihirwe, Deepak Cherian, Eivind Jahren, Etienne Schalk, Justus Magin, Marco Wolsza,
+Mathias Hauser, Matt Savoie, Maximilian Roos, Rambaud Pierrick, Tom Nicholas
+
+New Features
+~~~~~~~~~~~~
+
+- Added a simple ``nbytes`` representation in DataArrays and Dataset ``repr``.
+ (:issue:`8690`, :pull:`8702`).
+ By `Etienne Schalk `_.
+- Allow negative frequency strings (e.g. ``"-1YE"``). These strings are for example used in
+ :py:func:`date_range`, and :py:func:`cftime_range` (:pull:`8651`).
+ By `Mathias Hauser `_.
+- Add :py:meth:`NamedArray.expand_dims`, :py:meth:`NamedArray.permute_dims` and
+ :py:meth:`NamedArray.broadcast_to` (:pull:`8380`)
+ By `Anderson Banihirwe `_.
+- Xarray now defers to `flox's heuristics `_
+ to set the default `method` for groupby problems. This only applies to ``flox>=0.9``.
+ By `Deepak Cherian `_.
+- All `quantile` methods (e.g. :py:meth:`DataArray.quantile`) now use `numbagg`
+ for the calculation of nanquantiles (i.e., `skipna=True`) if it is installed.
+ This is currently limited to the linear interpolation method (`method='linear'`).
+ (:issue:`7377`, :pull:`8684`)
+ By `Marco Wolsza `_.
+
+Breaking changes
+~~~~~~~~~~~~~~~~
+
+- :py:func:`infer_freq` always returns the frequency strings as defined in pandas 2.2
+ (:issue:`8612`, :pull:`8627`).
+ By `Mathias Hauser `_.
+
+Deprecations
+~~~~~~~~~~~~
+- The `dt.weekday_name` parameter wasn't functional on modern pandas versions and has been
+ removed. (:issue:`8610`, :pull:`8664`)
+ By `Sam Coleman `_.
+
+
+Bug fixes
+~~~~~~~~~
+
+- Fixed a regression that prevented multi-index level coordinates being serialized after resetting
+ or dropping the multi-index (:issue:`8628`, :pull:`8672`).
+ By `Benoit Bovy `_.
+- Fix bug with broadcasting when wrapping array API-compliant classes. (:issue:`8665`, :pull:`8669`)
+ By `Tom Nicholas `_.
+- Ensure :py:meth:`DataArray.unstack` works when wrapping array API-compliant
+ classes. (:issue:`8666`, :pull:`8668`)
+ By `Tom Nicholas `_.
+- Fix negative slicing of Zarr arrays without dask installed. (:issue:`8252`)
+ By `Deepak Cherian `_.
+- Preserve chunks when writing time-like variables to zarr by enabling lazy CF encoding of time-like
+ variables (:issue:`7132`, :issue:`8230`, :issue:`8432`, :pull:`8575`).
+ By `Spencer Clark `_ and `Mattia Almansi `_.
+- Preserve chunks when writing time-like variables to zarr by enabling their lazy encoding
+ (:issue:`7132`, :issue:`8230`, :issue:`8432`, :pull:`8253`, :pull:`8575`; see also discussion in
+ :pull:`8253`).
+ By `Spencer Clark `_ and `Mattia Almansi `_.
+- Raise an informative error if dtype encoding of time-like variables would lead to integer overflow
+ or unsafe conversion from floating point to integer values (:issue:`8542`, :pull:`8575`).
+ By `Spencer Clark `_.
+- Raise an error when unstacking a MultiIndex that has duplicates as this would lead to silent data
+ loss (:issue:`7104`, :pull:`8737`).
+ By `Mathias Hauser `_.
+
+Documentation
+~~~~~~~~~~~~~
+- Fix `variables` arg typo in `Dataset.sortby()` docstring (:issue:`8663`, :pull:`8670`)
+ By `Tom Vo `_.
+- Fixed documentation where the use of the depreciated pandas frequency string prevented the
+ documentation from being built. (:pull:`8638`)
+ By `Sam Coleman `_.
+
+Internal Changes
+~~~~~~~~~~~~~~~~
+
+- ``DataArray.dt`` now raises an ``AttributeError`` rather than a ``TypeError`` when the data isn't
+ datetime-like. (:issue:`8718`, :pull:`8724`)
+ By `Maximilian Roos `_.
+- Move ``parallelcompat`` and ``chunk managers`` modules from ``xarray/core`` to
+ ``xarray/namedarray``. (:pull:`8319`)
+ By `Tom Nicholas `_ and `Anderson Banihirwe `_.
+- Imports ``datatree`` repository and history into internal location. (:pull:`8688`)
+ By `Matt Savoie `_, `Justus Magin `_
+ and `Tom Nicholas `_.
+- Adds :py:func:`open_datatree` into ``xarray/backends`` (:pull:`8697`)
+ By `Matt Savoie `_ and `Tom Nicholas
+ `_.
+- Refactor :py:meth:`xarray.core.indexing.DaskIndexingAdapter.__getitem__` to remove an unnecessary
+ rewrite of the indexer key (:issue: `8377`, :pull:`8758`)
+ By `Anderson Banihirwe `_.
+
+.. _whats-new.2024.01.1:
+
+v2024.01.1 (23 Jan, 2024)
+-------------------------
+
+This release is to fix a bug with the rendering of the documentation, but it also includes changes to the handling of pandas frequency strings.
+
+Breaking changes
+~~~~~~~~~~~~~~~~
+
+- Following pandas, :py:meth:`infer_freq` will return ``"YE"``, instead of ``"Y"`` (formerly ``"A"``).
+ This is to be consistent with the deprecation of the latter frequency string in pandas 2.2.
+ This is a follow up to :pull:`8415` (:issue:`8612`, :pull:`8642`).
+ By `Mathias Hauser `_.
+
+Deprecations
+~~~~~~~~~~~~
+
+- Following pandas, the frequency string ``"Y"`` (formerly ``"A"``) is deprecated in
+ favor of ``"YE"``. These strings are used, for example, in :py:func:`date_range`,
+ :py:func:`cftime_range`, :py:meth:`DataArray.resample`, and :py:meth:`Dataset.resample`
+ among others (:issue:`8612`, :pull:`8629`).
+ By `Mathias Hauser `_.
+
+Documentation
+~~~~~~~~~~~~~
+
+- Pin ``sphinx-book-theme`` to ``1.0.1`` to fix a rendering issue with the sidebar in the docs. (:issue:`8619`, :pull:`8632`)
+ By `Tom Nicholas `_.
+
+.. _whats-new.2024.01.0:
+
+v2024.01.0 (17 Jan, 2024)
+-------------------------
+
+This release brings support for weights in correlation and covariance functions,
+a new `DataArray.cumulative` aggregation, improvements to `xr.map_blocks`,
+an update to our minimum dependencies, and various bugfixes.
+
+Thanks to our 17 contributors to this release:
+
+Abel Aoun, Deepak Cherian, Illviljan, Johan Mathe, Justus Magin, Kai Mühlbauer,
+Llorenç Lledó, Mark Harfouche, Markel, Mathias Hauser, Maximilian Roos, Michael Niklas,
+Niclas Rieger, Sébastien Celles, Tom Nicholas, Trinh Quoc Anh, and crusaderky.
+
+New Features
+~~~~~~~~~~~~
+
+- :py:meth:`xr.cov` and :py:meth:`xr.corr` now support using weights (:issue:`8527`, :pull:`7392`).
+ By `Llorenç Lledó `_.
+- Accept the compression arguments new in netCDF 1.6.0 in the netCDF4 backend.
+ See `netCDF4 documentation `_ for details.
+ Note that some new compression filters needs plugins to be installed which may not be available in all netCDF distributions.
+ By `Markel García-Díez `_. (:issue:`6929`, :pull:`7551`)
+- Add :py:meth:`DataArray.cumulative` & :py:meth:`Dataset.cumulative` to compute
+ cumulative aggregations, such as ``sum``, along a dimension — for example
+ ``da.cumulative('time').sum()``. This is similar to pandas' ``.expanding``,
+ and mostly equivalent to ``.cumsum`` methods, or to
+ :py:meth:`DataArray.rolling` with a window length equal to the dimension size.
+ By `Maximilian Roos `_. (:pull:`8512`)
+- Decode/Encode netCDF4 enums and store the enum definition in dataarrays' dtype metadata.
+ If multiple variables share the same enum in netCDF4, each dataarray will have its own
+ enum definition in their respective dtype metadata.
+ By `Abel Aoun `_. (:issue:`8144`, :pull:`8147`)
+
+Breaking changes
+~~~~~~~~~~~~~~~~
+
+- The minimum versions of some dependencies were changed (:pull:`8586`):
+
+ ===================== ========= ========
+ Package Old New
+ ===================== ========= ========
+ cartopy 0.20 0.21
+ dask-core 2022.7 2022.12
+ distributed 2022.7 2022.12
+ flox 0.5 0.7
+ iris 3.2 3.4
+ matplotlib-base 3.5 3.6
+ numpy 1.22 1.23
+ numba 0.55 0.56
+ packaging 21.3 22.0
+ seaborn 0.11 0.12
+ scipy 1.8 1.10
+ typing_extensions 4.3 4.4
+ zarr 2.12 2.13
+ ===================== ========= ========
+
+Deprecations
+~~~~~~~~~~~~
+
+- The `squeeze` kwarg to GroupBy is now deprecated. (:issue:`2157`, :pull:`8507`)
+ By `Deepak Cherian `_.
+
+Bug fixes
+~~~~~~~~~
+
+- Support non-string hashable dimensions in :py:class:`xarray.DataArray` (:issue:`8546`, :pull:`8559`).
+ By `Michael Niklas `_.
+- Reverse index output of bottleneck's rolling move_argmax/move_argmin functions (:issue:`8541`, :pull:`8552`).
+ By `Kai Mühlbauer `_.
+- Vendor `SerializableLock` from dask and use as default lock for netcdf4 backends (:issue:`8442`, :pull:`8571`).
+ By `Kai Mühlbauer `_.
+- Add tests and fixes for empty :py:class:`CFTimeIndex`, including broken html repr (:issue:`7298`, :pull:`8600`).
+ By `Mathias Hauser `_.
+
+Internal Changes
+~~~~~~~~~~~~~~~~
+
+- The implementation of :py:func:`map_blocks` has changed to minimize graph size and duplication of data.
+ This should be a strict improvement even though the graphs are not always embarassingly parallel any more.
+ Please open an issue if you spot a regression. (:pull:`8412`, :issue:`8409`).
+ By `Deepak Cherian `_.
+- Remove null values before plotting. (:pull:`8535`).
+ By `Jimmy Westling `_.
+- Redirect cumulative reduction functions internally through the :py:class:`ChunkManagerEntryPoint`,
+ potentially allowing :py:meth:`~xarray.DataArray.ffill` and :py:meth:`~xarray.DataArray.bfill` to
+ use non-dask chunked array types.
+ (:pull:`8019`) By `Tom Nicholas `_.
+
+.. _whats-new.2023.12.0:
+
+v2023.12.0 (2023 Dec 08)
+------------------------
+
+This release brings new `hypothesis `_ strategies for testing, significantly faster rolling aggregations as well as
+``ffill`` and ``bfill`` with ``numbagg``, a new :py:meth:`Dataset.eval` method, and improvements to
+reading and writing Zarr arrays (including a new ``"a-"`` mode).
+
+Thanks to our 16 contributors:
+
+Anderson Banihirwe, Ben Mares, Carl Andersson, Deepak Cherian, Doug Latornell, Gregorio L. Trevisan, Illviljan, Jens Hedegaard Nielsen, Justus Magin, Mathias Hauser, Max Jones, Maximilian Roos, Michael Niklas, Patrick Hoefler, Ryan Abernathey, Tom Nicholas
+
+New Features
+~~~~~~~~~~~~
+
+- Added hypothesis strategies for generating :py:class:`xarray.Variable` objects containing arbitrary data, useful for parametrizing downstream tests.
+ Accessible under :py:mod:`testing.strategies`, and documented in a new page on testing in the User Guide.
+ (:issue:`6911`, :pull:`8404`)
+ By `Tom Nicholas `_.
+- :py:meth:`rolling` uses `numbagg `_ for
+ most of its computations by default. Numbagg is up to 5x faster than bottleneck
+ where parallelization is possible. Where parallelization isn't possible — for
+ example a 1D array — it's about the same speed as bottleneck, and 2-5x faster
+ than pandas' default functions. (:pull:`8493`). numbagg is an optional
+ dependency, so requires installing separately.
+- Use a concise format when plotting datetime arrays. (:pull:`8449`).
+ By `Jimmy Westling `_.
+- Avoid overwriting unchanged existing coordinate variables when appending with :py:meth:`Dataset.to_zarr` by setting ``mode='a-'``.
+ By `Ryan Abernathey `_ and `Deepak Cherian `_.
+- :py:meth:`~xarray.DataArray.rank` now operates on dask-backed arrays, assuming
+ the core dim has exactly one chunk. (:pull:`8475`).
+ By `Maximilian Roos `_.
+- Add a :py:meth:`Dataset.eval` method, similar to the pandas' method of the
+ same name. (:pull:`7163`). This is currently marked as experimental and
+ doesn't yet support the ``numexpr`` engine.
+- :py:meth:`Dataset.drop_vars` & :py:meth:`DataArray.drop_vars` allow passing a
+ callable, similar to :py:meth:`Dataset.where` & :py:meth:`Dataset.sortby` & others.
+ (:pull:`8511`).
+ By `Maximilian Roos `_.
+
+Breaking changes
+~~~~~~~~~~~~~~~~
+
+- Explicitly warn when creating xarray objects with repeated dimension names.
+ Such objects will also now raise when :py:meth:`DataArray.get_axis_num` is called,
+ which means many functions will raise.
+ This latter change is technically a breaking change, but whilst allowed,
+ this behaviour was never actually supported! (:issue:`3731`, :pull:`8491`)
+ By `Tom Nicholas `_.
+
+Deprecations
+~~~~~~~~~~~~
+- As part of an effort to standardize the API, we're renaming the ``dims``
+ keyword arg to ``dim`` for the minority of functions which current use
+ ``dims``. This started with :py:func:`xarray.dot` & :py:meth:`DataArray.dot`
+ and we'll gradually roll this out across all functions. The warnings are
+ currently ``PendingDeprecationWarning``, which are silenced by default. We'll
+ convert these to ``DeprecationWarning`` in a future release.
+ By `Maximilian Roos `_.
+- Raise a ``FutureWarning`` warning that the type of :py:meth:`Dataset.dims` will be changed
+ from a mapping of dimension names to lengths to a set of dimension names.
+ This is to increase consistency with :py:meth:`DataArray.dims`.
+ To access a mapping of dimension names to lengths please use :py:meth:`Dataset.sizes`.
+ The same change also applies to `DatasetGroupBy.dims`.
+ (:issue:`8496`, :pull:`8500`)
+ By `Tom Nicholas `_.
+- :py:meth:`Dataset.drop` & :py:meth:`DataArray.drop` are now deprecated, since pending deprecation for
+ several years. :py:meth:`DataArray.drop_sel` & :py:meth:`DataArray.drop_var`
+ replace them for labels & variables respectively. (:pull:`8497`)
+ By `Maximilian Roos `_.
+
+Bug fixes
+~~~~~~~~~
+
+- Fix dtype inference for ``pd.CategoricalIndex`` when categories are backed by a ``pd.ExtensionDtype`` (:pull:`8481`)
+- Fix writing a variable that requires transposing when not writing to a region (:pull:`8484`)
+ By `Maximilian Roos `_.
+- Static typing of ``p0`` and ``bounds`` arguments of :py:func:`xarray.DataArray.curvefit` and :py:func:`xarray.Dataset.curvefit`
+ was changed to ``Mapping`` (:pull:`8502`).
+ By `Michael Niklas `_.
+- Fix typing of :py:func:`xarray.DataArray.to_netcdf` and :py:func:`xarray.Dataset.to_netcdf`
+ when ``compute`` is evaluated to bool instead of a Literal (:pull:`8268`).
+ By `Jens Hedegaard Nielsen `_.
+
+Documentation
+~~~~~~~~~~~~~
+
+- Added illustration of updating the time coordinate values of a resampled dataset using
+ time offset arithmetic.
+ This is the recommended technique to replace the use of the deprecated ``loffset`` parameter
+ in ``resample`` (:pull:`8479`).
+ By `Doug Latornell `_.
+- Improved error message when attempting to get a variable which doesn't exist from a Dataset.
+ (:pull:`8474`)
+ By `Maximilian Roos `_.
+- Fix default value of ``combine_attrs`` in :py:func:`xarray.combine_by_coords` (:pull:`8471`)
+ By `Gregorio L. Trevisan `_.
+
+Internal Changes
+~~~~~~~~~~~~~~~~
+
+- :py:meth:`DataArray.bfill` & :py:meth:`DataArray.ffill` now use numbagg `_ by
+ default, which is up to 5x faster where parallelization is possible. (:pull:`8339`)
+ By `Maximilian Roos `_.
+- Update mypy version to 1.7 (:issue:`8448`, :pull:`8501`).
+ By `Michael Niklas `_.
+
+.. _whats-new.2023.11.0:
+
+v2023.11.0 (Nov 16, 2023)
+-------------------------
+
+
+.. tip::
+
+ `This is our 10th year anniversary release! `_ Thank you for your love and support.
+
+
+This release brings the ability to use ``opt_einsum`` for :py:func:`xarray.dot` by default,
+support for auto-detecting ``region`` when writing partial datasets to Zarr, and the use of h5py
+drivers with ``h5netcdf``.
+
+Thanks to the 19 contributors to this release:
+Aman Bagrecha, Anderson Banihirwe, Ben Mares, Deepak Cherian, Dimitri Papadopoulos Orfanos, Ezequiel Cimadevilla Alvarez,
+Illviljan, Justus Magin, Katelyn FitzGerald, Kai Muehlbauer, Martin Durant, Maximilian Roos, Metamess, Sam Levang, Spencer Clark, Tom Nicholas, mgunyho, templiert
+
+New Features
+~~~~~~~~~~~~
+
+- Use `opt_einsum `_ for :py:func:`xarray.dot` by default if installed.
+ By `Deepak Cherian `_. (:issue:`7764`, :pull:`8373`).
+- Add ``DataArray.dt.total_seconds()`` method to match the Pandas API. (:pull:`8435`).
+ By `Ben Mares `_.
+- Allow passing ``region="auto"`` in :py:meth:`Dataset.to_zarr` to automatically infer the
+ region to write in the original store. Also implement automatic transpose when dimension
+ order does not match the original store. (:issue:`7702`, :issue:`8421`, :pull:`8434`).
+ By `Sam Levang `_.
+- Allow the usage of h5py drivers (eg: ros3) via h5netcdf (:pull:`8360`).
+ By `Ezequiel Cimadevilla `_.
+- Enable VLEN string fill_values, preserve VLEN string dtypes (:issue:`1647`, :issue:`7652`, :issue:`7868`, :pull:`7869`).
+ By `Kai Mühlbauer `_.
+
+Breaking changes
+~~~~~~~~~~~~~~~~
+- drop support for `cdms2 `_. Please use
+ `xcdat `_ instead (:pull:`8441`).
+ By `Justus Magin `_.
+- Following pandas, :py:meth:`infer_freq` will return ``"Y"``, ``"YS"``,
+ ``"QE"``, ``"ME"``, ``"h"``, ``"min"``, ``"s"``, ``"ms"``, ``"us"``, or
+ ``"ns"`` instead of ``"A"``, ``"AS"``, ``"Q"``, ``"M"``, ``"H"``, ``"T"``,
+ ``"S"``, ``"L"``, ``"U"``, or ``"N"``. This is to be consistent with the
+ deprecation of the latter frequency strings (:issue:`8394`, :pull:`8415`). By
+ `Spencer Clark `_.
+- Bump minimum tested pint version to ``>=0.22``. By `Deepak Cherian `_.
+- Minimum supported versions for the following packages have changed: ``h5py >=3.7``, ``h5netcdf>=1.1``.
+ By `Kai Mühlbauer `_.
+
+Deprecations
+~~~~~~~~~~~~
+- The PseudoNetCDF backend has been removed. By `Deepak Cherian `_.
+- Supplying dimension-ordered sequences to :py:meth:`DataArray.chunk` &
+ :py:meth:`Dataset.chunk` is deprecated in favor of supplying a dictionary of
+ dimensions, or a single ``int`` or ``"auto"`` argument covering all
+ dimensions. Xarray favors using dimensions names rather than positions, and
+ this was one place in the API where dimension positions were used.
+ (:pull:`8341`)
+ By `Maximilian Roos `_.
+- Following pandas, the frequency strings ``"A"``, ``"AS"``, ``"Q"``, ``"M"``,
+ ``"H"``, ``"T"``, ``"S"``, ``"L"``, ``"U"``, and ``"N"`` are deprecated in
+ favor of ``"Y"``, ``"YS"``, ``"QE"``, ``"ME"``, ``"h"``, ``"min"``, ``"s"``,
+ ``"ms"``, ``"us"``, and ``"ns"``, respectively. These strings are used, for
+ example, in :py:func:`date_range`, :py:func:`cftime_range`,
+ :py:meth:`DataArray.resample`, and :py:meth:`Dataset.resample` among others
+ (:issue:`8394`, :pull:`8415`). By `Spencer Clark
+ `_.
+- Rename :py:meth:`Dataset.to_array` to :py:meth:`Dataset.to_dataarray` for
+ consistency with :py:meth:`DataArray.to_dataset` &
+ :py:func:`open_dataarray` functions. This is a "soft" deprecation — the
+ existing methods work and don't raise any warnings, given the relatively small
+ benefits of the change.
+ By `Maximilian Roos `_.
+- Finally remove ``keep_attrs`` kwarg from :py:meth:`DataArray.resample` and
+ :py:meth:`Dataset.resample`. These were deprecated a long time ago.
+ By `Deepak Cherian `_.
+
+Bug fixes
+~~~~~~~~~
+
+- Port `bug fix from pandas `_
+ to eliminate the adjustment of resample bin edges in the case that the
+ resampling frequency has units of days and is greater than one day
+ (e.g. ``"2D"``, ``"3D"`` etc.) and the ``closed`` argument is set to
+ ``"right"`` to xarray's implementation of resample for data indexed by a
+ :py:class:`CFTimeIndex` (:pull:`8393`).
+ By `Spencer Clark `_.
+- Fix to once again support date offset strings as input to the loffset
+ parameter of resample and test this functionality (:pull:`8422`, :issue:`8399`).
+ By `Katelyn FitzGerald `_.
+- Fix a bug where :py:meth:`DataArray.to_dataset` silently drops a variable
+ if a coordinate with the same name already exists (:pull:`8433`, :issue:`7823`).
+ By `András Gunyhó `_.
+- Fix for :py:meth:`DataArray.to_zarr` & :py:meth:`Dataset.to_zarr` to close
+ the created zarr store when passing a path with `.zip` extension (:pull:`8425`).
+ By `Carl Andersson _`.
+
+Documentation
+~~~~~~~~~~~~~
+- Small updates to documentation on distributed writes: See :ref:`io.zarr.appending` to Zarr.
+ By `Deepak Cherian `_.
+
+.. _whats-new.2023.10.1:
+
+v2023.10.1 (19 Oct, 2023)
+-------------------------
+
+This release updates our minimum numpy version in ``pyproject.toml`` to 1.22,
+consistent with our documentation below.
+
+.. _whats-new.2023.10.0:
+
+v2023.10.0 (19 Oct, 2023)
+-------------------------
+
+This release brings performance enhancements to reading Zarr datasets, the ability to use `numbagg `_ for reductions,
+an expansion in API for ``rolling_exp``, fixes two regressions with datetime decoding,
+and many other bugfixes and improvements. Groupby reductions will also use ``numbagg`` if ``flox>=0.8.1`` and ``numbagg`` are both installed.
+
+Thanks to our 13 contributors:
+Anderson Banihirwe, Bart Schilperoort, Deepak Cherian, Illviljan, Kai Mühlbauer, Mathias Hauser, Maximilian Roos, Michael Niklas, Pieter Eendebak, Simon Høxbro Hansen, Spencer Clark, Tom White, olimcc
+
+New Features
+~~~~~~~~~~~~
+- Support high-performance reductions with `numbagg `_.
+ This is enabled by default if ``numbagg`` is installed.
+ By `Deepak Cherian `_. (:pull:`8316`)
+- Add ``corr``, ``cov``, ``std`` & ``var`` to ``.rolling_exp``.
+ By `Maximilian Roos `_. (:pull:`8307`)
+- :py:meth:`DataArray.where` & :py:meth:`Dataset.where` accept a callable for
+ the ``other`` parameter, passing the object as the only argument. Previously,
+ this was only valid for the ``cond`` parameter. (:issue:`8255`)
+ By `Maximilian Roos `_.
+- ``.rolling_exp`` functions can now take a ``min_weight`` parameter, to only
+ output values when there are sufficient recent non-nan values.
+ ``numbagg>=0.3.1`` is required. (:pull:`8285`)
+ By `Maximilian Roos `_.
+- :py:meth:`DataArray.sortby` & :py:meth:`Dataset.sortby` accept a callable for
+ the ``variables`` parameter, passing the object as the only argument.
+ By `Maximilian Roos `_.
+- ``.rolling_exp`` functions can now operate on dask-backed arrays, assuming the
+ core dim has exactly one chunk. (:pull:`8284`).
+ By `Maximilian Roos `_.
+
+Breaking changes
+~~~~~~~~~~~~~~~~
+
+- Made more arguments keyword-only (e.g. ``keep_attrs``, ``skipna``) for many :py:class:`xarray.DataArray` and
+ :py:class:`xarray.Dataset` methods (:pull:`6403`). By `Mathias Hauser `_.
+- :py:meth:`Dataset.to_zarr` & :py:meth:`DataArray.to_zarr` require keyword
+ arguments after the initial 7 positional arguments.
+ By `Maximilian Roos `_.
+
+
+Deprecations
+~~~~~~~~~~~~
+- Rename :py:meth:`Dataset.reset_encoding` & :py:meth:`DataArray.reset_encoding`
+ to :py:meth:`Dataset.drop_encoding` & :py:meth:`DataArray.drop_encoding` for
+ consistency with other ``drop`` & ``reset`` methods — ``drop`` generally
+ removes something, while ``reset`` generally resets to some default or
+ standard value. (:pull:`8287`, :issue:`8259`)
+ By `Maximilian Roos `_.
+
+Bug fixes
+~~~~~~~~~
+
+- :py:meth:`DataArray.rename` & :py:meth:`Dataset.rename` would emit a warning
+ when the operation was a no-op. (:issue:`8266`)
+ By `Simon Hansen