Skip to content
forked from pydata/xarray

Commit

Permalink
Merge remote-tracking branch 'upstream/master' into groupby-plot
Browse files Browse the repository at this point in the history
* upstream/master:
  allow passing any iterable to drop when dropping variables (pydata#3693)
  Typo on DataSet/DataArray.to_dict documentation (pydata#3692)
  Fix mypy type checking tests failure in ds.merge (pydata#3690)
  Explicitly convert result of pd.to_datetime to a timezone-naive type (pydata#3688)
  ds.merge(da) bugfix (pydata#3677)
  fix docstring for combine_first: returns a Dataset (pydata#3683)
  Add option to choose mfdataset attributes source. (pydata#3498)
  How do I add a new variable to dataset. (pydata#3679)
  Add map_blocks example to whats-new (pydata#3682)
  Make dask names change when chunking Variables by different amounts. (pydata#3584)
  raise an error when renaming dimensions to existing names (pydata#3645)
  Support swap_dims to dimension names that are not existing variables (pydata#3636)
  Add map_blocks example to docs. (pydata#3667)
  add multiindex level name checking to .rename() (pydata#3658)
  • Loading branch information
dcherian committed Jan 14, 2020
2 parents cb78770 + e0fd480 commit a834dde
Show file tree
Hide file tree
Showing 14 changed files with 240 additions and 30 deletions.
2 changes: 2 additions & 0 deletions doc/data-structures.rst
Original file line number Diff line number Diff line change
Expand Up @@ -353,6 +353,8 @@ setting) variables and attributes:
This is particularly useful in an exploratory context, because you can
tab-complete these variable names with tools like IPython.

.. _dictionary_like_methods:

Dictionary like methods
~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
3 changes: 3 additions & 0 deletions doc/howdoi.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ How do I ...

* - How do I...
- Solution
* - add a DataArray to my dataset as a new variable
- ``my_dataset[varname] = my_dataArray`` or :py:meth:`Dataset.assign` (see also :ref:`dictionary_like_methods`)
* - add variables from other datasets to my dataset
- :py:meth:`Dataset.merge`
* - add a new dimension and/or coordinate
Expand Down Expand Up @@ -57,3 +59,4 @@ How do I ...
- ``obj.dt.ceil``, ``obj.dt.floor``, ``obj.dt.round``. See :ref:`dt_accessor` for more.
* - make a mask that is ``True`` where an object contains any of the values in a array
- :py:meth:`Dataset.isin`, :py:meth:`DataArray.isin`

22 changes: 22 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,13 +37,20 @@ New Features
- Added the ``count`` reduction method to both :py:class:`~core.rolling.DatasetCoarsen`
and :py:class:`~core.rolling.DataArrayCoarsen` objects. (:pull:`3500`)
By `Deepak Cherian <https://github.com/dcherian>`_
- Add `attrs_file` option in :py:func:`~xarray.open_mfdataset` to choose the
source file for global attributes in a multi-file dataset (:issue:`2382`,
:pull:`3498`) by `Julien Seguinot <https://github.com/juseg>_`.
- :py:meth:`Dataset.swap_dims` and :py:meth:`DataArray.swap_dims`
now allow swapping to dimension names that don't exist yet. (:pull:`3636`)
By `Justus Magin <https://github.com/keewis>`_.
- Extend :py:class:`core.accessor_dt.DatetimeAccessor` properties
and support `.dt` accessor for timedelta
via :py:class:`core.accessor_dt.TimedeltaAccessor` (:pull:`3612`)
By `Anderson Banihirwe <https://github.com/andersy005>`_.

Bug fixes
~~~~~~~~~

- Fix :py:meth:`xarray.combine_by_coords` to allow for combining incomplete
hypercubes of Datasets (:issue:`3648`). By `Ian Bolliger
<https://github.com/bolliger32>`_.
Expand All @@ -62,6 +69,16 @@ Bug fixes
By `Tom Augspurger <https://github.com/TomAugspurger>`_.
- Ensure :py:meth:`Dataset.quantile`, :py:meth:`DataArray.quantile` issue the correct error
when ``q`` is out of bounds (:issue:`3634`) by `Mathias Hauser <https://github.com/mathause>`_.
- Raise an error when trying to use :py:meth:`Dataset.rename_dims` to
rename to an existing name (:issue:`3438`, :pull:`3645`)
By `Justus Magin <https://github.com/keewis>`_.
- :py:meth:`Dataset.rename`, :py:meth:`DataArray.rename` now check for conflicts with
MultiIndex level names.
- :py:meth:`Dataset.merge` no longer fails when passed a `DataArray` instead of a `Dataset` object.
By `Tom Nicholas <https://github.com/TomNicholas>`_.
- Fix a regression in :py:meth:`Dataset.drop`: allow passing any
iterable when dropping variables (:issue:`3552`, :pull:`3693`)
By `Justus Magin <https://github.com/keewis>`_.

Documentation
~~~~~~~~~~~~~
Expand All @@ -80,9 +97,14 @@ Documentation
- Added examples for :py:meth:`DataArray.quantile`, :py:meth:`Dataset.quantile` and
``GroupBy.quantile``. (:pull:`3576`)
By `Justus Magin <https://github.com/keewis>`_.
- Added example for :py:func:`~xarray.map_blocks`. (:pull:`3667`)
By `Riley X. Brady <https://github.com/bradyrx>`_.

Internal Changes
~~~~~~~~~~~~~~~~
- Make sure dask names change when rechunking by different chunk sizes. Conversely, make sure they
stay the same when rechunking by the same chunk size. (:issue:`3350`)
By `Deepak Cherian <https://github.com/dcherian>`_.
- 2x to 5x speed boost (on small arrays) for :py:meth:`Dataset.isel`,
:py:meth:`DataArray.isel`, and :py:meth:`DataArray.__getitem__` when indexing by int,
slice, list of int, scalar ndarray, or 1-dimensional ndarray.
Expand Down
19 changes: 16 additions & 3 deletions xarray/backends/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -718,6 +718,7 @@ def open_mfdataset(
autoclose=None,
parallel=False,
join="outer",
attrs_file=None,
**kwargs,
):
"""Open multiple files as a single dataset.
Expand All @@ -729,8 +730,8 @@ def open_mfdataset(
``combine_by_coords`` and ``combine_nested``. By default the old (now deprecated)
``auto_combine`` will be used, please specify either ``combine='by_coords'`` or
``combine='nested'`` in future. Requires dask to be installed. See documentation for
details on dask [1]_. Attributes from the first dataset file are used for the
combined dataset.
details on dask [1]_. Global attributes from the ``attrs_file`` are used
for the combined dataset.
Parameters
----------
Expand Down Expand Up @@ -827,6 +828,10 @@ def open_mfdataset(
- 'override': if indexes are of same size, rewrite indexes to be
those of the first object with that dimension. Indexes for the same
dimension must have the same size in all objects.
attrs_file : str or pathlib.Path, optional
Path of the file used to read global attributes from.
By default global attributes are read from the first file provided,
with wildcard matches sorted by filename.
**kwargs : optional
Additional arguments passed on to :py:func:`xarray.open_dataset`.
Expand Down Expand Up @@ -961,7 +966,15 @@ def open_mfdataset(
raise

combined._file_obj = _MultiFileCloser(file_objs)
combined.attrs = datasets[0].attrs

# read global attributes from the attrs_file or from the first dataset
if attrs_file is not None:
if isinstance(attrs_file, Path):
attrs_file = str(attrs_file)
combined.attrs = datasets[paths.index(attrs_file)].attrs
else:
combined.attrs = datasets[0].attrs

return combined


Expand Down
12 changes: 9 additions & 3 deletions xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -1480,8 +1480,7 @@ def swap_dims(self, dims_dict: Mapping[Hashable, Hashable]) -> "DataArray":
----------
dims_dict : dict-like
Dictionary whose keys are current dimension names and whose values
are new names. Each value must already be a coordinate on this
array.
are new names.
Returns
-------
Expand All @@ -1504,6 +1503,13 @@ def swap_dims(self, dims_dict: Mapping[Hashable, Hashable]) -> "DataArray":
Coordinates:
x (y) <U1 'a' 'b'
* y (y) int64 0 1
>>> arr.swap_dims({"x": "z"})
<xarray.DataArray (z: 2)>
array([0, 1])
Coordinates:
x (z) <U1 'a' 'b'
y (z) int64 0 1
Dimensions without coordinates: z
See Also
--------
Expand Down Expand Up @@ -2362,7 +2368,7 @@ def to_dict(self, data: bool = True) -> dict:
naming conventions.
Converts all variables and attributes to native Python objects.
Useful for coverting to json. To avoid datetime incompatibility
Useful for converting to json. To avoid datetime incompatibility
use decode_times=False kwarg in xarrray.open_dataset.
Parameters
Expand Down
51 changes: 38 additions & 13 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,11 +85,16 @@
either_dict_or_kwargs,
hashable,
is_dict_like,
is_list_like,
is_scalar,
maybe_wrap_array,
)
from .variable import IndexVariable, Variable, as_variable, broadcast_variables
from .variable import (
IndexVariable,
Variable,
as_variable,
broadcast_variables,
assert_unique_multiindex_level_names,
)

if TYPE_CHECKING:
from ..backends import AbstractDataStore, ZarrStore
Expand Down Expand Up @@ -1748,7 +1753,10 @@ def maybe_chunk(name, var, chunks):
if not chunks:
chunks = None
if var.ndim > 0:
token2 = tokenize(name, token if token else var._data)
# when rechunking by different amounts, make sure dask names change
# by provinding chunks as an input to tokenize.
# subtle bugs result otherwise. see GH3350
token2 = tokenize(name, token if token else var._data, chunks)
name2 = f"{name_prefix}{name}-{token2}"
return var.chunk(chunks, name=name2, lock=lock)
else:
Expand Down Expand Up @@ -2780,6 +2788,7 @@ def rename(
variables, coord_names, dims, indexes = self._rename_all(
name_dict=name_dict, dims_dict=name_dict
)
assert_unique_multiindex_level_names(variables)
return self._replace(variables, coord_names, dims=dims, indexes=indexes)

def rename_dims(
Expand All @@ -2791,7 +2800,8 @@ def rename_dims(
----------
dims_dict : dict-like, optional
Dictionary whose keys are current dimension names and
whose values are the desired names.
whose values are the desired names. The desired names must
not be the name of an existing dimension or Variable in the Dataset.
**dims, optional
Keyword form of ``dims_dict``.
One of dims_dict or dims must be provided.
Expand All @@ -2809,12 +2819,17 @@ def rename_dims(
DataArray.rename
"""
dims_dict = either_dict_or_kwargs(dims_dict, dims, "rename_dims")
for k in dims_dict:
for k, v in dims_dict.items():
if k not in self.dims:
raise ValueError(
"cannot rename %r because it is not a "
"dimension in this dataset" % k
)
if v in self.dims or v in self:
raise ValueError(
f"Cannot rename {k} to {v} because {v} already exists. "
"Try using swap_dims instead."
)

variables, coord_names, sizes, indexes = self._rename_all(
name_dict={}, dims_dict=dims_dict
Expand Down Expand Up @@ -2868,8 +2883,7 @@ def swap_dims(
----------
dims_dict : dict-like
Dictionary whose keys are current dimension names and whose values
are new names. Each value must already be a variable in the
dataset.
are new names.
Returns
-------
Expand Down Expand Up @@ -2898,6 +2912,16 @@ def swap_dims(
Data variables:
a (y) int64 5 7
b (y) float64 0.1 2.4
>>> ds.swap_dims({"x": "z"})
<xarray.Dataset>
Dimensions: (z: 2)
Coordinates:
x (z) <U1 'a' 'b'
y (z) int64 0 1
Dimensions without coordinates: z
Data variables:
a (z) int64 5 7
b (z) float64 0.1 2.4
See Also
--------
Expand All @@ -2914,7 +2938,7 @@ def swap_dims(
"cannot swap from dimension %r because it is "
"not an existing dimension" % k
)
if self.variables[v].dims != (k,):
if v in self.variables and self.variables[v].dims != (k,):
raise ValueError(
"replacement dimension %r is not a 1D "
"variable along the old dimension %r" % (v, k)
Expand All @@ -2923,7 +2947,7 @@ def swap_dims(
result_dims = {dims_dict.get(dim, dim) for dim in self.dims}

coord_names = self._coord_names.copy()
coord_names.update(dims_dict.values())
coord_names.update({dim for dim in dims_dict.values() if dim in self.variables})

variables: Dict[Hashable, Variable] = {}
indexes: Dict[Hashable, pd.Index] = {}
Expand Down Expand Up @@ -3525,7 +3549,7 @@ def update(self, other: "CoercibleMapping", inplace: bool = None) -> "Dataset":

def merge(
self,
other: "CoercibleMapping",
other: Union["CoercibleMapping", "DataArray"],
inplace: bool = None,
overwrite_vars: Union[Hashable, Iterable[Hashable]] = frozenset(),
compat: str = "no_conflicts",
Expand Down Expand Up @@ -3582,6 +3606,7 @@ def merge(
If any variables conflict (see ``compat``).
"""
_check_inplace(inplace)
other = other.to_dataset() if isinstance(other, xr.DataArray) else other
merge_result = dataset_merge_method(
self,
other,
Expand Down Expand Up @@ -3664,7 +3689,7 @@ def drop(self, labels=None, dim=None, *, errors="raise", **labels_kwargs):
raise ValueError("cannot specify dim and dict-like arguments.")
labels = either_dict_or_kwargs(labels, labels_kwargs, "drop")

if dim is None and (is_list_like(labels) or is_scalar(labels)):
if dim is None and (is_scalar(labels) or isinstance(labels, Iterable)):
warnings.warn(
"dropping variables using `drop` will be deprecated; using drop_vars is encouraged.",
PendingDeprecationWarning,
Expand Down Expand Up @@ -4127,7 +4152,7 @@ def combine_first(self, other: "Dataset") -> "Dataset":
Returns
-------
DataArray
Dataset
"""
out = ops.fillna(self, other, join="outer", dataset_join="outer")
return out
Expand Down Expand Up @@ -4641,7 +4666,7 @@ def to_dict(self, data=True):
conventions.
Converts all variables and attributes to native Python objects
Useful for coverting to json. To avoid datetime incompatibility
Useful for converting to json. To avoid datetime incompatibility
use decode_times=False kwarg in xarrray.open_dataset.
Parameters
Expand Down
42 changes: 42 additions & 0 deletions xarray/core/parallel.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,48 @@ def map_blocks(
--------
dask.array.map_blocks, xarray.apply_ufunc, xarray.Dataset.map_blocks,
xarray.DataArray.map_blocks
Examples
--------
Calculate an anomaly from climatology using ``.groupby()``. Using
``xr.map_blocks()`` allows for parallel operations with knowledge of ``xarray``,
its indices, and its methods like ``.groupby()``.
>>> def calculate_anomaly(da, groupby_type='time.month'):
... # Necessary workaround to xarray's check with zero dimensions
... # https://github.com/pydata/xarray/issues/3575
... if sum(da.shape) == 0:
... return da
... gb = da.groupby(groupby_type)
... clim = gb.mean(dim='time')
... return gb - clim
>>> time = xr.cftime_range('1990-01', '1992-01', freq='M')
>>> np.random.seed(123)
>>> array = xr.DataArray(np.random.rand(len(time)),
... dims="time", coords=[time]).chunk()
>>> xr.map_blocks(calculate_anomaly, array).compute()
<xarray.DataArray (time: 24)>
array([ 0.12894847, 0.11323072, -0.0855964 , -0.09334032, 0.26848862,
0.12382735, 0.22460641, 0.07650108, -0.07673453, -0.22865714,
-0.19063865, 0.0590131 , -0.12894847, -0.11323072, 0.0855964 ,
0.09334032, -0.26848862, -0.12382735, -0.22460641, -0.07650108,
0.07673453, 0.22865714, 0.19063865, -0.0590131 ])
Coordinates:
* time (time) object 1990-01-31 00:00:00 ... 1991-12-31 00:00:00
Note that one must explicitly use ``args=[]`` and ``kwargs={}`` to pass arguments
to the function being applied in ``xr.map_blocks()``:
>>> xr.map_blocks(calculate_anomaly, array, kwargs={'groupby_type': 'time.year'})
<xarray.DataArray (time: 24)>
array([ 0.15361741, -0.25671244, -0.31600032, 0.008463 , 0.1766172 ,
-0.11974531, 0.43791243, 0.14197797, -0.06191987, -0.15073425,
-0.19967375, 0.18619794, -0.05100474, -0.42989909, -0.09153273,
0.24841842, -0.30708526, -0.31412523, 0.04197439, 0.0422506 ,
0.14482397, 0.35985481, 0.23487834, 0.12144652])
Coordinates:
* time (time) object 1990-01-31 00:00:00 ... 1991-12-31 00:00:00
"""

def _wrapper(func, obj, to_array, args, kwargs):
Expand Down
7 changes: 6 additions & 1 deletion xarray/core/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -547,7 +547,12 @@ def __eq__(self, other) -> bool:
return False

def __hash__(self) -> int:
return hash((ReprObject, self._value))
return hash((type(self), self._value))

def __dask_tokenize__(self):
from dask.base import normalize_token

return normalize_token((type(self), self._value))


@contextlib.contextmanager
Expand Down
Loading

0 comments on commit a834dde

Please sign in to comment.