Skip to content

Commit

Permalink
Optional indexes
Browse files Browse the repository at this point in the history
  • Loading branch information
shoyer committed Nov 5, 2016
1 parent 3f490a3 commit fefd741
Show file tree
Hide file tree
Showing 24 changed files with 925 additions and 613 deletions.
2 changes: 2 additions & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ Attributes
Dataset.coords
Dataset.attrs
Dataset.indexes
Dataset.get_index

Dictionary interface
--------------------
Expand Down Expand Up @@ -193,6 +194,7 @@ Attributes
DataArray.attrs
DataArray.encoding
DataArray.indexes
DataArray.get_index

**ndarray attributes**:
:py:attr:`~DataArray.ndim`
Expand Down
27 changes: 27 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,26 @@ v0.9.0 (unreleased)
Breaking changes
~~~~~~~~~~~~~~~~

- Index coordinates for each dimensions are now optional, and no longer created
by default. This has a number of implications:

- :py:func:`~align` and :py:meth:`~Dataset.reindex` can now error, if
dimensions labels are missing and dimensions have different sizes.
- Because pandas does not support missing indexes, methods such as
``to_dataframe``/``from_dataframe`` and ``stack``/``unstack`` no longer
roundtrip faithfully on all inputs. Use :py:meth:`~Dataset.reset_index` to
remove undesired indexes.
- ``Dataset.__delitem__`` and :py:meth:`~Dataset.drop` no longer delete/drop
variables that have dimensions matching a deleted/dropped variable.
- ``DataArray.coords.__delitem__`` is now allowed on variables matching
dimension names.
- ``.sel`` and ``.loc`` now handle indexing along a dimension with a
coordinate label by doing integer based indexing.
- :py:attr:`~Dataset.indexes` is no longer guaranteed to include all
dimensions names as keys. The new method :py:meth:`~Dataset.get_index` has
been added to get an index for a dimension guaranteed, falling back to
produce a default ``RangeIndex`` if necessary.

- The default behavior of ``merge`` is now ``compat='no_conflicts'``, so some
merges will now succeed in cases that previously raised
``xarray.MergeError``. Set ``compat='broadcast_equals'`` to restore the
Expand Down Expand Up @@ -113,6 +133,13 @@ Bug fixes
- ``Dataset.concat()`` now preserves variables order (:issue:`1027`).
By `Fabien Maussion <https://github.com/fmaussion>`_.

- Grouping over an dimension with non-unique values with ``groupby`` gives
correct groups.

- Fixed accessing coordinate variables with non-string names from ``.coords``
(:issue:`TBD`).
By `Stephan Hoyer <https://github.com/shoyer>`_.

.. _whats-new.0.8.2:

v0.8.2 (18 August 2016)
Expand Down
25 changes: 0 additions & 25 deletions xarray/backends/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,25 +30,6 @@ def _decode_variable_name(name):
return name


def is_trivial_index(var):
"""
Determines if in index is 'trivial' meaning that it is
equivalent to np.arange(). This is determined by
checking if there are any attributes or encodings,
if ndims is one, dtype is int and finally by comparing
the actual values to np.arange()
"""
# if either attributes or encodings are defined
# the index is not trivial.
if len(var.attrs) or len(var.encoding):
return False
# if the index is not a 1d integer array
if var.ndim > 1 or not var.dtype.kind == 'i':
return False
arange = np.arange(var.size, dtype=var.dtype)
return np.all(var.values == arange)


def robust_getitem(array, key, catch=Exception, max_retries=6,
initial_delay=500):
"""
Expand Down Expand Up @@ -200,12 +181,6 @@ def store_dataset(self, dataset):

def store(self, variables, attributes, check_encoding_set=frozenset()):
self.set_attributes(attributes)
neccesary_dims = [v.dims for v in variables.values()]
neccesary_dims = set(itertools.chain(*neccesary_dims))
# set all non-indexes and any index which is not trivial.
variables = OrderedDict((k, v) for k, v in iteritems(variables)
if not (k in neccesary_dims and
is_trivial_index(v)))
self.set_variables(variables, check_encoding_set)

def set_attributes(self, attributes):
Expand Down
4 changes: 2 additions & 2 deletions xarray/conventions.py
Original file line number Diff line number Diff line change
Expand Up @@ -910,7 +910,7 @@ def decode_cf(obj, concat_characters=True, mask_and_scale=True,
identify coordinates.
drop_variables: string or iterable, optional
A variable or list of variables to exclude from being parsed from the
dataset.This may be useful to drop variables with problems or
dataset. This may be useful to drop variables with problems or
inconsistent values.
Returns
Expand All @@ -936,7 +936,7 @@ def decode_cf(obj, concat_characters=True, mask_and_scale=True,
vars, attrs, concat_characters, mask_and_scale, decode_times,
decode_coords, drop_variables=drop_variables)
ds = Dataset(vars, attrs=attrs)
ds = ds.set_coords(coord_names.union(extra_coords))
ds = ds.set_coords(coord_names.union(extra_coords).intersection(vars))
ds._file_obj = file_obj
return ds

Expand Down
Loading

0 comments on commit fefd741

Please sign in to comment.