Skip to content

Commit

Permalink
OrderedDict --> dict, some python3.5 cleanup too (#3389)
Browse files Browse the repository at this point in the history
* OrderedDict --> dict, some python3.5 cleanup too

* respond to part of @shoyer's review

* fix set attr syntax on netcdf4 vars

* fix typing errors

* update whats new and todo comments

* Typing annotations

* Typing annotations

* Fix regression

* More substantial changes

* More polish

* Typing annotations

* Rerun notebooks
  • Loading branch information
Joe Hamman authored Oct 12, 2019
1 parent 6851e3e commit 863e490
Show file tree
Hide file tree
Showing 42 changed files with 485 additions and 638 deletions.
12 changes: 6 additions & 6 deletions doc/data-structures.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ multi-dimensional array. It has several key properties:
- ``coords``: a dict-like container of arrays (*coordinates*) that label each
point (e.g., 1-dimensional arrays of numbers, datetime objects or
strings)
- ``attrs``: an ``OrderedDict`` to hold arbitrary metadata (*attributes*)
- ``attrs``: :py:class:`dict` to hold arbitrary metadata (*attributes*)

xarray uses ``dims`` and ``coords`` to enable its core metadata aware operations.
Dimensions provide names that xarray uses instead of the ``axis`` argument found
Expand All @@ -32,10 +32,10 @@ alignment, building on the functionality of the ``index`` found on a pandas
:py:class:`~pandas.DataFrame` or :py:class:`~pandas.Series`.

DataArray objects also can have a ``name`` and can hold arbitrary metadata in
the form of their ``attrs`` property (an ordered dictionary). Names and
attributes are strictly for users and user-written code: xarray makes no attempt
to interpret them, and propagates them only in unambiguous cases (see FAQ,
:ref:`approach to metadata`).
the form of their ``attrs`` property. Names and attributes are strictly for
users and user-written code: xarray makes no attempt to interpret them, and
propagates them only in unambiguous cases
(see FAQ, :ref:`approach to metadata`).

.. _creating a dataarray:

Expand Down Expand Up @@ -222,7 +222,7 @@ to access any variable in a dataset, datasets have four key properties:
- ``data_vars``: a dict-like container of DataArrays corresponding to variables
- ``coords``: another dict-like container of DataArrays intended to label points
used in ``data_vars`` (e.g., arrays of numbers, datetime objects or strings)
- ``attrs``: an ``OrderedDict`` to hold arbitrary metadata
- ``attrs``: :py:class:`dict` to hold arbitrary metadata

The distinction between whether a variables falls in data or coordinates
(borrowed from `CF conventions`_) is mostly semantic, and you can probably get
Expand Down
8 changes: 8 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,14 @@ Breaking changes
It was unused and doesn't make sense for a Variable.
(:pull:`3375`) by `Guido Imperiale <https://github.com/crusaderky>`_.

- Remove internal usage of `collections.OrderedDict`. After dropping support for
Python <=3.5, most uses of `OrderedDict` in Xarray were no longer necessary. We
have removed the internal use of the `OrderedDict` in favor of Python's builtin
`dict` object which is now ordered itself. This change will be most obvious when
interacting with the `attrs` property on the Dataset and DataArray objects.

(:issue:`3380`, :issue:`3389`). By `Joe Hamman <https://github.com/jhamman>`_.

New functions/methods
~~~~~~~~~~~~~~~~~~~~~

Expand Down
54 changes: 27 additions & 27 deletions examples/xarray_multidimensional_coords.ipynb

Large diffs are not rendered by default.

14 changes: 7 additions & 7 deletions examples/xarray_seasonal_means.ipynb

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions xarray/backends/cfgrib_.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

from .. import Variable
from ..core import indexing
from ..core.utils import Frozen, FrozenOrderedDict
from ..core.utils import Frozen, FrozenDict
from .common import AbstractDataStore, BackendArray
from .locks import SerializableLock, ensure_lock

Expand Down Expand Up @@ -55,7 +55,7 @@ def open_store_variable(self, name, var):
return Variable(var.dimensions, data, var.attributes, encoding)

def get_variables(self):
return FrozenOrderedDict(
return FrozenDict(
(k, self.open_store_variable(k, v)) for k, v in self.ds.variables.items()
)

Expand Down
25 changes: 8 additions & 17 deletions xarray/backends/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,14 @@
import time
import traceback
import warnings
from collections import OrderedDict
from collections.abc import Mapping

import numpy as np

from ..conventions import cf_encoder
from ..core import indexing
from ..core.pycompat import dask_array_type
from ..core.utils import FrozenOrderedDict, NdimSizeLenMixin
from ..core.utils import FrozenDict, NdimSizeLenMixin

# Create a logger object, but don't add any handlers. Leave that to user code.
logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -120,10 +119,10 @@ def load(self):
This function will be called anytime variables or attributes
are requested, so care should be taken to make sure its fast.
"""
variables = FrozenOrderedDict(
variables = FrozenDict(
(_decode_variable_name(k), v) for k, v in self.get_variables().items()
)
attributes = FrozenOrderedDict(self.get_attrs())
attributes = FrozenDict(self.get_attrs())
return variables, attributes

@property
Expand Down Expand Up @@ -230,12 +229,8 @@ def encode(self, variables, attributes):
attributes : dict-like
"""
variables = OrderedDict(
[(k, self.encode_variable(v)) for k, v in variables.items()]
)
attributes = OrderedDict(
[(k, self.encode_attribute(v)) for k, v in attributes.items()]
)
variables = {k: self.encode_variable(v) for k, v in variables.items()}
attributes = {k: self.encode_attribute(v) for k, v in attributes.items()}
return variables, attributes

def encode_variable(self, v):
Expand Down Expand Up @@ -361,7 +356,7 @@ def set_dimensions(self, variables, unlimited_dims=None):

existing_dims = self.get_dimensions()

dims = OrderedDict()
dims = {}
for v in unlimited_dims: # put unlimited_dims first
dims[v] = None
for v in variables.values():
Expand All @@ -385,10 +380,6 @@ def encode(self, variables, attributes):
# All NetCDF files get CF encoded by default, without this attempting
# to write times, for example, would fail.
variables, attributes = cf_encoder(variables, attributes)
variables = OrderedDict(
[(k, self.encode_variable(v)) for k, v in variables.items()]
)
attributes = OrderedDict(
[(k, self.encode_attribute(v)) for k, v in attributes.items()]
)
variables = {k: self.encode_variable(v) for k, v in variables.items()}
attributes = {k: self.encode_attribute(v) for k, v in attributes.items()}
return variables, attributes
9 changes: 4 additions & 5 deletions xarray/backends/h5netcdf_.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@
import functools
from collections import OrderedDict

import numpy as np

from .. import Variable
from ..core import indexing
from ..core.utils import FrozenOrderedDict
from ..core.utils import FrozenDict
from .common import WritableCFDataStore
from .file_manager import CachingFileManager
from .locks import HDF5_LOCK, combine_locks, ensure_lock, get_write_lock
Expand Down Expand Up @@ -49,7 +48,7 @@ def _read_attributes(h5netcdf_var):
# GH451
# to ensure conventions decoding works properly on Python 3, decode all
# bytes attributes to strings
attrs = OrderedDict()
attrs = {}
for k, v in h5netcdf_var.attrs.items():
if k not in ["_FillValue", "missing_value"]:
v = maybe_decode_bytes(v)
Expand Down Expand Up @@ -153,12 +152,12 @@ def open_store_variable(self, name, var):
return Variable(dimensions, data, attrs, encoding)

def get_variables(self):
return FrozenOrderedDict(
return FrozenDict(
(k, self.open_store_variable(k, v)) for k, v in self.ds.variables.items()
)

def get_attrs(self):
return FrozenOrderedDict(_read_attributes(self.ds))
return FrozenDict(_read_attributes(self.ds))

def get_dimensions(self):
return self.ds.dimensions
Expand Down
7 changes: 3 additions & 4 deletions xarray/backends/memory.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
import copy
from collections import OrderedDict

import numpy as np

Expand All @@ -16,8 +15,8 @@ class InMemoryDataStore(AbstractWritableDataStore):
"""

def __init__(self, variables=None, attributes=None):
self._variables = OrderedDict() if variables is None else variables
self._attributes = OrderedDict() if attributes is None else attributes
self._variables = {} if variables is None else variables
self._attributes = {} if attributes is None else attributes

def get_attrs(self):
return self._attributes
Expand All @@ -26,7 +25,7 @@ def get_variables(self):
return self._variables

def get_dimensions(self):
dims = OrderedDict()
dims = {}
for v in self._variables.values():
for d, s in v.dims.items():
dims[d] = s
Expand Down
41 changes: 11 additions & 30 deletions xarray/backends/netCDF4_.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
import functools
import operator
from collections import OrderedDict
from contextlib import suppress

import numpy as np

from .. import Variable, coding
from ..coding.variables import pop_to
from ..core import indexing
from ..core.utils import FrozenOrderedDict, is_remote_uri
from ..core.utils import FrozenDict, is_remote_uri
from .common import (
BackendArray,
WritableCFDataStore,
Expand Down Expand Up @@ -274,25 +273,6 @@ def _is_list_of_strings(value):
return False


def _set_nc_attribute(obj, key, value):
if _is_list_of_strings(value):
# encode as NC_STRING if attr is list of strings
try:
obj.setncattr_string(key, value)
except AttributeError:
# Inform users with old netCDF that does not support
# NC_STRING that we can't serialize lists of strings
# as attrs
msg = (
"Attributes which are lists of strings are not "
"supported with this version of netCDF. Please "
"upgrade to netCDF4-python 1.2.4 or greater."
)
raise AttributeError(msg)
else:
obj.setncattr(key, value)


class NetCDF4DataStore(WritableCFDataStore):
"""Store for reading and writing data via the Python-NetCDF4 library.
Expand Down Expand Up @@ -388,7 +368,7 @@ def ds(self):
def open_store_variable(self, name, var):
dimensions = var.dimensions
data = indexing.LazilyOuterIndexedArray(NetCDF4ArrayWrapper(name, self))
attributes = OrderedDict((k, var.getncattr(k)) for k in var.ncattrs())
attributes = {k: var.getncattr(k) for k in var.ncattrs()}
_ensure_fill_value_valid(data, attributes)
# netCDF4 specific encoding; save _FillValue for later
encoding = {}
Expand All @@ -415,17 +395,17 @@ def open_store_variable(self, name, var):
return Variable(dimensions, data, attributes, encoding)

def get_variables(self):
dsvars = FrozenOrderedDict(
dsvars = FrozenDict(
(k, self.open_store_variable(k, v)) for k, v in self.ds.variables.items()
)
return dsvars

def get_attrs(self):
attrs = FrozenOrderedDict((k, self.ds.getncattr(k)) for k in self.ds.ncattrs())
attrs = FrozenDict((k, self.ds.getncattr(k)) for k in self.ds.ncattrs())
return attrs

def get_dimensions(self):
dims = FrozenOrderedDict((k, len(v)) for k, v in self.ds.dimensions.items())
dims = FrozenDict((k, len(v)) for k, v in self.ds.dimensions.items())
return dims

def get_encoding(self):
Expand All @@ -442,7 +422,11 @@ def set_dimension(self, name, length, is_unlimited=False):
def set_attribute(self, key, value):
if self.format != "NETCDF4":
value = encode_nc3_attr_value(value)
_set_nc_attribute(self.ds, key, value)
if _is_list_of_strings(value):
# encode as NC_STRING if attr is list of strings
self.ds.setncattr_string(key, value)
else:
self.ds.setncattr(key, value)

def encode_variable(self, variable):
variable = _force_native_endianness(variable)
Expand Down Expand Up @@ -494,10 +478,7 @@ def prepare_variable(
fill_value=fill_value,
)

for k, v in attrs.items():
# set attributes one-by-one since netCDF4<1.0.10 can't handle
# OrderedDict as the input to setncatts
_set_nc_attribute(nc4_var, k, v)
nc4_var.setncatts(attrs)

target = NetCDF4ArrayWrapper(name, self)

Expand Down
3 changes: 1 addition & 2 deletions xarray/backends/netcdf3.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
import unicodedata
from collections import OrderedDict

import numpy as np

Expand Down Expand Up @@ -70,7 +69,7 @@ def encode_nc3_attr_value(value):


def encode_nc3_attrs(attrs):
return OrderedDict([(k, encode_nc3_attr_value(v)) for k, v in attrs.items()])
return {k: encode_nc3_attr_value(v) for k, v in attrs.items()}


def encode_nc3_variable(var):
Expand Down
8 changes: 3 additions & 5 deletions xarray/backends/pseudonetcdf_.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,8 @@
from collections import OrderedDict

import numpy as np

from .. import Variable
from ..core import indexing
from ..core.utils import Frozen, FrozenOrderedDict
from ..core.utils import Frozen, FrozenDict
from .common import AbstractDataStore, BackendArray
from .file_manager import CachingFileManager
from .locks import HDF5_LOCK, NETCDFC_LOCK, combine_locks, ensure_lock
Expand Down Expand Up @@ -65,11 +63,11 @@ def ds(self):

def open_store_variable(self, name, var):
data = indexing.LazilyOuterIndexedArray(PncArrayWrapper(name, self))
attrs = OrderedDict((k, getattr(var, k)) for k in var.ncattrs())
attrs = {k: getattr(var, k) for k in var.ncattrs()}
return Variable(var.dimensions, data, attrs)

def get_variables(self):
return FrozenOrderedDict(
return FrozenDict(
(k, self.open_store_variable(k, v)) for k, v in self.ds.variables.items()
)

Expand Down
4 changes: 2 additions & 2 deletions xarray/backends/pydap_.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
from .. import Variable
from ..core import indexing
from ..core.pycompat import integer_types
from ..core.utils import Frozen, FrozenOrderedDict, is_dict_like
from ..core.utils import Frozen, FrozenDict, is_dict_like
from .common import AbstractDataStore, BackendArray, robust_getitem


Expand Down Expand Up @@ -83,7 +83,7 @@ def open_store_variable(self, var):
return Variable(var.dimensions, data, _fix_attributes(var.attributes))

def get_variables(self):
return FrozenOrderedDict(
return FrozenDict(
(k, self.open_store_variable(self.ds[k])) for k in self.ds.keys()
)

Expand Down
4 changes: 2 additions & 2 deletions xarray/backends/pynio_.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

from .. import Variable
from ..core import indexing
from ..core.utils import Frozen, FrozenOrderedDict
from ..core.utils import Frozen, FrozenDict
from .common import AbstractDataStore, BackendArray
from .file_manager import CachingFileManager
from .locks import HDF5_LOCK, NETCDFC_LOCK, SerializableLock, combine_locks, ensure_lock
Expand Down Expand Up @@ -66,7 +66,7 @@ def open_store_variable(self, name, var):
return Variable(var.dimensions, data, var.attributes)

def get_variables(self):
return FrozenOrderedDict(
return FrozenDict(
(k, self.open_store_variable(k, v)) for k, v in self.ds.variables.items()
)

Expand Down
5 changes: 2 additions & 3 deletions xarray/backends/rasterio_.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
import os
import warnings
from collections import OrderedDict

import numpy as np

Expand Down Expand Up @@ -244,7 +243,7 @@ def open_rasterio(filename, parse_coordinates=None, chunks=None, cache=None, loc
if cache is None:
cache = chunks is None

coords = OrderedDict()
coords = {}

# Get bands
if riods.count < 1:
Expand Down Expand Up @@ -276,7 +275,7 @@ def open_rasterio(filename, parse_coordinates=None, chunks=None, cache=None, loc
)

# Attributes
attrs = dict()
attrs = {}
# Affine transformation matrix (always available)
# This describes coefficients mapping pixel coordinates to CRS
# For serialization store as tuple of 6 floats, the last row being
Expand Down
Loading

0 comments on commit 863e490

Please sign in to comment.