Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read grid mapping and bounds as coords #2844

Merged
merged 40 commits into from
Feb 17, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
62152d0
Read and save `grid_mapping` and `bounds` as coordinates.
DWesl Mar 21, 2019
2ae8a7e
Add tests for (de)serialization of `grid_mapping` and `bounds`.
DWesl Mar 21, 2019
fff73c8
BUG: Use only encoding for tracking bounds and grid_mapping.
DWesl Mar 31, 2019
b3696d3
Address feedback on PR.
DWesl May 31, 2019
315d39d
Merge branch 'master' into read_grid_mapping_and_bounds_as_coords
DWesl May 31, 2019
c82cd47
Merge branch 'master' into read_grid_mapping_and_bounds_as_coords
DWesl Feb 14, 2020
02aff73
Style fixes: newline before binary operator.
DWesl Feb 14, 2020
0721506
Style fixes: double quotes for string literals, rewrap lines.
DWesl Feb 14, 2020
239761e
Address comments from review.
DWesl Jul 9, 2020
e0b8e99
Fix style issues and complete name changes.
DWesl Jul 9, 2020
9ba7485
Add more attributes from the CF conventions.
DWesl Aug 2, 2020
bf97fe1
Merge branch 'master' into read_grid_mapping_and_bounds_as_coords
DWesl Aug 2, 2020
4274730
Remove a trailing comma in a one-element dict literal.
DWesl Aug 2, 2020
ca0f805
Merge branch 'master' into read_grid_mapping_and_bounds_as_coords
DWesl Aug 7, 2020
7027767
Stop moving ancillary_variables to coords
DWesl Aug 9, 2020
8d96a66
Expand the list of attributes in the documentation.
DWesl Aug 16, 2020
1a5b35d
Make sure to run the pip associated with the running python.
DWesl Aug 16, 2020
9f53fbb
Warn about new locations for some variables.
DWesl Aug 16, 2020
1b8218d
Merge branch 'master' into read_grid_mapping_and_bounds_as_coords
DWesl Aug 17, 2020
546b43e
Move ancillary variables back to data_vars in test.
DWesl Aug 23, 2020
8ec4af3
Update warnings to provide a more useful stack level.
DWesl Aug 23, 2020
bc0b1d1
Split the CF attribute test into multiple smaller tests.
DWesl Aug 23, 2020
5c085e1
Add a test of a roundtrip after dropping bounds.
DWesl Aug 23, 2020
a5a67d1
Merge work from github back into local branch.
DWesl Aug 23, 2020
a864b83
Run black on changes.
DWesl Aug 23, 2020
c8d1bdc
Check whether round-trip to iris breaks things.
DWesl Aug 23, 2020
478be8a
Remove trailing comma.
DWesl Aug 23, 2020
036695c
Merge branch 'master' into read_grid_mapping_and_bounds_as_coords
DWesl Jan 5, 2021
b0e7a85
Style fixes from black.
DWesl Jan 5, 2021
1a9b201
Include suggestions from review.
DWesl Jan 16, 2021
6f3d55e
Update xarray/tests/test_backends.py
DWesl Jan 17, 2021
5268500
Update xarray/conventions.py
DWesl Jan 17, 2021
2edd367
Mention that there are other attributes not listed
DWesl Jan 17, 2021
948465c
Fix .rst syntax in whats-new
DWesl Jan 17, 2021
c68d372
Shorten name of another test.
DWesl Jan 17, 2021
9ee7c3a
Update docs.
dcherian Jan 17, 2021
b65e579
Merge remote-tracking branch 'upstream/master' into read_grid_mapping…
dcherian Feb 11, 2021
94b8153
fix merge.
dcherian Feb 11, 2021
c8896f3
Activate new behaviour only with `decode_coords="all"`
dcherian Feb 11, 2021
d3ec7ab
[skip-ci] fix docstrings
dcherian Feb 11, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
subprocess.run(["conda", "list"])
else:
print("pip environment:")
subprocess.run(["pip", "list"])
subprocess.run([sys.executable, "-m", "pip", "list"])
dcherian marked this conversation as resolved.
Show resolved Hide resolved

print(f"xarray: {xarray.__version__}, {xarray.__file__}")

Expand Down
30 changes: 30 additions & 0 deletions doc/weather-climate.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,36 @@ Weather and climate data

.. _Climate and Forecast (CF) conventions: http://cfconventions.org

.. _cf_variables:

Related Variables
-----------------

Several CF variable attributes contain lists of other variables
associated with the variable with the attribute. A few of these are
now parsed by XArray, with the attribute value popped to encoding on
read and the variables in that value interpreted as non-dimension
coordinates:

- ``coordinates``
- ``bounds``
- ``grid_mapping``
- ``climatology``
- ``geometry``
- ``node_coordinates``
- ``node_count``
- ``part_node_count``
- ``interior_ring``
- ``cell_measures``
- ``formula_terms``

This decoding is controlled by the ``decode_coords`` kwarg to
:py:func:`open_dataset` and :py:func:`open_mfdataset`.

The CF attribute ``ancillary_variables`` was not included in the list
due to the variables listed there being associated primarily with the
variable with the attribute, rather than with the dimensions.
Copy link
Contributor Author

@DWesl DWesl Jan 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
variable with the attribute, rather than with the dimensions.
variable with the attribute, rather than with the dimensions
associated with that variable.

Copy link
Member

@andersy005 andersy005 Feb 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DWesl, do you still want to commit this suggestion? Wasn't sure whether you missed it or not...


.. _metpy_accessor:

CF-compliant coordinate variables
Expand Down
8 changes: 7 additions & 1 deletion doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,13 @@ Breaking changes
always be set such that ``int64`` values can be used. In the past, no units
finer than "seconds" were chosen, which would sometimes mean that ``float64``
values were required, which would lead to inaccurate I/O round-trips.
- Variables referred to in attributes like ``bounds`` and ``grid_mapping``
are can be set as coordinate variables. These attributes
are moved to :py:attr:`DataArray.encoding` from
:py:attr:`DataArray.attrs`. This behaviour is controlled by the
``decode_coords`` kwarg to :py:func:`open_dataset` and
:py:func:`open_mfdataset`. The full list of decoded attributes is in
:ref:`weather-climate` (:pull:`2844`, :issue:`3689`)
- remove deprecated ``autoclose`` kwargs from :py:func:`open_dataset` (:pull:`4725`).
By `Aureliana Barghini <https://github.com/aurghs>`_.

Expand Down Expand Up @@ -340,7 +347,6 @@ New Features
- Expose ``use_cftime`` option in :py:func:`~xarray.open_zarr` (:issue:`2886`, :pull:`3229`)
By `Samnan Rahee <https://github.com/Geektrovert>`_ and `Anderson Banihirwe <https://github.com/andersy005>`_.


dcherian marked this conversation as resolved.
Show resolved Hide resolved
Bug fixes
~~~~~~~~~

Expand Down
22 changes: 16 additions & 6 deletions xarray/backends/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -354,9 +354,14 @@ def open_dataset(
form string arrays. Dimensions will only be concatenated over (and
removed) if they have no corresponding variable and if they are only
used as the last dimension of character arrays.
decode_coords : bool, optional
If True, decode the 'coordinates' attribute to identify coordinates in
the resulting dataset.
decode_coords : bool or {"coordinates", "all"}, optional
Controls which variables are set as coordinate variables:

- "coordinates" or True: Set variables referred to in the
``'coordinates'`` attribute of the datasets or individual variables
as coordinate variables.
- "all": Set variables referred to in ``'grid_mapping'``, ``'bounds'`` and
other attributes as coordinate variables.
engine : {"netcdf4", "scipy", "pydap", "h5netcdf", "pynio", "cfgrib", \
"pseudonetcdf", "zarr"}, optional
Engine to use when reading files. If not provided, the default engine
Expand Down Expand Up @@ -613,9 +618,14 @@ def open_dataarray(
form string arrays. Dimensions will only be concatenated over (and
removed) if they have no corresponding variable and if they are only
used as the last dimension of character arrays.
decode_coords : bool, optional
If True, decode the 'coordinates' attribute to identify coordinates in
the resulting dataset.
decode_coords : bool or {"coordinates", "all"}, optional
Controls which variables are set as coordinate variables:

- "coordinates" or True: Set variables referred to in the
``'coordinates'`` attribute of the datasets or individual variables
as coordinate variables.
- "all": Set variables referred to in ``'grid_mapping'``, ``'bounds'`` and
other attributes as coordinate variables.
engine : {"netcdf4", "scipy", "pydap", "h5netcdf", "pynio", "cfgrib"}, \
optional
Engine to use when reading files. If not provided, the default engine
Expand Down
12 changes: 8 additions & 4 deletions xarray/backends/apiv2.py
Original file line number Diff line number Diff line change
Expand Up @@ -195,10 +195,14 @@ def open_dataset(
removed) if they have no corresponding variable and if they are only
used as the last dimension of character arrays.
This keyword may not be supported by all the backends.
decode_coords : bool, optional
If True, decode the 'coordinates' attribute to identify coordinates in
the resulting dataset. This keyword may not be supported by all the
backends.
decode_coords : bool or {"coordinates", "all"}, optional
Controls which variables are set as coordinate variables:

- "coordinates" or True: Set variables referred to in the
``'coordinates'`` attribute of the datasets or individual variables
as coordinate variables.
- "all": Set variables referred to in ``'grid_mapping'``, ``'bounds'`` and
other attributes as coordinate variables.
drop_variables: str or iterable, optional
A variable or list of variables to exclude from the dataset parsing.
This may be useful to drop variables with problems or
Expand Down
79 changes: 74 additions & 5 deletions xarray/conventions.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,23 @@
from .core.pycompat import is_duck_dask_array
from .core.variable import IndexVariable, Variable, as_variable

CF_RELATED_DATA = (
"bounds",
"grid_mapping",
"climatology",
"geometry",
"node_coordinates",
"node_count",
"part_node_count",
"interior_ring",
"cell_measures",
"formula_terms",
)
CF_RELATED_DATA_NEEDS_PARSING = (
"cell_measures",
"formula_terms",
)


class NativeEndiannessArray(indexing.ExplicitlyIndexedNDArrayMixin):
"""Decode arrays on the fly from non-native to native endianness
Expand Down Expand Up @@ -256,6 +273,9 @@ def encode_cf_variable(var, needs_copy=True, name=None):
var = maybe_default_fill_value(var)
var = maybe_encode_bools(var)
var = ensure_dtype_not_object(var, name=name)

for attr_name in CF_RELATED_DATA:
pop_to(var.encoding, var.attrs, attr_name)
return var


Expand Down Expand Up @@ -499,7 +519,7 @@ def stackable(dim):
use_cftime=use_cftime,
decode_timedelta=decode_timedelta,
)
if decode_coords:
if decode_coords in [True, "coordinates", "all"]:
var_attrs = new_vars[k].attrs
if "coordinates" in var_attrs:
coord_str = var_attrs["coordinates"]
Expand All @@ -509,6 +529,38 @@ def stackable(dim):
del var_attrs["coordinates"]
coord_names.update(var_coord_names)

if decode_coords == "all":
for attr_name in CF_RELATED_DATA:
dcherian marked this conversation as resolved.
Show resolved Hide resolved
if attr_name in var_attrs:
DWesl marked this conversation as resolved.
Show resolved Hide resolved
attr_val = var_attrs[attr_name]
if attr_name not in CF_RELATED_DATA_NEEDS_PARSING:
var_names = attr_val.split()
else:
roles_and_names = [
role_or_name
for part in attr_val.split(":")
for role_or_name in part.split()
]
if len(roles_and_names) % 2 == 1:
warnings.warn(
f"Attribute {attr_name:s} malformed", stacklevel=5
)
var_names = roles_and_names[1::2]
if all(var_name in variables for var_name in var_names):
new_vars[k].encoding[attr_name] = attr_val
coord_names.update(var_names)
else:
referenced_vars_not_in_variables = [
proj_name
for proj_name in var_names
if proj_name not in variables
]
warnings.warn(
f"Variable(s) referenced in {attr_name:s} not in variables: {referenced_vars_not_in_variables!s}",
stacklevel=5,
)
del var_attrs[attr_name]

if decode_coords and "coordinates" in attributes:
attributes = dict(attributes)
coord_names.update(attributes.pop("coordinates").split())
Expand Down Expand Up @@ -542,9 +594,14 @@ def decode_cf(
decode_times : bool, optional
Decode cf times (e.g., integers since "hours since 2000-01-01") to
np.datetime64.
decode_coords : bool, optional
Use the 'coordinates' attribute on variable (or the dataset itself) to
identify coordinates.
decode_coords : bool or {"coordinates", "all"}, optional
Controls which variables are set as coordinate variables:

- "coordinates" or True: Set variables referred to in the
``'coordinates'`` attribute of the datasets or individual variables
as coordinate variables.
- "all": Set variables referred to in ``'grid_mapping'``, ``'bounds'`` and
other attributes as coordinate variables.
drop_variables : str or iterable, optional
A variable or list of variables to exclude from being parsed from the
dataset. This may be useful to drop variables with problems or
Expand Down Expand Up @@ -664,6 +721,7 @@ def _encode_coordinates(variables, attributes, non_dim_coord_names):

global_coordinates = non_dim_coord_names.copy()
variable_coordinates = defaultdict(set)
not_technically_coordinates = set()
for coord_name in non_dim_coord_names:
target_dims = variables[coord_name].dims
for k, v in variables.items():
Expand All @@ -674,6 +732,13 @@ def _encode_coordinates(variables, attributes, non_dim_coord_names):
):
variable_coordinates[k].add(coord_name)

if any(
attr_name in v.encoding and coord_name in v.encoding.get(attr_name)
for attr_name in CF_RELATED_DATA
):
not_technically_coordinates.add(coord_name)
global_coordinates.discard(coord_name)

variables = {k: v.copy(deep=False) for k, v in variables.items()}

# keep track of variable names written to file under the "coordinates" attributes
Expand All @@ -691,7 +756,11 @@ def _encode_coordinates(variables, attributes, non_dim_coord_names):
# we get support for attrs["coordinates"] for free.
coords_str = pop_to(encoding, attrs, "coordinates")
if not coords_str and variable_coordinates[name]:
attrs["coordinates"] = " ".join(map(str, variable_coordinates[name]))
attrs["coordinates"] = " ".join(
str(coord_name)
for coord_name in variable_coordinates[name]
if coord_name not in not_technically_coordinates
)
if "coordinates" in attrs:
written_coords.update(attrs["coordinates"].split())

Expand Down
113 changes: 113 additions & 0 deletions xarray/tests/test_backends.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@
requires_cftime,
requires_dask,
requires_h5netcdf,
requires_iris,
requires_netCDF4,
requires_pseudonetcdf,
requires_pydap,
Expand Down Expand Up @@ -857,6 +858,118 @@ def test_roundtrip_mask_and_scale(self, decoded_fn, encoded_fn):
assert decoded.variables[k].dtype == actual.variables[k].dtype
assert_allclose(decoded, actual, decode_bytes=False)

@staticmethod
def _create_cf_dataset():
original = Dataset(
dict(
variable=(
("ln_p", "latitude", "longitude"),
np.arange(8, dtype="f4").reshape(2, 2, 2),
{"ancillary_variables": "std_devs det_lim"},
),
std_devs=(
("ln_p", "latitude", "longitude"),
np.arange(0.1, 0.9, 0.1).reshape(2, 2, 2),
{"standard_name": "standard_error"},
),
det_lim=(
(),
0.1,
{"standard_name": "detection_minimum"},
),
),
dict(
latitude=("latitude", [0, 1], {"units": "degrees_north"}),
longitude=("longitude", [0, 1], {"units": "degrees_east"}),
latlon=((), -1, {"grid_mapping_name": "latitude_longitude"}),
latitude_bnds=(("latitude", "bnds2"), [[0, 1], [1, 2]]),
longitude_bnds=(("longitude", "bnds2"), [[0, 1], [1, 2]]),
areas=(
("latitude", "longitude"),
[[1, 1], [1, 1]],
{"units": "degree^2"},
),
ln_p=(
"ln_p",
[1.0, 0.5],
{
"standard_name": "atmosphere_ln_pressure_coordinate",
"computed_standard_name": "air_pressure",
},
),
P0=((), 1013.25, {"units": "hPa"}),
),
)
original["variable"].encoding.update(
{"cell_measures": "area: areas", "grid_mapping": "latlon"},
)
original.coords["latitude"].encoding.update(
dict(grid_mapping="latlon", bounds="latitude_bnds")
)
original.coords["longitude"].encoding.update(
dict(grid_mapping="latlon", bounds="longitude_bnds")
)
original.coords["ln_p"].encoding.update({"formula_terms": "p0: P0 lev : ln_p"})
return original

def test_grid_mapping_and_bounds_are_not_coordinates_in_file(self):
original = self._create_cf_dataset()
with create_tmp_file() as tmp_file:
original.to_netcdf(tmp_file)
with open_dataset(tmp_file, decode_coords=False) as ds:
assert ds.coords["latitude"].attrs["bounds"] == "latitude_bnds"
assert ds.coords["longitude"].attrs["bounds"] == "longitude_bnds"
assert "latlon" not in ds["variable"].attrs["coordinates"]
assert "coordinates" not in ds.attrs

def test_coordinate_variables_after_dataset_roundtrip(self):
original = self._create_cf_dataset()
with self.roundtrip(original, open_kwargs={"decode_coords": "all"}) as actual:
assert_identical(actual, original)

with self.roundtrip(original) as actual:
expected = original.reset_coords(
["latitude_bnds", "longitude_bnds", "areas", "P0", "latlon"]
)
# equal checks that coords and data_vars are equal which
# should be enough
# identical would require resetting a number of attributes
# skip that.
assert_equal(actual, expected)

def test_grid_mapping_and_bounds_are_coordinates_after_dataarray_roundtrip(self):
original = self._create_cf_dataset()
# The DataArray roundtrip should have the same warnings as the
# Dataset, but we already tested for those, so just go for the
# new warnings. It would appear that there is no way to tell
# pytest "This warning and also this warning should both be
# present".
# xarray/tests/test_conventions.py::TestCFEncodedDataStore
# needs the to_dataset. The other backends should be fine
# without it.
with pytest.warns(
UserWarning,
match=(
r"Variable\(s\) referenced in bounds not in variables: "
r"\['l(at|ong)itude_bnds'\]"
),
):
with self.roundtrip(
original["variable"].to_dataset(), open_kwargs={"decode_coords": "all"}
) as actual:
assert_identical(actual, original["variable"].to_dataset())

@requires_iris
def test_coordinate_variables_after_iris_roundtrip(self):
original = self._create_cf_dataset()
iris_cube = original["variable"].to_iris()
actual = DataArray.from_iris(iris_cube)
# Bounds will be missing (xfail)
del original.coords["latitude_bnds"], original.coords["longitude_bnds"]
# Ancillary vars will be missing
# Those are data_vars, and will be dropped when grabbing the variable
assert_identical(actual, original["variable"])

def test_coordinates_encoding(self):
def equals_latlon(obj):
return obj == "lat lon" or obj == "lon lat"
Expand Down