Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Del _FillValue from coordinates at cf_encoder #6346

Closed
wants to merge 6 commits into from

Conversation

BorjaEst
Copy link

@BorjaEst BorjaEst commented Mar 10, 2022

@BorjaEst
Copy link
Author

Probably there is a better way to solve this.
It also gets in conflict with backend tests (which conflict CF conventions as described at the comments).

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for trying to fix this longstanding issue @BorjaEst.

I've added a couple of comments. Please let us know if you need help finish it.

I imagine some of the tests will need to be updated to remove the _FillValue attribute

# a coordinate variable is a one-dimensional variable with the same name as its dimension
# see coordinate variable in http://cfconventions.org/cf-conventions/cf-conventions.html#terminology
for var in new_vars.keys():
if new_vars[var].dims == (var,) and new_vars[var].attrs.get("_FillValue"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Xarray does allow dimension coordinates with NaNs. So we'll need to check for the presence of nans using np.any(duck_array_ops.isnull(new_vars[var])). If there are no NaNs then we avoid the _FillValue.
  2. Can we move this logic to
    class CFMaskCoder(VariableCoder):
    """Mask or unmask fill values according to CF conventions."""
    def encode(self, variable, name=None):
    dims, data, attrs, encoding = unpack_for_encoding(variable)
  3. IIRC we always add _FillValue so the second half of the condition can be removed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From CF terminology:

coordinate variable
We use this term precisely as it is defined in the NUG section on coordinate variables. It is a one-dimensional variable with the same name as its dimension [e.g., time(time) ], and it is defined as a numeric data type with values that are ordered monotonically. Missing values are not allowed in coordinate variables.

I suppose the best is raise an Excepton ValueError: Missing values are not allowed in coordinate variables on the logic you indicated if is a coordinate variable and there are null_values.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can we move this logic to xarray/xarray/coding/variables.py

I tired, but in the variable context, it is not possible to access to bound variables (which are kind of coordinate).
I also tried to modify here:

def maybe_default_fill_value(var):
# make NaN the fill value for float types:
if (
"_FillValue" not in var.attrs
and "_FillValue" not in var.encoding
and np.issubdtype(var.dtype, np.floating)
):
var.attrs["_FillValue"] = var.dtype.type(np.nan)
return var

But I think that overcomplicates the logic later if we have to search for the bounds.

Any ideas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

to_netcdf -> _fill_value without NaN
2 participants