Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_netcdf -> _fill_value without NaN #2037

Open
mathause opened this issue Apr 3, 2018 · 8 comments
Open

to_netcdf -> _fill_value without NaN #2037

mathause opened this issue Apr 3, 2018 · 8 comments

Comments

@mathause
Copy link
Collaborator

mathause commented Apr 3, 2018

Code Sample, a copy-pastable example if possible

# Your code here
import xarray as xr
import numpy as np
x = np.arange(10.)
da = xr.Dataset(data_vars=dict(data=('dim1', x)), coords=dict(dim1=('dim1', x)))
da.to_netcdf('tst.nc')

Problem description

Apologies if this was discussed somwhere and it probably does not matter much, but tst.nc has _FillValue although it is not really necessary.

Output of xr.show_versions()

# Paste the output here xr.show_versions() here
@dcherian
Copy link
Contributor

dcherian commented Jun 7, 2021

From #5448 we should skip it for bounds variables too.

@ellesmith88
Copy link
Contributor

ellesmith88 commented Jun 9, 2021

I had a go at working on this, as mentioned in #5448 it could be a fix in

def _encode_coordinates(variables, attributes, non_dim_coord_names):

I couldn't see how to fix it in there but tried a fix at the end of

def cf_encoder(variables, attributes):

for bounds and coordinate variables. See below:

def cf_encoder(variables, attributes):
    """
    Encode a set of CF encoded variables and attributes.
    Takes a dicts of variables and attributes and encodes them
    to conform to CF conventions as much as possible.
    This includes masking, scaling, character array handling,
    and CF-time encoding.

    Parameters
    ----------
    variables : dict
        A dictionary mapping from variable name to xarray.Variable
    attributes : dict
        A dictionary mapping from attribute name to value

    Returns
    -------
    encoded_variables : dict
        A dictionary mapping from variable name to xarray.Variable,
    encoded_attributes : dict
        A dictionary mapping from attribute name to value

    See Also
    --------
    decode_cf_variable, encode_cf_variable
    """

    # add encoding for time bounds variables if present.
    _update_bounds_encoding(variables)
    
    new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()}

    # Remove attrs from bounds variables (issue #2921)
    for var in new_vars.values():
        bounds = var.attrs["bounds"] if "bounds" in var.attrs else None
        if bounds and bounds in new_vars:
            # see http://cfconventions.org/cf-conventions/cf-conventions.html#cell-boundaries
            for attr in [
                "units",
                "standard_name",
                "axis",
                "positive",
                "calendar",
                "long_name",
                "leap_month",
                "leap_year",
                "month_lengths",
            ]:
                if attr in new_vars[bounds].attrs and attr in var.attrs:
                    if new_vars[bounds].attrs[attr] == var.attrs[attr]:
                        new_vars[bounds].attrs.pop(attr)
            # remove _FillValue for bounds
            if new_vars[bounds].attrs.get("_FillValue"):
                 new_vars[bounds].attrs["_FillValue"] = None


    # remove _FillValue for coordinate variables as missing values are not permitted
    # a coordinate variable is a one-dimensional variable with the same name as its dimension
    # see coordinate variable in http://cfconventions.org/cf-conventions/cf-conventions.html#terminology
    for var in new_vars.keys():
        if new_vars[var].dims == (var,) and new_vars[var].attrs.get("_FillValue"):
            new_vars[var].attrs["_FillValue"] = None

    return new_vars, attributes

tests to add into test_conventions.py could be something like:

def test_fill_value_none_for_coordinate_variables():
    # need to be floats so _FillValue is added
    temp = 273.15 + 25 * np.random.randn(2, 2)
    lon = [0.0, 5.0]
    lat = [10.0, 20.0]
    
    ds = Dataset({'temperature': (['lat', 'lon'], temp)},
                coords={'lat': lat,
                        'lon': lon})

    ds['lat'].attrs = {
        'standard_name': 'latitude',
        'long_name': 'latitude',
        'units': 'degrees_north',
        'axis': 'Y'}
    ds['lon'].attrs = {
        'standard_name': 'longitude',
        'long_name': 'longitude',
        'units': 'degrees_east',
        'axis': 'X'}
    ds['temperature'].attrs = {
        'standard_name': 'air_temperature',
        'units': 'K'}


    v, _ = conventions.cf_encoder(ds.variables, ds.attrs)

    assert v["lat"].attrs['_FillValue'] is None
    assert v["lon"].attrs['_FillValue'] is None
    # check _FillValue still inserted for temperature
    assert v["temperature"].attrs['_FillValue'] is not None

def test_fill_value_none_for_bounds_variables():
    # need to be floats so _FillValue is added
    temp = 273.15 + 25 * np.random.randn(2, 2)
    lat_bnds = np.arange(4.0).reshape((2, 2))
    lon = [0.0, 5.0]
    lat = [10.0, 20.0]

    ds = Dataset({'temperature': (['lat', 'lon'], temp), 
                  'lat_bnds': (['lat', 'nbnds'], lat_bnds),
                  'lon_bnds': (['lon', 'nbnds'], lat_bnds)},
                coords={'lat': lat,
                        'lon': lon})

    
    ds['lat'].attrs = {
        'standard_name': 'latitude',
        'long_name': 'latitude',
        'units': 'degrees_north',
        'axis': 'Y',
        'bounds': 'lat_bnds'}
    ds['lon'].attrs = {
        'standard_name': 'longitude',
        'long_name': 'longitude',
        'units': 'degrees_east',
        'axis': 'X',
        'bounds': 'lon_bnds'}
    ds['temperature'].attrs = {
        'standard_name': 'air_temperature',
        'units': 'K'}

    v, _ = conventions.cf_encoder(ds.variables, ds.attrs)
    assert v["lat_bnds"].attrs['_FillValue'] is None
    assert v["lon_bnds"].attrs['_FillValue'] is None

but I couldn't quite get this to work, with some tests in test_backends.py failing. I could see that the fill value is being added in by maybe_default_fill_value in conventions.py so maybe it would be better to skip coordinate variable and bounds here. I can't really spend anymore time on this but putting it here in case its useful.

@BorjaEst
Copy link

I am posting here a temporal solution to this in case someone has the same issue.
Based on https://stackoverflow.com/questions/45693688/xarray-automatically-applying-fillvalue-to-coordinates-on-netcdf-output

for var in ds.variables:
    if '_FillValue' not in ds[var].attrs:
        ds[var].attrs['_FillValue'] = False

@BorjaEst
Copy link

BorjaEst commented Mar 10, 2022

@ellesmith88 ellesmith88

but I couldn't quite get this to work, with some tests in test_backends.py failing. I could see that the fill value is being added in by maybe_default_fill_value in conventions.py so maybe it would be better to skip coordinate variable and bounds here. I can't really spend anymore time on this but putting it here in case its useful.

I see you are setting _FillValue to None but probably it should be set to False instead.
See https://stackoverflow.com/questions/45693688/xarray-automatically-applying-fillvalue-to-coordinates-on-netcdf-output

I can give it a try. In which PR are you working on this?

@ellesmith88
Copy link
Contributor

@BorjaEst I haven't opened a PR for this 🙂

@BorjaEst
Copy link

@BorjaEst I haven't opened a PR for this slightly_smiling_face

@ellesmith88 is there a branch/fork where you implemented your trials or shall I start one and add your code?

@ellesmith88
Copy link
Contributor

@BorjaEst I don't think I ever pushed anything up, sorry. Probably best to start a new one.

@BorjaEst
Copy link

BorjaEst commented Mar 10, 2022

I see know why it did not had sense to push something. Basically cf requirements are in conflict with backend requirements.
It would be needed to restructure tests and review which from backend have sense to apply.

======================================================== short test summary info ========================================================
FAILED xarray/tests/test_backends.py::TestScipyFileObject::test_write_store - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyInMemoryData::test_write_store - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyInMemoryData::test_ondisk_after_print - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyInMemoryData::test_vectorized_indexing - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyInMemoryData::test_isel_dataarray - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyInMemoryData::test_roundtrip_test_data - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyInMemoryData::test_roundtrip_timedelta_data - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyFileObject::test_isel_dataarray - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyInMemoryData::test_load - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyFilePath::test_write_store - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyFileObject::test_coordinate_variables_after_dataset_roundtrip - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyInMemoryData::test_roundtrip_numpy_datetime_data - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyFileObject::test_roundtrip_test_data - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyInMemoryData::test_orthogonal_indexing - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyFileObject::test_dataset_compute - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyFileObject::test_roundtrip_timedelta_data - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyInMemoryData::test_zero_dimensional_variable - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyInMemoryData::test_dataset_compute - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyInMemoryData::test_array_type_after_indexing - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyInMemoryData::test_append_write - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyFilePath::test_vectorized_indexing - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyInMemoryData::test_coordinate_variables_after_dataset_roundtrip - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyFilePath::test_isel_dataarray - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyFileObject::test_append_overwrite_values - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyFileObject::test_load - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyFileObject::test_grid_mapping_and_bounds_are_coordinates_after_dataarray_roundtrip - Ke...
FAILED xarray/tests/test_backends.py::TestGenericNetCDFData::test_cross_engine_read_write_netcdf3 - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyFileObject::test_vectorized_indexing - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyInMemoryData::test_grid_mapping_and_bounds_are_coordinates_after_dataarray_roundtrip - ...
FAILED xarray/tests/test_backends.py::TestScipyInMemoryData::test_append_overwrite_values - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyFileObject::test_zero_dimensional_variable - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyFileObject::test_ondisk_after_print - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyFilePath::test_roundtrip_timedelta_data - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyFileObject::test_append_with_invalid_dim_raises - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyFileObject::test_orthogonal_indexing - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyFileObject::test_array_type_after_indexing - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyFilePath::test_orthogonal_indexing - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyInMemoryData::test_append_with_invalid_dim_raises - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyFileObject::test_roundtrip_numpy_datetime_data - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyFilePath::test_zero_dimensional_variable - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyFilePath::test_roundtrip_test_data - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyFileObject::test_append_write - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyFilePath::test_ondisk_after_print - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyFilePath::test_roundtrip_numpy_datetime_data - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyFilePath::test_dataset_compute - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyFilePath::test_append_write - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyFilePath::test_load - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyFilePath::test_array_type_after_indexing - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestGenericNetCDFData::test_engine - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyFilePath::test_grid_mapping_and_bounds_are_coordinates_after_dataarray_roundtrip - KeyE...
FAILED xarray/tests/test_backends.py::TestScipyFilePath::test_coordinate_variables_after_dataset_roundtrip - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyFilePath::test_append_with_invalid_dim_raises - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::TestScipyFilePath::test_append_overwrite_values - KeyError: ('O', 8)
FAILED xarray/tests/test_backends.py::test_scipy_entrypoint - KeyError: ('O', 8)

However your snipet worked, but it is not applying to boundary variables, which is a cf requirement as well.
Probably I would edit it to something like:

    # remove _FillValue for coordinate variables as missing values are not permitted (issue #2037)
    # a coordinate variable is a one-dimensional variable with the same name as its dimension
    # see coordinate variable in http://cfconventions.org/cf-conventions/cf-conventions.html#terminology
    for var in new_vars.keys():
        if new_vars[var].dims == (var,):
            new_vars[var].attrs["_FillValue"] = None
            if "bounds" in new_vars[var].attrs:
                bnds = new_vars[var].attrs['bounds']
                new_vars[bnds].attrs["_FillValue"] = None 

I will create a PR with the hope a developer will take a look into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants