-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File metadata change when subsetting data #198
Comments
@atmodatcode We spotted this a while back and I thought that the new version of
And:
We will look into it further. @ellesmith88 can you remember if we looked into this problem with |
@agstephens We have written fixes to solve these issues (as part of the decadal work), so they're not implemented for all files yet. Each dataset would need to have a fix for this to be removed. See functions The fix I wrote in xarray allows the coordinate attribute can be removed (https://github.com/pydata/xarray/pull/5514/files) Removing the The fill value fix could be used for all files during the processing but removing the coordinate attribute requires us to know which variables have had this added by xarray. |
Thanks @ellesmith88 I think the correct fix for this is in |
@agstephens Agreed, they have an issue open for it pydata/xarray#2037 that has been open since 2018, so it's not a priority. I had a go at it in June but didn't solve it. I can put the patch into clisops. |
Thanks @ellesmith88, please do that in a new PR. That would be great. |
Description
I retrieved data (variable tas) from https://cds.climate.copernicus.eu/cdsapp#!/dataset/projections-cmip6?tab=form in two manners:
I noted that the file metadata differ dependent on this choice.
For example, in contrast to the original data, the subsetted data contain
time:_FillValue = NaN ;
lat:_FillValue = NaN ;
time_bnds:_FillValue = NaN ;
time_bnds:coordinates = "height" ;
and so forth, which are not present in the original data.
Having the coordinate variables with NaN is not typical and does not follow the CF recommendations (http://cfconventions.org/cf-conventions/cf-conventions.html#missing-data) where it is stated that the _FillValue should have the same units as the variable itself.
Also, some software cannot handle missing values defined with NaN ( e.g. CDO produce weird results in such cases).
The NaN issue does not affect the data variables, but the coordinate variables and the bound variables.
Then, having the auxiliary coordinate variable "height" associated with the time_bnds variable makes no sense.
I'm here showing what the command diff shows when comparing the ncdump -h outputs of the original and the subsetted results:
Maybe you can have a look at this when you find time.
Thanks and cheers
Angelika
The text was updated successfully, but these errors were encountered: