Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using DataArray for xarray Accessor, attribute promotion #184

Closed
jlandmann opened this issue Oct 10, 2020 · 6 comments
Closed

Using DataArray for xarray Accessor, attribute promotion #184

jlandmann opened this issue Oct 10, 2020 · 6 comments

Comments

@jlandmann
Copy link
Contributor

When using an xr.DataArray to instantiate the xarray Accessor, there should be an explicit promote_attrs=True passed to to_dataset for the latest releases of xarray:

xarray_obj = xarray_obj.to_dataset(name='var')

Otherwise, instatiating the accessor fails, because the projection info in the attributes is not passed and defining the grid fails:

Traceback (most recent call last):
  File "C:\Program Files\Anaconda3\envs\glaciersat\lib\site-packages\salem\gis.py", line 911, in map_gridded_data
    grid = data.salem.grid  # try xarray
  File "C:\Program Files\Anaconda3\envs\glaciersat\lib\site-packages\xarray\core\extensions.py", line 36, in __get__
    accessor_obj = self._accessor(obj)
  File "C:\Program Files\Anaconda3\envs\glaciersat\lib\site-packages\salem\sio.py", line 420, in __init__
    raise RuntimeError('dataset Grid not understood.')
RuntimeError: dataset Grid not understood.

This seems to be untested, since all tests for xarray-dev pass (last run 2 months ago, but code was added 7 months ago to xarray).
I'm not sure if you maybe want to specify an older xarray version as default for salem, or make the change and require xarray=0.16.

@fmaussion
Copy link
Owner

Can you provide a code snippet reproducing the error? Supporting both versions should not be a problem.

@jlandmann
Copy link
Contributor Author

jlandmann commented Oct 11, 2020

I tried it with this one, which raises the above error:

xrda = xr.DataArray(attrs={'pyproj_srs': 'epsg:32632'})
xrda.salem

The same actually also happens if a Dataset is passed, as pyproj_srs is then not retrieved from the variable attributes, but only searched for as a global attribute in grid_from_dataset.

xrds = xrda.to_dataset(name='var')
xrds.salem

Does this have a specific reason? I mean, in principle the variables could have different projections, the question is only if that makes sense...

@fmaussion
Copy link
Owner

fmaussion commented Oct 11, 2020

I'm still not sure to understand the issue and how it relates to xarray v0.16. The code you sent never worked, because your data array does not have any coordinate to parse. This works for example:

import xarray as xr
import salem

print(xr.__version__)

xrda = xr.DataArray(dims=('x', 'y'), 
                    coords={'x':[1, 2], 'y':[1, 2]}, 
                    attrs={'pyproj_srs': 'epsg:32632'})
xrda.salem

output:

0.16.1
<salem.sio.DataArrayAccessor at 0x7f66a08d3e20>

Can you be more specific?

Does this have a specific reason? I mean, in principle the variables could have different projections, the question is only if that makes sense...

I think I never thought of a situation where data arrays of a different dataset have different projections, and I don't think that I ever encountered such a dataset (it would be super confusing).

That being said, the current implementation of crs attrs in salem is suboptimal, because it always needs one: i.e. if you do things manually, you need to add the crs as attribute to all variables and all datasets you want to parse / plot. This was one of the reason to wrap open_dataset in salem - in this thin layer, salem propagates the crs to all variables of a dataset for example:

salem/salem/sio.py

Lines 943 to 946 in 697762b

# add pyproj string everywhere
ds.attrs['pyproj_srs'] = grid.proj.srs
for v in ds.data_vars:
ds[v].attrs['pyproj_srs'] = grid.proj.srs

@fmaussion
Copy link
Owner

See also: #62 (but I don't think I'll ever come to implementing this tbh...)

@jlandmann
Copy link
Contributor Author

Ohhhhh.....sure, ouch! Sorry for this utterly bad example code.
My actual piece of code is the following:

grid_ds.salem
<salem.sio.DatasetAccessor object at 0x0000019649F12308>
ds.salem
<salem.sio.DatasetAccessor object at 0x000001964A81A5C8>
grid_ds.salem.transform(ds, interp='linear')
Traceback (most recent call last):
  File "PyCharm 2019.2\plugins\python\helpers\pydev\_pydevd_bundle\pydevd_exec2.py", line 3, in Exec
    exec(exp, global_vars, local_vars)
  File "<input>", line 1, in <module>
  File "Anaconda3\envs\glaciersat\lib\site-packages\salem\sio.py", line 676, in transform
    return self._apply_transform(transform, grid, other)
  File "Anaconda3\envs\glaciersat\lib\site-packages\salem\sio.py", line 618, in _apply_transform
    rdata = transform(var)
  File "Anaconda3\envs\glaciersat\lib\site-packages\salem\gis.py", line 911, in map_gridded_data
    grid = data.salem.grid  # try xarray
  File "Anaconda3\envs\glaciersat\lib\site-packages\xarray\core\extensions.py", line 36, in __get__
    accessor_obj = self._accessor(obj)
   File "Anaconda3\envs\glaciersat\lib\site-packages\salem\sio.py", line 420, in __init__
     raise RuntimeError('dataset Grid not understood.')
 RuntimeError: dataset Grid not understood.

The problem is indeed as you said that ds has two variables and a dataset attribute pyproj_srs, but one of the two variables doesn't have a pyproj_srs attribute. This is because the dataset is merged from two independent dataarrays. The latter fact lets salem then raise the RuntimeError when applying the transform, since this is done on the data variables.
So in sum, yes, probably #62 would help avoiding this. It might be worth considering this for a potential hackathon todo-list, but for now it's probably easiest to just add the missing projection attribute to the variable. Adding an additional check in _XarrayAccessorBase._apply_transform() if the variable has the projection attribute and, if not, transferring it from the dataset attribute seems like a complete overkill.
Feel free to close, especially because this has nothing to do with promoting attributes anymore as you showed above. This was a misunderstanding from my side as the attribute was not visible in the representation anymore when the dataarray was converted to a dataset. I don't know why setting this argument to True helped me in the beginning. Sorry for the confusion!

@fmaussion
Copy link
Owner

Thanks!

Yes OK so at core it is the same problem as #62 - closing this if you agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants