Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode an AreaDefinition into a netCDF/CF file #278

Open
TomLav opened this issue May 11, 2020 · 17 comments
Open

Encode an AreaDefinition into a netCDF/CF file #278

TomLav opened this issue May 11, 2020 · 17 comments

Comments

@TomLav
Copy link
Contributor

TomLav commented May 11, 2020

This is the pendant of #269 (where we loaded an AreaDefinition from a netCDF/CF file).

In Pytroll, writing to netCDF/CF files is already covered in https://github.com/pytroll/satpy/blob/master/satpy/writers/cf_writer.py

I propose to extract the part related to the AreaDefinition in pyresample (utils/_cf.py) so that the read/write logic is contained in one place.

Comments?

@djhoese
Copy link
Member

djhoese commented Dec 2, 2020

@sfinkens What do you think about this for a PCW task (for yourself 😉 )?

@sfinkens
Copy link
Member

sfinkens commented Dec 2, 2020

Sounds good! But it will have to wait for the next PCW 🙂

@wodowiesel
Copy link

would appreciate that "feature too

@gerritholl
Copy link
Collaborator

This is already supported with area.crs.to_cf()? Or is something missing? Apart from some bugs, it appears to work.

@TomLav
Copy link
Contributor Author

TomLav commented Oct 12, 2022

@gerritholl area.crs.to_cf() takes us a long way. But it only encodes the "crs" variable, not the projection_(x|y)_coordinate variables, nor create the associated dimensions, etc...

@TomLav
Copy link
Contributor Author

TomLav commented Oct 13, 2022

I opened this feature request a while ago, and had honestly forgotten about it. @gerritholl 's comment made me think a bit more.

I recall how AreaDefinition.load_cf_area() works:

>> area_def, cf_info = pr.utils.load_cf_area('https://thredds.met.no/thredds/dodsC/osisaf/met.no/reprocessed/ice/drift_455m_files/merged/2019/11/ice_drift_nh_ease2-750_cdr-v1p0_24h-201911211200.nc')
>> print(area_def)
Area ID: Lambert_Azimuthal_Equal_Area
Description: Lambert_Azimuthal_Equal_Area
Projection: {'ellps': 'WGS84', 'lat_0': '90', 'lon_0': '0', 'no_defs': 'None', 'proj': 'laea', 'type': 'crs', 'units': 'm', 'x_0': '0', 'y_0': '0'}
Number of columns: 144
Number of rows: 144
Area extent: (-5400000.0, -5400000.0, 5400000.0, 5400000.0)
>> print(cf_info)
{'variable': 't0', 'grid_mapping_variable': 'Lambert_Azimuthal_Equal_Area', 'type_of_grid_mapping': 'lambert_azimuthal_equal_area', 'x': {'varname': 'xc', 'first': -5362.5, 'last': 5362.5, 'spacing': 75.0, 'nb': 144, 'sign': 1.0, 'unit': 'km'}, 'y': {'varname': 'yc', 'first': 5362.5, 'last': -5362.5, 'spacing': 75.0, 'nb': 144, 'sign': -1.0, 'unit': 'km'}, 'lon': 'lon', 'lat': 'lat'}

So the routine I feel would be useful is ds = area_def.to_cf(ds=None, cf_info=None, skip_lonlat=True)

It returns a new xarray Dataset object (or updates an existing one if ds= is not None) with all needed info to define the projection information in a netCDF/CF file (the crs variable, the 'x' and 'y' variables, and the 'x' and 'y' dimensions). If those variables / dimensions already exist in the the ds=Dataset it does nothing. cf_info=dict can be used to specify names for the variables and dimensions. Default values are used if cf_info=None.

I am aware that the cf_info returned by load_cf_area() contains too much information. It was meant for two purposes: 1) to hold things that are not directly stored in an area_def object (e.g. name of the dimension variables), and 2) as a way for the user to control what had been read.

The cf_info to be used as input to area_def.to_cf() would be smaller, e.g.

>> cf_info = {'grid_mapping_variable': 'Lambert_Azimuthal_Equal_Area', 'x': {'varname': 'xc'}, 'y': {'varname': 'yc'}, 'lon': 'lon', 'lat': 'lat'}

Keyworkd skip_lonlat=True tells if the latitude and longitude 2D fields should be written in the netCDF/CF file (not mandatory for AreaDef-type content since CF-1.8).

A possible workflow would then be:

area_def, cf_info = pr.utils.cf_area(url_or_ds)
new_ds = area_def.to_cf(cf_info=cf_info)

Ideally, satpy's cf_writer.py would make direct use of the new to_cf() routine.

@djhoese
Copy link
Member

djhoese commented Oct 13, 2022

Keyworkd skip_lonlat=True tells if the latitude and longitude 2D fields should be written in the netCDF/CF file (not mandatory for AreaDef-type content since CF-1.8).

Really? I didn't think it was ever required to have lon/lats.

I'm a little torn on the design of this. On one hand it seems really useful and does everything for you. On the other, it does a lot and has complex arguments. What about following pyproj's strategy where CRS.to_cf returns a dictionary of parameters, AreaDefinition could return a series of DataArrays: x, y, and gm variables and then the user only needs to specify the name of the x and y dimensions in a keyword argument. It would be up to the user then to add these to one or more Dataset objects if they'd like.

Additionally, this makes it easier for someone not using xarray to use a future kwarg use_xarray=False where they get numpy arrays and a dictionary instead of DataArray objects.

@TomLav
Copy link
Contributor Author

TomLav commented Oct 17, 2022

It is indeed only from CF-1.8 that lat/lon could be omitted in case the coordinate system is fully defined by x, y, crs. cf-convention/cf-conventions#133

I don't have a strong feeling for whether the to_cf() should return individual arrays or a DataSet, but I think many (incl. myself) will use to_cf() to exactly bulid a DataSet / netCDF file, so we could bridge the last mile for them.

@TomLav
Copy link
Contributor Author

TomLav commented Sep 7, 2023

I badly needed this feature for a project I am working on, so I did a prototype.

It is fine for me to keep this outside the pyresample repo, but I know I'm going to use this code often, so I wouldn't mind having something similar in.

What do people think?

@djhoese
Copy link
Member

djhoese commented Sep 7, 2023

Isn't the encoder dictionary something that can be added to each individual .encoding in each DataArray?

@TomLav
Copy link
Contributor Author

TomLav commented Sep 7, 2023

Indeed. I was not aware of this. This will simplify the UI.

@TomLav
Copy link
Contributor Author

TomLav commented Sep 13, 2023

Also, I must clarify that this issue started from the bold aim to have the satpy to_cf() writer to use the new code from pyresample to_cf(). But I don't think this is a pre-requisite for moving forward: pyresample could get its to_cf() and we can later consider if satpy should make use of it. But there are some pytroll users that use pyresample without satpy (me!) who would benefit from a pyresample to_cf().

Also, I am not particularly happy with the solution to create an xarray Dataset with a phony variable that the user can drop later (see my prototype linked above), but this is the only way I found so far. Alternative approaches are most welcome.

@ghiggi
Copy link
Contributor

ghiggi commented Sep 13, 2023

I would also be in favor of having an area_def.to_cf() and swath_def method that returns a xr.Dataset (with empty variables) and the x,y (latitude, longitude), crs, (crs_wgs84) and a latitude, longitude, crs coordinate respectively.
In PR pytroll/satpy#2524 I started to refactor and reorganize all the code related to the creation of CF-compliant xr.Dataset from a satpy.Scene, and in satpy/writers/cf/area.py there is most of the code that could be moved to pyresample to accomplish the task. If we implement this in pyresample, than in satpy for each DataArray we just need to merge the area_def.to_cf dataset coordinates and add the grid_mapping attribute to the DataArray attribute (or we could have this already in pyresample ...).
I will be on holiday for the next 2 weeks, but then I could fix the last stuff to merge the PR and help implement this.
I also noted some bugs in the load_cf_area that I would like to fix ... and I would also like to have the option to include (for AreaDefinition) the possibility of adding the geodetic CRS (WGS84) of the lat/lon coordinates when they are included. And adapt then the load_cf_area to be able to deal with this case when the grid_mapping attribute is defined like CRS_PROJ: x y , CRS_WGS84: lat, lon.

@djhoese how much of these stuffs you already included in geoxarray? Is the plan to make geoxarray a dependency of pyresample/satpy ?

@TomLav
Copy link
Contributor Author

TomLav commented Sep 13, 2023

Hello @ghiggi, this is awesome. I am looking forward to discussing this further when you are back. It sounds like a good idea to first let you finish with your satpy PR, then consider what can be moved to pyresample (also depending on David's answer re: geoxarray).

I suggest opening separate issues for the bug you found in load_cf_area() and the other two features you want to add.

@ghiggi
Copy link
Contributor

ghiggi commented Sep 13, 2023

Yep this is the plan. I will open the issues with reproducible examples as soon as I am back ;)
David is already aware of it, but in case I also have code here ( https://github.com/ghiggi/gpm_api/blob/main/gpm_api/dataset/crs.py ) regarding adding the correct CF-compliant CRS coordinate and attributes if a xr.Dataset has already x/y/lat/lon coordinates (but also there we need to see what David has done in geoxarray). We also had a discussion if something related to this should not be put in the cf_xarray package maybe (see geoxarray/geoxarray#21)

@djhoese
Copy link
Member

djhoese commented Sep 13, 2023

@ghiggi @TomLav I haven't done anything with geoxarray since its first (0.1) release, but was hoping to start working on it again this week (but things keep popping up) but they wouldn't be this CF work. I could see geoxarray being a good place for this, but...

Since SwathDefinitions were brought up, what about instead of to_cf this is a to_xarray method on these geometries? Given Xarray's general preference for CF standards I don't think this is a huge break in expectations and it could even be a keyword argument on whether CF if followed or not. In both cases (Area and Swath) the logic could be dependent on geoxarray, but I'm not sure it has to be. Maybe I'm oversimplifying the task here though.

@TomLav
Copy link
Contributor Author

TomLav commented Sep 28, 2023

Hello. Just to mention that I patched my prototype implementation to automatically select the order of the x and y dimentions when creating the xarray Dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants