Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a from_cf() mechanism for AreaDefinition #269

Closed
TomLav opened this issue Apr 22, 2020 · 6 comments · Fixed by #271
Closed

add a from_cf() mechanism for AreaDefinition #269

TomLav opened this issue Apr 22, 2020 · 6 comments · Fixed by #271

Comments

@TomLav
Copy link
Contributor

TomLav commented Apr 22, 2020

We would like to get AreaDefinition objects directly initialized from a netCDF/CF file.

The user should be able to point the from_cf() to an existing netCDF/CF file (e.g. a forecast model file), and in return get an areadef object. Then he/she can resample other data (e.g. satellite swath) on that area. This is a new way of building an areadef, and does not touch upon the resampling methods.

Some initial notes:

  • we should rely on pyproj's from_cf() to get the Proj object;
  • we still need to guess the area_extent from the coordinate variables;
  • input can be both a filename, or a netCDF handle;
  • in case several grids are stored in the same netCDF file (is it allowed in CF?) we should help the user pick the right one (e.g. option to provide the names of variables in the netCDF file).

It is not clear how implementation of #264 affects the implementation of this feature, and if one must come first.

@TomLav
Copy link
Contributor Author

TomLav commented Apr 22, 2020

This is an example of how I implemented something similar outside pyresample. The from_cf() should be more robust, and guess the name of the grid_mapping information.

def load_grid_defs_from_OSISAF_ncCF_file(ncCF_file,):
    # Searches for a variable with attribute 'grid_mapping'. Use the value of that attribute
    # to load proj string information, and then use xc and yc for defining the extent
    # of the pyresample area_def object.
    
    def get_area_id(area, projection, resolution):
        reso = int(float(resolution)*10)
        area_id = '{}{}'.format(area,reso,)
        pcs_id = '{}_{}-{}'.format(area, projection, reso)
        return area_id, pcs_id

    with Dataset(ncCF_file,"r") as gf: 
        # proj_dict
        crs_name = 'crs'
        for varn in gf.variables.keys():
            try:
                a_var     = gf.variables[varn]
                crs_name  = a_var.grid_mapping
                break
            except AttributeError:
                pass
        if crs_name is None:
            raise ValueError("ERROR: did not find any variable with 'grid_mapping' attribute")
        else:
            print "Info: read grid_mapping information from {} (found in {})".format(crs_name,varn)
            
        proj_str  = gf.variables[crs_name].proj4_string
        proj_dict = dict([s[1:].split('=') for s in proj_str.split( )]) 
        # xcs
        xcs = gf.variables['xc']
        if xcs.units == 'km':
            xcs = xcs[:] * 1000.
        elif xcs.units == 'm':
            xcs = xcs[:]
        else:
            raise NotImplementedError("Do not know how to handle 'xc' with units {}".format(xcs.units))
        # ycs
        ycs = gf.variables['yc']
        if ycs.units == 'km':
            ycs = ycs[:] * 1000.
        elif ycs.units == 'm':
            ycs = ycs[:]
        else:
            raise NotImplementedError("Do not know how to handle 'yc' with units {}".format(ycs.units))
        # shape
        shape = a_var.shape[::-1]
        if len(shape) == 3:
            shape = (shape[0],shape[1])
        # grid spacing
        xgs = fabs(xcs[1]-xcs[0])
        ygs = fabs(ycs[1]-ycs[0])
        # extent
        extent = [xcs.min(),ycs.min(),xcs.max(),ycs.max(),]
        extent[0] = extent[0] - 0.5*xgs
        extent[1] = extent[1] - 0.5*ygs
        extent[2] = extent[2] + 0.5*xgs
        extent[3] = extent[3] + 0.5*ygs
        
        # try and guess an area_id from global attributes:
        area_id, pcs_id = crs_name, crs_name
        try:
            area_id, pcs_id  = get_area_id(gf.area,gf.projection,gf.resolution.split(' ')[0])
        except AttributeError as ae:
            print "Missing global attribute {} to create an area ID".format(ae,)
        except Exception as ex:
            print "Got {}".format(ex,)
            pass
        
        area_def  = pr.geometry.AreaDefinition(area_id,crs_name,pcs_id,
                                        proj_dict,
                                        shape[0],shape[1],
                                        extent)
        return area_def

@djhoese
Copy link
Member

djhoese commented Apr 22, 2020

I like the idea of this feature. I think the design will have to depend heavily on what the end goal is. It would be best if we split this in to multiple functions that handle various levels of the problem. That said, are you against an xarray dependency? It may make some of this easier to write.

Here is what a CF compliant NetCDF file typically looks like when it has CRS information provided:

    var1(y, x):
        ...
        grid_mapping = gmap1
    var2(y, x):
        grid_mapping = gmap2
    gmap1():
        ...
        grid_mapping_name = "geostationary"
    gmap2():
        ...
        grid_mapping_name = "mercator"
    y(y):
        units = "meters"
    x(x):
        units = "meters" 

I think at the highest level, the function needs to allow the user to specify a particular variable. If not provided then the function can search through every variable that has a grid_mapping attribute. If there are more than one overall grid_mapping (as you mentioned) then we should raise an exception telling the user to specify what variable to use. Additionally, if a variable name isn't provided and all variables have the same grid_mapping but they have different sizes then there should be another exception that the user has to provide the name of the variable to look at. Without that we don't know the size of the AreaDefinition.

The other optional keyword argument is what the names of the dimensions are to look at. In the above example the y and x variable were already the dimensions of the variables, but this isn't always the case. Sometimes they are called row and col (or any other name) and don't have a corresponding y and x variable defined. If you're lucky the are listed as coordinate variables on a variable via coordinates = "y x" (see http://www.cgd.ucar.edu/cms/eaton/netcdf/CF-20010629.htm#altc). So in these cases a y and x optional keyword argument should be provided by the user. Otherwise we have no way of computing the extents of the area (the y/x variables are the centers of the pixels so we can calculate the outer extents from that). So the definition of the highest-level function might look like:

def load_nc_area(nc_file, variable=None, y=None, x=None):

I think it is important that this function doesn't magically find the variables/attributes in the file. It should be based on the CF standard with maybe some common conventions that aren't CF. This high-level function should be just one interface too. If an xarray dependency is OK then I feel like the function should take an xarray Dataset (or maybe DataArray). This could even be made in to an xarray accessor so you could do my_dataset.pyresample.get_area. We could also have functions that take a crs object, a 1D y array, and a 1D x array and generates the AreaDefinition.

This is all my opinion. I'm open to discussion and other solutions.

@TomLav
Copy link
Contributor Author

TomLav commented Apr 22, 2020

Thanks David. I agree with all you write above.

I would try and develop the routine first without the xarray dependency, and then wrap it into something xarray, would that work? (I don't know xarray so I'll need some help :).

I think I'd like first to collect several/many "minimal" netCDF/CF example files in advance in a test suite.

Did you see my comment about #264, and do you think it interferes?

I'll give this a try during PCW. How to I move this issue to the PCW issue board?

@djhoese
Copy link
Member

djhoese commented Apr 22, 2020

I added it to the project for you. I think for you to do it you have to go to the project page and add it there. I don't remember.

As for #264. No this should only change the internals of the AreaDefinition. It should actually make this issue easier as you should now be able to pass a CRS object directly to the AreaDefinition instead of converting it to PROJ parameters first:

crs = CRS.from_cf(...)
area = AreaDefinition(..., crs, ...)

TomLav added a commit to TomLav/pyresample that referenced this issue May 4, 2020
@TomLav
Copy link
Contributor Author

TomLav commented May 4, 2020

@djhoese
Copy link
Member

djhoese commented May 4, 2020

Once you make a PR then you can mention this issue and it'll make the changes easier to review. Thanks for working on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants