Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some comments #19

Closed
dcherian opened this issue Jan 29, 2022 · 6 comments
Closed

some comments #19

dcherian opened this issue Jan 29, 2022 · 6 comments
Labels
community support Discussion new feature New feature or request

Comments

@dcherian
Copy link
Contributor

The proposed API is looking nice to me. Nice work!

Here are a few suggestions

  • uxarray.Grid.init(self, string file) \
  • (*) uxarray.Grid.init(self, string gridspec) \
  • uxarray.Grid.init(self, np.float64.list vertices) \

How about making the last two classmethods instead to avoid overloading __init__

  1. grid = uxarray.Grid.from_gridspec(...),
  2. grid = uxarray.Grid.from_vertices(...)

Even Grid.from_file looks nice to me but if that's the most common usecase, perhaps Grid.__init__(self, file) would be most convenient.

  • uxarray.Grid.calculatefaceareas(self)
  • uxarray.Grid.build_node_face_connectivity(self)
  • uxarray.Grid.build_edge_face_connectivity(self)
  • uxarray.Grid.buildlatlon_bounds(self)

Will these return values, or assign properties on the Grid object. Since these arrays are a property of the Grid, it seems like they should be a property Grid.connectivity = dict(node=..., edge=...)?

For areas (and other cell metrics like lengths and volumes), xgcm has the idea of metrics. Are areas the only relevant metric for unstructured grids? Seems like a vertical cell thickness would also be useful so perhaps there should be a metrics property too?

xarray.DataArray.grid: uxarray.Grid
xarray.type: enumeration
(*) xarray.np: integer

Are you thinking of storing these in DataArray.attrs?

We have this vague idea of a XgcmGridIndex that will be present on a Dataset and propagated when DataArrays are extracted, or appropriately subsetted when calling .isel or sel. This Index object can store arbitrary properties. I'm thinking you may want to build something like that too i.e. UgridIndex that stores a

  1. uxarray.Grid,
  2. types: dict(DataArray_name: enumeration{...}) and
  3. poly_order: dict(DataArray_name: integer)
    This suggestion can't be implemented until Explicit indexes pydata/xarray#5692 is merged though, so I suggest delaying this API for a bit until we can experiment with that new interface.

laplacian, relative_vorticity et al.

(I have only worked with structured grids so ignore if this comment doesn't make sense)
These look cool, but don't they depend on lower order functions. For e.g. xgcm provides very simple Grid.diff and Grid.interp, everything else is built up from there (Grid.derivative, Grid.integrate etc.). Does something like that make sense for unstructured grids? Would it be useful for those methods to be public too?

Minor comment: It's not clear to me why these methods are uxarray.laplacian and not Grid.laplacian.

@dcherian dcherian added the new feature New feature or request label Jan 29, 2022
@erogluorhan
Copy link
Member

Hi @dcherian , this is great input for moving forward, thanks a lot! I'd like to put my personal thoughts in the following comments, but your inputs would be of great help for the broader development team, cc @anissa111 @michaelavs @rajeeja @paullric

@erogluorhan
Copy link
Member

  • uxarray.Grid.init(self, string file) \
  • (*) uxarray.Grid.init(self, string gridspec) \
  • uxarray.Grid.init(self, np.float64.list vertices) \

How about making the last two class methods instead to avoid overloading __init__

  1. grid = uxarray.Grid.from_gridspec(...),
  2. grid = uxarray.Grid.from_vertices(...)

Even Grid.from_file looks nice to me but if that's the most common use case, perhaps Grid.__init__(self, file) would be most convenient.

I don't have a strong preference, but it might help @anissa111 @michaelavs @rajeeja think about function signatures/names

@erogluorhan
Copy link
Member

  • uxarray.Grid.calculatefaceareas(self)
  • uxarray.Grid.build_node_face_connectivity(self)
  • uxarray.Grid.build_edge_face_connectivity(self)
  • uxarray.Grid.buildlatlon_bounds(self)

Will these return values, or assign properties on the Grid object. Since these arrays are a property of the Grid, it seems like they should be a property Grid.connectivity = dict(node=..., edge=...)?

I'd go with assigning Grid propoerties when these functions are ran. @paullric any thoughts?

rajeeja added a commit that referenced this issue Feb 4, 2022
o Fix error with file not found
o Start write_netcdf example
@Huite
Copy link

Huite commented Feb 10, 2022

Hey all, I imagine I went through a similar process when I was working on xugrid last year. These are some of the things that I came across, maybe it's helpful:

Optional attributes

The UGRID convention describes many optional attributes (face_edge_connecitivity, face_face_connectivity, etc.). These are often required for operations, but can be somewhat costly to derive from the required attributes. If they're present in the file, you can obviously read them but otherwise, you can defer computation until needed as @erogluorhan suggests:

    @property
    def edge_node_connectivity(self) -> IntArray:
        """
        Edge to node connectivity. Every edge consists of a connection between
        two nodes.
        Returns
        -------
        connectivity: ndarray of integers with shape ``(n_edge, 2)``.
        """
        if self._edge_node_connectivity is None:
            self._edge_connectivity()
        return self._edge_node_connectivity

Relatedly, I've written a number of connectivity utilities here: https://github.com/Deltares/xugrid/blob/main/xugrid/connectivity.py
Although there's a numba import there, it's used in a single function which is only used elsewhere (so I should probably move it). At any rate, it's all based on numpy and scipy sparse arrays. It sometimes took a bit of thinking to write as vectorized numpy operations without any hot loops -- maybe they could be useful.

Types of Grids

The UGRID convention describes a number of grids: "1D" network topology, 2D triangular, 2D mixed mesh, 3D layered, fully 3D layered.

2D triangular, 2D mixed mesh, 3D layered all fall within the same category; since the layer dimension can simply be represented in xarray and is fully orthogonal like a regular xarray Index.

Are you thinking of supporting the network topology as well? It shares a number of attributes, but obviously has less attributes than a 2D topology. I included these because as a hydrologist, I can use these nicely for representing stream reaches and such.

I chose to put the shared stuff in a UgridBase class. I personally would like to have workable unstructured 3D UGRIDs as well in the future, even though these don't seem to exist (yet) in the wild much.

Initialization

My default way of initializing is by providing the numpy arrays for the node coordinates, and the face_node_connectivity array. My thinking was that it would be relatively rare to see a grid without associated data. Most of the time, people will look to read a file into a DataArray with the associated topology. However, I'm looking at a use case where grids are often different (e.g. they are generated by a mesh generator for different regional groundwater models). If you're looking at the same grid over and over, it's likely stored separately.

There are two additional methods to construct a grid from either an xarray dataset or a geopandas GeoDataFrame (from_dataset, from_geodataframe). Using e.g. geopandas has the benefit of allowing many vector based formats, and something like from_dataset opens up the world to other formats from xarray (e.g. zarr).

I've also written some conversion methods for vector to UGRID data (and back): https://github.com/Deltares/xugrid/blob/main/xugrid/conversion.py

My thinking is that the Grid object is really just a UgridIndex as @dcherian suggests. It's primarily responsibility is to give the appropriate face/node/edge index whenever a x or y (lon or lat) coordinate is involved.

UgridDataArray / UgridDataset

However, I couldn't implement this without the explicit xarray indexes, so unfortunately I created new classes which propagate the index. This results in this API: https://deltares.github.io/xugrid/api.html

The biggest pain point is not being able to stash the information in an Index, and therefore having to wrap the xarray methods. However, once merged, I think it should result in relatively minor API changes as a user would still do the unstructured stuff via the .ugrid accessor (and it can access data in the index as needed).

Ugrid IO

I've written some functions to pull the UGRID variables out of an xarray Dataset:
https://github.com/Deltares/xugrid/blob/main/xugrid/ugrid/ugrid_io.py

Iris also recently seems to have been adding UGRID IO support, but I haven't been able to look into it much:
https://github.com/SciTools/iris/tree/main/lib/iris/experimental/ugrid
(I'm also not nearly as familiar with the iris data structures.)

Feel free to use code (hopefully goes without saying since it's MIT licensed)!

(and let me know if I've made mistakes...)

@erogluorhan
Copy link
Member

Hi @Huite thanks a lot for putting your input here! I will go through these when I find some good chunk of time and get back to you. BTW, your discussions at several repositories were very much helpful for us to figure things out initially!

@philipc2
Copy link
Member

Closing since most of the discussion as been resolved, moved the discussion of using class methods into a separate issue: #378

@github-project-automation github-project-automation bot moved this from 📝 To Do to ✅ Done in UXarray Development Jul 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community support Discussion new feature New feature or request
Projects
Status: ✅ Done
Development

No branches or pull requests

5 participants