Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SCHISM compataibility #8

Open
mpiannucci opened this issue May 16, 2024 · 29 comments
Open

Add SCHISM compataibility #8

mpiannucci opened this issue May 16, 2024 · 29 comments
Labels
enhancement New feature or request
Milestone

Comments

@mpiannucci
Copy link
Collaborator

SCHISM datasets should be compatible with the UGRID accessor, but to ensure this is so we need to test, and fix any issues that arise. This issue also covers adding example notebooks and test cases for SCHISM datasets

@mpiannucci mpiannucci added this to the GSOC 2024 milestone May 16, 2024
@AtiehAlipour-NOAA
Copy link

Description:

This issue serves as a general ticket for the development of a subsetting tool for STOFS 3D model outputs. The aim is to add functionality to xarray-subset-grid code for efficiently subsetting data from STOFS 3D outputs while preserving key features of the model.

This issue serves as a starting point for the development of the subsetting tool. Subsequent issues may be created for smaller tasks related to its development.

For STOFS-3D, we use SCHISM as the base model.

Preservation of Mesh Structure: SCHISM use both triangular and quadratic meshes. It's essential that xarray-subset-grid maintains the structural of the mesh after subsetting.

Subset Data for Different Variables at Different Vertical Layers: STOFS 3D outputs contain data across multiple vertical layers for different variables. The xarray-subset-grid should incorporate functionalities to subset data based on different layers and variable names.

@mpiannucci mpiannucci added the enhancement New feature or request label May 23, 2024
@mpiannucci
Copy link
Collaborator Author

Here is the target file: s3://noaa-nos-stofs3d-pds/STOFS-3D-Atl/stofs_3d_atl.20240619/stofs_3d_atl.t12z.fields.out2d_f037_048.nc

@omkar-334
Copy link
Contributor

From Discussions

'stofs_3d_atl.t12z.fields.horizontalVelX_f001_012.nc'
'stofs_3d_atl.t12z.fields.horizontalVelY_f001_012.nc'
'stofs_3d_atl.t12z.fields.salinity_f001_012.nc'
'stofs_3d_atl.t12z.fields.temperature_f001_012.nc'
'stofs_3d_atl.t12z.fields.zCoordinates_f001_012.nc'
'stofs_3d_atl.t12z.fields.out2d_f001_012.nc'
'schout_adcirc_20240619.nc'
As expected, only the last 2 files, out2d and schout are detected as SCHISM and ADCIRC respectively.

According to this solution that Atieh had sent, I tried on out2d and schout.
The problem is that when we rename the variables, the CF metadata remains the same.
image

In our code, ugrid.py, We obtain the x and y variable names using the ds.cf['mesh_topology']['node_coordinates'] Here the code breaks since it checks for the original variable names after we've renamed.
One solution for this is to rewrite the CF metadata. I've tried this using schoutds.cf['mesh_topology'].assign_attrs(node_coordinates ='x y') but this seems to return a new object and the CF accessor doesn't support item assignment.
out.cf["mesh_topology"]['node_coordinates'] = 'x y' doesn't resolve it either.

Another solution would be not rename the variables.
After dropping and adding the nele dimension,

ds2 = ds2.drop_dims('nele')  
ds2['nele'] = ds['nele']  

This variable disappears - SCHISM_hgrid_face_nodes - (nSCHISM_hgrid_face, nMaxSCHISM_hgrid_face_nodes)
Here, nSCHISM_hgrid_face is renamed to nele. It could be that the new nele does not retain the connectivity and this happens.

@omkar-334
Copy link
Contributor

From Discussion

One solution is to first read files with complete information, and then read files with missing information like temperature for subsetting. We can write a wrapper for this.

'stofs_3d_atl.t12z.fields.horizontalVelX_f001_012.nc'
'stofs_3d_atl.t12z.fields.horizontalVelY_f001_012.nc'
'stofs_3d_atl.t12z.fields.salinity_f001_012.nc'
'stofs_3d_atl.t12z.fields.temperature_f001_012.nc'
'stofs_3d_atl.t12z.fields.zCoordinates_f001_012.nc'

If a file from above is chosen for subsetting, we can grab the respective schout and out2d, given that they exist for every file.
However, files like temperature and salinity are about 7GB. SHould we add this info only if it is needed, or do we chunk it and add to all subsetting jobs?

@ChrisBarker-NOAA
Copy link
Collaborator

hmm.

I think that xarray should "lazy load" the data, so the 7GB would only be loading up when saving out the results.

And we want a way for the user to be able to specify what variables they want.

So it should be fine to include everything up front.

@ChrisBarker-NOAA
Copy link
Collaborator

One solution for this is to rewrite the CF metadata. I've tried this using > schoutds.cf['mesh_topology'].assign_attrs(node_coordinates ='x y') but this seems to return a new object > and the CF accessor doesn't support item assignment.
out.cf["mesh_topology"]['node_coordinates'] = 'x y' doesn't resolve it either.

I think what you need to do is use .cf['mesh_topology'] to get the xarray variable, and then you can change the attrs directly on that variable.

Another option would be to use:

ds.subset_grid.assign_ugrid_topology() -- it will overwrite the mesh variable attribute for anything you pass in.
(only partially tested, but it should work)

Another solution would be not rename the variables.

is there a reason to rename them? if not, then yes, that's the easier way to go.

@ChrisBarker-NOAA
Copy link
Collaborator

One more thought: I think it's a goal for the resulting subset file to be as similar to the original as possible -- e.g. all variables have the same names. So I don't think any renaming should be done, unless absolutely required.

[while the CF standard applies almost no meaning to variable names, a lot of users do -- code tends to simply look for known variable names]

@mpiannucci
Copy link
Collaborator Author

Yes we very much do not want to rename any variables as a part of this process

@saeed-moghimi-noaa
Copy link

saeed-moghimi-noaa commented Jul 18, 2024

@josephzhang8 @gseroka @feiye-vims @yangz888 @omkar-334 @AtiehAlipour-NOAA @mpiannucci

Hi Joseph and Greg

Omkar is working on performing subsetting capability for STOFS-3D-Atl. He was trying to do it for adcirc like out puts. He wants to start looking into perform sub setting on native 2D and 3D schism outputs.

If he wants to start with water surface elevation. Would you please point him to the right SCHSIM native file name on STOFS-3D-ATl S3 bucket and file name ( I mean the ones that has quads and triangle elements).

Are those ugrid and cf compliant?

https://noaa-nos-stofs3d-pds.s3.amazonaws.com/README.html

Thanks,
-Saeed

@mpiannucci
Copy link
Collaborator Author

mpiannucci commented Jul 18, 2024

What we are seeing is that the face_node_connectivity is loaded as a float64 array and not integer. This breaks indexing when trying to collect the polygons. Also polygons that are triangles have NaNs and we need to check on the CF/ugrid compliance for that

@ChrisBarker-NOAA
Copy link
Collaborator

A bit more detail:

we'd like it to comply with:

https://ugrid-conventions.github.io/ugrid-conventions/#2d-flexible-mesh-mixed-triangles-quadrilaterals-etc-topology

In particular:
The face_node_connectivity should be:

  • integers -- indexes into the nodes array.
  • The fourth vertex, if it's not there (i.e. a triangle), should use the _FillValue, which should be specified.

I recommend -1 for _FillValue but the spec allows anything that's not a valid index -- so -999, or ....

But it's can't be NaN, as integers don't have NaN

@josephzhang8
Copy link

Hi all-

We only recently added UGRID outputs to shadow forecast (Zizang is still working on adding it on NOAA side). Anyway, Dan (@danishyo) and I just checked outputs from the shadow forecast and the format is as @ChrisBarker-NOAA described: int for the table, -1 as fill in value. Dan will share with you all the AWS path, but can someone please add him into the repo? His github ID is:

danishyo

Thanks.

@danishyo
Copy link

danishyo commented Jul 18, 2024 via email

@mpiannucci
Copy link
Collaborator Author

Added initial, untested with STOFS 3d, support here: #50

@mpiannucci
Copy link
Collaborator Author

Thanks for the help all, #50 now has a working example notebook for STOFS 3D.

@mpiannucci
Copy link
Collaborator Author

This is now merged :)

@felicio93
Copy link

I found what seems to be a bug with the subsetting in cases where the schism mesh has tria and quads.
In these cases the subsetting step fills NaNs in the triangulation with a value (which is equals to the highest node index).

for instance, the original triangulation was (i.e. ds['SCHISM_hgrid_face_nodes']):
[[0,1,2,nan]
[2,1,3,nan]
[1,4,5,3]]

after subsetting it becomes:
[[0,1,2,5]
[2,1,3,5]
[1,4,5,3]]

Here is the example using the stofs_3d example:

image

@ChrisBarker-NOAA
Copy link
Collaborator

Darn, this does look like a bug -- hopefully I, or someone will get a chance to look into it.

Another note, and I forget where we are with that is, that the face_node_connectivity variable really should be an integer type which doesn't allow NaN. Not sure why the schism folks have done that :-).

In which case, the missing node should be set to _FillVAlue, wich is usually -1 in this context.

Anyway -- more to be done here.

-CHB

@josephzhang8
Copy link

@ChrisBarker-NOAA: I checked the code and indeed from raw outputs, 'SCHISM_hgrid_face_nodes' is declared as an int, and SCHISM_hgrid_edge_nodes:_FillValue = -1 is used for the last index if the cell is triangle. What you were seeing must be from a post-processed file.

@felicio93
Copy link

felicio93 commented Oct 5, 2024

@josephzhang8 and @ChrisBarker-NOAA
Also, if I try to work with STOFS Pacific data I find the following problems.

First if I try to use "s3://noaa-nos-stofs3d-pds/STOFS-3D-Pac/para/stofs_3d_pac.20240601/stofs_3d_pac.t12z.n001_024.field2d.nc" data, using xarray_subset_grid.grids.ugrid.assign_ugrid_topology(ds), I get the following error:

face_node_connectivity is a required parameter if it is not in the mesh_topology variable

I explored this problem a bit, and it seems this files does not have "face_node_connectivity" in the cf_roles. I tried to change the code so it just passes the except condition, but then other things down the line break. I might be able to assign a cf_role, but that might not be the best solution.

The second problem arises when I try to use the "s3://noaa-nos-stofs3d-pds/STOFS-3D-Pac/para/stofs_3d_pac.20240601/stofs_3d_pac.t12z.fields.out2d_nowcast.nc"

I can open/subset this file, but when I plot it I get some weird "stripes", for instance:
image

if I use the exact same code for STOFS Atlantic, I don't see these stripes.

@josephzhang8, is there a reason why stofs_3d_pac.t12z.n001_024.field2d.nc is triangular mesh and stofs_3d_pac.t12z.fields.out2d_nowcast.nc is triangular+quad? the fact these two meshes don't match (in terms of 'SCHISM_hgrid_face_nodes', elements #), can make it tricky to handle them together (like plotting, subsetting, etc.)

@josephzhang8
Copy link

The mismatch is a direct result of splitting quads in post-processing. I'm no fan of doing this (expedient) way, just to avoid modifying the downstream tools that can only handle triangles. I think it's high time we paid attention to this and implement tools that can handle hybrid meshes.

@ChrisBarker-NOAA
Copy link
Collaborator

Agreed -- the goal of this project is to support the mixed tri/quads form natively, though apparently there's still a bug on that code :-(.

@felicio93: the face_node_connectivity needs to be specified on order for the following code to work.

See the UGRID spec for details:

https://ugrid-conventions.github.io/ugrid-conventions/

In this case, I think it's the

'SCHISM_hgrid_face_nodes' variable, so you would do:

assign_ugrid_topology(ds, face_node_connectivity='SCHISM_hgrid_face_nodes' )

@josephzhang8: Can you provide a small (<10MB) example of a native SCHISM output file with tris + quads -- maybe 100 or so cells, and a few timesteps, and velocity data and maybe a couple other variables as a netCDF file?

It would be really helpful to have a tiny sample like that we can put in the repo and use for tests and demos.

@josephzhang8
Copy link

@ChrisBarker-NOAA: here are some very small outputs on a hybrid mesh:

https://ccrm.vims.edu/yinglong/SVN_large_files/Scribe_IO_outputs/

@felicio93
Copy link

Hi @ChrisBarker-NOAA, thanks for sharing the function for assigning ugrid topology!
Just for the record, in addition to 'SCHISM_hgrid_face_nodes' I also had to assign lat and lon:
ds=xarray_subset_grid.grids.ugrid.assign_ugrid_topology(ds_complete,face_node_connectivity='SCHISM_hgrid_face_nodes',node_coordinates=('SCHISM_hgrid_node_x','SCHISM_hgrid_node_y'))

I am using the stofs_3d_pac.t12z.nxxx_xxx.field2d.nc files (which are triangles only, i.e. all quads are broken into triangles). When plotting the results there are gaps that I guess coincide with places where the quads were spilt (highlighted in red)?:
image
I will investigate why that is happening, maybe the split quads are written differently and thus cannot be read?

@josephzhang8, note that if I make the exact same plot, and just change the variable from temp_surface to elev, I get some weird "stripes" (the same thing I mentioned before):
image
I don't think this is a problem with the subsetting or the plotting.

@gseroka
Copy link

gseroka commented Oct 7, 2024

@felicio93 can you try a more recent file? This reminds me that there was this issue that we (Lei Shi) fixed in the semi-operational STOFS-3D-Pacific runs a couple months ago to my recollection. Also, please check if those files are still only triangles. If so, we should change that to quads/tris as STOFS-3D-Atlantic has.

@felicio93
Copy link

Greg, you are right, the newer stofs_3d_pac.t12z.field2d_n001_012.nc "elev" files do not show the same issue.

These files are triangles only, while the stofs_3d_pac.t12z.fields.out2d_n001_012.nc are tria+quad

@felicio93
Copy link

Regarding the other problem (highlighted in red here). I found a solution.

If I try to work with the stofs_3d_pac.t12z.field2d_n001_012.nc (the file that is triangles only) I get gaps in what I believe are places where quads were split into triangles during data post-processing:
image

Now, if I take the 'SCHISM_hgrid_face_nodes' from stofs_3d_pac.t12z.fields.out2d_n001_012.nc, break the quads myself using a split quads function (like this), and try to plot that with z values from stofs_3d_pac.t12z.field2d_n001_012.nc, I don't see the gaps anymore:
image

maybe there is something unexpected happening with the postprocessing steps that create the triangles for stofs_3d_pac.t12z.field2d_nXXX_XXX.nc, files?

@ChrisBarker-NOAA, this is just a suggestion but maybe if the subsetting tool was able to create an attribute "SCHISM_hgrid_face_nodes_TRIANGLESONLY" (using some slip_quad function) you wouldn't have to create something new just to handle quads. You could still subset it as a triangular mesh. Also, you could still keep the original "SCHISM_hgrid_face_nodes" on the subseted file for reference. In the final subset file you would only have to call all "SCHISM_hgrid_face_nodes" whose nodes are part of the subset (and update their id's of course).

It would also be helpful for the user to have a SCHISM_hgrid_face_nodes_TRIANGLESONLY attribute, since for plotting it in python you need to recreate the triangulation (with triangulation = tri.Triangulation(x, y, triangles=connectivity).

@ChrisBarker-NOAA
Copy link
Collaborator

@felicio93: we're trying to keep this package focused on the subsetting -- any splitting into triangles and the like should be done by pre-or-post processors, of the visualization tools, etc.

I do think we need better / cleaner tools for working with SCHISM and STOFS in particular, but they should be maintained elsewhere.

@feiye-vims
Copy link

feiye-vims commented Oct 15, 2024

@felicio93 As Greg suggested, please see this thread #146 which may have implications on your plots.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

10 participants