Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update regrid.pl for GOCART2G #228

Open
mathomp4 opened this issue Nov 3, 2021 · 19 comments
Open

Update regrid.pl for GOCART2G #228

mathomp4 opened this issue Nov 3, 2021 · 19 comments
Assignees

Comments

@mathomp4
Copy link
Member

mathomp4 commented Nov 3, 2021

Per @pcolarco and @mmanyin in email, there is a desire to regrid the new GOCART2G restarts. Looking at a set of them, the (possible) new files seem to be:

  • achem_internal
  • cabc_internal
  • cabr_internal
  • caoc_internal
  • du_internal
  • hemco_internal
  • ni_internal
  • ss_internal
  • su_internal

Now, normally we could just add these to The List™ that's in regrid.pl but it's not that simple.

I looked at these restarts with @bena-nasa and we noticed that:

  • hemco_internal is a 2D restart (no levels)
  • du_internal and ss_internal are 4D restarts (ungridded dims)

The main issue is the underlying regridding code was not set up for files like these.

So, I suppose the first question for @pcolarco or @amdasilva or @christophkeller is: Do we need to worry about hemco_internal? If not, we just just always bootstrap it and focus on the 4d restarts?

@mathomp4
Copy link
Member Author

mathomp4 commented Nov 3, 2021

CC @weiyuan-jiang, @tclune, @aoloso in re the regrid refactoring effort

@bena-nasa
Copy link
Collaborator

Adding the ability for interp_restarts.x to handle these new cases an ungridded dimension + level is straightforward from a scientific perspective (just treat each index of the ungridded dimension as a 3-D variable in it's own right and do the usual horizontal and vertical regridding. However, the code has gotten to the point that to add new capabilities like this calls for a refactoring. At least splitting out the binary and netcdf into separate codes so I can focus on adding these capabilities to the netcdf restarts without breaking the binary path. Splitting this apart and adding new capabilities is a non-trivial exercise so lead time and dedicated time to do this will be necessary.

@bena-nasa
Copy link
Collaborator

bena-nasa commented Nov 8, 2021

@mmanyin @pcolarco It had been a long time since I looked at the code; I split interp_restarts.x into a binary and NetCDF code. That was fairly straightforward. Adding support for these 4-D variables in the NetCDF version of the program is actually more straightforward than I thought. Although this will mean a multiple repo PR for regrid.pl and the underlying regridding ...

@bena-nasa
Copy link
Collaborator

bena-nasa commented Nov 15, 2021

@mmanyin @pcolarco @gmao-jstassi @mathomp4
I've updated interp_restart.x to handle these new restarts (as well as refactoring and spliting interp_restart.x into a separate binary and netcdf version to make my life easier going forward). I've confirmed it works with the new gocart2g restarts that are either 2d only or 4d with the unknown_dim + level. I've made contingent PR's for these in the FV3 and fvdycore repos.

The issue now is regrid.pl itself. It needs to know about the restart names but I think we are getting to the point where it needs some more flexibility. Since each species in gocart and each instance can have it's own restart like
cabc_internal_rst
cabr_internal_rst
caoc_internal_rst
adding these explicitly in regrid.pl seems problematic. What is somone else has other instances of ca for example?
It seems like regrid.pl needs to be able to do some sort of wild carding. Like instead of adding these explicitly, you had wild card like this:
ca*_internal_rst
and it finds any restarts of that form.
Pinging Joe in this as my perl is shaky to do this.

@weiyuan-jiang
Copy link
Contributor

Does this only affect the input name ? Is there any special option? For the future, the python regrider only cares about the names that are listed in the yaml file.

@mmanyin
Copy link
Contributor

mmanyin commented Nov 15, 2021

Sounds like good progress. Will it eventually be possible to regrid from the old GOCART 1G format to the GOCART 2G?

@pcolarco
Copy link

I would second @mmanyin. I think we would want to be able to regrid to GOCART2G from MERRA-2 for instance.

@bena-nasa
Copy link
Collaborator

Sounds like good progress. Will it eventually be possible to regrid from the old GOCART 1G format to the GOCART 2G?

@mmanyin Can you elaborate what that would entail or what that even means? Is it just a matter of splitting an old gocart 1g restart into separate restarts?

That's beyond the scope of what the underlying regridding code (interp_restarts.x) would handle, it just regrids what is there already.

It would have to be some other script but someone who understands this would have to write it or give me a precise recipe for what that operation would entail.

@bena-nasa
Copy link
Collaborator

bena-nasa commented Nov 15, 2021

I would second @mmanyin. I think we would want to be able to regrid to GOCART2G from MERRA-2 for instance.
@pcolarco what does this mean? Would every field in a gocart 1g restart (merra2 or otherwise) go to a specific field in a specific gocart2 split restart? If not, I'm not sure what this operation of going from GOCART1G to GOCART2G means.

As I said, I will not support this in interp_restarts.x, that's outside the scope of that program. If is as simple as spliting the fields into separate restarts that is another programs job, it would be a trivial python script.

As far as MERRA2 This is a can of worms. Regridding directly from MERRA2 to GOCART2G is complicated by the fast that MERRA2 was binary, so there is no metadata in the file. I painstakingly figured out the order of the fields in the MERRA2 binary restarts a long time ago and wrote a converter here to convert them from binary to NetCDF using descriptor files that document the variable order in each file:

https://github.com/bena-nasa/GEOS5_restart_converter

If you want to do something with MERRA2 in this hypothetical GOCART1G to GOCART2G operation, you would need to use my tool to convert this to NetCDF then, if there is a solution to go from GOCART1G to GOCART2G with the NetCDF file.

@pcolarco
Copy link

I understand. A boy can dream.
Splitting a netcdf legacy GOCART to GOCART2G should be straightforward with NCO or some other tool.

@mmanyin
Copy link
Contributor

mmanyin commented Nov 15, 2021

@bena-nasa I have been impressed in the past when needing to convert legacy restarts to a more recent version, that regrid.pl could identify what was missing and provide at least a bare bones set of restarts. Without really knowing what the limitations are, I was posing the general question -- can the program generate a set of G2G restarts that approximates an older G1G set? Sounds like this is not the tool for doing it.

@bena-nasa
Copy link
Collaborator

bena-nasa commented Nov 15, 2021

@bena-nasa I have been impressed in the past when needing to convert legacy restarts to a more recent version, that regrid.pl could identify what was missing and provide at least a bare bones set of restarts. Without really knowing what the limitations are, I was posing the general question -- can the program generate a set of G2G restarts that approximates an older G1G set? Sounds like this is not the tool for doing it.

No, regrid.pl just calls other program that regrid the restarts that are there using the boundary conditions it thinks are appropriate based on your answers to the questions; no more, no less. If you have a gocart_internal_rst in, you get one out.
After I fix up regrid.pl, if you have a set of restarts from gocart 2g in you get a set out.

@mmanyin @pcolarco This does bring up point, what are all the "base" component names it need to be aware of. I'm going to try to implement a wild card feature in regrid.pl
So looking in the restarts pete provided we have
ca,ss,du,ni,su
and these could potentially have multiple instances (in the restarts I have only ca actually does) that I will use the wildcard feature to find.
What am I missing if any? I'll code it to the list above so please let me know if I need to include others.

@pcolarco
Copy link

@bena-nasa Your list looks complete for GOCART2G, although note it is "cabr," "cabc," and "caoc". You might anticipate an eventual refactoring of the remainder of the legacy GOCART which would split what is still left in gocart_interal_rst into subsequent things like co, co2, ch4, ... _interal_rst.

@bena-nasa
Copy link
Collaborator

bena-nasa commented Nov 15, 2021

@bena-nasa Your list looks complete for GOCART2G, although note it is "cabr," "cabc," and "caoc". You might anticipate an eventual refactoring of the remainder of the legacy GOCART which would split what is still left in gocart_interal_rst into subsequent things like co, co2, ch4, ... _interal_rst.

@pcolarco Oh, did misread that? I thought, cabr, cabc, caoc were separate instances of one species, but now I'm reading again, that is just brown carbon, black carbon, and organic carbon. So per species there is only one restart no matter how many instances? If so then I can just hard code the names and I was making a problem out of nothing.

@pcolarco
Copy link

@bena-nasa Hmm... Each instance has its own restart. But default we are running three carbonaceous instances: brown, black, and organic carbon. We need to be able to regrid each such instance. Some guidance then for how to handle multiple instances for later (i.e., so far not tried out) cases will be helpful. Does that make sense?

@bena-nasa
Copy link
Collaborator

@pcolarco @jstassi I was only asking about the instances because currently the way regrid.pl works is that is has hard coded restart names it looks for. So if the name is not in the list, it won't regrid it. So this could be a problem if someone runs a new gocart case with multiple instances for example and wants to regrid those but the script is unaware.
I see two solutions possible solutions for this in regrid.pl

  1. Allow the user to specify "extra" restarts that the program is unaware of on the command line or when answering the questions when running interactively
  2. Implement some wildcard type thing where the program is given something like this ca*_internal_rst and it tries to find all the restarts that match that pattern. This would require the instance pattern to have been added and for the instances to be consistently named (I've never run gocart2g so how it works is a total mystery to me).

It sounds like for now the PR I've made handles the current uses but still needs another extension. Any thoughts on which method sounds better as an end user?

@pcolarco
Copy link

@bena-nasa Hey, Ben, is this functionality available now in some more recent model version for me to try out? Thanks

@mathomp4
Copy link
Member Author

@pcolarco I think it should be in GEOSgcm v10.21.0 for sure.

@pcolarco
Copy link

pcolarco commented Jan 13, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants