Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop LandUse x PFT static data file tool #30

Open
1 task done
glemieux opened this issue Aug 22, 2023 · 11 comments
Open
1 task done

Develop LandUse x PFT static data file tool #30

glemieux opened this issue Aug 22, 2023 · 11 comments

Comments

@glemieux
Copy link
Owner

glemieux commented Aug 22, 2023

For the upcoming no-comp version of NGEET#1040, we need the landuse x pft mapping data from https://gdex.ucar.edu/dataset/188b_oleson/file.html and a tool to concatenate the data into a single file. Note that this data is already present on Cheyenne under /cesm_tools/clm5landusedatatool/CLM5_CURRENT_GEOG_DEG025/.

  • Double check that this data is not already available in some other format.
@glemieux glemieux changed the title Develop LandUse x PFT static data file tool and associate API code Develop LandUse x PFT static data file tool and associated API code Aug 22, 2023
@glemieux
Copy link
Owner Author

glemieux commented Aug 22, 2023

When reviewing the available infrastructure for pulling in static files, @ekluzek recommended utilizing streams, which can be used to import static data as well (e.g. ExcessIceStreamType).

There are MCT and ESMF versions of the stream code, so would appear to be nominally compatible with both HLMs, although as always, things will depend on E3SM's eventual move towards the MOAB coupler.

@ckoven
Copy link
Collaborator

ckoven commented Aug 23, 2023

I guess one question is what are the relative advantages to doing things via the streams approach versus the advantages of using the same method for the land use and the land use x PFT datasets. I think there is some risk in doing anything (in particular regridding) differently between the two datasets, since it seems that one could end up in a situation where the regridded land use x PFT dataset doesn't correspond as well as it should to the land use dataset itself.

@ckoven
Copy link
Collaborator

ckoven commented Sep 1, 2023

One thing that occurred to me this morning, particularly in the interest of designing this new expansion of the interface scope to require as infrequent HLM-side updates in the future as possible, would be to design this slightly differently from what we have done in ESCOMP/CTSM#2076 and E3SM-Project/E3SM#5760 in one specific way. Instead of looping over a hard-coded list of variable names (which makes sense in the LUH2 data itself, since that is the full list of fields in LUH2 itself), I would suggest that the HLM-side code instead loop over all variables in the netcdf file that have the desired dimensionality (lat x lon x PFT) and pass to FATES both a vector of the variable names and a vector of the data for each variable in that gridcell. That way, if we want to do things like more finely resolve the land use categories in FATES, or try to align the crop types with PFTs as described in NGEET#1061, etc, we should be able to restrict those future changes to the just FATES code itself (i.e. the python codes that makes the netcdf file itself, and then the fortran codes that receive the info).

@ekluzek
Copy link

ekluzek commented Sep 1, 2023

@ckoven that makes sense to me as well. Long variable lists can periodically change and hence a "recipe" can be a better way to handle it. Do we expect the variable list to change in the future or is it unlikely? It sounds like the proposed recipe is unlikely to change for sure.

Actually, maybe a slightly different thing might make sense -- similar to what we do in CTSM for init_interp. We have metadata on the fields to tell init_interp how to process them. So you could add metadata on the fields that need to be processed by the HLM about should be done with them. Obviously different here, but you could add some attribute to each field to signal it should be handled by the HLM.

Just an idea...

Thanks for thinking about maintenance on the API...

@ckoven
Copy link
Collaborator

ckoven commented Sep 1, 2023

Thanks @ekluzek. We could do something like that, so loop over all variables with some metadata attribute rather than loop over all variables with a given dimensionality. Though if we did then add a new variable to the netcdf file that had that attribute but a different dimensionality, then the fortran code would still need to be modified to know how to handle that different dimensionality. So I think the two approaches would end up being equivalent?

I don't think we expect these variables to change, the thing I'd maybe want to try to allow for is switching from method 3.ii (which is how I've now written the relevant FATES code) to method 3.iv in NGEET#1061

@ekluzek
Copy link

ekluzek commented Sep 1, 2023

One advantage of the metadata approach is so that if you add fields that have the right dimensions -- but shouldn't be in the list -- they'll be appropriately handled without adding logic that says "this dimensionality -- but NOT that name". The other advantage of the metadata is that it's human readable, so I can look at the file and know how this is going to happen even if I have no clue about this stuff. We did originally handle interpolation in CLM with dimensionality, but then started to have things that broke the rules. It's been better, clearer, more flexible and more robust to have it in the metadata.

If the list is going to be pretty static it won't matter much, and as you point out new dimensional fields will require code changes anyway. I still think it's good to use metadata for readability to everyone though.

@glemieux
Copy link
Owner Author

glemieux commented Sep 1, 2023

Both ideas sound reasonable to me. I have a few followup questions:

  1. @ekluzek is this the initInterp the particular routine that you mentioned?
  2. Is there a difference in the approaches (including the optioned streams method) that makes one more applicable over the other (basically a reiteration of Charlie's previous question).

I didn't not this directly, but I should state that from a data management stand point, I think having two separate files for for the states+transitions+management data and the landuse x pft static file is preferential. Given that assumption, and the stated desire to have consistency in the method of can we utilize an existing hlm method for both static and time-series data?

@ekluzek
Copy link

ekluzek commented Sep 1, 2023

Yes on line 467 in that file it reads in "interpinic_flag" for the variable and determines what should happen based on its value. I imagine something similar would be done, although it wouldn't have the same flag values.

On the second part you have a good question we should think about and I didn't get back to Charlie's previous question. So let me make sure I also answer that.

A short answer is that at least in CTSM we have custom code to read in both static fields on the fsurdat file and the time-series file flanduse.timeseries. That custom code could be used for these files, but it's not very extensible or flexible. And likely will need to be custom for these files. That's a lot of why I like to suggest stream files if they can be used, because they have more extensive testing as well as flexibility built into them. But, let me also give a longer answer...

@glemieux
Copy link
Owner Author

Noting that per discussion from the fates software meeting we agreed that for this initial version, we will use a simple import method.

@glemieux glemieux changed the title Develop LandUse x PFT static data file tool and associated API code Develop LandUse x PFT static data file tool Sep 19, 2023
@glemieux
Copy link
Owner Author

glemieux commented Sep 19, 2023

@ckoven the list of percentages from the tool to pass to fates are forest, pasture and other correct? Is the bareground value necessary as well?

@ckoven
Copy link
Collaborator

ckoven commented Sep 19, 2023

The list of variables that we want is the following.primary, secondary, pasture, rangeland, and current-surfdata or similar wording, all with a PFT dimension. And then yes also a bareground map that does not have a PFT dimension.

Each of these is calculated in either cell 4 or 5 of https://github.com/ckoven/clm_landusedata/blob/main/clm_landusedata.ipynb and plotted in subsequent cells:

lat x lon x PFT variables:

  • Primary: primary_secondary_percent calculated in cell 5, plotted in cell 7
  • Secondary: also primary_secondary_percent calculated in cell 5, plotted in cell 7
  • Pasture: pasture_pft_percent calculated in cell 4, plotted in cell 8
  • Rangeland: other_pft_percent calculated in cell 4, plotted in cell 9
  • current-surfdata: current_surfdata_percent calculated in cell 4, plotted in cell 12

lat x lon variable:

  • bareground: bareground_percent calculated in cell 4, plotted in cell 6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants