Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FATES LUH2 data curation tool #1032

Merged
merged 55 commits into from
Aug 3, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
a453cba
Add luh2 module and conda environment yaml file
glemieux Mar 1, 2023
fda4aff
Add function to update luh2 time units
glemieux Mar 1, 2023
c7871a9
Update conda environment for luh2 tools
glemieux Mar 1, 2023
f464b7b
Adding functionality notes
glemieux Mar 2, 2023
202b09c
Start bringing in Charlie's xesmf code
glemieux Mar 7, 2023
6ef45bb
Update yml environment file
glemieux Mar 13, 2023
4c9a47f
Continue converting Charlie's script
glemieux Mar 14, 2023
134ed79
Update notes
glemieux Mar 14, 2023
da4a06d
Update comments
glemieux Mar 16, 2023
2b7364e
Renamed luh2 module and added primary script
glemieux Mar 20, 2023
6a32ff0
Moving script into module function
glemieux Mar 31, 2023
d3e347f
Intermediate improvements
glemieux Mar 31, 2023
f334f22
Fixup
glemieux Mar 31, 2023
f8fd35f
Update to xarray 0.19.0
glemieux Apr 6, 2023
0490748
Truncate to the yml file and update prep function
glemieux Apr 6, 2023
91ef322
more intermediate updates
glemieux Apr 7, 2023
5b65edd
Add command line argument parsing
glemieux Apr 7, 2023
05f52dd
continuing to add updates and debug
glemieux Apr 21, 2023
bf07950
add year variable as copy of time dimension
glemieux May 1, 2023
d879aea
converting variables to snake case
glemieux May 4, 2023
848bbea
adjust regridding methods
glemieux May 4, 2023
c2e4e24
various fixes and improvements
glemieux May 5, 2023
99eac94
Separated luh2 module and luh2 call script
glemieux May 5, 2023
b04fd2e
More updates
glemieux May 8, 2023
472eaa6
start working on allowing read in of regridder file
glemieux May 8, 2023
2e1c736
Fix regridding loop
glemieux May 9, 2023
92a57d0
Remove unnecessary xarray import call
glemieux May 9, 2023
5c72c2f
Adding state correction function
glemieux May 12, 2023
0b86cd2
Add handling of regridder argument option
glemieux May 13, 2023
430bd03
switch the importdata and prepdataset functions
glemieux May 13, 2023
3a31c2a
Correct some issues
glemieux May 13, 2023
9af66f0
adding type checks
glemieux May 13, 2023
23dabb7
adding check against the time variable for the input dataset
glemieux May 13, 2023
61306a0
fix import function
glemieux May 13, 2023
4e830a8
fix argument call
glemieux May 13, 2023
e4b43ac
update check function outputs
glemieux May 13, 2023
f22540d
Add bash script
glemieux May 16, 2023
eb79dd8
refactoring luh2 module
glemieux May 17, 2023
1941e66
Add merge flag to import data
glemieux May 18, 2023
1cee030
fix order of operations
glemieux May 18, 2023
4eadfb7
add commands to remove intermediate file copies
glemieux May 18, 2023
e6278f6
move all luh2 code into named folder
glemieux May 18, 2023
28bcafc
convert YEAR from cftime object to number
glemieux May 23, 2023
a178f9d
bugfixes to reduce memory usage and interpret cftime reference point
ckoven May 31, 2023
846f7d3
Merge pull request #28 from ckoven/fates-luh2_data
glemieux May 31, 2023
d652c21
implement decode_times=False
glemieux Jul 24, 2023
14153c5
remove call to attribute update in shell script
glemieux Jul 24, 2023
35dee92
reworking luh2 shell script call
glemieux Jul 24, 2023
ec19393
fix choice range and add shell script input options
glemieux Jul 28, 2023
67b8d01
Add comments and remove unnecessary dependence to yaml file
glemieux Aug 1, 2023
63124fd
add luh2 readme
glemieux Aug 1, 2023
e44bc90
remove old comments
glemieux Aug 1, 2023
4615b8a
Add system exit if unrecognized file type provided as argument
glemieux Aug 1, 2023
8cbb7ab
update luh2 tool argparse options and time handling
glemieux Aug 2, 2023
9cbfe4f
update the usage description in luh2.py
glemieux Aug 2, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions tools/luh2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# FATES LUH2 data tool README

glemieux marked this conversation as resolved.
Show resolved Hide resolved
## Purpose

This tool takes the raw Land Use Harmonization (https://luh.umd.edu/), or LUH2, data files as
input and prepares them for use with FATES. The tool concatenates the various raw data sets into
a single file and provides the ability to regrid the source data resolution to a target
resolution that the user designates. The output data is then usable by FATES, mediated through
a host land model (currently either CTSM or E3SM).

For more information on how FATES utilizes this information see https://github.com/NGEET/fates/pull/1040.

## Installation

This tool requires the usage of conda with python3. See https://docs.conda.io/en/latest/miniconda.html#installing
for information on installing conda on your system. To install the conda environment necessary to run the tool
execute the following commands:

conda env create -f conda-luh2.yml

This will create a conda environment named "luh2". To activate this environment run:

conda activate luh2

For more information on creating conda environments see
https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-from-an-environment-yml-file

Note that it is planned that a subset of host land model (hlm) and hlm supported machines will incoporate this tool into the surface dataset workflow.
As such, if you are working on one of these machines, the output from this tool may be precomputed and available for the grid resolution of interest.

## Usage

After activating the "luh2" environment the tool can be run from the command line with the following minimum required inputs:

python luh2.py -l <raw-luh2-datafile> -s <luh2-static-datafile> -r <regrid-targetfile> -w <regridder-output> -o <outputfile>

The description of the minimum required input arguments is as follows:
- raw-luh2-datafile: this is one of three raw luh2 datafiles, either states, transitions, or management. This is the data to be regridded and used by FATES.
- luh2-static-datafile: supplementary 0.25 deg resolution static data used in the construction of the raw luh2 datafiles. This is utilized to help set the gridcell mask for the output file.
- regrid-targetfile: host land model surface data file intended to be used in conjunction with the fates run at a specific grid resolution. This is used as the regridder target resolution.
- regridder-output: the path and filename to write out the regridding weights file or to use an existing regridding weights file.
- outputfile: the path and filename to which the output is written

The tool is intended to be run three times, sequentially, to concatenate the raw states, transitions, and management data into a single file. After the first run of
the tool, a merge option should also be included in the argument list pointing to the most recent output file. This will ensure that the previous regridding run
will be merged into the current run as well as reusing the previously output regridding weights file (to help reduce duplicate computation).
The luh2.sh file in this directory provides an example shell script in using the python tool in this sequential manner. The python tool itself provides additional
help by passing the `--help` option argument to the command line call.

## Description of directory contents

- luh2.py: main luh2 python script
- luh2mod.py: python module source file for the functions called in luh2.py
- luh2.sh: example bash shell script file demonstrating how to call luh2.py
- conda-luh2.yml: conda enviroment yaml file which defines the minimum set of package dependencies for luh2.py
11 changes: 11 additions & 0 deletions tools/luh2/conda-luh2.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# This yaml file is intended for users who wish to utilize the luh2.py tool on their own machines.
# The file is not yet tested regularly to determine if the latest versions of the dependencies will
# always work. This regular testing is expected to be implemented in the future.
name: luh2
channels:
- conda-forge
- defaults
dependencies:
- xesmf
# xarray which is autodownloaded as xesmf dependency, uses scipy, which needs netcdf4 to open datasets
- netcdf4
138 changes: 138 additions & 0 deletions tools/luh2/luh2.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
#!/usr/bin/env python3

# LUH2 python script
# Usage: python luh2.py -l <raw-luh2-datafile> -s <luh2-static-datafile> \
# -r <regrid-targetfile> -w <regridder-output> -o <outputfile>

glemieux marked this conversation as resolved.
Show resolved Hide resolved
import argparse, os, sys
from luh2mod import ImportData, SetMaskLUH2, SetMaskSurfData
from luh2mod import RegridConservative, RegridLoop, CorrectStateSum

# Add version checking here in case environment.yml not used
def main():

# Add argument parser - subfunction? Seperate common module?
# input_files and range should be the only arguments
# Allow variable input files (state and/or transitions and/or management)
args = CommandLineArgs()

# Import and prep the LUH2 datasets and regrid target
ds_luh2 = ImportData(args.luh2_file,args.begin,args.end)
ds_regrid_target = ImportData(args.regridder_target_file,args.begin,args.end)

# Import the LUH2 static data to use for masking
ds_luh2_static = ImportData(args.luh2_static_file)

# Create new variable where the ice water fraction is inverted w
ds_luh2_static["landfrac"] = 1 - ds_luh2_static.icwtr

# Mask all LUH2 input data using the ice/water fraction for the LUH2 static data
ds_luh2 = SetMaskLUH2(ds_luh2, ds_luh2_static)
ds_luh2_static = SetMaskLUH2(ds_luh2_static, ds_luh2_static)

# Mask the regrid target
ds_regrid_target = SetMaskSurfData(ds_regrid_target)

# Determine if we are saving a new regridder or using an old one
# TO DO: add check to handle if the user enters the full path
# TO DO: check if its possible to enter nothing with the argument
regrid_reuse = False
# If we are merging files together, we assume that the weights file
# being supplied exists on file
if (not isinstance(args.luh2_merge_file,type(None))):
regrid_reuse = True

# Regrid the luh2 data to the target grid
# TO DO: provide a check for the save argument based on the input arguments
regrid_luh2,regridder_luh2 = RegridConservative(ds_luh2, ds_regrid_target,
args.regridder_weights, regrid_reuse)

# Regrid the inverted ice/water fraction data to the target grid
regrid_land_fraction = regridder_luh2(ds_luh2_static)

# Adjust the luh2 data by the land fraction
# TO DO: determine if this is necessary for the transitions and management data
regrid_luh2 = regrid_luh2 / regrid_land_fraction.landfrac

# Correct the state sum (checks if argument passed is state file in the function)
regrid_luh2 = CorrectStateSum(regrid_luh2)

# Add additional required variables for the host land model
# Add 'YEAR' as a variable.
# This is an old requirement of the HLM and should simply be a copy of the `time` dimension
# If we are merging, we might not need to do this, so check to see if its there already
if (not "YEAR" in list(regrid_luh2.variables)):
regrid_luh2["YEAR"] = regrid_luh2.time
regrid_luh2["LONGXY"] = ds_regrid_target["LONGXY"] # TO DO: double check if this is strictly necessary
regrid_luh2["LATIXY"] = ds_regrid_target["LATIXY"] # TO DO: double check if this is strictly necessary

# Rename the dimensions for the output. This needs to happen after the "LONGXY/LATIXY" assignment
if (not 'lsmlat' in list(regrid_luh2.dims)):
regrid_luh2 = regrid_luh2.rename_dims({'lat':'lsmlat','lon':'lsmlon'})

# Merge existing regrided luh2 file with merge input target
# TO DO: check that the grid resolution
# We could do this with an append during the write phase instead of the merge
if (not(isinstance(args.luh2_merge_file,type(None)))):
ds_luh2_merge = ImportData(args.luh2_merge_file,args.begin,args.end,merge_flag=True)
#ds_luh2_merge = ds_luh2_merge.merge(regrid_luh2)
regrid_luh2 = regrid_luh2.merge(ds_luh2_merge)

# Write the files
# TO DO: add check to handle if the user enters the full path
output_file = os.path.join(os.getcwd(),args.output)
print("generating output: {}".format(output_file))
regrid_luh2.to_netcdf(output_file)

def CommandLineArgs():
glemieux marked this conversation as resolved.
Show resolved Hide resolved

parser = argparse.ArgumentParser(description="placeholder desc")

# Required input luh2 datafile
# TO DO: using the checking function to report back if invalid file input
parser.add_argument("-l","--luh2_file",
required=True,
help = "luh2 raw states, transitions, or management data file")

# Required static luh2 data to get the ice/water fraction for masking
parser.add_argument("-s", "--luh2_static_file",
required=True,
help = "luh2 static data file")

# File to use as regridder target (e.g. a surface dataset)
parser.add_argument("-r","--regridder_target_file",
required=True,
help = "target file with desired resolution to regrid luh2 data to")

# Filename to use or save for the regridder weights
parser.add_argument("-w", "--regridder_weights",
default = 'regridder.nc',
help = "filename of regridder weights to write to or reuse (if -m option used)")

# Optional input to subset the time range of the data
# TODO: add support for parsing the input and checking against the allowable date range
parser.add_argument("-b","--begin",
type = int,
default = None,
help = "beginning of date range of interest")
parser.add_argument("-e","--end",
type = int,
default = None,
help = "ending of date range to slice")

# Optional output argument
parser.add_argument("-o","--output",
default = 'LUH2_timeseries.nc',
help = "output filename")

# Optional merge argument to enable merging of other files
parser.add_argument("-m", "--luh2_merge_file",
default = None,
help = "previous luh2 output filename to merge into current run output")

args = parser.parse_args()

return(args)

if __name__ == "__main__":
main()
57 changes: 57 additions & 0 deletions tools/luh2/luh2.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
#!/bin/bash
# WARNING: This script generates intermediate copies of the LUH2
# data which at its peak takes up approximately 42G of space.
#
# Note that this script must be run with the luh2 conda environment
# It requires a single argument that points to the full path location
# of the luh2 data and the dataset to regrid against

# LUH2 data names
DATA_LOC=$1
STATIC_LOC=$2
TARGET_LOC=$3
OUTPUT_LOC=$4
STATES_FILE=states.nc
TRANSITIONS_FILE=transitions.nc
MANAGE_FILE=management.nc
glemieux marked this conversation as resolved.
Show resolved Hide resolved
STATIC_FILE=staticData_quarterdeg.nc
REGRID_TARGET_FILE=surfdata_4x5_16pfts_Irrig_CMIP6_simyr2000_c170824.nc

START=1850
END=2015

# Save files
REGRID_SAVE=regridder.nc
OUTPUT_FILE=LUH2_historical_0850_2015_4x5.nc

# Combine strings
STATES=${DATA_LOC}/${STATES_FILE}
TRANSITIONS=${DATA_LOC}/${TRANSITIONS_FILE}
MANAGE=${DATA_LOC}/${MANAGE_FILE}
STATIC=${STATIC_LOC}/${STATIC_FILE}
REGRID_TARGET=${TARGET_LOC}/${REGRID_TARGET_FILE}
REGRIDDER=${OUTPUT_LOC}/${REGRID_SAVE}

# Comment this out if the user already has the modified datasets available

# Regrid the luh2 data against a target surface data set and then remove the states_modified file
echo "starting storage"
du -h ${OUTPUT_LOC}
python luh2.py -b ${START} -e ${END} -l ${STATES} -s ${STATIC} -r ${REGRID_TARGET} -w ${REGRIDDER} -o ${OUTPUT_LOC}/states_regrid.nc
echo -e"storage status:\n"
du -h ${OUTPUT_LOC}

# Regrid the luh2 transitions data using the saved regridder weights file and merge into previous regrid output
python luh2.py -b ${START} -e ${END} -l ${TRANSITIONS} -s ${STATIC} -r ${REGRID_TARGET} -w ${REGRIDDER} \
-m ${OUTPUT_LOC}/states_regrid.nc -o ${OUTPUT_LOC}/states_trans_regrid.nc
echo -e"storage status:\n"
du -h ${OUTPUT_LOC}
rm ${DATA_LOC}/states_regrid.nc

# Regrid the luh2 management data using the saved regridder file and merge into previous regrid output
python luh2.py -b ${START} -e ${END} -l ${MANAGE} -s ${STATIC} -r ${REGRID_TARGET} -w ${REGRIDDER} \
-m ${OUTPUT_LOC}/states_trans_regrid.nc -o ${OUTPUT_LOC}/${OUTPUT_FILE}
echo -e"storage status:\n"
du -h ${OUTPUT_LOC}
rm ${OUTPUT_LOC}/states_trans_regrid.nc
rm ${REGRIDDER}
Loading