-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Viewer support #9
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar I'm trying to port over E3SM-Project/zppy#616 to post-refactored code. It's actually an excellent example of why it was a good idea to pull the code out into a separate package in the first place: there are three packages that my editor is showing as unfound, meaning pre-refactor I must have just been picking them up from Unified.
Is every Unified dependency included in https://github.com/E3SM-Project/e3sm-unified/blob/main/recipes/e3sm-unified/meta.yaml? I'm wondering if some of these packages are inside others. (I see output_viewer
but not bs4
or distutils
).
@@ -1,7 +1,10 @@ | |||
# Script to plot some global atmosphere and ocean time series | |||
import csv | |||
import distutils.dir_util |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could not be resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should not be using distutils anymore. Try to find a python 3.13 compatible alternative to this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've asked @altheaden if she would be willing to look for the preferred replacement for this function. Hopefully, she can come up with something.
from output_viewer.build import build_page, build_viewer | ||
from output_viewer.index import ( | ||
OutputFile, | ||
OutputGroup, | ||
OutputIndex, | ||
OutputPage, | ||
OutputRow, | ||
) | ||
from output_viewer.utils import rechmod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output_viewer
could not be resolved.
No, that only shows direct dependencies. There are also dependencies-of-dependencies and so on. There isn't a great way to know what those are or what their constraints are. |
Is bs4 a different implementation of beautiful_soup? If so, what a mess! |
|
Glad that this work is proving its value! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now have a viewer index pointing to both the atm and lnd viewers! (Images below).
However, I have 5 remaining issues I need some help on (described in this review's comments).
- @chengzhuzhang -- Issues 2,3
- @tomvothecoder -- Issues 1,2,4,5
- @xylar -- Issue 4 (and maybe some of the others too)
Thanks!
.github/workflows/build_workflow.yml
Outdated
@@ -47,6 +47,7 @@ jobs: | |||
# since the action is run on a branch in detached head state. | |||
# This is the equivalent of running "pre-commit run --all-files" locally. | |||
# If you commit with the `--no-verify` flag, this check may fail. | |||
# TODO: this doesn't seem to run when I run `git commit` locally. I always have to run `pre-commit run --all-files` manually. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue 1: When I run git commit
it doesn't actually do the pre-commit checks. I don't get the passed
messages. If I run pre-commit run --all-files
afterward, there are in fact changes made, which I then need to amend in to the commit if I didn't think to run pre-commit run --all-files
before committing. I'm not sure why it's not working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomvothecoder Not sure if you've had a chance to look at this yet, but I'm not quite sure how to fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you run pre-commit install
beforehand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that fixes this, thanks!
header = True | ||
# TODO: how do we make sure the csv is actually accessible???? | ||
# The current directory is where we ran the code from, which is not necessarily where the csv is. | ||
csv_path = INCLUSIONS_DIR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue 4a: See Issue 4 comment.
annual_average_dataset_for_var: xarray.core.dataset.Dataset | ||
if metric == Metric.AVERAGE: | ||
annual_average_dataset_for_var: xarray.core.dataset.Dataset = ( | ||
self.f.temporal.group_average(var, "year") | ||
annual_average_dataset_for_var = self.f.temporal.group_average( | ||
var, "year" | ||
) | ||
data_array = annual_average_dataset_for_var.data_vars[var] | ||
elif metric == Metric.TOTAL: | ||
annual_average_dataset_for_var = self.f.temporal.group_average( | ||
var, "year" | ||
) | ||
data_array = annual_average_dataset_for_var.data_vars[var] | ||
# import pprint | ||
# pprint.pprint( | ||
# f"annual_average_dataset_for_var attributes={annual_average_dataset_for_var.attrs}" | ||
# ) | ||
# pprint.pprint(f"data_array attributes={data_array.attrs}") | ||
# data_array *= area*landfrac | ||
# TODO: Determine how to get area and landfrac |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue 2: I know we're aiming for data_array *= area*landfrac
for the Metric.TOTAL
calculation, but I'm still a little confused about how to extract those. Something like the following?
self.f.temporal.group_average(var, "area")
self.f.temporal.group_average(var, "landfrac")
But those are known scalars, not computed averages, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use xarray
to extract area
or landfrac
-- specify variable with []
for indexing. In annual_average_dataset_for_var.data_vars[var]
, instead of var
use "area" or "landfrac"
def coupled_global(parameters: Parameters) -> None: | ||
requested_variables = RequestedVariables(parameters) | ||
for rgn in parameters.regions: | ||
run(parameters, requested_variables, rgn) | ||
plots_per_page = parameters.nrows * parameters.ncols | ||
# TODO: Is this how we want to determine when to make a viewer or should we have a `make_viewer` parameter in the cfg? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue 3: this is a design decision question. I guess it would be best to have a specific make_viewer
parameter rather than relying on a user specifying they want 1 row and 1 column in the parameters. It's possible a user would want single plot images, but not a Viewer. (Although, the inverse doesn't work; we need 1x1 images to use in the Viewer). Once #3 is implemented too, that will make parameter specification less clunky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon further thought, I think I should just go ahead and add the make_viewer
parameter. It's clearer to understand, and allows for the case of wanting single plot images but not a Viewer.
@@ -0,0 +1,42 @@ | |||
from enum import Enum | |||
|
|||
# TODO: how to determine this automatically? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue 4: There are two cases where we need to access non-python files. (See the Issue 4a and 4b comments). The code can't seem to access these files without me hard-coding a path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@forsyth2, I think you need a manifest.in.
@altheaden and I worked hard to find an alternative with luck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, good to know, thanks!
# import sys | ||
# logger.debug(f"sys.prefix: {sys.prefix}, ls sys.prefix: {os.listdir(sys.prefix)}") | ||
# TODO: figure out install_path | ||
install_path: str = INCLUSIONS_DIR | ||
path: str = os.path.join(install_path, "index_template.html") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue 4b: See Issue 4 comment.
E3SM Diags uses sys.prefix
(INSTALL_PATH = os.path.join(sys.prefix, "share/e3sm_diags/")
in e3sm_diags/__init__.py
, but there's no zppy-interfaces
in my {sys.prefix}/share
dir)
@@ -0,0 +1,126 @@ | |||
import os |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhangshixuan1987 For E3SM-Project/zppy#647, this Viewer file may be useful for generating Viewers for PCMDI Diags. (This will be in the main
branch of zppy-interfaces
after this PR merges).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar I tried adding MANIFEST.in
in the latest commit (b9d16ca), but I'm still getting FileNotFoundError: [Errno 2] No such file or directory
.
I just need to do the pip install .
step, right? Or does this need to be loaded into conda? https://github.com/E3SM-Project/polaris/pull/64/files appears to literally just add MANIFEST.in
and nothing else.
MANIFEST.in
Outdated
recursive-include zppy-interfaces zppy_interfaces/global_time_series/zppy_land_fields.csv | ||
recursive-include zppy-interfaces zppy_interfaces/global_time_series/index_template.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried zppy-interfaces
, zppy_interfaces
, and zi-global-time-series
here.
MANIFEST.in
Outdated
recursive-include zppy-interfaces zppy_interfaces/global_time_series/zppy_land_fields.csv | ||
recursive-include zppy-interfaces zppy_interfaces/global_time_series/index_template.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This command is for recursively including all files of a given type in the package:
recursive-include zppy-interfaces zppy_interfaces/global_time_series/zppy_land_fields.csv | |
recursive-include zppy-interfaces zppy_interfaces/global_time_series/index_template.html | |
recursive-include zppy_interfaces *.csv | |
recursive-include zppy_interfaces *.html |
This is not the syntax to use for including individual files, which I would not recommend. You should not have cvs or html files in the python package that you don't want to include in the conda package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar Hmm I'm still getting the same error even with this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@forsyth2, I see. I think the problem is 2-fold and I hadn't realized that. I think we have solved the first problem. Can you verify that, after calling pip install .
, you can find the place where zppy-interfaces
is installed and that both the csv
and html
file are included there along with the python files? I think that will be the case with the MANIFEST.in
as we have it now.
The next problem is that you are looking for files in a location that e3sm_diags would move them to, but this is not how zppy-interfaces works. You need to use importlib-resources to find the files within the zppy-interfaces
package. I will try to make the relevant suggested changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is how I would handle finding the csv and html files in the package. INCLUSIONS_DIR
should not be needed.
from bs4 import BeautifulSoup | ||
|
||
from zppy_interfaces.global_time_series.coupled_global_utils import ( | ||
INCLUSIONS_DIR, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
INCLUSIONS_DIR, |
td.append(a) | ||
row_obj.append(td) | ||
|
||
path: str = os.path.join(INCLUSIONS_DIR, "index_template.html") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
path: str = os.path.join(INCLUSIONS_DIR, "index_template.html") | |
path: str = str(imp_res.files("zppy_interfaces.global_time_series") / | |
"index_template.html") |
# Relies on MANIFEST.in to include files | ||
INCLUSIONS_DIR = "zppy_interfaces/global_time_series" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Relies on MANIFEST.in to include files | |
INCLUSIONS_DIR = "zppy_interfaces/global_time_series" |
|
||
from zppy_interfaces.global_time_series.coupled_global_plotting import make_plot_pdfs | ||
from zppy_interfaces.global_time_series.coupled_global_utils import ( | ||
INCLUSIONS_DIR, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
INCLUSIONS_DIR, |
def construct_land_variables(requested_vars: List[str]) -> List[Variable]: | ||
var_list: List[Variable] = [] | ||
header = True | ||
with open(f"{INCLUSIONS_DIR}/zppy_land_fields.csv", newline="") as csv_file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with open(f"{INCLUSIONS_DIR}/zppy_land_fields.csv", newline="") as csv_file: | |
path: csv_filename = str(imp_res.files("zppy_interfaces.global_time_series") / | |
"zppy_land_fields.html") | |
with open(csv_filename, newline="") as csv_file: |
@xylar Thanks, those changes work! |
Wonderful! Glad I could help. |
# ['AR', 'time_bounds', 'CWDC', 'FSH', 'GPP', 'H2OSNO', 'HR', 'LAISHA', 'LAISUN', 'NBP', 'QINTR', 'QOVER', 'QRUNOFF', 'QSOIL', 'QVEGE', 'QVEGT', 'RH2M', 'SOIL1C', 'SOIL2C', 'SOIL3C', 'SOIL4C', 'SOILWATER_10CM', 'TOTLITC', 'TOTVEGC', 'TSA', 'WOOD_HARVESTC', 'lon_bnds', 'lat_bnds'] | ||
# TODO: looks like we don't actually have area or landfrac in the dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looked like we didn't have area
or landfrac
in our test input data (see the results of this logger line).
I tried adding the extra_vars = "area,landfrac"
line below in the zppy
cfg
, to produce a new set of test input data, but zppy-interfaces is still showing that the key "area" doesn't exist: [ERROR]: coupled_global.py(set_var:184) >> "No variable named 'area'.
[[ lnd_monthly_glb ]]
extra_vars = "area,landfrac"
frequency = "monthly"
input_files = "elm.h0"
input_subdir = "archive/lnd/hist"
mapping_file = "glb"
vars = "FSH,RH2M,LAISHA,LAISUN,QINTR,QOVER,QRUNOFF,QSOIL,QVEGE,QVEGT,SOILWATER_10CM,TSA,H2OSNO,TOTLITC,CWDC,SOIL1C,SOIL2C,SOIL3C,SOIL4C,WOOD_HARVESTC,TOTVEGC,NBP,GPP,AR,HR"
years = "1985:1995:5",
These variables exist in the zppy
input though:
cd /lcrc/group/e3sm2/ac.wlin/E3SMv3/v3.LR.historical_0051/archive/lnd/hist
ncdump -h v3.LR.historical_0051.elm.h0.1850-01.nc | grep "float area"
# float area(lat, lon) ;
ncdump -h v3.LR.historical_0051.elm.h0.1850-01.nc | grep "float landfrac"
# float landfrac(lat, lon) ;
Maybe they need to be added to vars
rather than extra_vars
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@forsyth2 do you have a copy of the global time series files that ncclimo
generated? I thought both should be fixed variables included there, if not we should try include both as extra variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you have a copy of the global time series files that ncclimo generated
Yeah, that's the input data I'm testing on.
$ cd /lcrc/group/e3sm/ac.forsyth2/zi-test-input-data/post_until_20241203/lnd/glb/ts/monthly/5yr
$ ls
AR_198501_198912.nc H2OSNO_199001_199412.nc QINTR_198501_198912.nc QVEGE_199001_199412.nc SOIL3C_198501_198912.nc TOTVEGC_199001_199412.nc
AR_199001_199412.nc HR_198501_198912.nc QINTR_199001_199412.nc QVEGT_198501_198912.nc SOIL3C_199001_199412.nc TSA_198501_198912.nc
CWDC_198501_198912.nc HR_199001_199412.nc QOVER_198501_198912.nc QVEGT_199001_199412.nc SOIL4C_198501_198912.nc TSA_199001_199412.nc
CWDC_199001_199412.nc LAISHA_198501_198912.nc QOVER_199001_199412.nc RH2M_198501_198912.nc SOIL4C_199001_199412.nc WOOD_HARVESTC_198501_198912.nc
FSH_198501_198912.nc LAISHA_199001_199412.nc QRUNOFF_198501_198912.nc RH2M_199001_199412.nc SOILWATER_10CM_198501_198912.nc WOOD_HARVESTC_199001_199412.nc
FSH_199001_199412.nc LAISUN_198501_198912.nc QRUNOFF_199001_199412.nc SOIL1C_198501_198912.nc SOILWATER_10CM_199001_199412.nc
GPP_198501_198912.nc LAISUN_199001_199412.nc QSOIL_198501_198912.nc SOIL1C_199001_199412.nc TOTLITC_198501_198912.nc
GPP_199001_199412.nc NBP_198501_198912.nc QSOIL_199001_199412.nc SOIL2C_198501_198912.nc TOTLITC_199001_199412.nc
H2OSNO_198501_198912.nc NBP_199001_199412.nc QVEGE_198501_198912.nc SOIL2C_199001_199412.nc TOTVEGC_198501_198912.nc
$ cd /lcrc/group/e3sm/ac.forsyth2/zi-test-input-data/post_20241203_area_landfrac_as_extra_vars/lnd/glb/ts/monthly/5yr
$ ls
AR_198501_198912.nc H2OSNO_199001_199412.nc QINTR_198501_198912.nc QVEGE_199001_199412.nc SOIL3C_198501_198912.nc TOTVEGC_199001_199412.nc
AR_199001_199412.nc HR_198501_198912.nc QINTR_199001_199412.nc QVEGT_198501_198912.nc SOIL3C_199001_199412.nc TSA_198501_198912.nc
CWDC_198501_198912.nc HR_199001_199412.nc QOVER_198501_198912.nc QVEGT_199001_199412.nc SOIL4C_198501_198912.nc TSA_199001_199412.nc
CWDC_199001_199412.nc LAISHA_198501_198912.nc QOVER_199001_199412.nc RH2M_198501_198912.nc SOIL4C_199001_199412.nc WOOD_HARVESTC_198501_198912.nc
FSH_198501_198912.nc LAISHA_199001_199412.nc QRUNOFF_198501_198912.nc RH2M_199001_199412.nc SOILWATER_10CM_198501_198912.nc WOOD_HARVESTC_199001_199412.nc
FSH_199001_199412.nc LAISUN_198501_198912.nc QRUNOFF_199001_199412.nc SOIL1C_198501_198912.nc SOILWATER_10CM_199001_199412.nc
GPP_198501_198912.nc LAISUN_199001_199412.nc QSOIL_198501_198912.nc SOIL1C_199001_199412.nc TOTLITC_198501_198912.nc
GPP_199001_199412.nc NBP_198501_198912.nc QSOIL_199001_199412.nc SOIL2C_198501_198912.nc TOTLITC_199001_199412.nc
H2OSNO_198501_198912.nc NBP_199001_199412.nc QVEGE_198501_198912.nc SOIL2C_199001_199412.nc TOTVEGC_198501_198912.nc
$ cd /lcrc/group/e3sm/ac.forsyth2/zi-test-input-data/post_20241203_area_landfrac_as_vars
# no lnd directory
$ cd scripts
$ grep -in "error" *.o*
#ts_lnd_monthly_glb_1985-1989-0005.o639437#:15:ncrcat: ERROR no variables fit criteria for processing
#ts_lnd_monthly_glb_1985-1989-0005.o639437#:17:ncrcat: ERROR no variables fit criteria for processing
#ts_lnd_monthly_glb_1985-1989-0005.o639437#:19:ncclimo: ERROR Failed to split. cmd_sbs[0] failed. Debug this:
ts_lnd_monthly_glb_1985-1989-0005.o639437:15:ncrcat: ERROR no variables fit criteria for processing
ts_lnd_monthly_glb_1985-1989-0005.o639437:17:ncrcat: ERROR no variables fit criteria for processing
ts_lnd_monthly_glb_1985-1989-0005.o639437:19:ncclimo: ERROR Failed to split. cmd_sbs[0] failed. Debug this:
ts_lnd_monthly_glb_1990-1994-0005.o639438:15:ncrcat: ERROR no variables fit criteria for processing
ts_lnd_monthly_glb_1990-1994-0005.o639438:17:ncrcat: ERROR no variables fit criteria for processing
ts_lnd_monthly_glb_1990-1994-0005.o639438:19:ncclimo: ERROR Failed to split. cmd_sbs[0] failed. Debug this:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep. You are right, neither area
and landfrac
in global mean datasets. I think it makes sense to try if both can be added to vars
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But adding to vars
doesn't work either -- that's that third ls
block above.
From ts_lnd_monthly_glb_1985-1989-0005.o639437
:
ncrcat: ERROR no variables fit criteria for processing
ncrcat: HINT Extraction list must contain a record variable to concatenate. A record variable is a variable defined with a record dimension. Often the record dimension\
, aka unlimited dimension, refers to time. To change an existing dimension from a fixed to a record dimensions see http://nco.sf.net/nco.html#mk_rec_dmn or to add a ne\
w record dimension to all variables see http://nco.sf.net/nco.html#ncecat_rnm
ncrcat: ERROR no variables fit criteria for processing
ncrcat: HINT Extraction list must contain a record variable to concatenate. A record variable is a variable defined with a record dimension. Often the record dimension\
, aka unlimited dimension, refers to time. To change an existing dimension from a fixed to a record dimensions see http://nco.sf.net/nco.html#mk_rec_dmn or to add a ne\
w record dimension to all variables see http://nco.sf.net/nco.html#ncecat_rnm
ncclimo: ERROR Failed to split. cmd_sbs[0] failed. Debug this:
That is, area
, and landfrac
clearly are included in the original nc
files but I just can't seem to extract them either through extra_vars
or vars
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@forsyth2 thanks for testing both. It looks like extra_vars
does work with generic variable splitting (e.x. land_monthly task) but not when --glb
flag is on (e.x. land_monthly_global). I think there are a few possibilities for moving forward:
- include a [land_monthly] task to get both
area
andlandfrac
, and use both in downstream. - based on @czender 's comment , there is a new version of
ncclimo
that supports global integral scaling. To do so, we need additional logic to filter through variables in the time-series task. - we can ask @czender if it is possible to add
extra-vars
support in--glb
mode, to keeparea
andlandfrac
in the global time-series files. In the mean time, I think you could tentatively use the first approach to get values of two variables..
logger.error(f"area.shape={area.shape}") # area.shape=(180, 360) | ||
logger.error( | ||
f"landfrac.shape={landfrac.shape}" | ||
) # landfrac.shape=(180, 360) | ||
logger.error( | ||
f"data_array.shape={data_array.shape}" | ||
) # data_array.shape=(10, 3) | ||
# e: dimensions cannot change for in-place operations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chengzhuzhang Ok, using the separate land_monthly
non-global ts
task (option 1 above), I can get area
and landfrac
, but now I'm having issues with the dimensions. I'm not sure how to take (180,360)
to (10,3)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok I think I have a reasonable dimensionality reduction:
if metric == Metric.TOTAL:
logger.debug(
f"self.extra_var_dataset.keys()={list(self.extra_var_dataset.keys())}"
)
area: xarray.core.dataarray.DataArray = self.extra_var_dataset["area"]
landfrac: xarray.core.dataarray.DataArray = self.extra_var_dataset["landfrac"]
# area.shape() = (180, 360)
total_area = area.sum() # Sum over all dimensions
# landfrac.shape=(180, 360)
average_landfrac = landfrac.mean() # Mean over all dimensions
total_land_area = total_area * average_landfrac
# data_array.shape = (number of years, number of regions)
data_array *= total_land_area
logger.info(
"for Metric.TOTAL, data_array has been scaled by total land area"
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @forsyth2 and @chengzhuzhang, I'm trying to understand the change(s) you would like in ncclimo
. As you know, area
and landfrac
can be output in every default (non-spatially averaged) timeseries by requesting them with the --var_xtr=area,landfrac
. However, as @forsyth2 has discovered, this behavior changes when requesting global/regional-average timeseries. Currently the spatial-mean timeseries output includes only the geophysical field of interest, no matter what was requested with --var_xtr
.
I would be happy to change that behavior so that in spatial-averaged timeseries files ...
- Always contain the full (i.e., non-spatially averaged) variables requested with
--var_xtr
OR - Always contain the full
area
andlandfrac
variables OR - Behave as currently by default, and contain the full
area
andlandfrac
variables when an additional switch is provided - Behave as currently by default, and contain the full variables requested with
--var_xtr
when an additional switch is provided.
These changes would be relatively straightforward to implement. Keep in mind that the current behavior was implemented so that spatial-mean timeseries files are orders of magnitude smaller than their non-spatial-average counterparts. Adding full fields to spatial average timeseries will significantly inflate their size. However, area
and landfrac
are time-constant variables so the resulting filesizes would still be much smaller than full timeseries files if only these variables are added.
Also happy to discuss other options, like providing a sane way for zppy
to provide the desired scale factors to ncclimo
which would then apply them to the fields during spatial-average timeseries generation. Feedback welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@forsyth2 The "dimensionality reduction" you mention above triggers some red flags with me. In particular this line
total_land_area = total_area * average_landfrac
is questionable because area and landfrac can vary indepently so I would expect the total land area to be defined by total_land_area = (area*landfrac).sum
. If I am right, the former method would produce small errors that might not be noticed (e.g., 1-2%) until/unless carefully examined. Or I could be totally off base because I'm not sure how you're using the variables in the code :) In any case, you might try both methods and then compare against a known-to-be-correct answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally would say (1) or (4) since it allows users to set whatever extra variable they want rather than hard-coding in area or landfrac.
I appear to have a workaround here working now without these changes in ncclimo
, but it is a bit clunky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@czender I was thinking the same, to have the product area_times_landfrac
would be easiest for zppy in this use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and perhaps making the new additon default would be fine, cause the new variables is small.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I will add a field called something likearea_times_landfrac
to the output of all spatially-averaged timeseries. I'm headed to AGU on Friday and will return 12/16. Hopefully finish and release this in NCO 5.3.0 sometime that week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your help, Charlie! Have a nice trip to DC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thank you @czender!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chengzhuzhang I think this is getting pretty close to done. I just have a few comments, and then I'll ask the Land team to review further. I also want to update the unit tests too.
Results are visible at https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zi-test-webdir/global_time_series_1985-1995_results_viewers/ (note: that's not a stable link, meaning it will update with every time I run updated code). How does that look?
Corresponding changes on the zppy side are at https://github.com/E3SM-Project/zppy/pull/648/files
.github/workflows/build_workflow.yml
Outdated
@@ -47,6 +47,7 @@ jobs: | |||
# since the action is run on a branch in detached head state. | |||
# This is the equivalent of running "pre-commit run --all-files" locally. | |||
# If you commit with the `--no-verify` flag, this check may fail. | |||
# TODO: this doesn't seem to run when I run `git commit` locally. I always have to run `pre-commit run --all-files` manually. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomvothecoder Not sure if you've had a chance to look at this yet, but I'm not quite sure how to fix this.
if "area_times_landfrac" in keys: | ||
total_land_area = self.extra_var_dataset["area_times_landfrac"] | ||
else: | ||
area: xarray.core.dataarray.DataArray = self.extra_var_dataset[ | ||
"area" | ||
] | ||
landfrac: xarray.core.dataarray.DataArray = self.extra_var_dataset[ | ||
"landfrac" | ||
] | ||
# area.shape() = (180, 360) | ||
# landfrac.shape() =(180, 360) | ||
total_land_area = (area * landfrac).sum() # Sum over all dimensions | ||
# data_array.shape = (number of years, number of regions) | ||
data_array *= total_land_area |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chengzhuzhang This is how I'm handling the total_land_area
scaling now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@forsyth2 it is okay to use this logic for now for demonstration. The scaling is only good for global area, but not for N./S. hemisphere, which we can update later after ncclimo
includes the scaling factor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is possible to get N_land_area and S_land_area, but sub-select with lat >= 0 or <= 0 for both area and landfrac.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried updating to the following but I'm still getting some blank plots (strangely it's inconsistent if it's the glb or n or s plot that's empty).
if metric == Metric.TOTAL:
keys = list(self.extra_var_dataset.keys())
logger.debug(f"self.extra_var_dataset.keys()={keys}")
if "area_times_landfrac" in keys:
total_land_area = self.extra_var_dataset["area_times_landfrac"]
else:
area: xarray.core.dataarray.DataArray = self.extra_var_dataset[
"area"
]
landfrac: xarray.core.dataarray.DataArray = self.extra_var_dataset[
"landfrac"
]
# area.shape() = (180, 360)
# landfrac.shape() = (180, 360)
total_land_area = (area * landfrac).sum() # Sum over all dimensions
# Account for hemispheric plots:
north_area = area.where(area.lat >= 0)
south_area = area.where(area.lat <= 0)
north_landfrac = landfrac.where(landfrac.lat >= 0)
south_landfrac = landfrac.where(landfrac.lat <= 0)
north_land_area = (north_area * north_landfrac).sum()
south_land_area = (south_area * south_landfrac).sum()
# data_array.shape = (number of years, number of regions)
# We want to keep those dimensions, but with these values:
# (glb*total_land_area, n*north_land_area, s*south_land_area)
data_array[:, 0] *= total_land_area
data_array[:, 1] *= north_land_area
data_array[:, 2] *= south_land_area
if extra_vars_directory: | ||
# If an extra_vars_directory is provided, we'll use that to open a new dataset | ||
self.extra_var_dataset = xcdat.open_mfdataset( | ||
f"{extra_vars_directory}*.nc", | ||
center_times=True, | ||
) | ||
else: | ||
# Otherwise, we'll use the same dataset. | ||
self.extra_var_dataset = self.dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chengzhuzhang Until ncclimo
is updated to allow extra vars in glb
runs, we can just specify an alternative path (i.e., to the [land_monthly]
output).
This looks good! I would suggest to update |
zppy_interfaces/global_time_series/coupled_global_dataset_wrapper.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thorntonpe @BunnyVon (and maybe also @dmricciuto @wlin7) This is ready for review.
For background context, we recently decided to split zppy into two packages: zppy
itself will do its original purpose of orchestrating workflows while this new package zppy-interfaces
will handle the "last mile" stretches of glue code to plot or otherwise post-process data. (I.e. zppy
should not be doing any post-processing itself, only coordinating other post-processing tools.)
I have test results at https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zi-test-webdir/global_time_series_1985-1995_results_viewers/.
How to run your own tests
I think you also wanted to run some of your own test cases before we merge this PR. To do that:
# Set up zppy-interfaces
cd ~
git clone [email protected]:E3SM-Project/zppy-interfaces.git
cd zppy-interfaces
git fetch origin issue-601-viewers
git checkout -b test_land_viewers origin/issue-601-viewers
conda clean --all --y
conda env create -f conda/dev.yml -n my_zi_env
conda activate my_zi_env
pip install .
# Set up zppy
cd ~
git clone [email protected]:E3SM-Project/zppy.git
cd zppy
git fetch origin issue-601-viewers
git checkout -b test_land_viewers origin/issue-601-viewers
conda clean --all --y
conda env create -f conda/dev.yml -n my_zppy_env
conda activate my_zppy_env
pip install .
# Create a cfg
cd ~
emacs test_land_viewers.cfg
zppy -c test_land_viewers.cfg
Sample cfg
Below is an example cfg
that you could modify with your own input data and settings. (In particular, these parameters will likely need changing: case
, input
, output
, www
, experiment_name
, figstr
)
[default]
case = "v3.LR.historical_0051"
constraint = ""
dry_run = "False"
environment_commands = ""
fail_on_dependency_skip = True
guess_path_parameters = False
guess_section_parameters = False
input = /lcrc/group/e3sm2/ac.wlin/E3SMv3/v3.LR.historical_0051
input_subdir = archive/atm/hist
mapping_file = "map_ne30pg2_to_cmip6_180x360_aave.20200201.nc"
output = "/lcrc/group/e3sm/ac.forsyth2/zppy_min_case_global_time_series_comprehensive_v3_setup_only_output/unique_id/v3.LR.historical_0051"
partition = "debug"
qos = "regular"
www = "/lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_min_case_global_time_series_comprehensive_v3_setup_only_www/unique_id"
years = "1985:1989:2",
[ts]
active = True
e3sm_to_cmip_environment_commands = ""
walltime = "00:30:00"
[[ land_monthly ]]
extra_vars = "area,landfrac"
frequency = "monthly"
input_files = "elm.h0"
input_subdir = "archive/lnd/hist"
mapping_file = "map_r05_to_cmip6_180x360_aave.20231110.nc"
years = "1985:1995:5",
vars = "FSH,RH2M,LAISHA,LAISUN,QINTR,QOVER,QRUNOFF,QSOIL,QVEGE,QVEGT,SOILWATER_10CM,TSA,H2OSNO,TOTLITC,CWDC,SOIL1C,SOIL2C,SOIL3C,SOIL4C,WOOD_HARVESTC,TOTVEGC,NBP,GPP,AR,HR"
[[ atm_monthly_glb ]]
# Note global average won't work for 3D variables.
frequency = "monthly"
input_files = "eam.h0"
input_subdir = "archive/atm/hist"
mapping_file = "glb"
years = "1985:1995:5",
[[ lnd_monthly_glb ]]
frequency = "monthly"
input_files = "elm.h0"
input_subdir = "archive/lnd/hist"
mapping_file = "glb"
vars = "FSH,RH2M,LAISHA,LAISUN,QINTR,QOVER,QRUNOFF,QSOIL,QVEGE,QVEGT,SOILWATER_10CM,TSA,H2OSNO,TOTLITC,CWDC,SOIL1C,SOIL2C,SOIL3C,SOIL4C,WOOD_HARVESTC,TOTVEGC,NBP,GPP,AR,HR"
years = "1985:1995:5",
[global_time_series]
active = True
climo_years = "1985-1989", "1990-1995",
environment_commands = "source <INSERT PATH TO CONDA>/conda.sh; conda activate my_zi_env"
experiment_name = "v3.LR.historical_0051"
figstr = "v3.LR.historical_0051"
make_viewer = True
num_cols = 1
num_rows = 1
plots_lnd = "FSH,RH2M,LAISHA,LAISUN,QINTR,QOVER,QRUNOFF,QSOIL,QVEGE,QVEGT,SOILWATER_10CM,TSA,H2OSNO,TOTLITC,CWDC,SOIL1C,SOIL2C,SOIL3C,SOIL4C,WOOD_HARVESTC,TOTVEGC,NBP,GPP,AR,HR"
ts_num_years = 5
ts_years = "1985-1989", "1985-1995",
walltime = "00:30:00"
years = "1985-1995",
Reviewing code changes
I'm not sure how relevant the code changes themselves are to you, but I've marked some important areas as part of this review. Notably, the bulk of the changes are in this PR, but there are also a few changes on the zppy side: https://github.com/E3SM-Project/zppy/pull/648/files
# Non-derived variables | ||
annual_average_dataset_for_var: xarray.core.dataset.Dataset = ( | ||
self.dataset.temporal.group_average(var, "year") | ||
) | ||
data_array = annual_average_dataset_for_var.data_vars[var] | ||
if metric == Metric.TOTAL: | ||
keys = list(self.extra_var_dataset.keys()) | ||
logger.debug(f"self.extra_var_dataset.keys()={keys}") | ||
if "area_times_landfrac" in keys: | ||
total_land_area = self.extra_var_dataset["area_times_landfrac"] | ||
else: | ||
area: xarray.core.dataarray.DataArray = self.extra_var_dataset[ | ||
"area" | ||
] | ||
landfrac: xarray.core.dataarray.DataArray = self.extra_var_dataset[ | ||
"landfrac" | ||
] | ||
# area.shape() = (180, 360) | ||
# landfrac.shape() = (180, 360) | ||
total_land_area = (area * landfrac).sum() # Sum over all dimensions | ||
|
||
# Account for hemispheric plots: | ||
north_area = area.where(area.lat >= 0) | ||
south_area = area.where(area.lat <= 0) | ||
north_landfrac = landfrac.where(landfrac.lat >= 0) | ||
south_landfrac = landfrac.where(landfrac.lat <= 0) | ||
north_land_area = (north_area * north_landfrac).sum() | ||
south_land_area = (south_area * south_landfrac).sum() | ||
|
||
# data_array.shape = (number of years, number of regions) | ||
# We want to keep those dimensions, but with these values: | ||
# (glb*total_land_area, n*north_land_area, s*south_land_area) | ||
data_array[:, 0] *= total_land_area | ||
data_array[:, 1] *= north_land_area | ||
data_array[:, 2] *= south_land_area | ||
units = data_array.units | ||
# `units` will be "1" if it's a dimensionless quantity | ||
if (units != "1") and (original_units != "") and original_units != units: | ||
raise ValueError( | ||
f"Units don't match up: Have {units} but expected {original_units}. This renders the supplied scale_factor ({scale_factor}) unusable." | ||
) | ||
if (scale_factor != 1) and (final_units != ""): | ||
data_array *= scale_factor | ||
units = final_units | ||
return data_array, units |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the code block where we do average or total calculations.
valid_vars, | ||
invalid_vars, | ||
rgn, | ||
extra_vars_dict=exp["land"].replace("glb", "180x360_aave"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, ncclimo
can't generate area
and landfrac
for globally averaged datasets. So, we're currently relying on having a separate land_monthly
task defined in the cfg
to set up these variables.
@forsyth2 and @chengzhuzhang, While Hence I suggest that The latest It is unclear whether you want Feedback welcome. Charlie |
Thank you for the update, Charlie @czender |
@czender Thanks, I like the
My personal preference would be to enable that functionality. It's nice to have the flexibility.
I'm working on Chrysalis, so I'll try |
@czender I get the following regardless if I set
|
@forsyth2 That's because I haven't built NCO on Chrysalis (as opposed to Perlmutter) in a long time. I will try to do so today and ping you then. |
@forsyth2 Please try again on Chrysalis (and/or Perlmutter). I have rebuilt the latest snapshot there. BTW, the latest snapshot always includes |
@czender Hmm now I'm getting
That's with this cfg snippet:
Alternatively the error in #9 (comment) appears when using:
What's the proper usage?
The one change in
That seems reasonable to me. |
@forsyth2 Sorry for the problems. The intended proper usage is your last row, don't add any switches at all, and everything you might want will be in the output files in global timeseries mode. As for why you get shared library errors when invoking |
Thanks @czender! To put everything in one place, I just ran cd /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_global_time_series_comprehensive_v3_setup_only_output/zppy_601_viewers_updated_nco_no_extra_vars/v3.LR.historical_0051/post/scripts
grep -v "OK" *status
# ts_lnd_monthly_glb_1985-1989-0005.status:ERROR (2)
# ts_lnd_monthly_glb_1990-1994-0005.status:ERROR (2)
cat ts_lnd_monthly_glb_19*.o* That gives:
(For reference, E3SM-Project/zppy@ee60560 includes the changes for this) |
And the cfg is
|
You have given me all the information I need. I think. I altered the |
@forsyth2 Did those latest |
@czender Thanks, I've gotten a lot further, but still seeing blank plots on the zppy-interfaces side. However, that could be an implementation problem on my end*. I'm still debugging. It does seem like *For reference, the latest relevant logic is:
|
@forsyth2 Your logic looks good. Not sure why plots are blank. I will hold-off releasing until this works in zppy. However, these lines will double-count gridcells centered on the equator so that north+south != total:
To avoid this NCO defines |
@czender I'm still running into issues with properly scaling the data by the valid area. I do feel like the problem is on the zppy-interfaces side, but obviously can't fully confirm that yet. I'm going to have to look into this more next week. As far as the release schedule, mainly we just need the new NCO by the Jan. 15 release candidate deadline for E3SM Unified, for testing. |
Mostly for my reference: zppy steps
zppy-interfaces steps
mv /lcrc/group/e3sm/ac.forsyth2/zi-test-input-data/post/ /lcrc/group/e3sm/ac.forsyth2/zi-test-input-data/post_until_20241220
mv /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_global_time_series_comprehensive_v3_setup_only_output/zppy_601_viewers_updated_nco_no_extra_vars_v3/v3.LR.historical_0051/post/ /lcrc/group/e3sm/ac.forsyth2/zi-test-input-data/post/
Plots are always showing up blank, even after deleting cache. |
Turns out it may have been a cache problem after all. I'm seeing plots now on https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zi-test-webdir/global_time_series_1985-1995_results_viewers/table_lnd/index.html. Maybe I needed to delete more cached data?? Or use a different browser? |
@czender As far as I can tell, the NCO fix is in fact working. The cached empty plot issue was very unfortunate; it turns out the NCO fixes were working on Friday after all. |
FYI NCO 5.3.0 has been released and is in my personal directories on Perlmutter, Chrsalis, and Acme1. |
d09c1a1
to
dbcd011
Compare
dbcd011
to
f26e7ab
Compare
An example of our new commit paradigm, discussed on E3SM-Project/zstash#355: Rebase stepsgit checkout issue-601-viewers
# This part is unnecessary, but I like to create a backup branch in case I mess something up
git checkout -b issue-601-viewers-until-20250114
git checkout issue-601-viewers
git log
# We have 19 commits here
# We can probably group them as follows:
# 1. Add Viewer support
# 2. Working land viewer
# 3. Atm viewer exists but is not linked to
# 4. Index points to atm and lnd viewers
# 5. Misc updates
# 6. Refactored coupled_global
# 7. Misc updates
# 8. MANIFEST.in not working
# 9. Updates
# 10. Attempt adding extra_vars
# 11. Working Total plots
# 12. Improved Total plotting
# 13. Add make_viewer parameter
# 14. Only show non empty viewers
# 15. Address comments
# 16. Fix unit tests
# 17. Clean up code
# 18. Attempt using new NCO
# 19. TOTAL plots working
# So, now we can rebase accordingly
git rebase -i 87c9e54c13afa470b879ae4672e5ecfb06cbb514 That gives:
Which we'll change to:
git log
# Now, we're down to 9 commits
# Let's give these commits better names
git rebase -i 87c9e54c13afa470b879ae4672e5ecfb06cbb514 We have:
Let's do:
Rename those 4 to:
GitHub tells us we're expecting a merge conflict: git fetch upstream
git rebase upstream/main We have:
Change to:
git grep -n "<<<" tests
# No more diffs
git add tests/integration/global_time_series/cases_global_time_series.py
git rebase --continue
git push -f upstream issue-601-viewers |
E3SM-Project/zppy#654 removed
in |
Issue resolution
zppy
PR: Add Viewer support zppy#648Select one: This pull request is...
1. Does this do what we want it to do?
Objectives:
global_time_series
, similar to those of E3SM Diagsglobal_time_series
plotsRequired:
2. Are the implementation details accurate & efficient?
Required:
If applicable:
zppy-interfaces/conda
, not just animport
statement.beautifulsoup4
,lxml
,output_viewer
3. Is this well documented?
Required:
4. Is this code clean?
Required:
If applicable: