Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plumber2 Implementation #2406

Open
wants to merge 31 commits into
base: b4b-dev
Choose a base branch
from
Open

Plumber2 Implementation #2406

wants to merge 31 commits into from

Conversation

TeaganKing
Copy link
Contributor

@TeaganKing TeaganKing commented Mar 7, 2024

This PR will use the TowerSite class (parent to both NEON and PLUBMER sites) and create a Plumber2Site class as well as implement capabilities for running single point simulations at PLUMBER sites.

Contributors other than yourself, if any:
@ekluzek
@adrifoster

CTSM Issues Fixed (include github issue #):
Addresses part of #1487

Are answers expected to change (and if so in what way)?
No, this PR should be BFB. However, it should expand existing capabilities to allow users to run at PLUMBER sites.

Any User Interface Changes (namelist or namelist defaults changes)?
Additional flags will be implemented for run_neon

@TeaganKing
Copy link
Contributor Author

This PR will also address #2186

@TeaganKing
Copy link
Contributor Author

We'll also need to add the new CDEPS tag to this PR.

@TeaganKing
Copy link
Contributor Author

Per conversation with Erik, we'll move this to bfb once the other Plumber-related PRs are in bfb next week.

@TeaganKing
Copy link
Contributor Author

TeaganKing commented Jun 25, 2024

To Do for this PR:

  • Update branch / resolve merge conflicts
  • Add new CDEPS tag to this PR once the CDEPS PR is merged.
  • Bring in CTSM PLUMBER5.2 PR
  • Check for any sort of equivalent of user_version for plumber
  • Include any plumber-specific user namelist lines
  • Update check_plumber_data
  • Determine valid sites based on tower_type?
  • Define available_plumber_list
  • Be able to run NEON & PLUMBER simultaneously (if both are same run type)
  • Provide option for None for both NEON and PLUMBER cases so they can be run indepdently
  • Ensure that we don't use a NEON base case for PLUMBER or vice versa
  • Update run_type defaults in arg parse
  • Resolve unknown case.run error for PLUMBER given need for setup/build
  • Test out various tower options/arguments
  • Update tests
  • Update documentation

@slevis-lmwg slevis-lmwg mentioned this pull request Jul 2, 2024
@TeaganKing
Copy link
Contributor Author

TeaganKing commented Aug 6, 2024

We don't currently have restart files for PLUMBER2, which has brought up the question: to what extent do we want to provide support for PLUMBER tower sites? Two options that @wwieder and I discussed this morning are described below:

  1. Users spin up their own AD & postAD cases and then can run a transient case (no restart files needed). This is somewhat contrary to the purpose of run_neon because it requires users to run case setup, build, submit, and archive.
  2. My vote would be that we provide (and update upon changes to model versions) restart files and allow users to run a transient case out of the box (this is how NEON tower sites work, and consistency with run_tower would be nice as well as ease of use, but there is more overhead to maintain the feature). This would be a slightly easier implementation given that run_tower currently expects a transient case (and automatically submits case.run, which is not automatically created for the ad run but is automatically created for transient cases).

@danicalombardozzi @slevis-lmwg @olyson let me know if you have thoughts on this. We were planning to chat on Tuesday at 9am but it looks like there is a NCAR-NEON-Community partnership meeting that will be taking that time slot and I'd ideally like to make a decision before August 20th; also happy to tag up an additional time if that's easier than a GitHub discussion.

@olyson
Copy link
Contributor

olyson commented Aug 6, 2024

My guess is that most users will want to change the model and will likely have to do their own spinups anyway. So I might lean toward not providing initial files if it requires significant work to generate, keep track of, and update those files.

@dlawrenncar
Copy link
Contributor

dlawrenncar commented Aug 7, 2024 via email

@TeaganKing
Copy link
Contributor Author

Ok, thank you both for this feedback!

@wwieder
Copy link
Contributor

wwieder commented Aug 7, 2024

Thanks for weighing in here. Teagan since Dave and Keith came to the same conclusion we did yesterday lets move forward with the plan to support 'spinup' and 'transient' PLUMBER2 cases, without providing initial conditions.

It might be nice to differentiate how we do this for SP (fast spinup, ~20 or 40 years) vs. BGC (more involved, like NEON) cases. Would that be difficult to integrate into the run_towers workflow?

@TeaganKing
Copy link
Contributor Author

TeaganKing commented Aug 7, 2024

That sounds good. I have successfully run the AD & postAD case (with a bit of hard-coding), so this will require the following updates, which should be feasible:

  • finidat file name updates for each site in user_nl_clm (expecting <SITE>.ad.clm2.r.YYYY-MM-DD-00000.nc files but we have for instance *50400.nc files)
  • Update run_type defaults in run_case in both tower_site.py and plumber_site.py such that the AD case is the default (instead of the transient default that's set in neon_site.py).

@wwieder To clarify, regarding the option for SP vs BGC, are you envisioning a flag that allows users to pick between these options?

@wwieder
Copy link
Contributor

wwieder commented Aug 7, 2024

A few comments:

  • I don't know that we need to "automatically" run postAD and transient runs, as I think it's a good idea for users to have to check the status of their AD and postAD cases to check that they meet steady state criteria (e.g., Alaska NEON sites take significantly longer to spin up than CONUS NEON sites).
  • What I like about the current run_neon implementation is that running a postAD cases automatically creates a branch that looks for an similarly names AD case and then using the --run from postAD flag also creates a transient case uses the off the postAD ref case and initial conditions. Both of these options just take the last restart files that are available (regardless of simulation date).
  • Regarding SP vs. BGC cases, yes a flag that allows users to chose between these two options would be excellent. I'm not sure about the best way to implement this and don't want to introduce scope creep / complexity to this work but having the option to set up SP vs. BGC cases would be very useful for PLUMBER2 (and even NEON) cases.

Happy to have a quick that if it's helpful.

@TeaganKing
Copy link
Contributor Author

Ok, in that case, I'll just plan to change the default for PLUMBER to AD and add some documentation on this anticipated workflow for users. Thanks for clarifying!

Why don't I plan to get that first part implemented and then we can talk more about the BGC vs. SP options.

@danicalombardozzi
Copy link
Contributor

danicalombardozzi commented Aug 7, 2024 via email

@TeaganKing
Copy link
Contributor Author

TeaganKing commented Aug 20, 2024

This is about ready to go, but once #2485 is in, I'll run the unit and system tests once more and perform a few final tests to make sure the various run_tower flags work as expected.

@TeaganKing
Copy link
Contributor Author

TeaganKing commented Aug 23, 2024

I just wanted to note here a few things that I still need to fix:

  • Tests for test_sys_run_tower.py are currently failing. Fix and then run all ctsm py tests.
  • There seem to be some mixups when running with additional flags, eg plumber sites run with 'experiment' end up being run in a transient case instead of an ad case
  • Need to test various argument options (note some neon options irrelevant for plumber)

@TeaganKing
Copy link
Contributor Author

TeaganKing commented Aug 23, 2024

One limitation of this implementation is that if both neon and plumber cases are run simultaneously (not recommended), that the default run type becomes transient. While this could be changed by adding the following lines at line 197 to tower_arg_parse.py, implementing this as a valid parser option, and then setting run_type dependent on the type of tower in run_tower.py, we run into difficulties when the compset is changed, a step which seems reasonable to not repeat for every site if most times people use this will be for either all-neon or all-plumber sites.

if args.plumber_sites:
    run_type = "default_neon_and_plumber"

Overall, I think it's most reasonable to expect users to run plumber and neon cases separately (although both with the run_tower command), but I'm open to others thoughts.

@TeaganKing
Copy link
Contributor Author

Tests on derecho currently pass for make all and ./run_ctsm_py_tests --sys.

The test_sys_run_tower.py tests take about 30 minutes to run now that it involves setting up three cases; I can remove the ad neon case if this seems too excessive.

@TeaganKing TeaganKing added PR status: awaiting review Work on this PR is paused while waiting for review. and removed PR status: work in progress labels Aug 23, 2024
@TeaganKing TeaganKing marked this pull request as ready for review August 23, 2024 19:49
@TeaganKing TeaganKing changed the title [WIP] Plumber2 Implementation Plumber2 Implementation Aug 23, 2024
Copy link
Contributor

@wwieder wwieder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Thanks @TeaganKing. My only bigger comment is that by default PLUMBER is typically run in SP mode. I think it's out of scope in this PR to add this additional functionality, but it's something we may want to think about down the road to define compsets with an --SP or --BGC flag

python/ctsm/site_and_regional/plumber_site.py Outdated Show resolved Hide resolved
python/ctsm/site_and_regional/plumber_site.py Show resolved Hide resolved
python/ctsm/site_and_regional/plumber_site.py Outdated Show resolved Hide resolved
python/ctsm/site_and_regional/run_tower.py Outdated Show resolved Hide resolved
for site_name in valid_plumber_sites:

# start_year and end_year are set in shell commands, so these get overwritten
start_year = 2018
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? I'll let you investigate.

Copy link
Contributor Author

@TeaganKing TeaganKing Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed these start and and years to be very clearly dummy years. I think this is necessary because the PlumberSite class must be implemented with a start/end time. However, this is changed later in the shell commands and is not a known parameter at the time of the object instantiation.

That said, if someone has a better suggestion for implementation, I'm all ears!

python/ctsm/site_and_regional/tower_site.py Show resolved Hide resolved


if __name__ == "__main__":
unit_testing.setup_for_tests()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding more testing. Will defer to Erik on extent and timing of tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to add some detail, this system test now takes 30 minutes (each case setup takes about ten minutes). We currently test:

  • setup of a NEON site (BART) with a specified experiment and output root
  • setup of a NEON site (ABBY) AD case and specified output root
  • setup of a PLUMBER site (AR-SLu) with a specified experiment and output root

If we were going to remove one of these for the sake of time, I would probably remove the ABBY case since it is a secondary NEON case and PLUMBER is also going to run AD as default behavior. This does however test that non-default run types are working, which is useful.

python/ctsm/test/test_unit_run_tower.py Show resolved Hide resolved
@samsrabin samsrabin added enhancement new capability or improved behavior of existing capability science Enhancement to or bug impacting science bfb bit-for-bit labels Aug 26, 2024
@TeaganKing
Copy link
Contributor Author

Hi @ekluzek ,

I have now addressed the comments from @wwieder 's review. Two outstanding issues that Will and I discussed are the following (more details in comments above):

  • The Plumber2Site class is called in a somewhat clunky way with dummy years that get overwritten. I'm not sure of a better way to do this, and it works, but let me know if you have alternative suggestions?
  • Running system tests takes a while. There are of course benefits to doing more testing (as I have done here), but the time the tests take may be on the excessive side. Your feedback on whether these tests are worth it would be helpful!

And a software-focused review of the code would be helpful now that Will reviewed some of the functionality pieces.

Before merging (and after @ekluzek 's review), I will be sure to run the following tests one last time (I ran them before Will's review but there are changes that we should test again before merging):

  • run ./run_tower --plumber-sites with various options as a final check
  • make all
  • run ctsm system tests with ./python/run_ctsm_py_tests -s

Lastly, note that there are some related outstanding issues that have been documented that are outside the scope of this PR.

@wwieder
Copy link
Contributor

wwieder commented Sep 3, 2024

It seems like this PR also fixes #2739

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bfb bit-for-bit enhancement new capability or improved behavior of existing capability PR status: awaiting review Work on this PR is paused while waiting for review. science Enhancement to or bug impacting science
Projects
Status: In progress - master/b4b-dev
Status: Todo
Development

Successfully merging this pull request may close these issues.

run_neon not working as expected for AD simulations
7 participants