Compass package design document #29

xylar · 2020-11-16T14:18:38Z

A design document for the compass python package that could eventually replace the existing COMPASS approach.

xylar · 2020-11-16T14:21:25Z

@mark-petersen, @vanroekel and @matthewhoffman, I'll be working on this design document over the next few days and then asking for your full review of it and my prototype compass package after that.

For now, I would appreciate if you could each look over just the Requirements listed and let me know 1) if you think they are reasonable and 2) if there are others you think are missing.

Here is a link to the file for easier viewing:
https://github.com/xylar/compass/blob/compass_package_design_doc/docs/design_docs/compass_package.rst

xylar · 2020-11-16T17:01:49Z

@caozd999 and @sbrus89, as COMPASS testcase developers, you might be interested in giving feedback on this, too.

matthewhoffman · 2020-11-30T19:01:39Z

@xylar , can you elaborate on what is going on in the section "Readability would be improved by using Jinja2 templates for code generation"?

xylar · 2020-11-30T20:08:50Z

can you elaborate on what is going on in the section "Readability would be improved by using Jinja2 templates for code generation"?

Here is the relevant section for reference: https://github.com/xylar/compass/blob/compass_package_design_doc/docs/design_docs/compass_package.rst#design-solution-make-testcases-easy-to-understand-and-modify

I have elaborated as follows:

A Jinja2 template uses curly braces (e.g. {{ step.module }}) to indicate where an element of the template will be replaced by a python variable or dictionary value. In this example, {{ step.module }} will be replaced with the contents of step['module'] in the python code, and similarly for other replacements in the template. Other than the replacements, the code can be read as normal, in contrast to the existing approach of python scripts that define other python scripts via a series of string formatting statements.

Let me know if that is sufficient or if you still have questions.

matthewhoffman · 2020-12-01T21:41:43Z

@xylar , the requirements section looks good to me. The only other thought I have is the ability to run multiple tests in a regression suite simultaneously.

xylar · 2020-12-01T22:04:21Z

The only other thought I have is the ability to run multiple tests in a regression suite simultaneously.

Good timing! I'm actually working on that right now. I wanted to have a prototype before I added that to the design and I was originally thinking it would be an additional step (with its own design doc). But I now realize that it will be better to design with this in mind from the beginning because it will mean less retrofitting of test cases to support this capability.

So I'll add that as a requirement tomorrow along with my proposed design solution. It will apply not only in the regression but also in individual test cases where it makes sense to run multiple steps simultaneously. The only example of this I know of is the baroclinic channel test case, which can be run at 5 different viscosities at a given resolution to explore convergence. But it would likely open up the door for other possibilities that we haven't bothered to explore yet.

mark-petersen · 2020-12-02T15:25:59Z

@xylar wow, this is an incredibly thorough and well-written document. I think you really hit the nail on the head, these are exactly my concerns as well. When I wrote down my own issues, they were:

It is hard to explain to others how the xml files work (to such an extent that others don't adopt compass!)
There are now many standard python libraries, and we should spend our time learning those, rather than a custom system.
The hierarchy of config files is confusing and prone to cause errors
The directory structure is too constraining, and doesn't suit many of our needs
The mix of command-line calls versus embedded python functions is confusing

I think you covered all those fine in your document. Here are two more:

Resolution as a parameter Currently, resolution is hard-coded in the directory name and build_base_mesh.py file within. This is appropriate for more complex domains. But for convergence tests, I would like to not have a directory per resolution in compass, but be able to choose a dx = {min, max, step} (linear or log step) that is passed into the function that creates the initial condition and runs the tests. For convergence tests, resolution is a parameter, rather than something fundamental. That also avoids excessive cases on the list.
Easy alterations when developing test cases In the current compass, the created directories include soft links to build_base_mesh.py and add_initial_condition.py. I loved how easy it was to edit that file and rerun it, and quickly iterate until it was done. New people also understood how these worked. If we move from scripts to functions, I want to make sure it is still easy to work with. For example, I find ocean/global_ocean/scripts/cull_mesh.py baffling, and calls within like conversion.mask and conversion.cull are very hard to follow.

mark-petersen · 2020-12-02T16:40:46Z

Another requested design requirement:
3. Choice of pre-made or new init file For low-resolution (QU240, idealized), it is easy to create a new init file every time. For any higher resolution, this can take several hours, and should not be a requirement to set up forward runs with compass. I would like the initial_condition_database to have the same sub-directory structure as compass (ocean/global_ocean/EC60to30 etc) and be populated with the initial_state.nc files. When we set up compass runs, we can choose whether the initial condition is created, or points to pre-made files. Along with this, the exact directories and files should be housed here https://web.lcrc.anl.gov/public/e3sm/mpas_standalonedata/mpas-ocean/initial_condition_database/ and a wget command can download it if it is missing.

The part that would need thought is that the meshes and initial conditions are revised over time, so would need a date stamp or our mesh version number, and a pointer file to the most recent one. Perhaps that is redundant with our https://web.lcrc.anl.gov/public/e3sm/inputdata/ocn/mpas-o/, so we could point to those instead.

A good example for this is that I would like to add performance-scaling tests for many of our standard meshes, including high-resolution, and add some standard performance plots to help us track over time. For these, initial_state.nc should just point to some standard place.

A lighter version of this request is that some cases always point to pre-made files, and some always make new ones. That would not require a new flag to specify.

mark-petersen · 2020-12-02T17:19:22Z

Addition of sample batch submission script so that users do not need to figure it out every time. We had done that previously for performance tests, see example here:
https://github.com/MPAS-Dev/MPAS-Tools/blob/master/ocean/performance_testing/submit_performance_test_to_queue.py#L96
for single line command. Script would be similar.

xylar · 2020-12-04T20:12:43Z

@mark-petersen and @matthewhoffman, I added 5 new requirements following your reviews. Let me know if the design looks good so far or if additional requirements are needed. In the meantime, I'll proceed with the prototype.

vanroekel · 2020-12-04T20:27:22Z

@xylar sorry for not getting to taking a look sooner. I just read through and think it is really great. It captures all the things I would want in a design doc. It's really great work.

I only have one minor comment, one requirement is to have test cases easy to understand and modify. I would like to add "create" to this list. It is implied in the description but I'd like to call this out. Relatedly I'd also like to emphasize the need to balance readability and reusability. One worry I have is if the compass redesign becomes heavily pythonic it may be difficult for developers to contribute, but we can't go too far the other way either. Stressing that balance is important to me.

xylar · 2020-12-04T20:35:38Z

@vanroekel, I'll add that and it fits with my thinking. I think the best way to strike that balance is to have some examples that are easier to follow and others that are very efficient in code reuse. I have some such examples.

xylar · 2020-12-04T21:31:59Z

@vanroekel I made some changes based on your comment. let me know what you think.

@vanroekel and @mark-petersen, I do feel like I have to push back a little bit on both of your concerns about things becoming too pythonic for new developers to understand. I realize that a full use of python capabilities (functions, packages, etc.) may not be part of your standard workflow. But I think confusion about calls like conversion.mask() and conversion.cull() should be easy enough to follow if you (and future developers) take a little time to become familiar with how python functions, modules and packages work.

In my redesign, I have completely avoided the use of classes and inheritance, which would have been a more intuitive approach for me and the approach I used in MPAS-Analysis. But there is an almost inevitable need to pass some sort generic state information around, otherwise it becomes nearly impossible to generalize what a configuration, testcase and step are. In my prototype, I have addressed this need by passing a dictionary containing information about a testcase and its steps. New users may find this complicated and similarly unintuitive compare with classes. I'm not sure what to do about this. At some level, it is truly hard to write code that is generic enough to have a level of reusability that I believe is a fundamental requirement of the rewrite without introducing an level of complexity that may also prove challenging.

For now, I will continue to work on the prototype as I envision it. I will add a few more configurations beyond baroclinic_channel but won't try to add all that many. But I do want to prove that the basic functionality is there. Once that much is in place, perhaps each of you can find some time to work with me on what you find challenging in the prototype and what we can to to redesign it in a simpler, more intuitive way. Without your help, I may not be able to let go of what I see as "better", "more elegant" solutions in favor of simpler ones.

vanroekel · 2020-12-04T21:55:43Z

@xylar, thanks the changes look great. With regard to being too pythonic. I do fully agree with your comments. In my view if there are examples for me to learn from that fully addresses my concerns about readability vs. reusability. I don't think we can expect to make everyone happy with the redesign. And I'm with you that developers (truthfully me) should be expected to do some legwork in learning some of the more advanced python features. It is extremely reasonable.

I'd be very happy to be a beta tester of the new prototype. I'd like to add a new vertical advection convergence test and this would be a good opportunity to do so.

trhille · 2020-12-08T23:37:43Z

@xylar, a comment that maybe encompasses both Shared Code and Machine-Specific Data requirements. I've noticed that when I use a geojson file to define the mask, it creates a symlink instead of copying the file into the directory. This seems fine, but for whatever reason, the MpasMaskCreator.x call cannot use the symlink and I end up having to copy the geojson file into the create_mesh directory manually, which obviously results in an undesirable proliferation of files.

xylar · 2020-12-09T06:10:28Z

@trhille, that's not a problem I've encountered. Could you open an issue on MPAS-Tools about it with more details? One solution could be to use the python wrapper, which will read the link and write out a temp file. But I'm surprised the c++ mask creator doesn't handle symlinks

xylar · 2020-12-09T08:14:55Z

@trhille, just a quick update that I ran a test in which both the base mesh and the test geojson mask are symlinks and didn't encounter a problem:

$ ls -lah base_mesh.nc test.geojson 
lrwxrwxrwx 1 xylar xylar 25 Dec  9 09:04 base_mesh.nc -> ../base_mesh/base_mesh.nc
lrwxrwxrwx 1 xylar xylar 21 Dec  9 09:09 test.geojson -> land_coverage.geojson
$ MpasMaskCreator.x base_mesh.nc test_mask.nc -f test.geojson 


************************************************************
MPAS_MASK_CREATOR:
  C++ version
  Creates a set of masks for a given MPAS mesh and a set of feature files. 


  Compiled on Nov 25 2020 at 23:12:01.
************************************************************
...

This is using MpasMaskCreator.x from the latest mpas_tools package but the same should be true of any conda environment since the mpas_tools package was created.

trhille · 2020-12-09T14:45:53Z

@xylar Interesting. I'll look further into it. Maybe I'm doing something wrong.

xylar · 2020-12-10T19:17:19Z

@matthewhoffman (cc @mark-petersen and @vanroekel), I've spent several days looking into Parsl and it's so promising! But there are features still in beta that aren't yet part of a released package that would make it a lot more useful for us.

I'm torn between working on compass 1.0 using these new features (which are possibly buggy and the API for which could change), trying to work with only fully supported Parsl features, or doing my best to keep an eye on Parsl for the future but proceeding with compass 1.0 without the ability to run testcases in a test suite (and steps within a testcase) in parallel with one another. I'm leaning toward the last, with the idea that I would do a separate design doc for running testcases in parallel sometime early next year.

Any thoughts before I revise the design doc accordingly?

matthewhoffman · 2020-12-10T20:44:07Z

I would also vote for the last choice. Parallel test case execution would be a very nice feature, but not having it has not been a significant limitation. I'd rather wait to implement that around a stable tool than end up having to it twice or something like that. I also think parallel execution is distinct enough from the other design requirements that it can be delayed without major impacts on the other goals.

vanroekel · 2020-12-10T20:51:33Z

I think the last choice does make the most sense. My only question for you @xylar is how much work this could result in for you? Will implementing the parsl features on top of what you propose in this design doc result in a significant rewrite? Or just modifications? If you won't have to reinvent the whole package in only a couple months, I'd vote for the last one. But I think @matthewhoffman makes a great point that it would also be good to have a stable tool prior to adding new features on top.

xylar · 2020-12-10T20:56:57Z

My only question for you @xylar is how much work this could result in for you? Will implementing the parsl features on top of what you propose in this design doc result in a significant rewrite? Or just modifications? If you won't have to reinvent the whole package in only a couple months, I'd vote for the last one. But I think @matthewhoffman makes a great point that it would also be good to have a stable tool prior to adding new features on top.

Thanks to both of you for the discussion. I think it does make sense to have a test branch with this new Parsl tool going as I continue with compass 1.0. I also don't want to implement too many test cases before at least having a prototype of the Parsl parallelism because it certainly may impact the design in ways that would require updating all test cases.

So I think the way forward is for me to take parallel execution of testcases out of the current design, but to keep it firmly in mind anyway as I continue to design the prototype so I can hopefully minimize changes in the second phase of this work. Does that make sense?

xylar · 2021-04-13T13:13:31Z

At this point, I think I am done with the design document unless there are trivial (e.g. formatting) edits to be made. I think it is my best description of this design as it stands today, and my effort from now on is better put into documentation instead of this document.

mark-petersen

Yes, this looks ready to merge. Thanks!

This will be fleshed out more in the coming days.

Test case is not a single word, as I was using it.

Some other clean-up

Fix up some syntax problems that were giving build errors/warnings.

Mostly ased on classes and updated terminology

compass python package (compass v1.0) This merge creates a `compass` python package that can be used to list, set up, run, validate, and clean up test cases and set up, run, and clean up test suites. This prototype contains `landice` and `ocean` MPAS cores. The `landice` core has all test cases in the `sia_integration` test suite and a few other related test cases in the test groups that are in that test suite. The `ocean` core has all of the test cases for the `baroclinic_channel` test group, all test cases from the `nightly` regression suite, and the `QU240`, `QUwISC240`, `EC30to60`, `ECwIsC30to60` and `SOwISC12to60` meshes from "legacy" COMPASS (COMPASS as it currently exists). A significant number of python modules (python files) and packages (subdirectories) have been added within `compass` that break up the tasks related to listing, setting up, and cleaning up test cases that were previously in `list_testcases.py`, `setup_testcase.py` and `clean_testcase.py` as well as functionality related to test suites that was in `manage_regression_suite.py` before. New functionality has also been added to make setting up and running test cases easier, and to promote more code reuse than the current COMPASS framework permits. See the design document in #29 for more details. Each test case has an associated configuration file (`.cfg`) that is constructed by combining config options from various sources: a core-specific config file, one associated with the machine (if running on a "supported" machine), one for the configuration, one for the test case, and a user-defined config file. Typically, these contain different types of options with different purposes. For example, the core's config file points to namelist and streams templates that are core specific. The machine config file contains information about the batch queuing system, the number of cores per node, etc. The test group and test case config files contain options that can be treated as the python equivalents of the namelist options used in the Fortran MPAS-Model code. The user-defined config file can override any config options from these other sources, and can provide paths to initial conditions, meshes and other data that the core or test case may require. An effort has been made to automate most of this so a user config file should only be needed on a non-supported machine. Users can also edit the local config file within a test case (and symlinked within each step) that collects all of the config options for that test case before running.

xylar added the documentation Improvements or additions to documentation label Nov 16, 2020

xylar self-assigned this Nov 16, 2020

xylar requested review from vanroekel, mark-petersen and matthewhoffman November 16, 2020 14:19

xylar force-pushed the compass_package_design_doc branch 2 times, most recently from d8428d5 to c13d5a5 Compare November 18, 2020 14:05

mark-petersen mentioned this pull request Dec 2, 2020

compass python package (compass v1.0) #28

Merged

mark-petersen approved these changes Apr 13, 2021

View reviewed changes

xylar added 21 commits April 14, 2021 10:22

Add a template for design documents

9f43014

Add the first draft of a compass package design doc

1833895

This will be fleshed out more in the coming days.

Add more details on design solutions

2a295ad

Elaborate on Jinja2 templates

c0e66af

Add new requirements following review

1ad6cf6

Add balance between readability and reusibility

98811db

Change testcase parallelism to be a "consideration"

1a4f91c

Rename testcase --> test case

5c338cf

Test case is not a single word, as I was using it.

Update template to match MPAS-Model

c1de853

Rename Design solution --> Algorithm design

a4021eb

Some other clean-up

Update algirithm design sections

0bc2986

Add implementation of making the code "easy"

0955f07

Add design docs to main docs

deb51db

Fix up some syntax problems that were giving build errors/warnings.

Add implementation of shared code

8ce69c6

Add implementation of shared config options

ea08176

Add implementation of modify core count

a6c09f9

Add the template to the docs for reference

8b525d3

Add implementation of machine-specific data

e27e9d7

Add implementation of flexible directories

8860451

Add remaining implementation

62edf7a

Add testing

221469a

xylar force-pushed the compass_package_design_doc branch from a1cbef5 to 77ea5d4 Compare April 14, 2021 08:22

Update design doc based on recent changes

699717c

Mostly ased on classes and updated terminology

xylar force-pushed the compass_package_design_doc branch from 77ea5d4 to 699717c Compare April 14, 2021 08:28

xylar merged commit 42927e3 into MPAS-Dev:master Apr 14, 2021

xylar deleted the compass_package_design_doc branch April 14, 2021 08:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compass package design document #29

Compass package design document #29

xylar commented Nov 16, 2020 •

edited

Loading

xylar commented Nov 16, 2020 •

edited

Loading

xylar commented Nov 16, 2020

matthewhoffman commented Nov 30, 2020

xylar commented Nov 30, 2020 •

edited

Loading

matthewhoffman commented Dec 1, 2020

xylar commented Dec 1, 2020

mark-petersen commented Dec 2, 2020

mark-petersen commented Dec 2, 2020 •

edited

Loading

mark-petersen commented Dec 2, 2020

xylar commented Dec 4, 2020

vanroekel commented Dec 4, 2020

xylar commented Dec 4, 2020

xylar commented Dec 4, 2020

vanroekel commented Dec 4, 2020

trhille commented Dec 8, 2020

xylar commented Dec 9, 2020 •

edited

Loading

xylar commented Dec 9, 2020

trhille commented Dec 9, 2020

xylar commented Dec 10, 2020

matthewhoffman commented Dec 10, 2020

vanroekel commented Dec 10, 2020

xylar commented Dec 10, 2020

xylar commented Apr 13, 2021

mark-petersen left a comment

Compass package design document #29

Compass package design document #29

Conversation

xylar commented Nov 16, 2020 • edited Loading

xylar commented Nov 16, 2020 • edited Loading

xylar commented Nov 16, 2020

matthewhoffman commented Nov 30, 2020

xylar commented Nov 30, 2020 • edited Loading

matthewhoffman commented Dec 1, 2020

xylar commented Dec 1, 2020

mark-petersen commented Dec 2, 2020

mark-petersen commented Dec 2, 2020 • edited Loading

mark-petersen commented Dec 2, 2020

xylar commented Dec 4, 2020

vanroekel commented Dec 4, 2020

xylar commented Dec 4, 2020

xylar commented Dec 4, 2020

vanroekel commented Dec 4, 2020

trhille commented Dec 8, 2020

xylar commented Dec 9, 2020 • edited Loading

xylar commented Dec 9, 2020

trhille commented Dec 9, 2020

xylar commented Dec 10, 2020

matthewhoffman commented Dec 10, 2020

vanroekel commented Dec 10, 2020

xylar commented Dec 10, 2020

xylar commented Apr 13, 2021

mark-petersen left a comment

Choose a reason for hiding this comment

xylar commented Nov 16, 2020 •

edited

Loading

xylar commented Nov 16, 2020 •

edited

Loading

xylar commented Nov 30, 2020 •

edited

Loading

mark-petersen commented Dec 2, 2020 •

edited

Loading

xylar commented Dec 9, 2020 •

edited

Loading