Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compass package design document #29

Merged
merged 22 commits into from
Apr 14, 2021

Conversation

xylar
Copy link
Collaborator

@xylar xylar commented Nov 16, 2020

A design document for the compass python package that could eventually replace the existing COMPASS approach.

@xylar xylar added the documentation Improvements or additions to documentation label Nov 16, 2020
@xylar xylar self-assigned this Nov 16, 2020
@xylar
Copy link
Collaborator Author

xylar commented Nov 16, 2020

@mark-petersen, @vanroekel and @matthewhoffman, I'll be working on this design document over the next few days and then asking for your full review of it and my prototype compass package after that.

For now, I would appreciate if you could each look over just the Requirements listed and let me know 1) if you think they are reasonable and 2) if there are others you think are missing.

Here is a link to the file for easier viewing:
https://github.com/xylar/compass/blob/compass_package_design_doc/docs/design_docs/compass_package.rst

@xylar
Copy link
Collaborator Author

xylar commented Nov 16, 2020

@caozd999 and @sbrus89, as COMPASS testcase developers, you might be interested in giving feedback on this, too.

@xylar xylar force-pushed the compass_package_design_doc branch 2 times, most recently from d8428d5 to c13d5a5 Compare November 18, 2020 14:05
@matthewhoffman
Copy link
Member

@xylar , can you elaborate on what is going on in the section "Readability would be improved by using Jinja2 templates for code generation"?

@xylar
Copy link
Collaborator Author

xylar commented Nov 30, 2020

can you elaborate on what is going on in the section "Readability would be improved by using Jinja2 templates for code generation"?

Here is the relevant section for reference: https://github.com/xylar/compass/blob/compass_package_design_doc/docs/design_docs/compass_package.rst#design-solution-make-testcases-easy-to-understand-and-modify

I have elaborated as follows:

A Jinja2 template uses curly braces (e.g. {{ step.module }}) to indicate where an element of the template will be replaced by a python variable or dictionary value. In this example, {{ step.module }} will be replaced with the contents of step['module'] in the python code, and similarly for other replacements in the template. Other than the replacements, the code can be read as normal, in contrast to the existing approach of python scripts that define other python scripts via a series of string formatting statements.

Let me know if that is sufficient or if you still have questions.

@matthewhoffman
Copy link
Member

@xylar , the requirements section looks good to me. The only other thought I have is the ability to run multiple tests in a regression suite simultaneously.

@xylar
Copy link
Collaborator Author

xylar commented Dec 1, 2020

The only other thought I have is the ability to run multiple tests in a regression suite simultaneously.

Good timing! I'm actually working on that right now. I wanted to have a prototype before I added that to the design and I was originally thinking it would be an additional step (with its own design doc). But I now realize that it will be better to design with this in mind from the beginning because it will mean less retrofitting of test cases to support this capability.

So I'll add that as a requirement tomorrow along with my proposed design solution. It will apply not only in the regression but also in individual test cases where it makes sense to run multiple steps simultaneously. The only example of this I know of is the baroclinic channel test case, which can be run at 5 different viscosities at a given resolution to explore convergence. But it would likely open up the door for other possibilities that we haven't bothered to explore yet.

@mark-petersen
Copy link
Collaborator

@xylar wow, this is an incredibly thorough and well-written document. I think you really hit the nail on the head, these are exactly my concerns as well. When I wrote down my own issues, they were:

  1. It is hard to explain to others how the xml files work (to such an extent that others don't adopt compass!)
  2. There are now many standard python libraries, and we should spend our time learning those, rather than a custom system.
  3. The hierarchy of config files is confusing and prone to cause errors
  4. The directory structure is too constraining, and doesn't suit many of our needs
  5. The mix of command-line calls versus embedded python functions is confusing

I think you covered all those fine in your document. Here are two more:

  1. Resolution as a parameter Currently, resolution is hard-coded in the directory name and build_base_mesh.py file within. This is appropriate for more complex domains. But for convergence tests, I would like to not have a directory per resolution in compass, but be able to choose a dx = {min, max, step} (linear or log step) that is passed into the function that creates the initial condition and runs the tests. For convergence tests, resolution is a parameter, rather than something fundamental. That also avoids excessive cases on the list.
  2. Easy alterations when developing test cases In the current compass, the created directories include soft links to build_base_mesh.py and add_initial_condition.py. I loved how easy it was to edit that file and rerun it, and quickly iterate until it was done. New people also understood how these worked. If we move from scripts to functions, I want to make sure it is still easy to work with. For example, I find ocean/global_ocean/scripts/cull_mesh.py baffling, and calls within like conversion.mask and conversion.cull are very hard to follow.

@mark-petersen
Copy link
Collaborator

mark-petersen commented Dec 2, 2020

Another requested design requirement:
3. Choice of pre-made or new init file For low-resolution (QU240, idealized), it is easy to create a new init file every time. For any higher resolution, this can take several hours, and should not be a requirement to set up forward runs with compass. I would like the initial_condition_database to have the same sub-directory structure as compass (ocean/global_ocean/EC60to30 etc) and be populated with the initial_state.nc files. When we set up compass runs, we can choose whether the initial condition is created, or points to pre-made files. Along with this, the exact directories and files should be housed here https://web.lcrc.anl.gov/public/e3sm/mpas_standalonedata/mpas-ocean/initial_condition_database/ and a wget command can download it if it is missing.

The part that would need thought is that the meshes and initial conditions are revised over time, so would need a date stamp or our mesh version number, and a pointer file to the most recent one. Perhaps that is redundant with our https://web.lcrc.anl.gov/public/e3sm/inputdata/ocn/mpas-o/, so we could point to those instead.

A good example for this is that I would like to add performance-scaling tests for many of our standard meshes, including high-resolution, and add some standard performance plots to help us track over time. For these, initial_state.nc should just point to some standard place.

A lighter version of this request is that some cases always point to pre-made files, and some always make new ones. That would not require a new flag to specify.

@mark-petersen
Copy link
Collaborator

  1. Addition of sample batch submission script so that users do not need to figure it out every time. We had done that previously for performance tests, see example here:
    https://github.com/MPAS-Dev/MPAS-Tools/blob/master/ocean/performance_testing/submit_performance_test_to_queue.py#L96
    for single line command. Script would be similar.

@xylar
Copy link
Collaborator Author

xylar commented Dec 4, 2020

@mark-petersen and @matthewhoffman, I added 5 new requirements following your reviews. Let me know if the design looks good so far or if additional requirements are needed. In the meantime, I'll proceed with the prototype.

@vanroekel
Copy link
Collaborator

@xylar sorry for not getting to taking a look sooner. I just read through and think it is really great. It captures all the things I would want in a design doc. It's really great work.

I only have one minor comment, one requirement is to have test cases easy to understand and modify. I would like to add "create" to this list. It is implied in the description but I'd like to call this out. Relatedly I'd also like to emphasize the need to balance readability and reusability. One worry I have is if the compass redesign becomes heavily pythonic it may be difficult for developers to contribute, but we can't go too far the other way either. Stressing that balance is important to me.

@xylar
Copy link
Collaborator Author

xylar commented Dec 4, 2020

@vanroekel, I'll add that and it fits with my thinking. I think the best way to strike that balance is to have some examples that are easier to follow and others that are very efficient in code reuse. I have some such examples.

@xylar
Copy link
Collaborator Author

xylar commented Dec 4, 2020

@vanroekel I made some changes based on your comment. let me know what you think.

@vanroekel and @mark-petersen, I do feel like I have to push back a little bit on both of your concerns about things becoming too pythonic for new developers to understand. I realize that a full use of python capabilities (functions, packages, etc.) may not be part of your standard workflow. But I think confusion about calls like conversion.mask() and conversion.cull() should be easy enough to follow if you (and future developers) take a little time to become familiar with how python functions, modules and packages work.

In my redesign, I have completely avoided the use of classes and inheritance, which would have been a more intuitive approach for me and the approach I used in MPAS-Analysis. But there is an almost inevitable need to pass some sort generic state information around, otherwise it becomes nearly impossible to generalize what a configuration, testcase and step are. In my prototype, I have addressed this need by passing a dictionary containing information about a testcase and its steps. New users may find this complicated and similarly unintuitive compare with classes. I'm not sure what to do about this. At some level, it is truly hard to write code that is generic enough to have a level of reusability that I believe is a fundamental requirement of the rewrite without introducing an level of complexity that may also prove challenging.

For now, I will continue to work on the prototype as I envision it. I will add a few more configurations beyond baroclinic_channel but won't try to add all that many. But I do want to prove that the basic functionality is there. Once that much is in place, perhaps each of you can find some time to work with me on what you find challenging in the prototype and what we can to to redesign it in a simpler, more intuitive way. Without your help, I may not be able to let go of what I see as "better", "more elegant" solutions in favor of simpler ones.

@vanroekel
Copy link
Collaborator

@xylar, thanks the changes look great. With regard to being too pythonic. I do fully agree with your comments. In my view if there are examples for me to learn from that fully addresses my concerns about readability vs. reusability. I don't think we can expect to make everyone happy with the redesign. And I'm with you that developers (truthfully me) should be expected to do some legwork in learning some of the more advanced python features. It is extremely reasonable.

I'd be very happy to be a beta tester of the new prototype. I'd like to add a new vertical advection convergence test and this would be a good opportunity to do so.

@trhille
Copy link
Collaborator

trhille commented Dec 8, 2020

@xylar, a comment that maybe encompasses both Shared Code and Machine-Specific Data requirements. I've noticed that when I use a geojson file to define the mask, it creates a symlink instead of copying the file into the directory. This seems fine, but for whatever reason, the MpasMaskCreator.x call cannot use the symlink and I end up having to copy the geojson file into the create_mesh directory manually, which obviously results in an undesirable proliferation of files.

@xylar
Copy link
Collaborator Author

xylar commented Dec 9, 2020

@trhille, that's not a problem I've encountered. Could you open an issue on MPAS-Tools about it with more details? One solution could be to use the python wrapper, which will read the link and write out a temp file. But I'm surprised the c++ mask creator doesn't handle symlinks

@xylar
Copy link
Collaborator Author

xylar commented Dec 9, 2020

@trhille, just a quick update that I ran a test in which both the base mesh and the test geojson mask are symlinks and didn't encounter a problem:

$ ls -lah base_mesh.nc test.geojson 
lrwxrwxrwx 1 xylar xylar 25 Dec  9 09:04 base_mesh.nc -> ../base_mesh/base_mesh.nc
lrwxrwxrwx 1 xylar xylar 21 Dec  9 09:09 test.geojson -> land_coverage.geojson
$ MpasMaskCreator.x base_mesh.nc test_mask.nc -f test.geojson 


************************************************************
MPAS_MASK_CREATOR:
  C++ version
  Creates a set of masks for a given MPAS mesh and a set of feature files. 


  Compiled on Nov 25 2020 at 23:12:01.
************************************************************
...

This is using MpasMaskCreator.x from the latest mpas_tools package but the same should be true of any conda environment since the mpas_tools package was created.

@trhille
Copy link
Collaborator

trhille commented Dec 9, 2020

@xylar Interesting. I'll look further into it. Maybe I'm doing something wrong.

@xylar
Copy link
Collaborator Author

xylar commented Dec 10, 2020

@matthewhoffman (cc @mark-petersen and @vanroekel), I've spent several days looking into Parsl and it's so promising! But there are features still in beta that aren't yet part of a released package that would make it a lot more useful for us.

I'm torn between working on compass 1.0 using these new features (which are possibly buggy and the API for which could change), trying to work with only fully supported Parsl features, or doing my best to keep an eye on Parsl for the future but proceeding with compass 1.0 without the ability to run testcases in a test suite (and steps within a testcase) in parallel with one another. I'm leaning toward the last, with the idea that I would do a separate design doc for running testcases in parallel sometime early next year.

Any thoughts before I revise the design doc accordingly?

@matthewhoffman
Copy link
Member

I would also vote for the last choice. Parallel test case execution would be a very nice feature, but not having it has not been a significant limitation. I'd rather wait to implement that around a stable tool than end up having to it twice or something like that. I also think parallel execution is distinct enough from the other design requirements that it can be delayed without major impacts on the other goals.

@vanroekel
Copy link
Collaborator

I think the last choice does make the most sense. My only question for you @xylar is how much work this could result in for you? Will implementing the parsl features on top of what you propose in this design doc result in a significant rewrite? Or just modifications? If you won't have to reinvent the whole package in only a couple months, I'd vote for the last one. But I think @matthewhoffman makes a great point that it would also be good to have a stable tool prior to adding new features on top.

@xylar
Copy link
Collaborator Author

xylar commented Dec 10, 2020

My only question for you @xylar is how much work this could result in for you? Will implementing the parsl features on top of what you propose in this design doc result in a significant rewrite? Or just modifications? If you won't have to reinvent the whole package in only a couple months, I'd vote for the last one. But I think @matthewhoffman makes a great point that it would also be good to have a stable tool prior to adding new features on top.

Thanks to both of you for the discussion. I think it does make sense to have a test branch with this new Parsl tool going as I continue with compass 1.0. I also don't want to implement too many test cases before at least having a prototype of the Parsl parallelism because it certainly may impact the design in ways that would require updating all test cases.

So I think the way forward is for me to take parallel execution of testcases out of the current design, but to keep it firmly in mind anyway as I continue to design the prototype so I can hopefully minimize changes in the second phase of this work. Does that make sense?

@xylar
Copy link
Collaborator Author

xylar commented Apr 13, 2021

At this point, I think I am done with the design document unless there are trivial (e.g. formatting) edits to be made. I think it is my best description of this design as it stands today, and my effort from now on is better put into documentation instead of this document.

Copy link
Collaborator

@mark-petersen mark-petersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this looks ready to merge. Thanks!

@xylar xylar force-pushed the compass_package_design_doc branch from a1cbef5 to 77ea5d4 Compare April 14, 2021 08:22
Mostly ased on classes and updated terminology
@xylar xylar force-pushed the compass_package_design_doc branch from 77ea5d4 to 699717c Compare April 14, 2021 08:28
@xylar xylar merged commit 42927e3 into MPAS-Dev:master Apr 14, 2021
@xylar xylar deleted the compass_package_design_doc branch April 14, 2021 08:29
xylar added a commit that referenced this pull request Apr 23, 2021
compass python package (compass v1.0)

This merge creates a `compass` python package that can be used to list, set up, run, validate, and clean up test cases and set up, run, and clean up test suites.

This prototype contains  `landice` and `ocean` MPAS cores.  The `landice` core has all test cases in the `sia_integration` test suite and a few other related test cases in the test groups that are in that test suite.  The `ocean` core has all of the test cases for the `baroclinic_channel` test group, all test cases from the `nightly` regression suite, and the `QU240`, `QUwISC240`, `EC30to60`, `ECwIsC30to60` and `SOwISC12to60` meshes from "legacy" COMPASS (COMPASS as it currently exists).

A significant number of python modules (python files) and packages (subdirectories) have been added within `compass` that break up the tasks related to listing, setting up, and cleaning up test cases that were previously in `list_testcases.py`, `setup_testcase.py` and `clean_testcase.py` as well as functionality related to test suites that was in `manage_regression_suite.py` before.  New functionality has also been added to make setting up and running test cases easier, and to promote more code reuse than the current COMPASS framework permits.  See the design document in #29 for more details.

Each test case has an associated configuration file (`.cfg`) that is constructed by combining config options from various sources: a core-specific config file, one associated with the machine (if running on a "supported" machine), one for the configuration, one for the test case, and a user-defined config file.  Typically, these contain different types of options with different purposes.  For example, the core's config file points to namelist and streams templates that are core specific.  The machine config file contains information about the batch queuing system, the number of cores per node, etc.  The test group and test case config files contain options that can be treated as the python equivalents of the namelist options used in the Fortran MPAS-Model code.  The user-defined config file can override any config options from these other sources, and can provide paths to initial conditions, meshes and other data that the core or test case may require.  An effort has been made to automate most of this so a user config file should only be needed on a non-supported machine.  Users can also edit the local config file within a test case (and symlinked within each step) that collects all of the config options for that test case before running.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants