-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to v1.2.0-alpha.5 #555
Conversation
@mark-petersen, @matthewhoffman, @trhille, @darincomeau and @jonbob, A need has arisen for new spack environments. The changes @jonbob has made for Chicoma in E3SM-Project/E3SM#5499 also affect spack (E3SM-Project/mache#112). We also need to solve the problems in #539 and #549, the relevant commits from which I have included here. @jonbob and I will test a few builds on a few different machines first to make sure there aren't any unpleasant surprises. Then, I will merge E3SM-Project/mache#112 and ask you all to do some more thorough testing (see the PR description). Then, I will create a new mache release and we will deploy (again, see above). There is nothing for any of the rest of you to do for now. I just wanted to make sure you're aware that I'll be asking for this soon. |
As a reminder, instructions are here: |
27e4061
to
435148c
Compare
435148c
to
da1e9d3
Compare
A question for everyone (but especially @trhille, @mark-petersen and @darincomeau who would do the work): Is there any point in updating on Cori at this point? Or should we drop support with this update? |
I'd be fine with leaving Cori out of the update. |
Okay, given @trhille's blessing, I'm removing Cori (we can always take out that commit). |
1f07c5b
to
4c5c6ff
Compare
b65aef2
to
15d13aa
Compare
This is how I am testing this branch on perlmutter and chicoma:
Is that the correct mache branch? @xylar is this what I'm supposed to be doing? On perlmutter it dies with this
It gets a lot further on chicoma, but is extremely slow on these builds (but not hung)
|
@mark-petersen, yes, that's the correct mache branch. You do, indeed, need to run the spack clean thing. You'll need to figure out the right file to source to get the |
I've also been having trouble on chicoma. |
@mark-petersen, one more thing: you should be able to see the |
Using the commands in my previous post, chicoma ran to completion - it was just slow. I was then able to
build MPAS-Ocean, and run successfully with I still need to figure out perlmutter. |
@mark-petersen, Chrysalis with Gnu and OpenMPI would be the other to try. That one has been failing reliably for a long time. #500 |
I take that back. Using compass build with this pr on chicoma passes the nightly test suite, but I get these failures with the pr test suite:
This one also has trouble:
It appears to hang on this line in the log file, but sometimes recovers.
This looks like the same error as in #500. Since this is a long-standing documented error, it does not affect this compass update. |
@mark-petersen, the problems with The |
I'm inclined to say the same, even though it's frustrating that it's a long-standing error. |
I am now waiting on: |
@xylar the
Chrysalis:
gnu and openmpi:
Anvil: Build error, will revisit when next test completes gnu and openmpi Built, compass test in queue |
@darincomeau, the failed tests on Chrysalis aren't a good sign. Could you point me to where you ran them so I can look at the log files? |
@xylar the four test directories are here:
Thanks for taking a look! |
Thanks @darincomeau. The fact that we're seeing state validation errors indicates to me that Sara might have changed something on her branch that has broken the test. This is part of what makes the |
@xylar, any thoughts on how to deal with this?
|
@trhille, I hadn't seen that before. I don't know why you're seeing it and everyone else isn't. The only solution I've found is the one you've tried. I'll keep looking... |
@trhille, reading more, it seems like having others using a git repo that I created is not considered to be safe. The only solution I can come up with (which I was avoiding in the interest of saving disk space and build time) is to have a different spack clone for each spack environment, rather than trying to share one. I can make that change but it will mean we all have to start over with deployment. My preference would be for me to do all the deployment on Perlmutter (and any other machines with this error) for this version and then to fix the problem in a subsequent update when we're not so far along. |
... except that @mark-petersen is the owner of the spack clone on Perlmutter. So that complicates things. |
@xylar is this perhaps something that removing |
@trhille, I am currently rebuilding the spack environment, after moving @mark-petersen's build aside. Once that's done, you can test without |
@darincomeau, I think the |
@trhille, you can test now on Perlmutter. |
@xylar, we're getting (hopefully) closer.
(Note: without |
@trhille, let me see if I can reproduce this. |
@trhille, I'm getting other errors even earlier. I'll try again tomorrow... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deployed and tested nonhydro
stratified_seiche
for supported compilers on Chrysalis and Anvil.
Thanks for all the guidance @xylar !
@trhille, I was able to build MALI just fine on Perlmutter. Did you do the following?
I'm running the test suite now and it seems to be passing so far. Update: the |
@trhille, I checked the box but would still appreciate you verifying that things work for you. If not, we need to debug that library error. |
I had no problem building MALI, it was just when I ran |
@xylar, I'm still waiting for your go-ahead to re-test on Chicoma without |
@trhille, on Chicoma, please use |
@xylar, Chicoma still gives me |
okay, thanks. I'll do all 3 envs on Chicoma, too. |
@trhille, I've made the @jonbob, I've made the shared PETSc environment, too. Did you run into the same issue as @trhille on Chicoma and Perlmutter (not being able to make changes to @mark-petersen's spack clone)? Or have you been using a different space? |
@scalandr suggested the following namelist changes to make
She will make a compass pr to change these. With these changes, I was able to run both |
@trhille, I was able to run |
Thanks everyone for your help on this! I know it was a slog. Hopefully, the next time will be smoother because of what we learned this time. |
@xylar - I had run into that issue and more. I finally gave up, or at least stopped pushing on it... |
Updates
mache
to v1.14.0, which brings in many updates since v1.10.0 (current compass version)Explicitly gives a version of CMake in spack, needed by Trilinos (from #549)
Removes Albany (therefore MALI) support on Anvil, as we haven't been able to create a compatible configuration.
Updates the spack, compass and mpas-standalone locations to be in
/usr/projects/e3sm
on Chicoma.Removes support for Cori-Haswell, which will be decommissioned in April.
Removes support for Gnu and OpenMPI on Compy, which has not been working in testing.
Testing
MPAS-Ocean with
pr
:ocean/global_ocean/QU240/PHC/RK4/decomp_test
failing on Compy with Intel #502gnu and openmpi- Failing similarly topr
test suite not successful with Gnu on Compy #503MALI with
full_integration
:gnu and openmpi- nearly all tests are failing (invalid memory reference)MPAS-Ocean with
nonhydro
:Deployed
MPAS-Ocean with
pr
:MALI with
full_integration
:MPAS-Ocean with
nonhydro
:closes #335