-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update albedo initialization for identical order of operations #303
Conversation
It would be good to get this fix into the repo by Friday evening for weekend testing. We may also want to update CICE prior to the weekend if we can as well. Will see how reviews go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there are some leftover print statements. Otherwise, it looks fine.
@@ -339,7 +339,8 @@ subroutine read_restart_field(nu,work,ndim) | |||
|
|||
minw = minval(work) | |||
maxw = maxval(work) | |||
write(nu_diag,*) minw, maxw | |||
sumw = sum(work) | |||
write(nu_diag,*) subname, minw, maxw, sumw | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you still want the print statements here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They were there before, and I've found them to be handy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were there before and I just extended the feature a bit. One of the first things I wanted to check was whether all the fields were read as written and whether they were identical. It's helpful to have the same diagnostics written when writing a restart as when reading it, and the sum is a new diagnostic that provides something like a checksum value of the field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Thank you @apcraig
There's no such thing as "bug-free code"... it's surprising this one hasn't popped up before.
@@ -339,7 +339,8 @@ subroutine read_restart_field(nu,work,ndim) | |||
|
|||
minw = minval(work) | |||
maxw = maxval(work) | |||
write(nu_diag,*) minw, maxw | |||
sumw = sum(work) | |||
write(nu_diag,*) subname, minw, maxw, sumw | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They were there before, and I've found them to be handy.
Just one other comment. The problem, in some ways, is that the same code is implemented twice. I fixed it so they would compute fields with the same order of operations. But the right thing to do would be to pull out this code and create a subroutine with it where both parts of the model are calling it. This is all in the icepack driver, so it wouldn't affect icepack columnphysics. I didn't do that as it's a bit more intrusive, I thought I'd just keep the fix as simple as possible. But it's something that could or maybe should be done. |
* serial fix for cheyenne * fix hobart compile
PR checklist
fix albedo initialization which resulted in one non-exact restart
apcraig
Tested on conrad full suite with 4 compilers, all pass, all bit-for-bit (results)
Tested on conrad full suite with 4 compilers with iso1, all tests pass, a few tests fail regression due to compiler optimization related to iso1 change (not this PR) results
With the iso1 changes, one test was failing exact restart on one machine and compiler. After some debugging, it turns out to be an order of operations issue in initialization of albedos that has been there forever but never resulted in problems with exact restart. For whatever reason, this one test on one compiler on one machine picked it up finally. It is fixed and the answers are also bit-for-bit with prior results.
I also fixed initialization of nt_bgc_S when solve_zsal is off. It was not uniquely defined but this did not seem to matter to runs. Still, it should be set.
I added some diagnostics for restart read and write to more easily confirm identical fields written and read.
I modified how the ice_diag files are saved to the log directory by attaching the run timestamp to the filenames. This was long overdue. Prior, runs in the same directory (including restart tests) just overwrote the files on each run. We may want to eventually also change the way we save baselines or rename files in the run directory as this feature is only implemented in the copy of the ice_diag files to the case log directory.
I removed an OMP directive in the albedo initialization as it will not materially help performance since the code is run once. It's also a trivially cheap piece of code.