-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Several weekly GPU longruns are broken #2934
Comments
SSP baroclinic wave also seems consistenly broken: |
Yeah, we have been ignoring this one. Hope we don't need it anymore after SSPKnoth:) |
Would it make sense to look at all of those cases after SSPKnoth lands? |
Yes, I just tagged you to let you know there is an open issue. Also make sure they don't break diagnostic EDMF:) |
We will revisit this after SSPKnoth; I'm adding some fixes to a bycolumn op (in the use of |
I found that PR #2855 breaks one run that works otherwise with SSPKnoth |
Interesting, so that PR breaks two of the working runs, although I see no reason that it should cause physical changes. What is the job that failed? |
https://buildkite.com/clima/climaatmos-ci/builds/19073#018fe547-837c-4eab-a70f-2691b203a9fa |
Could someone take another look at the PR #2855 to see if the change in MSE is only from the change in the order of computation? Maybe @juliasloan25? |
I am looking at it right now because it is blocking my work on Rosenbrock. |
I took one integration step (for the job that is failing for me) and compared before and after the patch. The results are very different:
|
I suggest that we revert it, unless @akshaysridhar or @glwagner understand why there is a substantial change in surface fluxes from that PR? |
@Sbozzolo Could you check to see if |
Steps to reproduce: First, checkout commit before/after change, e.g. git checkout b51150ee4 Then import ClimaAtmos as CA
import SciMLBase
working = CA.get_simulation(CA.AtmosConfig("config/mpi_configs/mpi_sphere_aquaplanet_rhoe_equilmoist_clearsky.yml"))
SciMLBase.step!(working.integrator)
println(working.integrator.p.precomputed.sfc_conditions.ρ_flux_uₕ) |
After the first step, they are the same. |
(Maybe @dennisYatunin too) |
wait I think there is a bug in that PR.
f = C12(f₁₃ * xz + f₂₃ * yz, L) ?
|
@Sbozzolo could you try if that fixes the issue? (And we should somehow prevent this from happening in the future...) |
Yes, this is a bug |
Will try this now. |
Yes, that works! |
Opening a PR now |
The longrun experiments have been cleaned up and most of these are irrelevant now. Closing the issue for now. |
[build]((https://buildkite.com/clima/climaatmos-gpulongruns/builds/253) on 6/14
longrun_aquaplanet_clearsky_1M
: unstable after ~17 weeks. t_end was extended to 120 days for this run so it is broken.Status: not fixed
longrun_aquaplanet_rhoe_equil_55km_nz63_clearsky_tvinsol_0M_slabocean
: stable, but conservation test failed due to the change to SSP.Status: fixed by #3159, which ensures the surface state uses the correct precipitation tendency.
build on 6/14
longrun_aquaplanet_clearsky_1M
: unstable after ~5 days. It seems 1M doesn't have a regression test now. Maybe related to #3084?Status: fixed by #3095. Now it runs for longer and becomes unstable eventually, which is a separate issue.
build on 6/7
longrun_bw_rhoe_equil_highres
. This is from PR #3074 which removes the reference state. Reordering the u3 tendency to make it similar to before in this commit fixes it. This suggests that we may be close to the instability regime. We will see if it works with SSPKnoth.Status: fixed by switching to SSP
longrun_aquaplanet_rhoe_equil_55km_nz63_clearsky_0M_earth
. This is from PR #3074 which removes the reference state.Status: not fixed
build on 5/27
longrun_sphere_hydrostatic_balance_rhoe
. This is from a change to a higher horizontal and vertical resolution (PR #3028)Status: fixed by switching to SSP
longrun_aquaplanet_rhoe_equil_55km_nz63_gray_0M
. PR #2855 is the only one that changes MSE this week.Status: fixed by fixing a bug in PR #2855
build on 4/19
longrun_aquaplanet_rhoe_equil_55km_nz63_clearsky_0M_earth_sleve
. cc @akshaysridhar . Changing SST to a function of surface height results in behavior changes in this job.Status: not fixed
longrun_aquaplanet_clearsky_1M
. cc @trontrytel . I don't know why this job has behavior changes.Status: fixed (as in, it works now, I don't know why)
The text was updated successfully, but these errors were encountered: