Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-BFB normalVelocity in wetting/drying test case for MPAS-O when using more than 9 procs #5902

Open
gcapodag opened this issue Aug 29, 2023 · 4 comments

Comments

@gcapodag
Copy link
Contributor

The normalVelocity is not BFB compared to a serial run when using 10 or more (I tested 11,12 and 13) processes in parallel.
This has been observed in the Compass test case called: ocean/drying_slope/1km/single_layer/ramp.
The results are instead BFB when running with 2,3,...,9 processes.
See also companion issue on Compass Github: MPAS-Dev/compass#686

@gcapodag
Copy link
Contributor Author

gcapodag commented Sep 1, 2023

It looks like with dt=30s and a run duration of 31min and 30s the results are BFB, the next time-step (so run duration 32min) they become non-BFB. I saved the intermediate solutions for the normalVelocityProvis during the last time-step computation (from 31:30 to 32:00) at the first three stages of RK4 and compared them serial vs 10 procs using ncdiff. The results show they are exactly the same (both in single and double precision), though printing from the code after the first stage of RK4 the provis velocity is not the same because the tendency is not the same. At the first stage the tendency and diagnostics are computed with the old solution. Also, it does not look like this has anything to do with wetting and drying since wettingVelocityFactor is zero when the provis solution is advanced.

@xylar
Copy link
Contributor

xylar commented Sep 1, 2023

Insufficient halo updates on that first provisional solution?

@gcapodag
Copy link
Contributor Author

gcapodag commented Sep 1, 2023

@xylar After opening countless matrioskas I finally found the problem. A halo update on layerThickEdgeFlux seems to be missing in the code. After I added this halo update right before computing ocn_time_integrator_rk4_compute_vel_tends in RK4 the runs with more than 9 procs are finally BFB with respect to the serial 12 hr run.

@xylar
Copy link
Contributor

xylar commented Sep 1, 2023

That's wonderful! Not a fun debugging process, I'm sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants