Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sod perpendicular velocities #376

Merged
merged 28 commits into from
Dec 7, 2021
Merged

Fix sod perpendicular velocities #376

merged 28 commits into from
Dec 7, 2021

Conversation

G-071
Copy link
Member

@G-071 G-071 commented Aug 18, 2021

This PR addresses the concerns raised #366.

When all kernels are executed on the same device, the perpendicular velocities are zero. If some kernels are executed on the CPU, others on the GPU there were some values not being zero (though still within machine precision).

This can be avoiding by setting a different fp_contract:

  • Changed CMakeList.txt to build the different hydro kernels with the same fp_contract.
  • Added CMake Option OCTOTIGER_WITH_FAST_FP_CONTRACT to set the fp_contract to off for all hydro kernels. This also addresses the issue with the LEGACY kernel resulting in non-zero values for sz and zx (even though sy is 0.0).
  • Added CMake Option OCTOTIGER_ARCH_FLAG to pick the architecture to compile for. This is a work-around to pick the correct architecture for the files where we are overwriting the cmake compile flags to set a different fp contract.

The issue can also be avoided by always running everything on the GPU:

  • Added runtime option DEVICE_ONLY to all host kernels, guaranteeing all kernels of that type to be executed by the GPU.

Additional tests to check for this issue automatically in the future:

  • Added additional SOD ctests, checking for sy sz and zx to be zero.
  • Added additional SOD test suite running with max_level=3 instead of 1.
  • Added Jenkins test, running with fp_contract=off

@G-071 G-071 requested a review from diehlpk August 18, 2021 16:37
@G-071 G-071 marked this pull request as ready for review August 19, 2021 12:52
@G-071 G-071 requested a review from shibersag August 19, 2021 12:53
@diehlpk
Copy link
Member

diehlpk commented Aug 20, 2021

@G-071 I think that @shibersag would be the better choice since he reported the bug.

@G-071 G-071 removed the request for review from diehlpk August 20, 2021 13:15
@diehlpk diehlpk requested a review from dmarce1 September 6, 2021 01:56
@diehlpk
Copy link
Member

diehlpk commented Sep 6, 2021

@shibersag and @dmarce1 could you please have a look at this pr?

@shibersag
Copy link
Contributor

@G-071, I made several tests and this PR does not seem to resolve the issue.
I ran on Queen-Bee with:
problem=sod
odt=0.02
stop_time=0.2
gravity=off
max_level=3
#n_species=1
disable_diagnostics=on
xscale=0.5
correct_am_hydro=0
cuda_number_gpus=1
#reflect_bc=1
cuda_streams_per_gpu=32
cuda_buffer_capacity=1
#monopole_host_kernel_type=VC
#multipole_host_kernel_type=VC
#monopole_device_kernel_type=CUDA
#multipole_device_kernel_type=CUDA
hydro_device_kernel_type=CUDA
hydro_host_kernel_type=DEVICE_ONLY

And the output I get is:
sx 4.384558e-03 8.361244e-03
sy 0.000000e+00 0.000000e+00
sz 1.764596e-15 7.976589e-15
zx 4.412263e-16 2.302379e-15

@diehlpk
Copy link
Member

diehlpk commented Oct 22, 2021

@G-071 and @shibersag any update?

@diehlpk
Copy link
Member

diehlpk commented Nov 18, 2021

@shibersag and @dmarce1 Can one of you please review the pull request?

@shibersag
Copy link
Contributor

@diehlpk yeah sure. I had to compile the code again on Queen-Bee and now I will continue with testing this PR.

@shibersag
Copy link
Contributor

I've checked the DEVICE_ONLY option and it indeed does the trick.
Compiling with OCTOTIGER_WITH_FAST_FP_CONTRACT=OFF fixes the issue as well.

See the comments in #366 for more details.
I would recommend adding a test of a long Sod run to detect this kind of issues in the future.

@G-071
Copy link
Member Author

G-071 commented Dec 7, 2021

@shibersag Part of the PR already some adds some extended testing with a larger sod problem that uses a larger grid and more time-steps than our usual one. It is being tested multiple times (CPU-only, GPU-only and CPU+GPU). We can make it a bit larger still, but I would not go all out as it is run multiple times for each of our automatic tests (10 build configuration times times all kernel configurations)

Copy link
Contributor

@shibersag shibersag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LVGTM!
Thanks @G-071 for working on that.

@shibersag
Copy link
Contributor

I'm merging this PR.
We can discuss about extending the tests in a separate thread.
For now all looks good.

@shibersag shibersag merged commit 5044bd8 into master Dec 7, 2021
@shibersag shibersag deleted the fix_sod_non_zeros branch December 7, 2021 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants