-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lbgpu node vel #2878
Lbgpu node vel #2878
Conversation
Codecov Report
@@ Coverage Diff @@
## python #2878 +/- ##
======================================
- Coverage 82% 82% -1%
======================================
Files 525 525
Lines 26807 26807
======================================
- Hits 22015 22013 -2
- Misses 4792 4794 +2
Continue to review full report at Codecov.
|
@mkuron do you have seen the rocm error before? ( |
No idea. Could be some variation of an out-of-memory error or out-of-registers. |
|
actually I don't understand why the compilation is not terminated, there are a number of compile errors: /builds/espressomd/espresso/src/core/electrostatics_magnetostatics/p3m_gpu_error_cuda.cu:104:37: error: 'Utils::sinc': no overloaded function has restriction specifiers that are compatible with the ambient context 'p3m_k_space_error_gpu_kernel_ik'
int_pow<2 * cao>(Utils::sinc(meshi.x * nx) * Utils::sinc(meshi.y * ny) *
^
/builds/espressomd/espresso/src/core/electrostatics_magnetostatics/p3m_gpu_error_cuda.cu:104:37: error: 'Utils::sinc': no overloaded function has restriction specifiers that are compatible with the ambient context 'p3m_k_space_error_gpu_kernel_ik'
/builds/espressomd/espresso/src/core/electrostatics_magnetostatics/p3m_gpu_error_cuda.cu:104:65: error: 'Utils::sinc': no overloaded function has restriction specifiers that are compatible with the ambient context 'p3m_k_space_error_gpu_kernel_ik'
int_pow<2 * cao>(Utils::sinc(meshi.x * nx) * Utils::sinc(meshi.y * ny) *
^
/builds/espressomd/espresso/src/core/electrostatics_magnetostatics/p3m_gpu_error_cuda.cu:104:65: error: 'Utils::sinc': no overloaded function has restriction specifiers that are compatible with the ambient context 'p3m_k_space_error_gpu_kernel_ik'
/builds/espressomd/espresso/src/core/electrostatics_magnetostatics/p3m_gpu_error_cuda.cu:105:37: error: 'Utils::sinc': no overloaded function has restriction specifiers that are compatible with the ambient context 'p3m_k_space_error_gpu_kernel_ik'
Utils::sinc(meshi.z * nz));
^
/builds/espressomd/espresso/src/core/electrostatics_magnetostatics/p3m_gpu_error_cuda.cu:105:37: error: 'Utils::sinc': no overloaded function has restriction specifiers that are compatible with the ambient context 'p3m_k_space_error_gpu_kernel_ik'
/builds/espressomd/espresso/src/core/electrostatics_magnetostatics/p3m_gpu_error_cuda.cu:151:39: error: 'Utils::sinc': no overloaded function has restriction specifiers that are compatible with the ambient context 'p3m_k_space_error_gpu_kernel_ad'
U2 = pow((double)Utils::sinc(meshi.x * nmx) *
^
/builds/espressomd/espresso/src/core/electrostatics_magnetostatics/p3m_gpu_error_cuda.cu:151:39: error: 'Utils::sinc': no overloaded function has restriction specifiers that are compatible with the ambient context 'p3m_k_space_error_gpu_kernel_ad'
/builds/espressomd/espresso/src/core/electrostatics_magnetostatics/p3m_gpu_error_cuda.cu:152:35: error: 'Utils::sinc': no overloaded function has restriction specifiers that are compatible with the ambient context 'p3m_k_space_error_gpu_kernel_ad'
Utils::sinc(meshi.y * nmy) * Utils::sinc(meshi.z * nmz),
^
/builds/espressomd/espresso/src/core/electrostatics_magnetostatics/p3m_gpu_error_cuda.cu:152:35: error: 'Utils::sinc': no overloaded function has restriction specifiers that are compatible with the ambient context 'p3m_k_space_error_gpu_kernel_ad'
/builds/espressomd/espresso/src/core/electrostatics_magnetostatics/p3m_gpu_error_cuda.cu:152:64: error: 'Utils::sinc': no overloaded function has restriction specifiers that are compatible with the ambient context 'p3m_k_space_error_gpu_kernel_ad'
Utils::sinc(meshi.y * nmy) * Utils::sinc(meshi.z * nmz),
^
/builds/espressomd/espresso/src/core/electrostatics_magnetostatics/p3m_gpu_error_cuda.cu:152:64: error: 'Utils::sinc': no overloaded function has restriction specifiers that are compatible with the ambient context 'p3m_k_space_error_gpu_kernel_ad'
/builds/espressomd/espresso/src/core/electrostatics_magnetostatics/p3m_gpu_error_cuda.cu:210:39: error: 'Utils::sinc': no overloaded function has restriction specifiers that are compatible with the ambient context 'p3m_k_space_error_gpu_kernel_ik_i'
U2 = pow((double)Utils::sinc(meshi.x * nmx) *
^
/builds/espressomd/espresso/src/core/electrostatics_magnetostatics/p3m_gpu_error_cuda.cu:210:39: error: 'Utils::sinc': no overloaded function has restriction specifiers that are compatible with the ambient context 'p3m_k_space_error_gpu_kernel_ik_i'
/builds/espressomd/espresso/src/core/electrostatics_magnetostatics/p3m_gpu_error_cuda.cu:211:35: error: 'Utils::sinc': no overloaded function has restriction specifiers that are compatible with the ambient context 'p3m_k_space_error_gpu_kernel_ik_i'
Utils::sinc(meshi.y * nmy) * Utils::sinc(meshi.z * nmz),
^
/builds/espressomd/espresso/src/core/electrostatics_magnetostatics/p3m_gpu_error_cuda.cu:211:35: error: 'Utils::sinc': no overloaded function has restriction specifiers that are compatible with the ambient context 'p3m_k_space_error_gpu_kernel_ik_i'
/builds/espressomd/espresso/src/core/electrostatics_magnetostatics/p3m_gpu_error_cuda.cu:211:64: error: 'Utils::sinc': no overloaded function has restriction specifiers that are compatible with the ambient context 'p3m_k_space_error_gpu_kernel_ik_i'
Utils::sinc(meshi.y * nmy) * Utils::sinc(meshi.z * nmz),
^
/builds/espressomd/espresso/src/core/electrostatics_magnetostatics/p3m_gpu_error_cuda.cu:211:64: error: 'Utils::sinc': no overloaded function has restriction specifiers that are compatible with the ambient context 'p3m_k_space_error_gpu_kernel_ik_i'
/builds/espressomd/espresso/src/core/electrostatics_magnetostatics/p3m_gpu_error_cuda.cu:273:39: error: 'Utils::sinc': no overloaded function has restriction specifiers that are compatible with the ambient context 'p3m_k_space_error_gpu_kernel_ad_i'
U2 = pow((double)Utils::sinc(meshi.x * nmx) *
^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated. |
Good find. So this is actually something like an unresolved symbol error.
That‘s a known hipcc bug. It doesn‘t produce nonzero exit codes if compilation failed. Usually that‘s not a problem because the object file would subsequently be missing and CMake terminates anyway, but in the current case, an (incomplete?) object file seems to have already been written. That issue was fixed last month: ROCm/HIP#1117. |
Concerning the
|
@KaiSzuttor there is a merge regression now that reintroduces the |
I'll try again... |
*/ | ||
__global__ void integrate(LB_nodes_gpu n_a, LB_nodes_gpu n_b, LB_rho_v_gpu *d_v, | ||
LB_node_force_density_gpu node_f, | ||
EK_parameters *ek_parameters_gpu, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here e.g.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was intended by this PR, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, those are not needed and were removed in the master after this branch was forked of
was there before merge:
|
Still this reverts #2877 which it should not. |
puh, then maybe you should merge... |
This is too difficult to debug for me, giving up. |
float *boundary_velocity = nullptr; | ||
int *boundary_node_list = nullptr; | ||
int *boundary_index_list = nullptr; | ||
size_t size_of_boundindex = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These variables shadow globals, and these globals are not even used anymore. It's possible that that was the cause of the HSA_STATUS_ERROR_INVALID_ISA
you were seeing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HSA_STATUS_ERROR_INVALID_ISA
: The instruction set architecture is invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That‘s just a generic error code. It is used in many places for which no specific error code has been defined.
@fweik it's only the rocm test that fails... maybe we should not throw away this PR just because of that |
Well you know where to find the code. I'm done with this. |
This is where the error comes from: https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/e2d1950bd8fc83dd5b8352a41b829a7c57fc1073/src/core/runtime/amd_aql_queue.cpp#L820. It supposedly means "Out of VGPRs". On CUDA, an out-of-registers condition would cause an overflow into global memory -- slowing things down, but still working. I guess we don't have infrastructure in place that would detect this kind of performance regression on CUDA though. So it's plausible that a function requiring a very large number of registers would be broken on AMD. Adding I think AMD can deal with 255 VGPRs too, but it's unclear to me how to measure how many of them a kernel is using. Maybe this pull request just crosses the limit. To reduce register consumption, large kernels would need to be broken up into multiple smaller ones. |
It's the call to |
2982: Reduce excessive loop unrolling in lbgpu velocity interpolation r=KaiSzuttor a=mkuron This caused excessive register usage, especially when combined with thrust. Issue discovered by @fweik in #2878. It turns out that this is a problem for CUDA too, it just exhibits a different behavior. Instead of crashing like on HIP, CUDA just produces a large binary and slower code. In a perfect world, the compiler should display a warning, but I guess neither AMD nor Nvidia operate in a perfect world. Co-authored-by: Michael Kuron <[email protected]>
@KaiSzuttor the webhook didn't trigger CI for 81b9c33 |
bors r+ |
2878: Lbgpu node vel r=KaiSzuttor a=fweik Description of changes: - Factored out velocity getter, and obey boundary velocity for both interpolation schemes, - Removed some globals - Leak less Co-authored-by: Florian Weik <[email protected]> Co-authored-by: RudolfWeeber <[email protected]> Co-authored-by: Kai Szuttor <[email protected]> Co-authored-by: Kai Szuttor <[email protected]>
Build succeeded |
Description of changes:
both interpolation schemes,