Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kokkos: disable failing CUDA+DEBUG test #2494

Merged
merged 2 commits into from
Apr 2, 2018
Merged

Conversation

ibaned
Copy link
Contributor

@ibaned ibaned commented Apr 2, 2018

@trilinos/kokkos
@trilinos/framework

Description

Disable instances of a test which request a hardcoded number of
32 CUDA threads per warp, because with debugging
enabled the CUDA kernel uses too many registers
and can only run on 16 threads per warp max.

Motivation and Context

This test is failing in key ATDM build configurations

Related Issues

kokkos/kokkos#1514, kokkos/kokkos#1513, #2471

This test requests a hardcoded number of
32 CUDA threads per warp, but with debugging
enabled the CUDA kernel uses too many registers
and can only run on 16 threads per warp max.
[kokkos/kokkos#1514, kokkos/kokkos#1513, #2471]
@ibaned ibaned added type: bug The primary issue is a bug in Trilinos code or tests pkg: Kokkos labels Apr 2, 2018
@ibaned ibaned self-assigned this Apr 2, 2018
@ibaned ibaned requested review from bartlettroscoe and crtrott April 2, 2018 16:43
@trilinos-autotester
Copy link
Contributor

Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request.

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: Trilinos_pullrequest_gcc_4.9.3

  • Build Num: 418
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 2494
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH kokkos-debug-cuda-fix
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA ca2465c
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA d191960

Build Information

Test Name: Trilinos_pullrequest_gcc_4.8.4

  • Build Num: 135
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
COMPILER_MODULE sems-gcc/4.9.3
JENKINS_BUILD_TYPE Release
JENKINS_COMM_TYPE MPI
JENKINS_DO_COMPLEX OFF
JENKINS_JOB_TYPE Experimental
MPI_MODULE sems-openmpi/1.8.7
PULLREQUESTNUM 2494
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH kokkos-debug-cuda-fix
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA ca2465c
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA d191960

Using Repos:

Repo: TRILINOS (trilinos/Trilinos)
  • Branch: kokkos-debug-cuda-fix
  • SHA: ca2465c
  • Mode: TEST_REPO

Pull Request Author: ibaned

Copy link
Member

@bartlettroscoe bartlettroscoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how that if() test works but if it disables this on unit test then the atdm platforms then that is good.

Does this disable the test for all CUDA builds? If so, i think we can disable this one unit test for just the atdm platforms. I just need a var to set that will trigger the disable.

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED

Note: Testing will normally be attempted again in approx. 4:00:00. If a change to the Pull Request source branch occurs, the testing will be attempted again.

Pull Request Auto Testing has FAILED (click to expand)

Build Information

Test Name: Trilinos_pullrequest_gcc_4.9.3

  • Build Num: 418
  • Status: FAILED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 2494
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH kokkos-debug-cuda-fix
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA ca2465c
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA d191960

Build Information

Test Name: Trilinos_pullrequest_gcc_4.8.4

  • Build Num: 135
  • Status: FAILED

Jenkins Parameters

Parameter Name Value
COMPILER_MODULE sems-gcc/4.9.3
JENKINS_BUILD_TYPE Release
JENKINS_COMM_TYPE MPI
JENKINS_DO_COMPLEX OFF
JENKINS_JOB_TYPE Experimental
MPI_MODULE sems-openmpi/1.8.7
PULLREQUESTNUM 2494
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH kokkos-debug-cuda-fix
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA ca2465c
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA d191960

@ibaned
Copy link
Contributor Author

ibaned commented Apr 2, 2018

@bartlettroscoe this will disable those unit tests only if KOKKOS_ENABLE_DEBUG=ON and it is running on CUDA. I think its right to disable this for all GPU platforms, because so far all the ones we've looked at cannot run this configuration due to running out of registers.

@ibaned
Copy link
Contributor Author

ibaned commented Apr 2, 2018

@allevin can we get details on why testing failed for this pull request?

@bartlettroscoe
Copy link
Member

Ok. Sounds like this is the right thing to do then.

@allevin
Copy link

allevin commented Apr 2, 2018

@ibaned

I believe the data is on CDash at https://testing-vm.sandia.gov/cdash/index.php?project=Trilinos#!#Pull_Request

From the build listing 418 on gcc 4.9.3:

In file included from /scratch/trilinos/workspace/trilinos-folder/Trilinos_pullrequest_gcc_4.9.3/Trilinos/packages/kokkos/core/unit_test/serial/TestSerial_Team.cpp:73:0:
/scratch/trilinos/workspace/trilinos-folder/Trilinos_pullrequest_gcc_4.9.3/Trilinos/packages/kokkos/core/unit_test/TestTeamVector.hpp: In member function ‘virtual void Test::serial_triple_nested_parallelism_Test::TestBody()’:
/scratch/trilinos/workspace/trilinos-folder/Trilinos_pullrequest_gcc_4.9.3/Trilinos/packages/kokkos/core/unit_test/TestTeamVector.hpp:907:37: error: ‘Cuda’ is not a member of ‘Kokkos’
if (!std::is_same<TEST_EXECSPACE, Kokkos::Cuda>::value) {
^
/scratch/trilinos/workspace/trilinos-folder/Trilinos_pullrequest_gcc_4.9.3/Trilinos/packages/kokkos/core/unit_test/TestTeamVector.hpp:907:37: error: ‘Cuda’ is not a member of ‘Kokkos’
/scratch/trilinos/workspace/trilinos-folder/Trilinos_pullrequest_gcc_4.9.3/Trilinos/packages/kokkos/core/unit_test/TestTeamVector.hpp:907:49: error: template argument 2 is invalid
if (!std::is_same<TEST_EXECSPACE, Kokkos::Cuda>::value) {
^
[ 32%] Built target teuchos_xml_pl_test_helpers

@ibaned
Copy link
Contributor Author

ibaned commented Apr 2, 2018

Thanks @allevin !

@jwillenbring
Copy link
Member

@ibaned You can find the 4.8.4 results here:

https://testing-vm.sandia.gov/cdash/index.php?project=Trilinos&parentid=3409296

You can locate this page by going to the new CDash dashboard, going to the correct day, then scrolling down to the "Pull Request section" and matching your PR number with PR-XXXX at the beginning of the job name.

The 4.9.3 build info isn't available yet via CDash.

@ibaned
Copy link
Contributor Author

ibaned commented Apr 2, 2018

@jwillenbring thank you! I'll try using that directly next time I run into a PR failure.

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request.

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: Trilinos_pullrequest_gcc_4.9.3

  • Build Num: 419
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 2494
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH kokkos-debug-cuda-fix
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA 3c09c73
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA d191960

Build Information

Test Name: Trilinos_pullrequest_gcc_4.8.4

  • Build Num: 136
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
COMPILER_MODULE sems-gcc/4.9.3
JENKINS_BUILD_TYPE Release
JENKINS_COMM_TYPE MPI
JENKINS_DO_COMPLEX OFF
JENKINS_JOB_TYPE Experimental
MPI_MODULE sems-openmpi/1.8.7
PULLREQUESTNUM 2494
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH kokkos-debug-cuda-fix
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA 3c09c73
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA d191960

Using Repos:

Repo: TRILINOS (trilinos/Trilinos)
  • Branch: kokkos-debug-cuda-fix
  • SHA: 3c09c73
  • Mode: TEST_REPO

Pull Request Author: ibaned

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED

Pull Request Auto Testing has PASSED (click to expand)

Build Information

Test Name: Trilinos_pullrequest_gcc_4.9.3

  • Build Num: 419
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PULLREQUESTNUM 2494
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH kokkos-debug-cuda-fix
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA 3c09c73
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA d191960

Build Information

Test Name: Trilinos_pullrequest_gcc_4.8.4

  • Build Num: 136
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
COMPILER_MODULE sems-gcc/4.9.3
JENKINS_BUILD_TYPE Release
JENKINS_COMM_TYPE MPI
JENKINS_DO_COMPLEX OFF
JENKINS_JOB_TYPE Experimental
MPI_MODULE sems-openmpi/1.8.7
PULLREQUESTNUM 2494
TEST_REPO_ALIAS TRILINOS
TRILINOS_SOURCE_BRANCH kokkos-debug-cuda-fix
TRILINOS_SOURCE_REPO https://github.com/trilinos/Trilinos
TRILINOS_SOURCE_SHA 3c09c73
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA d191960

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
THE LAST COMMIT TO THIS PULL REQUEST HAS NOT BEEN REVIEWED YET!

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur...

@ibaned
Copy link
Contributor Author

ibaned commented Apr 2, 2018

@bartlettroscoe approved the main changes, and the only difference since then was introduced to workaround GCC 4.8.4 not being C++11 compliant, should not affect visible behavior.

@ibaned ibaned merged commit 100aaf3 into develop Apr 2, 2018
@ibaned ibaned deleted the kokkos-debug-cuda-fix branch April 2, 2018 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg: Kokkos type: bug The primary issue is a bug in Trilinos code or tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants