Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kokkos promotion #5432

Merged
merged 42 commits into from
Jun 26, 2019
Merged
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
aafa65e
stokhos: update MPVector ParallelFor specialization
ndellingwood Apr 4, 2019
be47e60
Merge branch 'develop' into kokkos-promotion
ndellingwood Apr 9, 2019
9d93aa8
stokhos: Replace StaticAssert with static_assert in perf tests
ndellingwood May 10, 2019
6c6b1d4
Merge branch 'develop' into kokkos-promotion
ndellingwood May 13, 2019
87d3f94
sacado: add enum to ViewMapping
ndellingwood May 15, 2019
4b16453
stokhos: add enum to ViewMapping classes
ndellingwood May 15, 2019
09ec03c
Update fences in agreement with Kokkos deprecations
ndellingwood May 16, 2019
8d12be2
ifpack2: Updates for Kokkos deprecations
ndellingwood May 16, 2019
5dfabb4
tacho: Round 1 updates for Kokkos tasking backend changes
ndellingwood May 16, 2019
4b408e4
Tacho - it compiles with new kokkos tasking api
kyungjoo-kim May 23, 2019
6781177
Tacho - testing all schedulers
kyungjoo-kim May 23, 2019
bc7cec9
Tacho - fix for new kokkos tasking api
kyungjoo-kim May 24, 2019
f0f1976
Tacho - change pool interface according to kokkos changes and remove …
kyungjoo-kim May 31, 2019
90ace06
Amesos2: Tacho fix
ndellingwood Jun 3, 2019
29f7124
Merge branch 'develop' into kokkos-promotion
ndellingwood Jun 3, 2019
ac08578
nox: unit test fix with tpetra
ndellingwood Jun 3, 2019
cf5d7a9
stokhos: another fix to support kokkos streams
ndellingwood Jun 4, 2019
7d5ee4b
run_repo_comparison: update time and check tests complete
ndellingwood Jun 4, 2019
966368d
Merge branch 'develop' into kokkos-promotion
ndellingwood Jun 4, 2019
77a3e7c
run_repo_comparison_lsf: increase test time
ndellingwood Jun 5, 2019
85fcf39
More kokkos integration script updates
ndellingwood Jun 5, 2019
9982b0a
Merge branch 'develop' into kokkos-promotion
ndellingwood Jun 5, 2019
a14fa50
tpetra: replace static exec_space fence in CrsMatrix_StaticImportExport
ndellingwood Jun 5, 2019
06c363f
stokhos: Fix for compatibility with kokkos deprecated impl change
ndellingwood Jun 6, 2019
0d546f4
muelu: update fences in perf test
ndellingwood Jun 6, 2019
7409adb
Merge branch 'develop' into kokkos-promotion
ndellingwood Jun 7, 2019
f02eb0c
Merge branch 'develop' into kokkos-promotion
ndellingwood Jun 12, 2019
6021529
Tacho - working version on cuda with david kokkos branch
Jun 12, 2019
38057c7
Tacho - add when_all lambda interface
Jun 12, 2019
7cdc807
Merge branch 'kokkos-promotion' of https://github.com/trilinos/Trilin…
Jun 12, 2019
4233c0a
Merge branch 'develop' into kokkos-promotion
ndellingwood Jun 20, 2019
e379ba9
panzer: Replace static exec_space fence calls
ndellingwood Jun 20, 2019
14f7492
bddc: Update interface for Tacho::Solver
ndellingwood Jun 21, 2019
91773d3
Snapshot of kokkos.git from commit 2983b80d9aeafabb81f2c8c1c5a49b40cc…
ndellingwood Jun 24, 2019
b395c9f
Snapshot of kokkos-kernels.git from commit d86db111124cea12e23dd3447b…
ndellingwood Jun 24, 2019
92bbdda
Disable Tacho+Cuda testing until RDC is enabled
ndellingwood Jun 25, 2019
589fe1a
Tacho - do not include test and example if rdc is off.
kyungjoo-kim Jun 25, 2019
f5a99cc
Merge branch 'kokkos-promotion' of https://github.com/trilinos/Trilin…
kyungjoo-kim Jun 25, 2019
4b0a09b
For you Paul ;)
ndellingwood Jun 25, 2019
cfb9d06
tacho: Fix -Werror
ndellingwood Jun 25, 2019
a508234
tacho: remove unused scheduler_name from unit test file
ndellingwood Jun 25, 2019
d4524dc
tacho: Replace deleted TaskSchedulerType alias
ndellingwood Jun 25, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
5 changes: 3 additions & 2 deletions packages/amesos2/src/Amesos2_Tacho_decl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ class TachoSolver : public SolverCore<Amesos2::TachoSolver, Matrix, Vector>
typedef Tacho::ordinal_type ordinal_type;
typedef Tacho::size_type size_type;
typedef Kokkos::DefaultHostExecutionSpace HostSpaceType;
typedef Kokkos::TaskScheduler<HostSpaceType> SchedulerType;
typedef Kokkos::View<size_type*,HostSpaceType> size_type_array;
typedef Kokkos::View<ordinal_type*,HostSpaceType> ordinal_type_array;
typedef Kokkos::View<tacho_type*, HostSpaceType> value_type_array;
Expand Down Expand Up @@ -203,7 +204,7 @@ class TachoSolver : public SolverCore<Amesos2::TachoSolver, Matrix, Vector>

// struct holds all data necessary to make a tacho factorization or solve call
mutable struct TACHOData {
typename Tacho::Solver<tacho_type,HostSpaceType> solver;
typename Tacho::Solver<tacho_type,SchedulerType> solver;

// TODO: Implement the paramter options - confirm which we want and which have been implemented
// int num_kokkos_threads;
Expand All @@ -224,7 +225,7 @@ class TachoSolver : public SolverCore<Amesos2::TachoSolver, Matrix, Vector>
#else
typedef Kokkos::Serial DeviceSpaceType;
#endif
typedef typename Tacho::Solver<tacho_type,DeviceSpaceType>::value_type_matrix
typedef typename Tacho::Solver<tacho_type,SchedulerType>::value_type_matrix
solve_array_t;

// used as an internal workspace - possibly we can store this better in TACHOData
Expand Down
4 changes: 2 additions & 2 deletions packages/ifpack2/src/Ifpack2_BlockTriDiContainer_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -127,8 +127,8 @@ namespace KB = KokkosBatched::Experimental;
using do_not_initialize_tag = Kokkos::ViewAllocateWithoutInitializing;

template <typename MemoryTraitsType, Kokkos::MemoryTraitsFlags flag>
using MemoryTraits = Kokkos::MemoryTraits<MemoryTraitsType::Unmanaged |
MemoryTraitsType::RandomAccess |
using MemoryTraits = Kokkos::MemoryTraits<MemoryTraitsType::is_unmanaged |
MemoryTraitsType::is_random_access |
flag>;

template <typename ViewType>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -432,7 +432,7 @@ initialize ()
decltype(val) newval ("val", val.extent (0));

// FIXME: The code below assumes UVM
crs_matrix_type::execution_space::fence();
typename crs_matrix_type::execution_space().fence();
newptr(0) = 0;
for (local_ordinal_type row = 0, rowStart = 0; row < numRows; ++row) {
auto A_r = Alocal.row(numRows-1 - row);
Expand All @@ -445,7 +445,7 @@ initialize ()
rowStart += numEnt;
newptr(row+1) = rowStart;
}
crs_matrix_type::execution_space::fence();
typename crs_matrix_type::execution_space().fence();

// Reverse maps
using map_type = typename crs_matrix_type::map_type;
Expand Down
8 changes: 4 additions & 4 deletions packages/intrepid2/perf-test/ComputeBasis/test_hgrad.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@ namespace Intrepid2 {

flush.run();

DeviceSpaceType::fence();
DeviceSpaceType().fence();
timer.reset();

cts::setJacobian(jacobian, refPoints, worksetCells, cellTopo);
Expand All @@ -197,7 +197,7 @@ namespace Intrepid2 {
fts::HGRADtransformGRAD(phyBasisGrads, jacobianInv, refBasisGrads);
fts::multiplyMeasure(weightedBasisGrads, cellMeasure, phyBasisGrads);

DeviceSpaceType::fence();
DeviceSpaceType().fence();
t_horizontal += (iwork >= 0)*timer.seconds();
}
}
Expand Down Expand Up @@ -226,12 +226,12 @@ namespace Intrepid2 {
for (ordinal_type iwork=ibegin;iwork<nworkset;++iwork) {
flush.run();

DeviceSpaceType::fence();
DeviceSpaceType().fence();
timer.reset();

Kokkos::parallel_for(policy, functor);

DeviceSpaceType::fence();
DeviceSpaceType().fence();
t_vertical += (iwork >= 0)*timer.seconds();
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -189,12 +189,12 @@ namespace Intrepid2 {
for (ordinal_type iwork=ibegin;iwork<nworkset;++iwork) {
flush.run();

DeviceSpaceType::fence();
DeviceSpaceType().fence();
timer.reset();

Kokkos::parallel_for(policy, functor);

DeviceSpaceType::fence();
DeviceSpaceType().fence();
t_vectorize += (iwork >= 0)*timer.seconds();
}

Expand Down
8 changes: 4 additions & 4 deletions packages/intrepid2/perf-test/DynRankView/test_01.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -191,13 +191,13 @@ namespace Intrepid2 {

Kokkos::deep_copy(in, 1.0);

DeviceSpaceType::fence();
DeviceSpaceType().fence();
timer.reset();

for (ordinal_type i=0;i<nworkset;++i)
Kokkos::parallel_for( policy, FunctorType(out, in) );

DeviceSpaceType::fence();
DeviceSpaceType().fence();
t_view[itest] = timer.seconds();
}

Expand All @@ -214,13 +214,13 @@ namespace Intrepid2 {
Kokkos::deep_copy(in, 1.0);


DeviceSpaceType::fence();
DeviceSpaceType().fence();
timer.reset();

for (ordinal_type i=0;i<nworkset;++i)
Kokkos::parallel_for( policy, FunctorType(out, in) );

DeviceSpaceType::fence();
DeviceSpaceType().fence();
t_dynrankview[itest] = timer.seconds();
}
} catch (std::exception err) {
Expand Down
24 changes: 12 additions & 12 deletions packages/intrepid2/perf-test/DynRankView/test_02.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -180,26 +180,26 @@ namespace Intrepid2 {
Kokkos::deep_copy(in, 1.0);
{
*verboseStream << " -> with subview \n";
DeviceSpaceType::fence();
DeviceSpaceType().fence();
timer.reset();

typedef F_clone<ViewType,ViewType,-1> FunctorType;
for (ordinal_type i=0;i<nworkset;++i) {
Kokkos::parallel_for( policy, FunctorType(out, in) );
}
DeviceSpaceType::fence();
DeviceSpaceType().fence();
t_with_subview[itest] = timer.seconds();
}
{
*verboseStream << " -> without subview \n";
DeviceSpaceType::fence();
DeviceSpaceType().fence();
timer.reset();

typedef F_clone<ViewType,ViewType,0> FunctorType;
for (ordinal_type i=0;i<nworkset;++i) {
Kokkos::parallel_for( policy, FunctorType(out, in) );
}
DeviceSpaceType::fence();
DeviceSpaceType().fence();
t_without_subview[itest] = timer.seconds();
}
}
Expand All @@ -210,26 +210,26 @@ namespace Intrepid2 {
Kokkos::deep_copy(in, 1.0);
{
*verboseStream << " -> with subview \n";
DeviceSpaceType::fence();
DeviceSpaceType().fence();
timer.reset();

typedef F_clone<ViewType,ViewType,-1> FunctorType;
for (ordinal_type i=0;i<nworkset;++i) {
Kokkos::parallel_for( policy, FunctorType(out, in) );
}
DeviceSpaceType::fence();
DeviceSpaceType().fence();
t_with_subview[itest] = timer.seconds();
}
{
*verboseStream << " -> without subview \n";
DeviceSpaceType::fence();
DeviceSpaceType().fence();
timer.reset();

typedef F_clone<ViewType,ViewType,1> FunctorType;
for (ordinal_type i=0;i<nworkset;++i) {
Kokkos::parallel_for( policy, FunctorType(out, in) );
}
DeviceSpaceType::fence();
DeviceSpaceType().fence();
t_without_subview[itest] = timer.seconds();
}
}
Expand All @@ -241,26 +241,26 @@ namespace Intrepid2 {
Kokkos::deep_copy(in, 1.0);
{
*verboseStream << " -> with subview \n";
DeviceSpaceType::fence();
DeviceSpaceType().fence();
timer.reset();

typedef F_clone<ViewType,ViewType,-1> FunctorType;
for (ordinal_type i=0;i<nworkset;++i) {
Kokkos::parallel_for( policy, FunctorType(out, in) );
}
DeviceSpaceType::fence();
DeviceSpaceType().fence();
t_with_subview[itest] = timer.seconds();
}
{
*verboseStream << " -> without subview \n";
DeviceSpaceType::fence();
DeviceSpaceType().fence();
timer.reset();

typedef F_clone<ViewType,ViewType,2> FunctorType;
for (ordinal_type i=0;i<nworkset;++i) {
Kokkos::parallel_for( policy, FunctorType(out, in) );
}
DeviceSpaceType::fence();
DeviceSpaceType().fence();
t_without_subview[itest] = timer.seconds();
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -250,7 +250,7 @@ namespace Intrepid2 {
transformed_value_of_basis_at_cub_points,
weighted_transformed_value_of_basis_at_cub_points);

DeviceSpaceType::fence();
DeviceSpaceType().fence();

/******************* STOP COMPUTATION ***********************/

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -285,7 +285,7 @@ namespace Intrepid2 {
// apply field signs (after the fact, as a post-processing step)
fst::applyLeftFieldSigns(mass_matrices, field_signs);
fst::applyRightFieldSigns(mass_matrices, field_signs);
DeviceSpaceType::fence();
DeviceSpaceType().fence();

/******************* STOP COMPUTATION ***********************/

Expand Down
17 changes: 17 additions & 0 deletions packages/kokkos-kernels/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,22 @@
# Change Log

## [2.9.00](https://github.com/kokkos/kokkos-kernels/tree/2.9.00) (2019-06-24)
[Full Changelog](https://github.com/kokkos/kokkos-kernels/compare/2.8.00...2.9.00)

**Implemented enhancements:**

- KokkosBatched: Add specialization for float2, float4 and double4 [\#427](https://github.com/kokkos/kokkos-kernels/pull/427)
- KokkosBatched: Reduce VectorLength (16 to 8) [\#432](https://github.com/kokkos/kokkos-kernels/pull/432)
- KokkosBatched: Remove experimental name space for batched blas [\#371](https://github.com/kokkos/kokkos-kernels/issues/371)
- Capability: Initial sparse triangular solve capability [\#435](https://github.com/kokkos/kokkos-kernels/pull/435)
- Capability: Add support for MAGMA GESV TPL [\#409](https://github.com/kokkos/kokkos-kernels/pull/409)
- cuBLAS: Add CudaUVMSpace specializations for GEMM [\#397](https://github.com/kokkos/kokkos-kernels/issues/397)

**Fixed bugs:**

- Deprecated Code Fixes [\#411](https://github.com/kokkos/kokkos-kernels/issues/411)
- BuildSystem: Compilation error on rzansel [\#401](https://github.com/kokkos/kokkos-kernels/issues/401)

## [2.8.00](https://github.com/kokkos/kokkos-kernels/tree/2.8.00) (2019-02-05)
[Full Changelog](https://github.com/kokkos/kokkos-kernels/compare/2.7.24...2.8.00)

Expand Down
75 changes: 64 additions & 11 deletions packages/kokkos-kernels/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -290,23 +290,64 @@ ENDIF()
# Enable Third Party Libraries
# ==================================================================

IF (TPL_ENABLE_BLAS)
SET(KOKKOSKERNELS_ENABLE_TPL_BLAS ${TPL_ENABLE_BLAS})
IF (DEFINED KOKKOSKERNELS_ENABLE_TPL_BLAS)
# user overriding kokkoskernels
IF (KOKKOSKERNELS_ENABLE_TPL_BLAS)
IF (NOT TPL_ENABLE_BLAS)
MESSAGE( WARNING "KOKKOSKERNELS_ENABLE_TPL_BLAS is ON but TPL_ENABLE_BLAS is OFF. Please set TPL_ENABLE_BLAS:BOOL=ON")
SET(KOKKOSKERNELS_ENABLE_TPL_BLAS OFF)
ENDIF()
ENDIF()
ELSE()
# default behavior
IF (TPL_ENABLE_BLAS)
SET(KOKKOSKERNELS_ENABLE_TPL_BLAS ${TPL_ENABLE_BLAS})
ENDIF()
ENDIF()
IF (TPL_ENABLE_MKL)
SET(KOKKOSKERNELS_ENABLE_TPL_MKL ${TPL_ENABLE_MKL})

IF (DEFINED KOKKOSKERNELS_ENABLE_TPL_MKL)
# user overriding kokkoskernels
IF (KOKKOSKERNELS_ENABLE_TPL_MKL)
IF (NOT TPL_ENABLE_MKL)
MESSAGE( WARNING "KOKKOSKERNELS_ENABLE_TPL_MKL is ON but TPL_ENABLE_MKL is OFF. Please set TPL_ENABLE_MKL:BOOL=ON")
SET(KOKKOSKERNELS_ENABLE_TPL_MKL OFF)
ENDIF()
ENDIF()
ELSE()
IF (TPL_ENABLE_MKL)
SET(KOKKOSKERNELS_ENABLE_TPL_MKL ${TPL_ENABLE_MKL})
ENDIF()
ENDIF()

IF(${Kokkos_ENABLE_Cuda})
IF (NOT KOKKOSKERNELS_ENABLE_TPL_BLAS)
SET(KOKKOSKERNELS_ENABLE_TPL_BLAS ON)
LIST(APPEND TPL_LIST "BLAS")
ENDIF()
# CUBLAS is ON by default when CUDA is enabled
SET(KOKKOSKERNELS_ENABLE_TPL_CUBLAS ON)
IF (NOT DEFINED KOKKOSKERNELS_ENABLE_TPL_CUBLAS)
SET(KOKKOSKERNELS_ENABLE_TPL_CUBLAS ON)
ENDIF()

# Tribit provides TPL mechanism for CUSPARSE; thus, use it
IF (TPL_ENABLE_CUSPARSE)
SET(KOKKOSKERNELS_ENABLE_TPL_CUSPARSE ${TPL_ENABLE_CUSPARSE})
IF (DEFINED KOKKOSKERNELS_ENABLE_TPL_CUSPARSE)
IF (NOT TPL_ENABLE_CUSPARSE)
MESSAGE( WARNING "KOKKOSKERNELS_ENABLE_TPL_CUSPARSE is ON but TPL_ENABLE_CUSPARSE is OFF. Please set TPL_ENABLE_CUSPARSE:BOOL=ON")
SET(KOKKOSKERNELS_ENABLE_TPL_CUSPARSE OFF)
ENDIF()
ELSE()
IF (TPL_ENABLE_CUSPARSE)
SET(KOKKOSKERNELS_ENABLE_TPL_CUSPARSE ${TPL_ENABLE_CUSPARSE})
ENDIF()
ENDIF()

IF (DEFINED KOKKOSKERNELS_ENABLE_TPL_MAGMA)
IF (KOKKOSKERNELS_ENABLE_TPL_MAGMA)
IF (NOT TPL_ENABLE_MAGMA)
MESSAGE( WARNING "KOKKOSKERNELS_ENABLE_TPL_MAGMA is ON but TPL_ENABLE_MAGMA is OFF. Please set TPL_ENABLE_MAGMA:BOOL=ON")
SET(KOKKOSKERNELS_ENABLE_TPL_MAGMA OFF)
ENDIF()
ENDIF()
ELSE()
IF (TPL_ENABLE_MAGMA)
SET(KOKKOSKERNELS_ENABLE_TPL_MAGMA ${TPL_ENABLE_MAGMA})
ENDIF()
ENDIF()
ENDIF()

Expand All @@ -322,6 +363,18 @@ ENDIF()
IF (KOKKOSKERNELS_ENABLE_TPL_CUBLAS)
LIST(APPEND TPL_LIST "CUBLAS")
ENDIF()
IF (KOKKOSKERNELS_ENABLE_TPL_MAGMA)
LIST(APPEND TPL_LIST "MAGMA")
IF (F77_BLAS_MANGLE STREQUAL "(name,NAME) name ## _")
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DADD_ -fopenmp -lgfortran")
ELSEIF (F77_BLAS_MANGLE STREQUAL "(name,NAME) NAME")
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DUPCASE -fopenmp -lgfortran")
ELSEIF (F77_BLAS_MANGLE STREQUAL "(name,NAME) name")
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DNOCHANGE -fopenmp -lgfortran")
ELSE ()
MESSAGE(FATAL_ERROR "F77_BLAS_MANGLE ${F77_BLAS_MANGLE} detected while MAGMA only accepts Fortran mangling that is one of single underscore (-DADD_), uppercase (-DUPCASE), and no change (-DNOCHANGE)")
ENDIF()
ENDIF()

# ==================================================================
# Fortran Complex BLAS
Expand Down
Loading