Skip to content

Latest commit

 

History

History
524 lines (445 loc) · 42.7 KB

CHANGELOG.md

File metadata and controls

524 lines (445 loc) · 42.7 KB

Change Log

2.5.00 (2017-12-15)

Full Changelog

Part of the Kokkos C++ Performance Portability Programming EcoSystem 2.5

Implemented enhancements:

  • Provide Makefile.kokkos logic for CMake and TriBITS #878
  • Add Scatter View #825
  • Drop gcc 4.7 and intel 14 from supported compiler list #603
  • Enable construction of unmanaged view using common_view_alloc_prop #1170
  • Unused Function Warning with XL #1267
  • Add memory pool parameter check #1218
  • CUDA9: Fix warning for unsupported long double #1189
  • CUDA9: fix warning on defaulted function marking #1188
  • CUDA9: fix warnings for deprecated warp level functions #1187
  • Add CUDA 9.0 nightly testing #1174
  • {OMPI,MPICH}_CXX hack breaks nvcc_wrapper use case #1166
  • KOKKOS_HAVE_CUDA_LAMBDA became KOKKOS_CUDA_USE_LAMBDA #1274

Fixed bugs:

  • MinMax Reducer with tagged operator doesn't compile #1251
  • Reducers for Tagged operators give wrong answer #1250
  • Kokkos not Compatible with Big Endian Machines? #1235
  • Parallel Scan hangs forever on BG/Q #1234
  • Threads backend doesn't compile with Clang on OS X #1232
  • $(shell date) needs quote #1264
  • Unqualified parallel_for call conflicts with user-defined parallel_for #1219
  • KokkosAlgorithms: CMake issue in unit tests #1212
  • Intel 18 Error: "simd pragma has been deprecated" #1210
  • Memory leak in Kokkos::initialize #1194
  • CUDA9: compiler error with static assert template arguments #1190
  • Kokkos::Serial::is_initialized returns always true #1184
  • Triple nested parallelism still fails on bowman #1093
  • OpenMP openmp.range on Develop Runs Forever on POWER7+ with RHEL7 and GCC4.8.5 #995
  • Rendezvous performance at global scope #985

2.04.11 (2017-10-28)

Full Changelog

Implemented enhancements:

  • Add Subview pattern. #648
  • Add Kokkos "global" is_initialized #1060
  • Add create_mirror_view_and_copy #1161
  • Add KokkosConcepts SpaceAccessibility function #1092
  • Option to Disable Initialize Warnings #1142
  • Mature task-DAG capability #320
  • Promote Work DAG from experimental #1126
  • Implement new WorkGraph push/pop #1108
  • Kokkos_ENABLE_Cuda_Lambda should default ON #1101
  • Add multidimensional parallel for example and improve unit test #1064
  • Fix ROCm: Performance tests not building #1038
  • Make KOKKOS_ALIGN_SIZE a configure-time option #1004
  • Make alignment consistent #809
  • Improve subview construction on Cuda backend #615

Fixed bugs:

  • Kokkos::vector fixes for application #1134
  • DynamicView non-power of two value_type #1177
  • Memory pool bug #1154
  • Cuda launch bounds performance regression bug #1140
  • Significant performance regression in LAMMPS after updating Kokkos #1139
  • CUDA compile error #1128
  • MDRangePolicy neg idx test failure in debug mode #1113
  • subview construction on Cuda backend #615

2.04.04 (2017-09-11)

Full Changelog

Implemented enhancements:

  • OpenMP partition: set number of threads on nested level #1082
  • Add StaticCrsGraph row() method #1071
  • Enhance Kokkos complex operator overloading #1052
  • Tell Trilinos packages about host+device lambda #1019
  • Function markup for defaulted class members #952
  • Add deterministic random number generator #857

Fixed bugs:

  • Fix reduction_identity<T>::max for floating point numbers #1048
  • Fix MD iteration policy ignores lower bound on GPUs #1041
  • (Experimental) HBWSpace Linking issues in KokkosKernels #1094
  • (Experimental) ROCm: algorithms/unit_tests test_sort failing with segfault #1070

2.04.00 (2017-08-16)

Full Changelog

Implemented enhancements:

  • Added ROCm backend to support AMD GPUs
  • Kokkos::complex<T> behaves slightly differently from std::complex<T> #1011
  • Kokkos::Experimental::Crs constructor arguments were in the wrong order #992
  • Work graph construction ease-of-use (one lambda for count and fill) #991
  • when_all returns pointer of futures (improved interface) #990
  • Allow assignment of LayoutLeft to LayoutRight or vice versa for rank-0 Views #594
  • Changed the meaning of Kokkos_ENABLE_CXX11_DISPATCH_LAMBDA #1035

Fixed bugs:

  • memory pool default constructor does not properly set member variables. #1007

2.03.13 (2017-07-27)

Full Changelog

Implemented enhancements:

  • Disallow enabling both OpenMP and Threads in the same executable #406
  • Make Kokkos::OpenMP respect OMP environment even if hwloc is available #630
  • Improve Atomics Performance on KNL/Broadwell where PREFETCHW/RFO is Available #898
  • Kokkos::resize should test whether dimensions have changed before resizing #904
  • Develop performance-regression/acceptance tests #737
  • Make the deep_copy Profiling hook a start/end system #890
  • Add deep_copy Profiling hook #843
  • Append tag name to parallel construct name for Profiling #842
  • Add view label to View bounds error message for CUDA backend #870
  • Disable printing the loaded profiling library #824
  • "Declared but never referenced" warnings #853
  • Warnings about lock_address_cuda_space #852
  • WorkGraph execution policy #771
  • Simplify makefiles by guarding compilation with appropriate KOKKOS_ENABLE_### macros #716
  • Cmake build: wrong include install directory #668
  • Derived View type and allocation #566
  • Fix Compiler warnings when compiling core unit tests for Cuda #214

Fixed bugs:

  • Out-of-bounds read in Kokkos_Layout.hpp #975
  • CudaClang: Fix failing test with Clang 4.0 #941
  • Respawn when memory pool allocation fails (not available memory) #940
  • Memory pool aborts on zero allocation request, returns NULL for < minimum #939
  • Error with TaskScheduler query of underlying memory pool #917
  • Profiling::*Callee static variables declared in header #863
  • calling *Space::name() causes compile error #862
  • bug in Profiling::deallocateData #860
  • task_depend test failing, CUDA 8.0 + Pascal + RDC #829
  • [develop branch] Standalone cmake issues #826
  • Kokkos CUDA failes to compile with OMPI_CXX and MPICH_CXX wrappers #776
  • Task Team reduction on Pascal #767
  • CUDA stack overflow with TaskDAG test #758
  • TeamVector test on Cuda #670
  • Clang 4.0 Cuda Build broken again #560

2.03.05 (2017-05-27)

Full Changelog

Implemented enhancements:

  • Harmonize Custom Reductions over nesting levels #802
  • Prevent users directly including KokkosCore_config.h #815
  • DualView aborts on concurrent host/device modify (in debug mode) #814
  • Abort when running on a NVIDIA CC5.0 or higher architecture with code compiled for CC < 5.0 #813
  • Add "name" function to ExecSpaces #806
  • Allow null Future in task spawn dependences #795
  • Add Unit Tests for Kokkos::complex #785
  • Add pow function for Kokkos::complex #784
  • Square root of a complex #729
  • Command line processing of --threads argument prevents users from having any commandline arguments starting with --threads #760
  • Protected deprecated API with appropriate macro #756
  • Allow task scheduler memory pool to be used by tasks #747
  • View bounds checking on host-side performance: constructing a std::string #723
  • Add check for AppleClang as compiler distinct from check for Clang. #705
  • Uninclude source files for specific configurations to prevent link warning. #701
  • Add --small option to snapshot script #697
  • CMake Standalone Support #674
  • CMake build unit test and install #808
  • CMake: Fix having kokkos as a subdirectory in a pure cmake project #629
  • Tribits macro assumes build directory is in top level source directory #654
  • Use bin/nvcc_wrapper, not config/nvcc_wrapper #562
  • Allow MemoryPool::allocate() to be called from multiple threads per warp. #487
  • Allow MemoryPool::allocate\(\) to be called from multiple threads per warp. #487
  • Move OpenMP 4.5 OpenMPTarget backend into Develop #456
  • Testing on ARM testbed #288

Fixed bugs:

  • Fix label in OpenMP parallel_reduce verify_initialized #834
  • TeamScratch Level 1 on Cuda hangs #820
  • [bug] memory pool. #786
  • Some Reduction Tests fail on Intel 18 with aggressive vectorization on #774
  • Error copying dynamic view on copy of memory pool #773
  • CUDA stack overflow with TaskDAG test #758
  • ThreadVectorRange Customized Reduction Bug #739
  • set_scratch_size overflows #726
  • Get wrong results for compiler checks in Makefile on OS X. #706
  • Fix check if multiple host architectures enabled. #702
  • Threads Backend Does not Pass on Cray Compilers #609
  • Rare bug in memory pool where allocation can finish on superblock in empty state #452
  • LDFLAGS in core/unit_test/Makefile: potential "undefined reference" to pthread lib #148

2.03.00 (2017-04-25)

Full Changelog

Implemented enhancements:

  • UnorderedMap: make it accept Devices or MemorySpaces #711
  • sort to accept DynamicView and [begin,end) indices #691
  • ENABLE Macros should only be used via #ifdef or #if defined #675
  • Remove impl/Kokkos_Synchronic_* #666
  • Turning off IVDEP for Intel 14. #638
  • Using an installed Kokkos in a target application using CMake #633
  • Create Kokkos Bill of Materials #632
  • MDRangePolicy and tagged evaluators #547
  • Add PGI support #289

Fixed bugs:

  • Output from PerTeam fails #733
  • Cuda: architecture flag not added to link line #688
  • Getting large chunks of memory for a thread team in a universal way #664
  • Kokkos RNG normal() function hangs for small seed value #655
  • Kokkos Tests Errors on Shepard/HSW Builds #644

2.02.15 (2017-02-10)

Full Changelog

Implemented enhancements:

  • Containers: Adding block partitioning to StaticCrsGraph #625
  • Kokkos Make System can induce Errors on Cray Volta System #610
  • OpenMP: error out if KOKKOS_HAVE_OPENMP is defined but not _OPENMP #605
  • CMake: fix standalone build with tests #604
  • Change README (that GitHub shows when opening Kokkos project page) to tell users how to submit PRs #597
  • Add correctness testing for all operators of Atomic View #420
  • Allow assignment of Views with compatible memory spaces #290
  • Build only one version of Kokkos library for tests #213
  • Clean out old KOKKOS_HAVE_CXX11 macros clauses #156
  • Harmonize Macro names #150

Fixed bugs:

  • Cray and PGI: Kokkos_Parallel_Reduce #634
  • Kokkos Make System can induce Errors on Cray Volta System #610
  • Normal() function random number generator doesn't give the expected distribution #592

2.02.07 (2016-12-16)

Full Changelog

Implemented enhancements:

  • Add CMake option to enable Cuda Lambda support #589
  • Add CMake option to enable Cuda RDC support #588
  • Add Initial Intel Sky Lake Xeon-HPC Compiler Support to Kokkos Make System #584
  • Building Tutorial Examples #582
  • Internal way for using ThreadVectorRange without TeamHandle #574
  • Testing: Add testing for uvm and rdc #571
  • Profiling: Add Memory Tracing and Region Markers #557
  • nvcc_wrapper not installed with Kokkos built with CUDA through CMake #543
  • Improve DynRankView debug check #541
  • Benchmarks: Add Gather benchmark #536
  • Testing: add spot_check option to test_all_sandia #535
  • Deprecate Kokkos::Impl::VerifyExecutionCanAccessMemorySpace #527
  • Add AtomicAdd support for 64bit float for Pascal #522
  • Add Restrict and Aligned memory trait #517
  • Kokkos Tests are Not Run using Compiler Optimization #501
  • Add support for clang 3.7 w/ openmp backend #393
  • Provide an error throw class #79

Fixed bugs:

  • Cuda UVM Allocation test broken with UVM as default space #586
  • Bug (develop branch only): multiple tests are now failing when forcing uvm usage. #570
  • Error in generate_makefile.sh for Kokkos when Compiler is Empty String/Fails #568
  • XL 13.1.4 incorrect C++11 flag #553
  • Improve DynRankView debug check #541
  • Installing Library on MAC broken due to cp -u #539
  • Intel Nightly Testing with Debug enabled fails #534

2.02.01 (2016-11-01)

Full Changelog

Implemented enhancements:

  • Add Changelog generation to our process. #506

Fixed bugs:

  • Test scratch_request fails in Serial with Debug enabled #520
  • Bug In BoundsCheck for DynRankView #516

2.02.00 (2016-10-30)

Full Changelog

Implemented enhancements:

  • Add PowerPC assembly for grabbing clock register in memory pool #511
  • Add GCC 6.x support #508
  • Test install and build against installed library #498
  • Makefile.kokkos adds expt-extended-lambda to cuda build with clang #490
  • Add top-level makefile option to just test kokkos-core unit-test #485
  • Split and harmonize Object Files of Core UnitTests to increase build parallelism #484
  • LayoutLeft to LayoutLeft subview for 3D and 4D views #473
  • Add official Cuda 8.0 support #468
  • Allow C++1Z Flag for Class Lambda capture #465
  • Add Clang 4.0+ compilation of Cuda code #455
  • Possible Issue with Intel 17.0.098 and GCC 6.1.0 in Develop Branch #445
  • Add name of view to "View bounds error" #432
  • Move Sort Binning Operators into Kokkos namespace #421
  • TaskPolicy - generate error when attempt to use uninitialized #396
  • Import WithoutInitializing and AllowPadding into Kokkos namespace #325
  • TeamThreadRange requires begin, end to be the same type #305
  • CudaUVMSpace should track # allocations, due to CUDA limit on # UVM allocations #300
  • Remove old View and its infrastructure #259

Fixed bugs:

  • Bug in TestCuda_Other.cpp: most likely assembly inserted into Device code #515
  • Cuda Compute Capability check of GPU is outdated #509
  • multi_scratch test with hwloc and pthreads seg-faults. #504
  • generate_makefile.bash: "make install" is broken #503
  • make clean in Out of Source Build/Tests Does Not Work Correctly #502
  • Makefiles for test and examples have issues in Cuda when CXX is not explicitly specified #497
  • Dispatch lambda test directly inside GTEST macro doesn't work with nvcc #491
  • UnitTests with HWLOC enabled fail if run with mpirun bound to a single core #489
  • Failing Reducer Test on Mac with Pthreads #479
  • make test Dumps Error with Clang Not Found #471
  • OpenMP TeamPolicy member broadcast not using correct volatile shared variable #424
  • TaskPolicy - generate error when attempt to use uninitialized #396
  • New task policy implementation is pulling in old experimental code. #372
  • MemoryPool unit test hangs on Power8 with GCC 6.1.0 #298

2.01.10 (2016-09-27)

Full Changelog

Implemented enhancements:

  • Enable Profiling by default in Tribits build #438
  • parallel_reduce(0), parallel_scan(0) unit tests #436
  • data()==NULL after realloc with LayoutStride #351
  • Fix tutorials to track new Kokkos::View #323
  • Rename team policy set_scratch_size. #195

Fixed bugs:

  • Possible Issue with Intel 17.0.098 and GCC 6.1.0 in Develop Branch #445
  • Makefile spits syntax error #435
  • Kokkos::sort fails for view with all the same values #422
  • Generic Reducers: can't accept inline constructed reducer #404
  • data\(\)==NULL after realloc with LayoutStride #351
  • const subview of const view with compile time dimensions on Cuda backend #310
  • Kokkos (in Trilinos) Causes Internal Compiler Error on CUDA 8.0.21-EA on POWER8 #307
  • Core Oversubscription Detection Broken? #159

2.01.06 (2016-09-02)

Full Changelog

Implemented enhancements:

  • Add "standard" reducers for lambda-supportable customized reduce #411
  • TaskPolicy - single thread back-end execution #390
  • Kokkos master clone tag #387
  • Query memory requirements from task policy #378
  • Output order of test_atomic.cpp is confusing #373
  • Missing testing for atomics #341
  • Feature request for Kokkos to provide Kokkos::atomic_fetch_max and atomic_fetch_min #336
  • TaskPolicy<Cuda> performance requires teams mapped to warps #218

Fixed bugs:

  • Reduce with Teams broken for custom initialize #407
  • Failing Kokkos build on Debian #402
  • Failing Tests on NVIDIA Pascal GPUs #398
  • Algorithms: fill_random assumes dimensions fit in unsigned int #389
  • Kokkos::subview with RandomAccess Memory Trait #385
  • Build warning (signed / unsigned comparison) in Cuda implementation #365
  • wrong results for a parallel_reduce with CUDA8 / Maxwell50 #352
  • Hierarchical parallelism - 3 level unit test #344
  • Can I allocate a View w/ both WithoutInitializing & AllowPadding? #324
  • subview View layout determination #309
  • Unit tests with Cuda - Maxwell #196

2.01.00 (2016-07-21)

Full Changelog

Implemented enhancements:

  • Edit ViewMapping so assigning Views with the same custom layout compiles when const casting #327
  • DynRankView: Performance improvement for operator() #321
  • Interoperability between static and dynamic rank views #295
  • subview member function ? #280
  • Inter-operatibility between View and DynRankView. #245
  • (Trilinos) build warning in atomic_assign, with Kokkos::complex #177
  • View<>::shmem_size should runtime check for number of arguments equal to rank #176
  • Custom reduction join via lambda argument #99
  • DynRankView with 0 dimensions passed in at construction #293
  • Inject view_alloc and friends into Kokkos namespace #292
  • Less restrictive TeamPolicy reduction on Cuda #286
  • deep_copy using remap with source execution space #267
  • Suggestion: Enable opt-in L1 caching via nvcc-wrapper #261
  • More flexible create_mirror functions #260
  • Rename View::memory_span to View::required_allocation_size #256
  • Use of subviews and views with compile-time dimensions #237
  • Use of subviews and views with compile-time dimensions #237
  • Kokkos::Timer #234
  • Fence CudaUVMSpace allocations #230
  • View::operator() accept std::is_integral and std::is_enum #227
  • Allocating zero size View #216
  • Thread scalable memory pool #212
  • Add a way to disable memory leak output #194
  • Kokkos exec space init should init Kokkos profiling #192
  • Runtime rank wrapper for View #189
  • Profiling Interface #158
  • Fix View assignment (of managed to unmanaged) #153
  • Add unit test for assignment of managed View to unmanaged View #152
  • Check for oversubscription of threads with MPI in Kokkos::initialize #149
  • Dynamic resizeable 1dimensional view #143
  • Develop TaskPolicy for CUDA #142
  • New View : Test Compilation Downstream #138
  • New View Implementation #135
  • Add variant of subview that lets users add traits #134
  • NVCC-WRAPPER: Add --host-only flag #121
  • Address gtest issue with TriBITS Kokkos build outside of Trilinos #117
  • Make tests pass with -expt-extended-lambda on CUDA #108
  • Dynamic scheduling for parallel_for and parallel_reduce #106
  • Runtime or compile time error when reduce functor's join is not properly specified as const member function or with volatile arguments #105
  • Error out when the number of threads is modified after kokkos is initialized #104
  • Porting to POWER and remove assumption of X86 default #103
  • Dynamic scheduling option for RangePolicy #100
  • SharedMemory Support for Lambdas #81
  • Recommended TeamSize for Lambdas #80
  • Add Aggressive Vectorization Compilation mode #72
  • Dynamic scheduling team execution policy #53
  • UVM allocations in multi-GPU systems #50
  • Synchronic in Kokkos::Impl #44
  • index and dimension types in for loops #28
  • Subview assign of 1D Strided with stride 1 to LayoutLeft/Right #1

Fixed bugs:

  • misspelled variable name in Kokkos_Atomic_Fetch + missing unit tests #340
  • seg fault Kokkos::Impl::CudaInternal::print_configuration #338
  • Clang compiler error with named parallel_reduce, tags, and TeamPolicy. #335
  • Shared Memory Allocation Error at parallel_reduce #311
  • DynRankView: Fix resize and realloc #303
  • Scratch memory and dynamic scheduling #279
  • MemoryPool infinite loop when out of memory #312
  • Kokkos DynRankView changes break Sacado and Panzer #299
  • MemoryPool fails to compile on non-cuda non-x86 #297
  • Random Number Generator Fix #296
  • View template parameter ordering Bug #282
  • Serial task policy broken. #281
  • deep_copy with LayoutStride should not memcpy #262
  • DualView::need_sync should be a const method #248
  • Arbitrary-sized atomics on GPUs broken; loop forever #238
  • boolean reduction value_type changes answer #225
  • Custom init() function for parallel_reduce with array value_type #210
  • unit_test Makefile is Broken - Recursively Calls itself until Machine Apocalypse. #202
  • nvcc_wrapper Does Not Support -Xcompiler <compiler option> #198
  • Kokkos exec space init should init Kokkos profiling #192
  • Kokkos Threads Backend impl_shared_alloc Broken on Intel 16.1 (Shepard Haswell) #186
  • pthread back end hangs if used uninitialized #182
  • parallel_reduce of size 0, not calling init/join #175
  • Bug in Threads with OpenMP enabled #173
  • KokkosExp_SharedAlloc, m_team_work_index inaccessible #166
  • 128-bit CAS without Assembly Broken? #161
  • fatal error: Cuda/Kokkos_Cuda_abort.hpp: No such file or directory #157
  • Power8: Fix OpenMP backend #139
  • Data race in Kokkos OpenMP initialization #131
  • parallel_launch_local_memory and cuda 7.5 #125
  • Resize can fail with Cuda due to asynchronous dispatch #119
  • Qthread taskpolicy initialization bug. #92
  • Windows: sys/mman.h #89
  • Windows: atomic_fetch_sub() #88
  • Windows: snprintf #87
  • Parallel_Reduce with TeamPolicy and league size of 0 returns garbage #85
  • Throw with Cuda when using (2D) team_policy parallel_reduce with less than a warp size #76
  • Scalar views don't work with Kokkos::Atomic memory trait #69
  • Reduce the number of threads per team for Cuda #63
  • Named Kernels fail for reductions with CUDA #60
  • Kokkos View dimension_() for long returning unsigned int #20
  • atomic test hangs with LLVM #6
  • OpenMP Test should set omp_set_num_threads to 1 #4

Closed issues:

  • develop branch broken with CUDA 8 and --expt-extended-lambda #354
  • --arch=KNL with Intel 2016 build failure #349
  • Error building with Cuda when passing -DKOKKOS_CUDA_USE_LAMBDA to generate_makefile.bash #343
  • Can I safely use int indices in a 2-D View with capacity > 2B? #318
  • Kokkos::ViewAllocateWithoutInitializing is not working #317
  • Intel build on Mac OS X #277
  • deleted #271
  • Broken Mira build #268
  • 32-bit build #246
  • parallel_reduce with RDC crashes linker #232
  • build of Kokkos_Sparse_MV_impl_spmv_Serial.cpp.o fails if you use nvcc and have cuda disabled #209
  • Kokkos Serial execution space is not tested with TeamPolicy. #207
  • Unit test failure on Hansen KokkosCore_UnitTest_Cuda_MPI_1 #200
  • nvcc compiler warning: calling a __host__ function from a __host__ __device__ function is not allowed #180
  • Intel 15 build error with defaulted "move" operators #171
  • missing libkokkos.a during Trilinos 12.4.2 build, yet other libkokkos*.a libs are there #165
  • Tie atomic updates to execution space or even to thread team? (speculation) #144
  • New View: Compiletime/size Test #137
  • New View : Performance Test #136
  • Signed/unsigned comparison warning in CUDA parallel #130
  • Kokkos::complex: Need op* w/ std::complex & real #126
  • Use uintptr_t for casting pointers #110
  • Default thread mapping behavior between P and Q threads. #91
  • Windows: Atomic_Fetch_Exchange() return type #90
  • Synchronic unit test is way too long #84
  • nvcc_wrapper -> $(NVCC_WRAPPER) #42
  • Check compiler version and print helpful message #39
  • Kokkos shared memory on Cuda uses a lot of registers #31
  • Can not pass unit test cuda.space without a GT 720 #25
  • Makefile.kokkos lacks bounds checking option that CMake has #24
  • Kokkos can not complete unit tests with CUDA UVM enabled #23
  • Simplify teams + shared memory histogram example to remove vectorization #21
  • Kokkos needs to rever to ${PROJECT_NAME}_ENABLE_CXX11 not Trilinos_ENABLE_CXX11 #17
  • Kokkos Base Makefile adds AVX to KNC Build #16
  • MS Visual Studio 2013 Build Errors #9
  • subview(X, ALL(), j) for 2-D LayoutRight View X: should it view a column? #5

End_C++98 (2015-04-15)

* This Change Log was automatically generated by github_changelog_generator