Releases: UO-OACISS/apex
Releases · UO-OACISS/apex
Release v2.7.0 with new features and bug fixes
This release contains quite a lot of new functionality, and refactored untied task support. Here's a
list of the new features included in this release:
- Updated Kokkos autotuning support, with new search strategies genetic_search, nelder_mead, and automatic. The complete list is exhaustive, random, simulated_annealing, genetic_search, nelder_mead an
d automatic.
- Nested Kokkos autotuning support allows for complicated search strategies when doing nested Kokkos
search contexts. For example, choose between two execution policies, while autotuning the internals of each.
- NVTX pass-through support allows APEX timers to be fed to NVIDIA performance tools if desired.
- dladdr support for symbol name resolution when binutils are not available.
- Robust tracking of pthreads without needing to wrap all pthread functions.
- Added support for the TaskStubs library, a "PerfStubs" like library for instrumenting task based r
untimes like Iris, PaRSEC, and StarPU. This support includes new events for scheduling, data transfe
r as well as execution. Task arguments are included in the Google Trace Events output.
- Added support for complicated MxN parent-child task dependencies, not just 1xN. This provides comp
lete support for the above runtimes.
- All runtimes are treated as untied tasks, even standard callpath timer stacks. This allows for com
plicated task dependency graphs combining asynchronous tasks and callpath timer stacks.
- Enabled measurement of HIP, CUDA, and SYCL in the same executable for Iris support. Will also support OpenCL (Intel) if needed.
- Added OpenCL support for Iris on Intel GPUs/FPGAs.
- Added Python support with updated PerfStubs.
Complete list of commits in this release:
view commit • Updating kokkos to 4.2.01
view commit • Re-enabling kokkos allocation tracking When enabled, APEX will keep track of allocations through Kokkos and ensure they are all freed before exit
view commit • Trying to clean up memory allocation tracking When tracking allocations on the host, everything seems to be working correctly but on occasion, we see allocation amounts changing on the stack in gdb on frontier. can't explain it yet. But some fixes are included in this commit.
view commit • Updating roofline stats to use new CSV output
view commit • Adding NVTX pass-through support. As requested for the pika project, the ablity to pass APEX timers through to NVTX. This is not compatible with APEX cuda support, since it implements the NVTX API. However, it should work with an applicaiton linked with APEX if the APEX_ENABLE_NVTX_HANDOFF environment variable is set.
view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
view commit • Fixing errors in unit tests on apple
view commit • Removing debug print message
view commit • Fixing builds with and without kokkos. If APEX is configured without Kokkos it should build now. And APEX will build examples correctly with kokkos, and will build Kokkos only if the Kokkos examples/tests are requested. Otherwise it just uses the headers.
view commit • Fixing .dylib/.so for apple configurations
view commit • Forgot to update an enum name for CUDA
view commit • Adding OpenMP library when linking against Kokkos unit test with OpenMP back end.
view commit • A few updates to help with periodic sampling of ROCM metrics on Frontier.
view commit • Added env var for specifying libnvToolsExt.so When using APEX as an NVTX pass-through using the APEX_ENABLE_NVTX_HANDOFF variable, NVIDIA doesn't automatically load the library with nvtx support. Adding APEX_NVTX_LIBRARY which defaults to "libnvToolsExt.so" but can be overridden with a full path, or you can add the path to LD_LIBRARY_PATH.
view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
view commit • Changing from backtrace_symbols to dladdr to resolve addresses when BFD is not available.
view commit • Removing sleep during shutdown if the background thread is already done.
view commit • Removing sleeps from unit tests, exit should be clean now even for very short programs.
view commit • Changing NVTX pass-through to use different colors for up to 25 different timers.
view commit • Adding first version of genetic search algorithm based on random
view commit • Fixing osx build errors
view commit • First genetic search implementation.
view commit • Debugging genetic search
view commit • Debugging autotuning of kokkos kernels - problems with genetic search and nested contexts should now be working.
view commit • Fixing cached variable usage When using cached tunings from Kokkos, the variables might not be declared or provided in the same order, and depending on the execution may not be given at all. So...ignore the IDs other than for mapping from the old context to the old variables, or the new contexts to the new variables. We now match the variable names between context hashes, which should be correct in the future.
view commit • Enhancing autotuning search with nested contexts. When doing autotuning with Kokkos kernels, it is possible to have nested search contexts. When that happens, we want to make sure that we explore all branches of all possible scenarios. For example, the "idk_jmm" test in the https://github.com/khuck/apex-kokkos-tuning repository has one context to choose between a team policy and a mdrange policy, and each of those has tunable parameters. Because the outer context has only two choices, it can converge very quickly unless we prevent it from converging until the search has completed for both team and mdrange policy implementations. That's what this change does - if we have nested search contexts, the outer context(s) won't converge until the inner context searches converge. I also reduced the max_iterations limit for random, genetic_search and simulated_annealing to 500 from 1000.
view commit • Fixing output for exhaustive tuning
view commit • Fixing output for tuning strategies to be more helpful.
view commit • debugging simulated annealing with one variable of only a few values.
view commit • Minor changes to fix thread tracing When threads are created before main (looking at you, CUDA), or are not terminated before main terminates (looking at you, Iris), we need to make sure that the registered new threads have a thread ID. To do that, the top level timers for those threads should have synchronous start/stop events, otherwise they get a single "unified" event at the end with the thread ID belonging to the main thread (because it is the one cleaning up a the end).
Read more
Release v2.6.5 with bug fixes
view commit • Cleaning up kokkos tuning verbosity, shortening simulated annealing minimum iterations.
view commit • Adding common tree construction at end of MPI execution. This will speed up analysis with python and the apex-treesummary.py script.
view commit • Rocm 5.7.0 is missing librocprofiler64.so, so don't expect it to be there. But don't fail if it isn't.
view commit • Adding explicit rocm_smi memory check
view commit • Adding extra calls to query HIP memory periodically, and to query SMI memory at alloc/free points.
view commit • Make sure defines are defined correctly.
view commit • Don't include rocm_smi in ompt code.
view commit • Cleaning up some race condition crashes during short tests
view commit • Removing unused variable
view commit • Fixing bug where tasktree header isn't written to csv file for non-MPI runs
view commit • Reporting tcmalloc preload error when detected. We could probably automatically detect that tcmalloc is a dependency and preload it automatically...
view commit • Adding concurrency options to apex_exec Also allowing all options to have either underscores or dashes
view commit • Disabling APEX_BUILD_OMPT, this addresses PR #177 with the correct fix. There's no need to build the LLVM runtime to support the non-compliant GCC compiler.
view commit • Removing OMPT build setting from CI
view commit • Adding Kokkos unit test, but only enable it if APEX builds Kokkos as a submodule. We can't guarantee that the installed Kokkos will only provide host support. This allows us to test Kokkos support on CI.
view commit • Fixing tree post-processing script to assume that the task tree is common across all ranks, this speeds up processing a bunch.
view commit • Adding warning that OMPT got re-initialized
view commit • Fixing startup issues when launched with gdb, or just in general
view commit • Enabled headless batch GDB processing to catch crashes with mpirun.
view commit • Fixing bug in computing receive bytes for non-root ranks and getting a segv because the pointer is null
view commit • Fixing a bug in OMPT support where openmp regions happen after finalize - the kokkos runtime uses openmp regions in destructors.
view commit • Fixing symbol collision between ompt and hip support
view commit • Resetting the static tree node count after each dump, needed for merging common tree of tasktree data.
view commit • Only have a threadpool the size of the allowable cores, not all of them
view commit • Fixing shutdown bug when finalize is called without dump on frontier
view commit • Updating readme
view commit • Updating documentation
view commit • Updating CI build options
view commit • adding documentation link
view commit • adding documentation link
view commit • Debugging Kokkos autotuning of occupancy hints on CUDA. Lots of silly mistakes.
view commit • Forgot to add source files for test
view commit • Forgot another test source file
view commit • Force Kokkos fencing enabled when doing autotuning.
view commit • Flushing trace at dump.
view commit • Fixing bug in trace buffer flush at exit, and reducing minimum iterations for simulated annealing
view commit • Updating documentation for 2.6.4 release
view commit • Updating documentation for 2.6.4 release
view commit • Changing default of Kokkos tuning to false
view commit • Updating binutils support to work with modern compilers
view commit • Fixing binutils hash
view commit • Lots of fixes for tracking memory leaks Found a few issues with memory tracking on Frontier. These fixes will allow us to delay memory tracking until after apex::dump() has been called some number of times (configurable), and fixes some symbol resolution. This also fixes some trace output for when we crash before exit.
view commit • Updating version number for 2.6.5 patch release
view commit • Merge branch 'develop'
Relase v2.6.4 with bug fixes
New release for v2.6.4, providing bug fixes.
view commit • Removing Kokkos header and making kokkos a build dependency, either externally if available or as a git submodule.
view commit • Adding Kokkos submodule support for HPX integrated build.
view commit • Cleaning up Kokkos submodule support for HPX integrated build.
view commit • Fixing compiler warning by checking return value.
view commit • Fixing CMake capitalization for Kokkos
view commit • Fix deprecated CMake cache variables
view commit • Cleaning up kokkos tuning verbosity, shortening simulated annealing minimum iterations.
view commit • Adding common tree construction at end of MPI execution. This will speed up analysis with python and the apex-treesummary.py script.
view commit • Rocm 5.7.0 is missing librocprofiler64.so, so don't expect it to be there. But don't fail if it isn't.
view commit • Adding explicit rocm_smi memory check
view commit • Adding extra calls to query HIP memory periodically, and to query SMI memory at alloc/free points.
view commit • Make sure defines are defined correctly.
view commit • Don't include rocm_smi in ompt code.
view commit • Cleaning up some race condition crashes during short tests
view commit • Removing unused variable
view commit • Fixing bug where tasktree header isn't written to csv file for non-MPI runs
view commit • Reporting tcmalloc preload error when detected. We could probably automatically detect that tcmalloc is a dependency and preload it automatically...
view commit • Adding concurrency options to apex_exec Also allowing all options to have either underscores or dashes
view commit • Disabling APEX_BUILD_OMPT, this addresses PR #177 with the correct fix. There's no need to build the LLVM runtime to support the non-compliant GCC compiler.
view commit • Removing OMPT build setting from CI
view commit • Merge branch 'develop' into fix-cmake-deprecated-variables
view commit • Merge pull request #178 from Pansysk75/fix-cmake-deprecated-variables
view commit • Fixing tree post-processing script to assume that the task tree is common across all ranks, this speeds up processing a bunch.
view commit • Adding warning that OMPT got re-initialized
view commit • Fixing startup issues when launched with gdb, or just in general
view commit • Enabled headless batch GDB processing to catch crashes with mpirun.
view commit • Fixing bug in computing receive bytes for non-root ranks and getting a segv because the pointer is null
view commit • Fixing a bug in OMPT support where openmp regions happen after finalize - the kokkos runtime uses openmp regions in destructors.
view commit • Fixing symbol collision between ompt and hip support
view commit • Resetting the static tree node count after each dump, needed for merging common tree of tasktree data.
view commit • Only have a threadpool the size of the allowable cores, not all of them
view commit • Fixing shutdown bug when finalize is called without dump on frontier
view commit • Updating readme
view commit • Updating documentation
view commit • Updating CI build options
view commit • Adding Kokkos unit test, but only enable it if APEX builds Kokkos as a submodule. We can't guarantee that the installed Kokkos will only provide host support. This allows us to test Kokkos support on CI.
view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
view commit • adding documentation link
view commit • adding documentation link
view commit • Debugging Kokkos autotuning of occupancy hints on CUDA. Lots of silly mistakes.
view commit • Forgot to add source files for test
view commit • Forgot another test source file
view commit • Force Kokkos fencing enabled when doing autotuning.
view commit • Flushing trace at dump.
view commit • Fixing bug in trace buffer flush at exit, and reducing minimum iterations for simulated annealing
view commit • Updating documentation for 2.6.4 release
view commit • Updating documentation for 2.6.4 release
Relase v2.6.3 with bug fixes
APEX Version 2.6.2
Several change to the APEX code structure, including separation of HIP, CUDA, OMPT, LEVEL0, PERFETTO and MPI support to separate libraries. This allows us to have one configure/build on a system, and allow it to dynamically add support at runtime to support different features. This allows us to have a configuration that provides support for CUDA but doesn't require it, for example. Other new features include:
- Level0 (SYCL/OneAPI support)
- Improved OMPT support for supported compilers
- Phiprof support (VLASIATOR)
- StarPU support
- Perfetto native tracing support (doesn't yet support flow events, JSON trace output still recommended)
- Merged CSV output for both flat profile and task tree data
- Python scripts (
apex-summary.py
, apex-treesummary.py
) to post-process merged CSV output
- Many bug fixes
view commit • Initial Level 0 support. Profiling works, but tracing timestamps are quite bogus.
view commit • Updating OneAPI support, still examining timestamps
view commit • Fixing APPLE pedantic build issues.
view commit • Adding native perfetto tracing support, very rudimentary support.
view commit • Working with correct timestamps now
view commit • Adding program name to program track in perfetto
view commit • Adding async events
view commit • Adding async events
view commit • Adding --apex:pftrace option to apex_exec for Perfetto and deprecating the Google Trace Events option
view commit • Debugging async events with HIP. Apparently the nvhpc 22.11 compiler crashes when compiling the perfetto.cc file.
view commit • Debugging perfetto support with nvhpc
view commit • Debugging hpx build
view commit • Adding perfetto license and readme
view commit • Debugging perfetto trace with hpx - make sure the trace is closed on exit
view commit • Making Perfetto native support optional, with -DAPEX_WITH_PERFETTO flag
view commit • Lots of fixes for starpu and pthreads. All detached threads are now correctly handled at exit, and starpu counters are back.
view commit • Debugging missing stop/starts from threads that are spawned by blas, cuda, starpu, etc.
view commit • Fixing pthread_create wrapper support for library threads launched before main. Also changing order that timers are written to taskgraph.0.dot, becuase when APEX MAIN, apex_preload_main, and APEX pthread wrapper timers aren't written first, they get lost in the graph. we want the graph anchored at those timers.
view commit • Timer throttling is now a runtime option, disabled by default.
view commit • Phiprof support
view commit • Adding timer throttle options to apex_exec
view commit • Renamed config file, removed a compilation message
view commit • Merge branch 'develop' into oneapi
view commit • Fixing links to HPX web sites
view commit • Debugging Level0 support and adding inclusive time for task lifetimes
view commit • Finally fixed the intel timestamps for tracing.
view commit • Making perfetto off by default.
view commit • Debugging on Crusher
view commit • Replacing broken CSV output with reduced CSV output from all ranks.
view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
view commit • Fully reduced CSV and tasktree output now. The tasktree output can be post-processed to generate any necessary Hatchet, Graphviz or Trilinos output if desired.
view commit • Adding general metrics for tasktree nodes!
view commit • Removing debug message
view commit • Adding ability to check whether MPI can accomodate the memory requested for big transfers. Use the APEX_VALIDATE_MPI_MEMORY_USAGE variable to enable it.
view commit • More advanced statistics for MPI bytes in the tasktree data, including min, max, mode, median, stddev
view commit • Removing BW computation during run, what's the point of adding overhead?
view commit • Merge branch 'develop' of github.com:khuck/xpress-apex into develop
view commit • Fixing non-portable code changes
view commit • Fixing portability bug without HIP/ROCm
view commit • Fixing sorting of csv tasktree output, I think
view commit • Adding reduction support for HPX+MPI. Still need support for other parcels.
view commit • Adding inclusive time to tasktree output.
view commit • Fixing units in tasktree output
view commit • Forgot to write out yields to profile csv
view commit • Fixing sorting errors in tasktree csv output
view commit • Adding script to post-process apex_profiles.csv and provide same output as what we get at the end of the run, but with greater flexibility.
view commit • Merge branch 'develop' of github.com:khuck/xpress-apex into develop
view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
<...
Read more
Version 2.6.0
Lots of bug fixes and new features.
Highlights:
- CUDA support updated through nvhpc 22.9 / cuda 11.9
- HIP support updated through rocm 5.2.0
- OMPT target offload support tested with amdclang 5.2.0, intel, nvhpc compilers
The change log:
view commit • Working on OMPT update with target offload on MI250X with AMD 5.1.0 compilers
view commit • Minor changes and bug fixes from testing on crusher. Kokkos doesn't include the device ID any more (assume 0). When accumulated is 0.0, assume the timer hasn't been stopped.
view commit • Merge branch 'ompt_amd_target_update' of git.nic.uoregon.edu:/gitroot/xpress-apex into ompt_amd_target_update
view commit • Working OpenMP target support!
view commit • Fixing bugs in OMPT offload tracing, task dependencies, flow events. All seems working?
view commit • Working OpenMP Target Offload with tracing to GTrace and OTF2
view commit • Adding sync time measurement to collectives.
view commit • Debugging OMPT target offload with latest Intel OneAPI
view commit • Merge branch 'ompt_amd_target_update' into develop
view commit • Merge branch 'develop' of github.com:khuck/xpress-apex into develop
view commit • Adding comment for AMD implementation of OMPT
view commit • Check to make sure a profiler is returned when measuring bandwidth for MPI. This can happen when timers are requested after APEX has shut down.
view commit • Updating JSON tasktree output to support Hatchet analysis
view commit • Fixing tasktree output for Hatchet parsing.
view commit • executable requires -lstdc++ when building with icx
view commit • Found bug in destructor logic. Default thread id to INT_MAX.
view commit • Bug when configuring OMPT but not OTF2
view commit • Fixing OpenMP event annotation, including address whether the code was compiled with debug or not. Demangling now happens in apex_bfd.cpp, before line number or address is added to the symbol name.
view commit • Merge branch 'oneapi' into develop
view commit • Removing debug statement
view commit • Removing debug statement
view commit • More threaded statistic support
view commit • Testing with ROCm 5.2
view commit • Debugging latest PAPI hip/rocm counter support on crusher, fixing bugs with initializing the event set with the rocm component. With 5.2, all hardware counters should work.
view commit • Fixing csv output to be consistent across counters and timers.
view commit • Cleaning up compiler warning message
view commit • Adding support for counter groups, causes re-execution of the application and profile data is written to different directories for each pass.
view commit • Cleaning up papi counter groups, now also tracing the metrics to trace events.
view commit • Splitting GPU and CPU memory allocation leak tracking.
view commit • Debugging memory tracking for gpus and cpus
view commit • Configuring apex to install apex_exec when configured as part of HPX
view commit • Debugging support on polaris.
view commit • Adding apex_environment_help utility to list all APEX environment variables.
view commit • Cleaning up address resolution and providing a backtrace_symbols backup implementation when BFD not used
view commit • Adding new utility to dump all environment variables
view commit • Adding banner at program start to confirm things are working.
view commit • Adding Fortran MPI Wrapper support Added wrappers for all the C MPI functions that we care about. Also debugged the "delayed start" feature - will need some more testing with CUDA and OneAPI.
view commit • When started disabled, make sure CUDA returns immediately for callbacks and activity
view commit • Cleaning up build with hipcc & mpi & hpx
view commit • Debugging HPX+Kokkos
view commit • Kokkos renamed their tool library environment variable.
view commit • Trying to fix the Rocprofiler library rpath problem.
view commit • Only include apex_mpi.cpp if the MPI parcel port is used.
view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
view commit • Updating namespace for HPX bind call
view commit • Debugging HPX exit and replacing __func__ with pretty version where available.
view commit • Cleaning up output
view commit • Fixing bug where csv output doesn't happen without screen output
view commit • Updating Cray power counter support on crusher
view commit • HPX doesn't like the MPI wrapper on some systems. Disable it for now.
<...
Read more
v2.5.1
view commit • The HPX configuration was missing ROCPROFILER
view commit • Adding ROC profiler sources for HPX when HIP support is requested
view commit • Cleaning up HIP enabled builds with GCC as compiler
view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
view commit • Fixing apparently broken Range Begin/End Push/Pop Cupti hasn't been making callbacks for range end or pop. So we will wrap those functions, instead of processing callbacks
view commit • Starting fix on range end events
view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
view commit • Updating version number
APEX Version 2.5.0
view commit • Trying to integrate OpenMP target offload for AMD and NVIDIA
view commit • Adding clock synchronization code, but not enabled yet.
view commit • Working MPI clock sync for OTF2. Still need to test and fully implment the HPX version. In this implementation, all ranks determine a latency between rank 0 and themselves, then compute a clock drift between the two. That drift is added to the archive file creation time as an absolute reference that all ranks can use.
view commit • Updating HPX version of clock sync before test with HPX build
view commit • Updated clock sync for OTF2 trace output for HPX. This doesn't actually do a clock sync, but uses the OTF2 archive creation time as the "baseline" timestamp for all localities. Unfortunately, I need to figure out a safe time to make lco requests from one rank to another - during the apex::initialize phase, HPX isn't quite ready to handle requests yet.
view commit • Fixing initial values for kokkos tuning search
view commit • Minor fix to prevent crash when resolving long Kokkos kernel names
view commit • Fixing initial value for interval set parameters. When tuning Kokkos kernels, we need to use the initial value of the variable, which is (supposed to be) given to us by Kokkos.
view commit • Cleaning up kokkos autotuning code
view commit • First commit for transferred/renamed repo
view commit • Allow for unlimited custom policies Needed for Kokkos autotuning, allow as many tuning sessions as the system requests. Also reduce the output to the screen.
view commit • Fixing stat structure for OSX
view commit • Adding delayed start and max measurement Adding APEX_START_DELAY_SECONDS option and APEX_MAX_DURATION_SECONDS to specify a delayed start of measurement, and a max length to measure. The max length does not include delayed time. So a delay of 1 seconds and a max length of 2 seconds would record seconds 1-3 of an execution.
view commit • hipcc doesn't defined the _OPENMP macro, so don't expect it.
view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
view commit • Cleaning up build as we port to spock Need to clean up installation location of dependent packages to allow for re-building without deleting the install directory.
view commit • updating documentation links
view commit • Updating the readme with hip and pthread info, as well as reference links
view commit • adding implementation of `kokkosp_request_tool_settings` so we can disable fencing when profiling
view commit • Updating Kokkos tuning to support multiple sessions at the same time, of unlimited number. Also tweaked the simulated annealing search to converge in a reasonable time frame.
view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
view commit • Adding HSA events to the hip tracer
view commit • Merge branch 'develop' of git.nic.uoregon.edu:/gitroot/xpress-apex into develop
view commit • Merge branch 'master' into develop
view commit • Adding OpenMP Target examples
view commit • Pushing for David to test
view commit • Working Kokkos tuning with caching
view commit • Fixing compiler bug that slipped through
view commit • Fixing code path for synchronous timer processing Even when processing profiles synchronously, we were launching the background thread and signalling it when a profiler was created.
view commit • Don't use shared pointers when doing synchronous processing
view commit • lowering overhead in measurement. Don't enable hip or cuda by default, and provide hip command line options for apex_exec.
view commit • Fixing compiler error that shouldn't have been committed?
view commit • Fixing all dependency builds
view commit • Removing google profiler from apex_exec
view commit • Adding active harmony patch
view commit • Cleaning up config for CUDA and GCC and openmp offload
view commit • Fixing namespace for apex options
view commit • Adding output for gtrace output
view commit • cd to the output directory before post-processing
view commit • Cmake changes for subdir builds
view commit • Fixing kokkos tuning cache writing logic, and stopped generating overhead once the tuning is done.
view commit • Merge branch 'feature/allow-subdir-builds' of github.com:DavidPoliakoff/apex into develop
view commit • Fixing CMake variables to allow for subproject builds
view commit • Fixing compiler error with gcc 5.4
view commit • Need a minimum value for 'quarter' which is used in the calculation of the next candidate neighbor to test. If quarter is less than 2, we never get movement.
view commit • Adding ROCm SMI reader
view commit • Fixing bfd install step mistake
Read more
Patch release v2.4.1
Emergency patch to fix HPX collectives API change in next HPX release.