Skip to content

Meeting 2016 02 Minutes v2.1 Talk

Tomislav Janjusic edited this page Jan 6, 2024 · 1 revision

Geoffs notes for the v2.x series

NOTE: Howard also took notes in original wiki page. Consider his "better" than mine.

Estimated timeline

  • June '15 master branched to 2.0.0
  • July '16 2.0.0 is released (yes 1 year and 1 month later!)
  • September '16 2.0.1 release
  • October '16 2.1.0 date based Drivers: PMIx 2.0.0, OSHMEM, TCP improvements, usnic MT.
  • December '16 2.1.0 release

Must-have Features for 2.1.x

  • Thread safety (MPI_THREAD_MULTIPLE) support
    • need to verify which BTLs are thread safe (via testing vs stating) (DONE)
    • need more testing (non-blocking collectives, one-sided, MPI I/O, etc.)
    • need to document what is not thread safe (DONE)
    • performance improvements when using MPI_THREAD_MULTIPLE (i.e., TEST/WAIT improvements) - may wait for a publication before committing
  • MPI-3.1 Compliance
    • Ticket #349 (MPI_Aint_add) - 2.0.X candidate (DONE)
    • Ticket #369 (same_disp_info key for MPI_Win_create) - 2.0.X candidate, maybe (DONE)
    • Ticket #273 (non blocking coll I/O, non trivial). This is dependent on moving libnbc core out of libnbc component. (DONE)
    • Ticket #404 (MPI_Aint_diff) - 2.0.X candidate (DONE)
    • Ticket #357 (MPI_Initialized, MPI_Query_thread, MPI_Thread_is_main) always thread safe (probably just verify with a test to see this is true now for OMPI thread models) (DONE)
  • MPI-3 Errata Items
    • Jeff needs to check
    • Jeff needs to check
  • Coverity cleanup (IN PROGRESS, down to ~260) (never ending)
  • Scalable startup work (smarter add_proc in the OB1 PML), needs more work
    • Sparse groups
    • Additional PMIx features (issue 394)
    • PMIx 2.0.0 (just the shared memory for on node PMIx communication)
  • ROMIO refresh - need to be using a released ROMIO package (DONE)
  • Fix Java bindings garbage collection issues (DONE)
  • Hwloc 1.11.3 final (DONE)
  • CUDA extension (to add MPIX_CUDA_IS_AWESOME to mpi.h) and MPI_T Cvar for run-time query of whether CUDA is supported in this OMPI
  • Add MPI 3 features to Java bindings (DONE)

Must-have Features for the 2.1 Series

  • PMIx 2.0 integration
  • pending PRs (Nathan's free list work) (DONE)
  • Multi-rail performance in OB1? What happened? (WONT FIX)
  • TCP latency went up and bandwidth(rendezvous) went way down. What happened? Maybe in 2.x series sometime...
    • AWS - is watching it.
  • Support for thread-based asynchronous progress for BTLs (anyone working on this now?)
  • Improved story on out-of-the-box performance, particularly for collectives. Ideally some kind of auto-tune type of mechanism. (otopo project)
    • Edgar - still working on it
  • mpool rewrite (PR open)
  • OB1 has cuda enhancements, with potential future nvidia collectives enhancements.

Desirable-to-have Features for 2.1

  • Rationalized configuration for Cray XE/XC (DONE)
  • platform file for using OFI MTL on Cray XC/KNL
  • usNIC stuff
    • conversion to libfabric (DONE)
    • usNIC BTL thread safety PR #1233
  • simplified verbs BTL for iWarp? (NOT GOING TO HAPPEN)
  • Mellanox stuff
    • HCOLL datatypes
  • BTL/OpenIB across different subnets PR #1043
  • Open SHMEM 1.3 compliance PR #1041 and PR #1042
  • OMPI commands (mpirun, orte_info, etc.): deprecate all single-dash options except for the sacrosanct ones (-np, etc.). Print a stderr warning for all the deprecated options.
    • Note that MPI-3.1 8.8 mpiexec mentions: -soft, -host, -arch, -wdir, -path, -file
  • Score-P integration (won't hit 2.0.0, but will get in 2.x)
  • libfabric support (Intel MTL, Cisco BTL, others) (DONE in 1.10)
  • Memkind support both for MPI_Alloc_mem and Open MPI internal
    • No current owner at Intel for Memkind
  • Nathan Hjelmn's BTL 3.0 changes (DONE)
  • MPI-4 features (maybe as extensions?)
    • endpoints proposal
    • ULFM (as of June 2015, Ralph/George are coordinating so that ORTE can give ULFM what it needs)
    • MPI T extensions
  • Better interop with OpenMP 4 placement - esp. for nested OMP parallelism
  • OFI MTL support MPI_THREAD_MULTIPLE - may already be thread safe
  • OFI OSC component (probably will not happen)

Features that are already in master that made it in to v2.0

  • Switch to using OMPI I/O as default
  • Switch to vader as default for shared memory BTL
  • PSM2 MTL

Terminating support

  • Cray XT legacy items (ESS alps component, etc.) (DONE - although new ess/alps for Cray XE/XC)
  • MX BTL (DONE)
  • What other BTLs to delete? SCIF?
  • Clean up README (DONE)
  • Delete coll hierarch component
  • coll ML disabled
  • Delete VampirTrace interface
  • Deprecate mpif77/mpif90: print a stderr warning

Testing

  • What do we want to test?
    • More thread safety tests - non blocking collectives, etc.
    • OMPI I/O tests, refresh from HDF group? (DONE)
Clone this wiki locally