Skip to content

WeeklyTelcon_20210525

Geoffrey Paulsen edited this page May 25, 2021 · 1 revision

Open MPI Weekly Telecon ---

Attendees (on Web-ex)

  • Austen Lauria (IBM)
  • Brendan Cunningham (Cornelis Networks)
  • Brian Barrett (AWS)
  • Edgar Gabriel (UH)
  • Geoffrey Paulsen (IBM)
  • Harumi Kuno (HPE)
  • Hessam Mirsadeghi (NVIDIA))
  • Howard Pritchard (LANL)
  • Jeff Squyres (Cisco)
  • Joseph Schuchart (HLRS)
  • Josh Hursey (IBM)
  • Marisa Roman (Cornelius)
  • Matthew Dosanjh (Sandia)
  • Sam Gutierrez (LANL)
  • Todd Kordenbrock (Sandia)
  • William Zhang (AWS)

not there today (I keep this for easy cut-n-paste for future notes)

  • Akshay Venkatesh (NVIDIA)
  • Artem Polyakov (NVIDIA)
  • Aurelien Bouteiller (UTK)
  • Brandon Yates (Intel)
  • Charles Shereda (LLNL)
  • Christoph Niethammer (HLRS)
  • David Bernholdt (ORNL)
  • Erik Zeiske (HPE)
  • Geoffroy Vallee (ARM)
  • George Bosilca (UTK)
  • Joshua Ladd (NVIDIA)
  • Mark Allen (IBM)
  • Matias Cabral (Intel)
  • Michael Heinz (Cornelis Networks)
  • Nathan Hjelm (Google)
  • Naughton III, Thomas (ORNL)
  • Noah Evans (Sandia)
  • Raghu Raja (secret startup)
  • Ralph Castain (Intel)
  • Scott Breyer (Sandia?)
  • Shintaro iwasaki
  • Tomislav Janjusic (NVIDIA)
  • Xin Zhao (NVIDIA)

New Items

How is MPI_Sessions implementation going?

  • Pretty close to syncing up with master.
  • Open a DRAFT PR in next few weeks (Before June)
  • Confidence is high on this implementation.
    • High as to not break MPI_Init / MPI_Finalize.
    • Intern writing more tests on sessions.
    • Code prototype should be okay, but PMIx is more up in the air.
  • Implements everything in current MPI 4.0 proposal.
    • need to rip out some extra stuff in branch
    • Text is in standard
    • Will have a new minimum PMIx version.
      • YES. Might need PMIx v5
    • Will need to have some conversations about what to do with older PMIxes.
  • Also includes making frameworks refcounted

v4.0.x

  • 8982 - MPI_Snedrecev btl/tcp doesn't block an RC
  • We'll do one more RC, and then get a final v4.0.6 out.
  • Where are we on pack/unpack with long and long double
    • only external32
    • This worked before, but not sure
  • 8918 - pack/unpack with external32
  • 8818 - checking if
  • Brian thinks Issue 8990 would also apply to v4.0.x
    • with-libevent=/usr (Debian packaging does), we add a -L/usr to wrapper output, and put all of the -L to find deps, before -L to libmpi.so, and if there is an ompi in /usr/lib as well,

Company asking to use Open-MPI logo

Issue 8763 - Fortran non-blocking Handles

  • Fortran bindings for MPI_Ialltoallw and neighbor version
    • We create an array, pass it into the C binding, and then free it before the C side has completed.
    • Another issue discovered:
      • 4 byte C ints, and 8 byte Fortran integers.
      • This has been in code "forever"
    • Howard thought we discussed this before and that we added a configury check to disallow this.
    • George thinks he has an elegant solution.
      • Won't be as invasive as originally thought.
    • Marked as critical, but not a blocker as it's been in the product forever.

v4.1.x

  • No driver to rush, so now just in bugfix phase.

v5.0.x

  • Need some configury changes in before we RC.
  • Issue 8850, 8990 and more
  • Brian will file 3-ish issues
    • One is configure pmix
  • Unscheduled RC
  • Dynamic Windows fix in for UCX.
  • Any update on debugger support?
  • Need some documentation that Open MPI v5.0 supports PMIx based debuggers, and that if
  • MPIR Shim - pushed up fixes, and enabled CI.
    • Could add it to some more CI, to ensure that PMIx doesn't break
    • IBM is working on some CI testing with MPIR (typically very brittle)
    • Need some guidance on pmix version.
    • Right not, probably not a big deal, but perhaps in 2 years when we have 3 release branches with different pmix versions on different release branches, it might make sense to do open-mpi CI testing.
      • Shouldn't be too much work to do.
  • UCC coll component updating to just set to be default when UCX is selected. PR 8969
    • Intent is that this will eventually replace hcoll.

Reformatting

Master

  • PR 8998 - MPIPy -
    • In shift to PRRTE, --oversubscribe is NOT being handled. If you have more procs than slots on a node, internal oversubscribe var is not yet being set.

MTT

  • Mellanox hasn't been reporting for a while. Tommi will follow up.
  • Austen filed a couple of issues from MTT.

PMIx

  • No discussion

PRRTE v2.0

  • No update

Longer Term discussions

  • No discussion.
Clone this wiki locally