Skip to content

WeeklyTelcon_20220712

Geoffrey Paulsen edited this page Jul 12, 2022 · 1 revision

Open MPI Weekly Telecon ---

  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Austen Lauria (IBM)
  • Brendan Cunningham (Cornelis Networks)
  • David Bernhold (ORNL)
  • Edgar Gabriel (UoH)
  • Geoffrey Paulsen (IBM)
  • Howard Pritchard (LANL)
  • Joseph Schuchart
  • Josh Fisher (Cornelis Networks)
  • Josh Hursey (IBM)
  • Todd Kordenbrock (Sandia)
  • Tommy Janjusic (nVidia)

not there today (I keep this for easy cut-n-paste for future notes)

  • Akshay Venkatesh (NVIDIA)
  • Artem Polyakov (nVidia)
  • Aurelien Bouteiller (UTK)
  • Brandon Yates (Intel)
  • Brian Barrett (AWS)
  • Charles Shereda (LLNL)
  • Christoph Niethammer (HLRS)
  • Erik Zeiske
  • Geoffroy Vallee (ARM)
  • George Bosilca (UTK)
  • Harumi Kuno (HPE)
  • Hessam Mirsadeghi (UCX/nVidia)
  • Jeff Squyres (Cisco)
  • Joshua Ladd (nVidia)
  • Marisa Roman (Cornelius)
  • Mark Allen (IBM)
  • Matias Cabral (Intel)
  • Matthew Dosanjh (Sandia)
  • Michael Heinz (Cornelis Networks)
  • Nathan Hjelm (Google)
  • Noah Evans (Sandia)
  • Raghu Raja (AWS)
  • Ralph Castain (Intel)
  • Sam Gutierrez (LLNL)10513
  • Scott Breyer (Sandia?)
  • Shintaro iwasaki
  • Thomas Naughton (ORNL)
  • William Zhang (AWS)
  • Xin Zhao (nVidia)

v4.1.x

  • v4.1.5

    • Schedule: targeting ~6 mon (Nov 1)
    • No driver on schedule yet.
  • Looks like 5 PRs are reviewed and ready to be merged (https://github.com/open-mpi/ompi/pulls?q=is%3Apr+is%3Aopen+base%3Av4.1.x+)

    • William Would like merged PR#10513
  • Brendan summarized 3 CVEs in libevent 2.0.x series, that are fixed in newer versions of libevent.

    • Two of the CVEs are almost certainly NOT built when Open MPI builds libevent.
    • Third CVE is in some search function supporting search (haven't traced if we actually compile it in for Open MPI).
    • Fixed in libevent 2.1.8
    • libevent fixes will be needed on v4.1.x and v4.0.x
    • Brian mentioned on RM Slack channel that upgrading libevent version would be best path
    • Brendan will post PR to show how little change is needed...
      • Balance between risk and reward.
  • Last week discussion: Would not want to try upgrading libevent to v2.1.8 for v4.1.x

    • Fixes are contained to simple fixes (bounds/range checking fix)
    • Cherry-picks is much less risky
    • Decided Brendan will PR the small cherry-picks needed to v4.1.x and try v4.0.x.

v5.0.x

  • Schedule:
  • Keeping an eye on PRRTE branch (ralph will create new v3 branch soon).
    • When this happens we'll swap out submodule pointer.
    • OMPI main prrte submodule pointer is already pointing to prrte main
  • Sessions PR fix merged to v5.0.x

PRRTE

Main branch

  • Joseph is trying to do HAN / Adapt runs.
    • main branch.
    • x86 with infiniband.
    • Sometimes the startup hangs (really annouing).
    • Don't know if it's a prrte run hang?
    • Tried to run with GDB/DDT, but of course then it doesn't hang.
    • He'll try to see if it's before MPI_Init in PRRTE or something else.

Accelerator framework

  • sm_cuda component was moved into framework.
    • nVidia has some issues building, and will try again to test

Attomics PRs.

  • 10492 and link to 10487
    • Geoff STILL needs to test on ppc64le, will get to this, THIS week.
  • Joseph will post some additional info thing in the ticket

MTT

Administrative tasks

Face-to-face

Clone this wiki locally