Skip to content

WeeklyTelcon_20210323

Geoffrey Paulsen edited this page Mar 24, 2021 · 1 revision

Open MPI Weekly Telecon ---

  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Austen Lauria (IBM)
  • Brendan Cunningham (Cornelis Networks)
  • Brian Barrett (AWS)
  • Edgar Gabriel (UH)
  • Geoffrey Paulsen (IBM)
  • Harumi Kuno (HPE)
  • Hessam Mirsadeghi (UCX/nVidia)
  • Howard Pritchard (LANL)
  • Jeff Squyres (Cisco)
  • Josh Hursey (IBM)
  • Michael Heinz (Cornelis Networks)
  • Naughton III, Thomas (ORNL)
  • Raghu Raja (AWS)
  • Ralph Castain (Intel)
  • Todd Kordenbrock (Sandia)
  • Tomislav Janjusic
  • William Zhang (AWS)
  • Marisa Roman (Cornelius)
  • Matthew Dosanjh (Sandia)

not there today (I keep this for easy cut-n-paste for future notes)

  • Akshay Venkatesh (NVIDIA)
  • Artem Polyakov (nVidia/Mellanox)
  • Aurelien Bouteiller (UTK)
  • Brandon Yates (Intel)
  • Charles Shereda (LLNL)
  • Christoph Niethammer (HLRS)
  • David Bernhold (ORNL)
  • Erik Zeiske
  • Geoffroy Vallee (ARM)
  • George Bosilca (UTK)
  • Joseph Schuchart
  • Joshua Ladd (nVidia/Mellanox)
  • Mark Allen (IBM)
  • Matias Cabral (Intel)
  • Nathan Hjelm (Google)
  • Noah Evans (Sandia)
  • Scott Breyer (Sandia?)
  • Shintaro iwasaki
  • Xin Zhao (nVidia/Mellanox)

New Items

v5.0.x branch now building nightly tarballs.

  • If you don't have zlib, this affects launching and memory consumption
    • Tools will spit out a warning that you don't have compression
    • We need to write up something for Packagers as well.
  • Brian will document this (really should build with zlib) in a README-packagers.md
    • Hope that packager will package these things externally.
  • NEWS bullets for zlib as well.
    • Geoff will do this.
  • Please update your CI to run MTT on v5.0.x PRs, and on v5.0.x based PRs
  • Please Cherry-pick your bugfix/v5.0.x PRs there after your PR is accepted to master

Reformatting master

  • Doing formatting on master and v5.0.x seems reasonable
  • But reformatting v4.0.x and v4.1.x seems too risky.
  • clang-format instructions are in the format file.
  • He also ran clang-tidy, and we don't have directions for that yet.
  • Requires clang-format at least v10 (Different version clang-format than clang compiler)
    • Nathan will try to make it compatible with older v8
    • Geoff ping Nathan to request the v5.0.x version of opal PR.
  • clang-format is separate from compiler-toolchange
  • Will we require developers to REQUIRE this?
    • Not requiring a github build to require it.
    • Will have a CI test that will check it.
    • Not in a path where every CI will have to have it installed.
  • Do we want to hold off on MORE before v5.0.0 ships? (or 6 months after?)
  • Should be rerun as a non-cherry-pick. Might be easy to lose
    • But the two branches are close.
  • Run it on master, try to PR to v5.0.x, and
  • Nathan can only run certain sections of the code-base with the systems he has.
    • Strongly encourage everyone test their sections.
    • PSM2 - doesn't even build in our CI, so someone should build/test this.

PR 8551 - New coding style enforced via clang --format

  • Needs a squash, missing signed off commit.
    • Austen will ping Nathan.
    • want in v5.0.x also

Autoconf 2.7

  • This is working just fine at the moment, except for ROMIO.
    • ROMIO is throwing tons of warnings. But okay.
    • Would need to fix it upstream.
  • PMIx/PRRTE is updated.
  • Perhaps now for 3rdParties, configure with --silence-obsolencense flag.
  • Does someone want to ping Rob about it?
    • Jeff will

Testing

  • Intercomm Merge tests are timing out.
    • MTT master on HLS timeouts

32bit? Do we want to continue to support this?

  • Failure in prrte on v5.0.x, will be resolved in tonight's.
  • https://github.com/open-mpi/ompi/issues/8566
  • Using an actual 32bit gcc - Compile fail
  • Nathan thinks he might be able to write a compare-and-swap
  • v5.0 - good time to drop 32bit.
    • Jeff will send note to packaging, and see if they will care.
    • Debian is okay, they will just use MPICH
    • OSC/RDMA assumed everything was 64bit, but once we changed
  • On 32bit, if we could use C11 atomics with locks, it might be allowed.
    • So perhaps this would be a path.
    • Is C11 available on older 32bit systems.
    • gcc 6.0+ it should work fine.
  • Nobody has a strong opinon.
    • Pride issue, but it's also time and money
    • Right now the only thing breaking it Nathan's 1sided.
    • Lets ask Nathan what he thinks, and if he has time to fix it.

4.0.x

  • Shoot for a next RC of v4.0.6 on March 31st
  • blocking on UCX issues (see New topics above)
    • George, will get to it soon.
  • Too many Open Issues (50)
    • Geoff and Howard will go over v4.0.x issues, and try to close or address many of them.
      • May need to label some as wont_fix, and then close
  • Check status of ROMIO from MPICH vs in v4.1 vs v4.0.x

v4.1.x

  • Same boat, waiting for George's datatype fix.
  • A new v4.1 RC was built last week
  • Most of ROMIO fixes have gone into MPICH
    • 8371 - might be close
  • Intercomm Merge issue
    • may have gone away after PRRTE update on master
    • Investigating
  • blocking on UCX issues (see New topics above)
    • George, will get to soon.

Open-MPI v5.0

  • What do we do with the mpirun Manpage?
    • Didn't want OMPI requiring Sphynx, but if PRRTE and PMIx in same tar
  • Ralph almost has singleton comm spawn working
    • Single node without the mpirun process
  • Static MCA components default still on track for v5.0.x

Video Presentation BOF

  • ECP Community days ( March 30-April 1st )
    • Need SLIDES by close of business FRIDAY (not Saturday)
    • Each day 90 minute time slots.
    • Tuesday March 30th from 1-2:30pm (US Eastern)
      • LIVE
      • Invited some people to speak. They will be our main community speakers.
      • Anyone on OMPI community can send slides to Jeff and George
      • Due Friday March 26th
    • PMIx Wed 31st 11 - 12:30 (US Eastern)
    • Need to ensure no more MPIR, SLURM PMI1/2,

Longer Term discussions

Doc update

  • PR 8329 - convert README, HACKING, and possibly Manpages to restructured text.
    • Uses https://www.sphinx-doc.org/en/master/ (Python tool, can pip install)
    • Intent this is for v5.0
      • mpirun / prrterun - we had quite a bit of details in orte, but are updating as much as possible.
    • Ralph has asked about this for PMIx/PRRTE since this is turning out to work
  • No update - 3/16
    • Could be independent of PMIx and PRRTE.
    • PMIx and PRRTE want to follow suite, and not require both pandoc and sphynx.

ROMIO Long Term (12/8)

  • OLD
  • What do we want to do about ROMIO in general.
    • OMPIO is the default everywhere.
    • Giles is saying the changes we made are integration changes.
      • There have been some OMPI specific changes put into ROMIO, meaning upstream maintainers refuse to help us with it.
      • We may be able to work with upstream to make a clear API between the two.
    • As a 3rd party package, should we move it upto the 3rd party packaging area, to be clear that we shouldn't make changes to this area?
  • Need to look at this treematch thing. Upstream package that is now inside of Open-MPI.
  • Might want a CI bot to watch a set of files, and flag PRs that violate principles like this.
  • Putting new tests there
  • ULFM have some tests added there.
  • Need folks to add to MTT
  • Should have some new Sessions tests
Clone this wiki locally