Skip to content

WeeklyTelcon_20220301

Geoffrey Paulsen edited this page Mar 5, 2022 · 1 revision

Open MPI Weekly Telecon ---

  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

Not recorded.

not there today (I keep this for easy cut-n-paste for future notes)

  • Geoffrey Paulsen (IBM)
  • Austen Lauria (IBM)
  • Jeff Squyres (Cisco)
  • Brendan Cunningham (Cornelis Networks)
  • Brian Barrett (AWS)
  • Christoph Niethammer (HLRS)
  • David Bernhold (ORNL)
  • Hessam Mirsadeghi (UCX/nVidia)
  • Howard Pritchard (LANL)
  • Josh Hursey (IBM)
  • Thomas Naughton (ORNL)
  • Todd Kordenbrock (Sandia)
  • Tomislav Janjusic (nVidia)
  • William Zhang (AWS)
  • Akshay Venkatesh (NVIDIA)
  • Artem Polyakov (nVidia)
  • Aurelien Bouteiller (UTK)
  • Brandon Yates (Intel)
  • Charles Shereda (LLNL)
  • Edgar Gabriel (UoH)
  • Erik Zeiske
  • Geoffroy Vallee (ARM)
  • George Bosilca (UTK)
  • Harumi Kuno (HPE)
  • Joseph Schuchart
  • Joshua Ladd (nVidia)
  • Marisa Roman (Cornelius)
  • Mark Allen (IBM)
  • Matias Cabral (Intel)
  • Matthew Dosanjh (Sandia)
  • Michael Heinz (Cornelis Networks)
  • Nathan Hjelm (Google)
  • Noah Evans (Sandia)
  • Raghu Raja (AWS)
  • Ralph Castain (Intel)
  • Sam Gutierrez (LLNL)
  • Scott Breyer (Sandia?)
  • Shintaro iwasaki
  • Xin Zhao (nVidia)

NEW Discussion

  • Ralph (retired) is stepping back from Open MPI, and will be 100% PMIx and PRRTE.

    • He's no longer on our slack, or OMPI github repo.
    • If you @rhc54 in a comment / pull request he won't see, please use PMIx or PRRTE.
  • New Big Payload test in public test suite.

    • Tries to do collectives up to close to MAX_INT.
    • We fixes as much as we found and pushed fixes upstream.
    • Josh also used Jeff Hammands's Big Count test suite.
      • If there's a way to do exact same tests, but call through Fortran bindings instead, that'd be cool.
      • Wouldn't expect this to turn up anything, since bugs haven't been MPI interface. Instead it's been underlying implementations that have been showing bugs.
    • What do we want MPI_Count to be?
      • MPI_Count already exists in Open MPI. It's a size_t.
        • We have some MPIX_ routines
      • Size of MPI_Count is implementation dependent.
      • Could make the interface size_t easily.
        • But then there's the question of the max number.
    • Some of the memory pressure does NOT scale.
      • Need a design discussion about what limits we have.
      • Might be a good topic for a face2face meeting.
    • We COULD throw an exception if the datatype * count is > some internal threshold.
      • Might not be standards compliant, but better than crashing.
      • IBM

MPI ABI

  • Some pushback about ABI breaking in v5.0
    • Fortran mostly care about mpi.h.
      • Removed some of the deprecated functions because no one wanted to keep them.
      • Because we need a stable ABI for ISV codes, otherwise no one will use v5.0 for years.
  • One of the issues is libmpi interface is quite large.
  • 3 part proposal
    • Put deleted functions back in.
    • Put Only MPI C bindings in libmpi, and everything else in another library.
    • Split out Fortran
      • useMPI and useF08 are "unrecoverable", and need to be fixed.
  • ROMIO issue, because they call the MPI_API functions.
    • Some possible ways to fix this.
  • Shared libraries.
    • Could keep cached versions of pre-built OMPIs, and run some tests with it.

MPI Forum related

  • A question came up for implementations.

    • Threads that are syncronized. Multiple threads are sending on same comm + tag.
      • Should ordering be across all threads (global ordering)
        • Concerns that a single atomic that is contended by one lock.
      • Some are arguing the text should be ordered per thread.
    • Don't actually need a global lock today (maybe, in practice probably do).
    • From a hardware implementer's point of view per thread would be very expensive.
    • Text is not very clear.
    • For single threaded, ordering is very well defined.
    • Could you make use of this or not?
  • Two HWLOC issues

    • PRRTE/PMIx hwloc issue: https://github.com/openpmix/prrte/pull/1185 and https://github.com/openpmix/openpmix/pull/2445
    • hwloc when built with CUDA support, is hard linking against it.
      • This doesn't work in the common case where CUDA isn't installed on login nodes.
    • hwloc v2.5 - v2.7.0 is putting variables in read-only memory into environ, but prrte is trying to modify these and segvs.
    • PMIx and PRRTE has block-listed large hwloc versions 2.5-2.7.0
      • putstr(env) is segv-ing.
    • Discussions about minimizing mpirun/mpicc to only link against subset of opal.
    • Makes things slightly better, but not really. Still have cuda on some nodes and not on others.
    • Projected solution is to use hwloc plugins (dlopen cuda libs)
      • A while back, hwloc changed default to NOT load components as plugins.
        • He this this for Open MPI (some cyclic dependencies).
        • This is no longer an issue for us.
      • Now hwloc has reasonable defaults for some things build as plugins (dlopened at runtime).
      • Usually customers install in local filesystems.
      • This gets us around the dependencies.
      • So whenever this is actually fixed, Jeff will write docs, and we can touch on points.
      • From JOSH'es HWLOC PR, if there are any other suggestions or modifications, please put this on the hwloc PR.
  • Resuming MTT development - send email

    • Doodle.
    • Like to have a monthly call.
    • Christopph Niethammer is interested.
      • Might need a new cleanup mechanism when rolling out lots of versions.
    • Find out who's using python client, and what problems.
    • IU database plugin (what ends up getting data into MTT viewer) has a number of issues.

4.0.x

  • Schedule: No schedule for v4.0.8 yet
  • Winding down v4.0.x, and after v5.0.x will stop
  • Really only want small changes reported by users.
  • Otherwise, point users to v4.1.x release.

v4.1.x

  • Schedule: Shooting for v4.1.3 end of March/Q1.
    • RC next weeks or so.
  • No other update.

v5.0.x

  • Sessions, just one more thing on to-do list for TAG_UB for communicator created from Session create.
    • Howard has a PR ready for this.
  • Each of these PRs have link to original Sessions PR to help find.
  • Will use PRRTE v2.1 for OMPI v5.0
    • Includes ompi_schizo for prrterun for command line parsing.
    • Changes for this will need to go to PRRTE and then updating in our submodule ptr.
  • Docs, tremendous amount of work has happened.
    • All automated transformations have happened for man-pages.
    • A bit of work left to do for docs/configury.
    • So once configury is ready, like to merge to master.
      • git clone, and want to build html and man-pages, will need sphinx in env before configure.
      • git clone without sphinx, just no man-pages or html docs. Won't be able to make dist.
      • requirement for pandoc is going away.
      • tarball will include html and man-pages already, so won't need.
  • libompitrace - removing for v5.0, as no one seems to be using / maintaining.
  • Docs question
    • When we publish docs for v5.0
      • There will probably be one that will be at master/HEAD.
      • We can also make docs available from a TAG (for release).
        • Do we want a v5.0.x?
        • If we're going to have latest/master or whatever, we should NOT make many links to that.
  • NO UPDATE THIS WEEK PR 9996 - bug with current cuda common code.
    • Some user fixed a problem in smcuda.
      • Ask tommi to
    • Writing the API, and will try to port over code.
    • ported this code to UTIL, to try to fix the bug, but been an ask to do a bit more.
    • An accelerator framework,
    • Need to figure out how we move forward here. Moving it into util is not the right place.
      • Don't need more things with knarly dependencies in util.
        • this makes the mpicc problem worse.
    • William will take a stab at it, but if it's not a lot of work.
      • four to six functions that datatype engine calls.
        • Is accellerator?
        • data movement functions.
        • need to figure out memory hooks stuff.
      • libfabric has this abstraction, so we could
      • No new code, just moving things around.

Master

  • No new Gnus

MTT

  • A fix pending to workaround the IBM XL MTT build failure (compiler abort)
  • Issue 9919 - Thinks this common component should still be built.
    • Commons get built when it's likely their is a dependency.
    • Commons self-select if they should be built or not.
Clone this wiki locally