Skip to content

WeeklyTelcon_20230822

Geoffrey Paulsen edited this page Aug 22, 2023 · 1 revision

Geoff Paulsen [IBM] Jeff Squyres [Cisco] Austen Lauria [IBM] David Bernholdt Edgar Gabriel [AMD] Joseph Schuart Luke Robison Matthew Dosanjh Pedram Alidazeh Quincey Koziol [AWS] Thomas Huber Thomas Naughton Todd Kordenbrock Tommy Janjusic Wenduo Wang [AWS]

v4.1

  • Did an RC two weeks ago. Just a couple of bugfixes.
    • Waiting for minor fortran fix.
    • Merge, RC then quickly.

v5.0

  • mpirun manpage - 90% done

    • Need others to review text.
    • Not going to be perfect for v5.0.0 (okay)
    • looking at Once Over, or FUll Audit, or everything inbetween.
  • Actively working on Issue #11733

  • No new issues from v5.0 RM.

  • Docs Pair of PRs ready for v5.0

  • Got the Base Params stabelized #11532

    • Take a look at old
  • Any progress on Accelerator framework delayed initialization

    • Is delayed initialization - If Cuda Init wasn't initialized before MPI_Init, then components can't get some info from Cuda, and doesn't try later. So order of initalization between components (MPI_Init) and Cuda is important.
      • In v4.1, we had to initalize Cuda_Init before MPI_Init()
        • Delay was supposed to fix this, but doesn't.
      • One thought to fix this is to have MPI Initialize Cuda.
        • Always? - Yes, even if user doesn't use Cuda
    • Not neccisarily a blocker for v5.0.0, and could be address in a future fix/patch.
    • Can we hook cuInit?
    • Can we/MPI do this ourselves? Current main does this. Why is this not possible?
      • Some issue about the Cuda Context when MPI does it might be different from what user does it?
      • But does MPI really use this?
      • Not really, but we setup the default stream in a bad way, and then not functional for user.
      • Can we make it so we don't setup the default context? ??
    • PR #11617 done on main?
      • This doesn't fix the issue since still don't have a context.
  • The behavior currently in v5.0.x should be the same as v4.1, since users will need to call Cuda_Init before MPI_Init.

    • Not a blocker right now.
    • Will document in release notes.
  • Is PR11689 a blocker for v5.0.0?

    • No, not a blocker for v5.0.0 Maybe v5.0.1
    • Do folks need a phone call?
      • No right now all aligned, and discussion about followup GPU stuff. Might want a call on that.
  • 10657 - Need a table doc update coming in next few days.

  • 11733 - MPIR coming in next few days

  • Discuss if we want to add infrastructure to create a PDF.

    • Defaults are not perfect, might be nice if Table of Context, expand another level deeper (as it's huge)
    • Is it worth-wile to have this available for customers who don't have internet
    • Sys-admin very happy to put html on internal web-server.
    • not a v5.0.0 blocker.
    • Curious of what folks think.
    • Static html is installed... but need to have a browser to read it.
      • so sshing to a node, that's not terribly feasable.
      • pdf might be easier for folks to scp back to desktop.
      • 2100 pages.
    • Is static html searchable? - YES
Clone this wiki locally