Skip to content

WeeklyTelcon_20190129

Geoffrey Paulsen edited this page Mar 12, 2019 · 2 revisions

Open MPI Weekly Telecon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Geoff Paulsen
  • Jeff Squyres
  • Akshay Venkatesh
  • Brian Barrett
  • David Bernholdt
  • Edgar Gabriel
  • Geoffroy Vallee
  • Howard Pritchard
  • Josh Hursey
  • Matias Cabral
  • Ralph Castain
  • Todd Kordenbrock
  • Xin Zhao

not there today (I keep this for easy cut-n-paste for future notes)

  • Aravind Gopalakrishnan (Intel)
  • Joshua Ladd
  • Nathan Hjelm
  • Dan Topa (LANL)
  • Thomas Naughton
  • Akshay Venkatesh (nVidia)
  • Matthew Dosanjh
  • Arm (UTK)
  • George
  • Peter Gottesman (Cisco)
  • mohan

Agenda/New Business

Minutes

Review v3.0.x Milestones v3.0.3

Review v3.1.x Milestones v3.1.0

Review v4.0.x Milestones v4.0.1

  • Schedule: Need a quick turn around for a v4.0.1
  • v4.0.0
  • Merged in PMIx update.
  • Adding OSHMEM API - bugfix. Need to rev .so versions correctly
  • Some Fixes in onesided datatype in past week or two, not sure if this went in.
  • There have been other non-blocker fixes:
    • hwloc macros, libfabric, ompi-io issues fixed in master
  • https://github.com/open-mpi/ompi/issues/6278
    • Removed symbols and nice message on master and v4.0.x does not give a compile time error. What do we want?
      • Do we want compile time error? Or just removed symbol and linker error
      • Could add a Check for C11, and use 'static assert' for nice message.
      • For older compilers could just NOT declare the function.
        • but that doesn't work for v4.0.x since the symbols in the library will be there, and the comiler will only issue a warning that about no prototype, but will succeed and link correctly.
        • It was decided that this is okay, if the C11 static assert check is in mpi.h. Most users set 'no prototype' as an error.
    • Tests on v4.0.x started passing, but possibly false positives. We will look at how the ibm tests are passing with #6278 issue on master and v4.0.x
  • Should resolve https://github.com/open-mpi/ompi/issues/6198 before releasing
  • OOB TCP is ignorning virtual interfaces.
    • What's the right fix? TCP btl allows virtual interfaces, but
    • Want users to allow mpirun to work on node. But if we allow virtual interface, some providers don't support loopback.
    • What do we do in TCP btl. Do we set the exclude for a default value
    • Long term we should finish reachability functionality.
    • for v4.0.x may need something in include/exclude default.
    • Any fix for OOB tCP should be pushed up to PRTE/oob/tcp
    • Will create an issue and solve over email with code, rather than solving on phone.
  • PR6306 - RegEx - they want to push into v4.0.x. Problem is that any RegEx we come up with has a problem in a special case. Worried about getting into a mode where fixing something for one, there will be a node-name convention that will break it. PRTE threw this framework out, and just use a PMIx parser. Because this PR would cause the PMIX parser to get out of sync. Want to have same answer out of both parsers. Need to Open an issue on Open MPI to ensure we don't continue breaking patterns. Some ideas:
    • Don't try to do Reg-ex, and instead do compression.
    • Use a 3rd party existing reg-ex generator (generate a reg-ex from a list of hostnames)

v5.0.0

  • Any Schedule for this yet? Summer of 2019
  • Discussion of schedule depends on scope discussion
    • if we want to seperate Orte out for that? Might delay a bit
  • May want to open up release-manager elections.

Master

PMIx

  • There was a problem with PMIx v3.1.0 - should post another today.

MTT

  • Cisco showing build failure.
  • IBM test configure should have caused that.
  • Cisco has a one-sided info check that failed a hundred times.
    • Cisco install fail looks like a legit compile fail (ipv6 master)

New topics

  • PMIX direct call / PRTE replacement for ORTE.

  • Ralph's Thinking about approach.

    • Perhaps we don't worry about PMI1 and PMI2 calls, and let PMIx compat support clients that make older style PMI1 or PMI2 style calls.
    • Ralph will discuss with Howard best way forward.
  • Howard has been changing OMPI or OPAL places that call the PMIx framework,

    • to use PMIx data structures directly in the code.
    • Doesn't look like Howard would step on Ralph's toes.
  • March 4th is next MPI Forum (then June)

  • We have a new open-mpi SLACK channel for Open MPI developers.

    • Not for users, just developers...
    • email Jeff If you're interested in being added.

Review Master Master Pull Requests

  • didn't discuss today.

Oldest PR

Oldest Issue


Status Updates:

Status Update Rotation

  1. Mellanox, Sandia, Intel
  2. LANL, Houston, IBM, Fujitsu
  3. Amazon,
  4. Cisco, ORNL, UTK, NVIDIA

Back to 2018 WeeklyTelcon-2018

Clone this wiki locally