Skip to content

WeeklyTelcon_20160315

Jeff Squyres edited this page Nov 18, 2016 · 1 revision

Open MPI Weekly Telcon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Jeff Squyres
  • Geoff Paulsen
  • Brad Benton
  • Howard
  • Josh Hursey
  • Nathan Hjelm
  • ralph
  • Ryan Grant
  • Sylvain Jequgey
  • Todd Kordenbrock
  • Yohann Burette

Agenda

Review 1.10

  • Milestones: https://github.com/open-mpi/ompi-release/milestones/v1.10.3
    • PR 1004 - MPI_Ineighbor_alltoallw
      • Needs reviewer
    • PR 1002 - Memory allocation hooks
      • Jeff to review
    • PR 1006 - Ralph will review.
    • PR 1008 - Jeff already reviewed.
    • Anyone testing on PSM2 or Omnipath? Intel guys have been. Howard - Ralph check this.
      • Symbol confusion fix requires 1.10.2, and need newer Omnipath (PSM2) library/driver.
  • 1.10.3 in Late April - Some smaller fixes accumulating in the branch
    • Nothing critical at the moment

Review 2.0.x

  • Wiki: https://github.com/open-mpi/ompi/wiki/Releasev20
  • Blocker Issues: https://github.com/open-mpi/ompi/issues?utf8=%E2%9C%93&q=is%3Aopen+milestone%3Av2.0.0+label%3Ablocker
    • Issue 1425 - External PMIx server support
      • Ralph working on a fix - should be quick.
    • Issue 1418 - fix MPI process suicide code
      • Delayed until 2.1 or 3.0
    • Issue 1406 - TCP BTL THREAD_MULTIPLE deadlock
      • Nathan - George is working on a fix, but it is a rewrite. So might take some time.
        • If old and new rewrite of TCP BTL are "compatible", then we can switch based on threaded state of MPI_Init.
        • OR could require two different TCP components "tcp" / "tcpmt", and expose this issue to users.
    • Issue 1353 - -host behavior
      • Ticket has been updated with new commits. Jeff to test.
      • Ralph is out of time to work on.
      • Need to document behavior in different releases, then close this for v2.0.0.
  • Milestones: https://github.com/open-mpi/ompi-release/milestones/v2.0.0
    • PR 1003 - Race condition in process matching thread
      • Nathan to review the patch and sign off.
    • PR 1000 - misc warnings and missing include files
      • Ralph to review and update
    • PR 977 - span on heterogeneous clusters
      • Ralph to review and update
    • PR 973 - Parsing of envvars in MCA
      • Nathan pushing update, Jeff to review after the call.
  • Reviewed all Pull Requests, and pinged a few for comments.
  • Really need more people doing testing on v2.0 branch
  • Decision to shoot for April 24th for RC0 of 2.0.
  • Jeff saw someone is getting OPAL-FIFO is failing on 2.0.
  • Need more thread safety testing.
    • Jeff added his thread safety tests are a 2 night cycle to do all the tests.
  • Jeff seeing SIGPIPEs ONLY on master with usNIC - Ralph wonders if it's valgrind issue Mellanox was seeing.

Misc

  • MPI Forum - MPI_Info under discussion
    • Don't propagate infos with MPI_Comm_dup - use MPI_Comm_dup_with_info to propagate infos
    • Still discussion about MPI_Info_get/MPI_Info_set behavior
  • Open MPI Developer's Meeting
  • Nathan - Enabling thread_multiple all the time
    • PR 1397 - always enable MPI_THREAD_MULTIPLE support
    • Should we turn this on for everyone? Generally feeling is to accept this
    • Send another note to the devel list to give folks one last change to comment before commit.
  • Need to do some performance testing
    • v2.0.0 better MPI_THREAD_MULTIPLE correctness
    • focus on performance improvements in next v2.X series release

Review Master?

  • https://github.com/open-mpi/ompi/pull/1417: "RFC: change default build to always be optimized (even for developers)" If no one has any further comments, it's time to merge.
    • Jeff thinks we are in consensus, but wants to check with developers.
    • To turn on --enable-debug, --enable-memdebug, --enable-picky.
  • Nathan - heads up that mpool re-write is ready to go.
    • will get merged in afternoon today.

MTT status:

  • Really need more people doing testing on v2.0 branch

Status Updates:

  • delayed until next week.

Status Update Rotation

  1. Cisco, ORNL, UTK, NVIDIA
  2. Mellanox, Sandia, Intel
  3. LANL, Houston, IBM

Back to 2016 WeeklyTelcon-2016

Clone this wiki locally