Skip to content

WeeklyTelcon_20170926

Geoffrey Paulsen edited this page Jan 9, 2018 · 1 revision

Open MPI Weekly Telcon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Jeff Squyres (Cisco)
  • Geoff Paulsen (IBM)
  • David Bernholdt (ORNL)
  • Edgar Gabriel
  • Geoffroy Vallee (ORNL)
  • Howard
  • Joshua Hursey
  • Todd Kordenbrock
  • Joshua Ladd
  • Geoffroy Vallee (ORNL)
  • Artem (Mellanox)
  • Joshua Ladd (Mellanox)
  • Thomas Naughton

Agenda

Review v2.0.x Milestones v2.0.4

  • Going to switch v2.0.x to only Critical fixes only!
    • Only Critical fix we know of now is MAdvise fix.
  • Ask people to move to v2.1.x or v3.0.0
  • If nothing else critical Howard and Jeff will make an RC soon.
  • targeting Oct 21st for release.
  • Should be pretty easy.

Review v2.x Milestones v2.1.2

  • v2.1.3 (unscheduled, but probably jan 19, 2018)
    • PR4172 - a mix between feature / bugfix.

Review v3.0.x Milestones v3.0

  • v3.0.1 - Opened the branch for bugfixes Sep 18th.
    • Looking at Oct 17th
  • ortedvm is broken on v3.0.0
    • Discussed and pushing to v3.1 due to high number of orted changes.
  • Plan to branch from Master moved to Tuesday Oct 3rd.
  • Plan to create first RC Tuesday Oct 3rd after branching.
  • gives us 4 weeks to stabilize and release before supercomputing.
  • PMIx 2.1 should get in in time for v3.1
    • One new feature is cross version compatibility.
    • PMIx version 2.x will support one step back, PMIx v1.x Not sure if it support v1.0 and v1.1 and v1.2
    • Discuss next week exactly what this supports.
    • useful for slurm build with older PMIx.

Review Master Master Pull Requests

  • proc_hostname code not coded correctly for 3 years. git bisect from PMIx from 2 weeks ago
    • Giles posted a fix
    • Someone from PMIX should look at, was this a latent bug that brought forward, or were we just getting lucky?
    • Does this affect other branches? Other branches also didn't initialize proc_hostname to NULL.
    • Segfaults in Finalize (teardown in proc, tries to free a bogus value
  • Related issue: Also discuss preventative programming with calloc vs malloc for proc_hostname type future issues.
    • discussion on devel mailing list. Need to understand.

MTT / Jenkins Testing Dev

  • Howard having issues with reaching out and getting ID from MTT. Josh isn't sure.
    • Josh had a breakthrough on Python Client.
  • Python 3 users will need this PR to try: https://github.com/open-mpi/mtt/pull/561
    • Please try out Python Client if you can.
    • This is the future, just a matter of time before everyone should switch.
  • Other than Cisco's new proc_hostname issue, looking pretty good.
  • Artem is seeing an Out of Resource error (filesystem) on AWS.
    • Boris will try to reproduce this, but if can't reproduce, it would be nice if AWS could check what is leftover there.
    • Error appears at fallocate(), using it to reserve space for dstore. Get contents of directory.
    • Probably not file descriptor.

This week Discussion Points.

Oldest PR

Oldest Issue

Next face-to-face meeting

  • Jan / Feb
  • Possible locations: San Jose, Portland, Albuquerque, Dallas

Status Updates:

Status Update Rotation

  1. Mellanox, Sandia, Intel
  2. LANL, Houston, IBM, Fujitsu
  3. Amazon,
  4. Cisco, ORNL, UTK, NVIDIA

Back to 2017 WeeklyTelcon-2017

Clone this wiki locally