-
Notifications
You must be signed in to change notification settings - Fork 864
WeeklyTelcon_20161206
Geoffrey Paulsen edited this page Jan 9, 2018
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoff Paulsen
- Artem Polyakov
- Jeff Squyres
- Karen (Executive Director - Software Freedom Conservancy)
- Josh Hursey
- Ralph
- Todd Kordenbrock (HPE @ Sandia)
- Slyvian Jeaugey Nvidia.
Conservancy (Karen)
- Introductions.
- Do you have a feel for if the Conservancy would want to invite us?
- Yes, your application looks good, and would expect that they would invite us.
- We have an invite from SPI that expires in January, that drives the Open MPI time-table.
- Karen will try to speed up an invite, so we know if we have an invite.
- SPI and Conservancy are different organizations, as they have different functions.
- Chances are you'll need one or the other, but need to consider what we need.
- Differences:
- SPI is a different legal model. Just affiliating with SPI. To use SPI to hold funds, and disperses it similarly on a grant making process (case-by-case). It's a loose affiliation.
- SPI is more bare-bones, will do finances, but not it's main push.
- Conservancy has paid people, but SPI is volunteers.
- Organizations become a part of Conservancy. The project has a legal status. Conservancy can then execute contracts for them.
- Have hired contractors, do paperwork.
- 10% from projects (doesn't cover 1% of Conservancy costs). Have to do extra fund raising.
- corporate members -
- Require projects to establish governance mechanisms.
- We need an official body in place to
- Decided we don't need different legal protections to safeguard different projects from one another.
- Charity, and unlikely target for lawsuit.
- If we want to sign a contract for venue, they make sure that subproject has the funds to pay if no one comes to venue.
- No fundraising requirements on member projects.
- Just help projects come up with ways to fund themselves. Inkscape, twisted, piepie, Selineum.
- Software licensing
- Available if there are problems.
- Some Copy-left ask for help with enforcement.
- Some have had to relicense.
- A lot of licensing expertise.
- Have helped with attribution requirements of permissive licensing.
- require that the software is "free" and "open" based on ____ definitions.
- Milestones: https://github.com/open-mpi/ompi-release/milestones/v1.10.5
- 10 open PRs on 1.10.5 - Newly changed in GITHUB - look closely under topic, should say if it's been approved). 2 approved, and 7 review required, and 1 pushed back.
- The ones that are approved are urgent.
- Schedule a release in January of 1.10.5.
- Nathan's looking at a segv in PSM2, but not PSM. He will create issue after reproducing.
- Not the known issue with PSM2 - Something about interrupt handler.
-
Wiki: https://github.com/open-mpi/ompi/wiki/Releasev20
-
Blocker Issues: https://github.com/open-mpi/ompi/issues?utf8=%E2%9C%93&q=is%3Aopen+milestone%3Av2.0.0+label%3Ablocker
-
Known / ongoing issues to discuss
- Signal - Hey, I'm seeing procs get killed with SIGKILL, but not being first hit by SIGTERM first.
- Ralph fixed, but then requirements changed.
- Other problem - Revealed the real problem.
- Issue a SIGTERM, go into a timer event, to issue the SIGKILL.
- mpirun and daemon exit and leave the event library, and never fire the timer event to finish cleanup.
- why we're getting zombies. Not trivial to solve. Ralph going to work on some more this week.
- Is this a blocker? yes, because now remote procs will now only get a SIGTERM, but not a SIGKILL.
- A regression from 1.8 behavior.
- Ralph has an idea for a simpler fix, should have it today.
- Any other blockers for 2.0.2?
- Open a new Issue 2505 - osc_pt2pt wrong answer. Pretty vanilla one sided test case.
- blocker: HColl Context Free (PR on 1.10.5, but Mellanox will PR to 2.0.x in next 2 days)
- Still not in master - wondering if there is a reason. No one at Mellanox has merged it into Master.
- Artem will talk to josh to pull into master.
- Giles pushed a bunch to master, and was curious if it was an accident.
- Question - Accidental push to master is possible. May want to look at the direction of going through PR.
- Add to face to face agenda.
- Signal - Hey, I'm seeing procs get killed with SIGKILL, but not being first hit by SIGTERM first.
-
Next week SPI director will be on call.
- If people are not testing with PMIx Async modex + drop through barriers, maybe they should.
- for libraries that want all endpoints in Init, using PMIx_Dstore shows 15% improvement.
- Collect the data fixed in master.
- Mellanox is testing 2.1. until this fix comes in, collect the data. with UCX, any back-end would work.
- because on first message will block until endpoint is available.
- If people are not testing with PMIx Async modex + drop through barriers, maybe they should.
-
PMIx update
- A couple of outstanding issues with the dstore.
- performance on power architecture.
- Should help memory footprint at scale.
- Hope to roll a new 1.2 RC2 by friday.
- Will update PMIx and Open MPI master.
- On track for January Open MPI v2.1? PMIx and integrated with embedded.
- Josh and Artem feels like mid-january. of PMIx 1.2 + integration in Open MPI v2.1.0.
- A couple of outstanding issues with the dstore.
-
OMPI 2.1
- THE blocking issue is PMIx.
- The BSD patcher - Nathan's been asked to work on it. Graceful fail is fine.
- Fuzzy, estimate for End of January.
-
Milestones: https://github.com/open-mpi/ompi-release/milestones/v2.0.0 *
Review Master MTT testing (https://mtt.open-mpi.org/)
- No morning messages still. Need to pester Brian about. Apparently not allowed to make changes until after the new year.
- mail from our AWS instance is not getting to us.
- Biggest failures we saw in 2.0.x and 2.1.x
- OSHMEM - BTL fix, fixed a bunch of things, but still a few errors (Segv), Put or Get not registered location.
- Jeff will make a ticket for few remaining OSHMEM failures.
- OSHMEM - BTL fix, fixed a bunch of things, but still a few errors (Segv), Put or Get not registered location.
- Sylvain seeing a bunch of errors in master oob/ud components
- mostly timeouts. not sure if hanging, or really slow.
- Josh - turned on Jenkins testing at IBM, may result in timeouts. Using PGI on PPC64.
-
Put up a PR for combinatorial executor. Still a bug in submitter.
-
Telcom tomorrow.
-
Face to Face in January - https://github.com/open-mpi/ompi/wiki/Meeting-2017-01
-
SC BOF
- Should we do 2.2 or 3.0? Poll to the community.
- 87% said go for 3.0.
- Went way too long
- Bad time slot (not sure why), since we only had half of people we normally do.
- Should we do 2.2 or 3.0? Poll to the community.
-
PMIx update - Decided to do a PMIx 2.0 release (what was going to be PMIx 3.0) - January time frame.
-
libevent update - they have put out an RC for 2.1.7 (OMPI 2.x is on libevent 2.0)
- 2 years of code changes, though most are not in our usage path.
- Still some, somewhat scarey changes in main path, so need to test well. evaluate before adding to OMPI 2.x
- There is an external component for libevent, so there is that option.
- Cisco, ORNL, UTK, NVIDIA
- Mellanox, Sandia, Intel
- LANL, Houston, IBM