-
Notifications
You must be signed in to change notification settings - Fork 864
WeeklyTelcon_20170131
Geoffrey Paulsen edited this page Jan 9, 2018
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoff Paulsen
- Jeff Squyres
- Brian Barrett
- Howard
- josh Hursey
- Josh Ladd
- Ralph
- Sorry - I may have missed a few this week.
Review All Open Blockers
Review Milestones v1.10.6
Review Milestones v2.0.2
- Put out RC yesterday.
- ireduce fix - could and should release today.
- No objections, releasing today, assuming MTT looks okay.
- close the door on PRs for v2.0.x for a weekish.
Review Milestones v2.1.0
- What's going on with --disable_dlopen failure. Recent PRs are getting Xed due to an undefined ref with --disable_dlopen on yoohoo cluster.
- Giles posted on a couple, that it looks like a dirty tree. Failure is it can't copy bits to m4 directory.
- Howard see, when it links hello_world it looks like it fails with unsat symbol.
- Don't see this issue in v2.0
- The symbol is opal_declspeced
- Perhaps it's several things going on.
- PR2885 - PMIx 2885 - red X on disable_dlopen.
- No one can hit this by hand (jeff and Ralph both tried). GCC 4.85 and 4.81.
- Ralph is working on update to PMIx Master that will have pthread_locking fix.
- If that's okay, Ralph will add new PR to roll PMIx Master into OMPI 2.1.1 (within a few days)
- Mellanox has some PRs for v2.1.
- want to resolve issues with CI before we Pull more PRs.
- Estimated Schedule: Middle of March.
- Amazon has approval to do Release Engineering.
- IBM is still asking approval.
Review Master Pull Requests
- PR2861 - Stack traces - ready to go. Josh Hursey
- Custom one for v2.x branch - Ralph signed off on both.
- Support Datatypes - most for master look like they've gone through CI. Waiting for someone to click merge.
- Nathan's thread perf fix was cleared.
-
PR2838, - removal of -heterogeneous configure option.
- If someone can fix it, we should keep it... depend on how risky the fix is.
- Is it being tested? @ggouaillardet is testing nightly, just not uploading data to MTT.
- If @ggouaillardet agress to fix it, and upload tests to MTT nightly on hetrogeneous cluster, then no reason to remove the support for us.
Review Master MTT testing
- Got some fixes committed upstream to OSHMEM
- some new tests are supposed to HANG / Timeout.
- Jeff submitted request to them to help MTT hook into this new test use-case.
- MPI_ONE_SIDED + MPI_THREAD_MULTIPLE
- Abort rather than wrong answer when component doesn't want to run due to lack of component.
- osc_pt2pt - Nathan has fix wants IBM to test.
- Been going kinda slow lately. Intel committed some code to be able to report to the database.
- Ralph is going to switch to Python client to post to MTT database.
- Ralph's been asked to provide 3 other components:
- Watchdog timer to Kill off hung jobs.
- A harass launch other side procs to do nasty thing
- Nightly regression on the DBM - starts DBM, and launch tests against DBM. Will run MTT in half the time.
- SPI - got a reply on how to initiate, Cisco has the ball.
- Added SPI to website.
- Github transitioned us to a free account.
- Cisco, ORNL, UTK, NVIDIA
- Mellanox, Sandia, Intel
- LANL, Houston, IBM, Fujitsu