Proposed Changes to Standardization Process #181

SteVwonder · 2019-04-11T02:58:15Z

Problem Statement

During the Runtime Standards meeting in Chattanooga, there was some discontent expressed with regards to the current standardization process. I believe the main sources of discontent were over the mechanisms for achieving consensus (two weeks of no objections during the conference call) and in regards to stability/backwards-compatibility (i.e., there is no formal support for an "experimental" interface or attribute). Note: this issue only focuses on the former, the latter is discussed in #179.

Speaking personally, I have found great value in going back and reading the PMIx RFCs. For the changes/interfaces/attributes that have RFCs, I find the less formal language and documentation of the thought processes motivating a change very helpful. While rigorous formal processes may seem like an unnecessary burden, I think it is beneficial to have processes in place to ensure that the discussions and thought processes behind changes are documented and preserved.

Background

During last weeks (April 5, 2019) concall, we discussed the current standardization process and the above problems. Notes from the call can be found here. We decided it would be appropriate to follow the proposed process when proposing the changes (how meta of us).

The main proposal is to generally follow the COSS model, but use issues & PRs in the pmix-standard repo rather than RFCs. As @rhc54 mentioned in #179, this is essentially what the PMIx community is already doing. The main differences (from what I can tell) between this proposal and what the community has currently documented are formally requiring certain documentation in a PR before it can be merged (dates of concalls where consensus was achieved, a link to the spawning GH issue, etc) and "nitty gritty details" (don't squash commits, use draft PRs, etc.)

Proposed Process

A change to PMIx starts with a new issue on GitHub. The issue should highlight the problem with the current standard but does not need to contain proposed changes, standards language, or working code (although any/all of those are nice additions). The main motivation of opening an issue is to document problems/suggestion that people have, start discussions around potential solutions/changes, and keep everyone in sync with current efforts in the community.

Once the discussion in the GitHub issue has reached a point where concrete changes to the standards documents are ready to be proposed, then a Pull Request (PR) should be opened with the proposed changes. Initially, the PR only needs to contain a reference to the aforementioned GitHub issue and the proposed changes to the wording of the PMIx document. If the PR does not contain a reference to a working implementation, we recommend that a GitHub Draft PR be used. Discussion around the specific proposed changes can then take place within GitHub's PR comment thread. Based on the discussion of the PR, the PR may need to be adjusted. The PR author should push a new commit without squashing so that the history of the PR is preserved. If there is a competing change or the wording of the current PR has diverged significantly from the original wording, a new PR can be created.

The next step for the proposed change is implementation. Once the change has been implemented, the PR shall be updated with a link to the implementation. If the PR was opened as a "Draft PR", at this stage, it should be converted into a regular PR. Discussion on the proposed change should continue online until consensus is reached.

Once consensus is reached online, the PR must then be discussed at least twice on the weekly PMIx conference call. If there are no changes or objections two weeks in a row, then the PR must be updated with the dates of the two concalls as well as links to the notes from those concalls so that it is easy to see the discussion that happened and how consensus was reached. At this point, the PR is eligible to be merged.

Things to discuss/decide

What level of implementation is sufficient for merging a PR? Is a branch of or PR against the reference implementation sufficient? Should we require the code actual be merged into the mainline first? Is it a judgement call by those reviewing the standard document PR?

Who can "press the button" and merge a PR? The Collective Code Construction Contract has a rule that a maintainer cannot accept their own "patch" (PR in this case). With all of these "checklist" items, it seems like a useful convention to have a second pair of eyes go over the PR and ensure all the steps have been followed before a PR is merged.

Does every little change to the standard need to go through this process? Does every change require both an issue and a PR (can spelling mistakes and format fixes go straight to a PR)?

There was a discussion of leveraging GitHub's issue and PR templates. I think that is a good idea. What should they include?

Strawman proposal for PR template:

Note: Please do not create a pull request without creating an issue first
Abstract: A brief description of the proposed change.
Link to relevant issue(s): [required]
Labels:
- Please select at least one of the following labels: (Copy label descriptions over from RFC template along with note about too many labels on one PR)
- (These labels chosen by the author could then be applied by a maintainer to the PR via GitHub's interface)
Description:
- A detailed description of the proposed change. The length and degree of detail should be commensurate with the magnitude of the change. This is not intended to be burdensome, nor are there any awards for verbosity - but clear communication will avoid repeated requests for alterations. The description should indicate what is functionally being modified.
Link to prototype implementation: [optional at creation, required for merging]
- Provide a reference link to the accompanying Pull Request (PR) against an open-source PMIx implementation. If the prototype implementation has been tested against an appropriately modified resource manager and/or client program, then references to those prototypes should be provided. Logs or output from the implementation being used are encouraged (if the output is large, please use a GitHub gist.
Authors:
- Provide a list of authors (if more than just the PR submitter), including contact info. This can be in the form of email address or Github ID.
Dates consensus reach on conference call (after implementing): [required for merging]
- Date 1: Link to Notes
- Date 2: Link to Notes

rhc54 · 2019-04-11T03:26:28Z

@SteVwonder Again, a very well thought out and written proposal 😄

I have no objection to any of the above - in fact, FWIW, I strongly endorse it.

Does every little change to the standard need to go through this process? Does every change require both an issue and a PR (can spelling mistakes and format fixes go straight to a PR)?

I'd suggest being rigorous here - it isn't a significant burden and avoids getting into hair-splitting arguments.

There was a discussion of leveraging GitHub's issue and PR templates. I think that is a good idea. What should they include?

We have a template RFC - it is a markdown file in the RFC repo. Easy to convert that to an issue/PR template. It seemed to work well for us - I'd suggest integrating that with your list, at least as a starting point.

What level of implementation is sufficient for merging a PR? Is a branch of or PR against the reference implementation sufficient? Should we require the code actual be merged into the mainline first? Is it a judgement call by those reviewing the standard document PR?

What we have been doing is asking that it be a PR against the reference implementation (so it remains current with the standard). It gets merged when the standard's RFC is committed. In a perfect world, we would always have the merge done by someone other than the author. However, we also didn't want the standards repo to be open for commits by a wide audience, and so we restricted it to a couple of people. Since one of those was me, and I have been the most prolific RFC author, it made the "merge by someone else" somewhat impractical. We substituted a requirement that at least one other person "approve" using the GitHub "review" buttons.

With a larger set of participants, I suspect we can make the "someone else" rule work now. We might also want to "loosen" the requirement about the PR being against the reference implementation - seems somewhat unfair to make people whose interest lies in a different implementation to have to contribute to the reference one. Perhaps we could instead ask that the PR be against some open source implementation (so we can see the code to help understand what is being proposed) combined with some output that helps understand the intended functionality? Dunno - probably worth you folks discussing it.

SteVwonder · 2019-04-11T04:11:17Z

Thanks for the quick feedback @rhc54!

Again, a very well thought out and written proposal

Thanks 😄. Most of the credit goes to those present on the concall last week. I mainly just copied-and-pasted the notes from the discussion into an issue.

We have a template RFC - it is a markdown file in the RFC repo. Easy to convert that to an issue/PR template. It seemed to work well for us - I'd suggest integrating that with your list, at least as a starting point.
We might also want to "loosen" the requirement about the PR being against the reference implementation - seems somewhat unfair to make people whose interest lies in a different implementation to have to contribute to the reference one. Perhaps we could instead ask that the PR be against some open source implementation (so we can see the code to help understand what is being proposed) combined with some output that helps understand the intended functionality?

Excellent suggestions! I think the title and action fields are covered under the existing metadata that GitHub provides on PRs. I also assume a Copyright disclaimer is not necessary for a PR (please correct me if I'm wrong there). I've integrated the other fields into the PR template proposal in the original comment of this issue; as well as the open-source implementation suggestion.

I'd suggest being rigorous here - it isn't a significant burden and avoids getting into hair-splitting arguments.

We substituted a requirement that at least one other person "approve" using the GitHub "review" buttons.

👍 Those both seem reasonable to me.

rhc54 · 2019-04-11T05:53:46Z

I also assume a Copyright disclaimer is not necessary for a PR (please correct me if I'm wrong there).

We require them only for code, not for text going into the standard. However, we have required a "Signed-off-by" line on the PR itself just because it is going into a repo. No strong feelings on that point - just seemed a way to be consistent and help train good habits for people that might be contributing to code.

rhc54 · 2019-04-11T05:59:56Z

BTW: if you'd like to add that template to the pmix-standards repo, please let me know and I'll add write permissions for you. Frankly, your proposal looks good enough I see no reason not to adopt the mechanics now. Even if you decide to modify it in the future, no harm done and (now that you encouraged me to go back and look at the RFCs) I think that RFC format really helped capture the theory of what was being proposed. The text for the standard doesn't necessarily provide a place for it as that is more focused on pedantics - i.e., this is the API and associated attributes, not the theory of operation behind its design.

Guess I'm kinda thinking the "issue" filed on pmix-standard is where the RFC template goes, the pmix-standard PR is the corresponding text mod to the standard doc itself, and the PR against the code repository is the associated implementation (all appropriately "linked").

Does that make sense? Or is my fossil brain lost in the fluffy clouds?

abouteiller · 2019-04-11T15:47:45Z

This captures very welll what we discussed over the phone. Like it.

SteVwonder · 2019-04-11T17:55:49Z

Guess I'm kinda thinking the "issue" filed on pmix-standard is where the RFC template goes, the pmix-standard PR is the corresponding text mod to the standard doc itself, and the PR against the code repository is the associated implementation (all appropriately "linked").

Yeah. That is a good point. I guess it depends on the situation. In this case, for example, there was already a proposal in mind when the issue was originally posted. So filling out an RFC-based issue template would be relatively straightforward. In another scenario, someone may have a use-case that they lay out in an issue but not have a solution to propose right away. Over the course of the discussion, a proposed change (or multiple competing changes) to the standard may emerge(s). How should we handle those types of issues, where the proposed change isn't know at submit time? Or the types of issues where there are multiple competing changes/proposals that emerge.

However, we have required a "Signed-off-by" line on the PR itself just because it is going into a repo. No strong feelings on that point - just seemed a way to be consistent and help train good habits for people that might be contributing to code.

Are the "signed-of-by" line and the GitHub PR review approval interchangeable or do they serve distinct purposes?

Frankly, your proposal looks good enough I see no reason not to adopt the mechanics now.

Ok. I will submit a PR soon. The templates look to be version controlled files that can be placed in the repo by a PR, so I'll include those in the PR along with changes to the README.md and add a "Contributing" document.

rhc54 · 2019-04-11T22:05:39Z

someone may have a use-case that they lay out in an issue but not have a solution to propose right away.

I would suggest we create a "use-case" label for the issue to indicate this is just a description of a desired behavior or problem

Over the course of the discussion, a proposed change (or multiple competing changes) to the standard may emerge(s).

I would suggest creating an "RFC" label to be applied to each such proposed change, and add a comment "Ref #" to link it to the related use-case

Are the "signed-of-by" line and the GitHub PR review approval interchangeable or do they serve distinct purposes?

They are distinct. The "signed-off-by" line has a very specific meaning derived from the Linux community's practice - it means that the author is signifying that this is their original work and they have the authority (from their company or whatever) to contribute it. The GitHub review is someone other than the author indicating that they approve of the proposed change.

jjhursey · 2019-04-12T14:35:23Z

I think this is looking good.

So we are talking about having:

GH Issue: Use case (specific label)
GH Issue: RFC (specific label) - link to Use case if possible
GH PR: Link to the corresponding RFC

Then on the Wiki we can accumulate links to the Use Case GH issues. This is instead of having a wiki page dedicated to the use case. I like the idea of having the use case in an Issue since then it is open for discussion. Whereas a wiki page does not have that option.

We may want some language in the GH Issue template for Use Case authors letting them know that they need not fill out all of the RFC fields if they don't apply.

Once the PR(s) associated with an RFC have been accepted then the RFC issue can be closed.
Once the RFC(s) associated with a use case have been accepted then the Use Case issue can be closed.

The Use Case issue can be reopened if additional future work is needed (e.g., extensions to the model), but it might be more appropriate to open a new Use Case that extension.

rhc54 · 2019-04-12T14:43:12Z

Yeah, I think that's all true. One minor suggestion: I'd have a "use-case" template that is separate from the "RFC" template. GH allows you to have multiple templates (IIRC) that the user can select from, so we can tailor the use-case one for that purpose.

jjhursey · 2019-04-12T21:08:55Z

On the teleconf today we discussed quorum and voting. Since it is part of the proposal for the standardization process it is best to capture it here for discussion.

Note that all of the below is still under discussion. Please provide constructive feedback so we make sure this works well for the community.

A few additional notes on the PR process:

Once the PR has associated with it an implementation it can be converted from a 'draft PR' to a PR that is 'ready for reading' in the next meeting.
Once a PR is 'ready for reading' then it is presented as a reading in a subsequent teleconf. The timing of reading is determined by the author, but likely the 'next' teleconf.
- The group must designate a Recorder, which cannot be the presenting author, to record notes on the PR for the reading. This allows the presenting author to focus on the reading and not on note-taking.
After a successful reading (no major objections), the PR is eligible for a vote in the next teleconf.
- If there are objections between the reading and the next teleconf the author may choose to delay the vote until those objections can be addressed.
- Any significant changes (beyond typos, for example) to the PR will require the author to do another reading.

Voting on a Pull Request

PRs up for Reading and Voting will be announced before the meeting
An organization's vote is publicly recorded on the PR
A 2/3 quorum of eligible organizations must be present for a vote to be held.
- If we vote by ballot then maybe we need the participation of 2/3 for the vote to be actionable?
A simple majority of participating organizations must approve to move the motion forward.
- Do we want to make this a 2/3 or 3/4 majority?
- Do we exclude abstains?
Any strong objections should be addressed before a vote.

Voting Eligibility:

Anyone can attend any meeting
1 vote per participating organization
One person representing one organization must attend 2 of the last 3 teleconferences to be eligible to vote.
- Attending a teleconf can be difficult to schedule such that everyone is able to attend.
- Do we want to have rotating or alternating teleconfs to be more convenient for more people?
- Do we want to allow for an organization that cannot attend the teleconfs to still be eligible if they are, say, active in the community over the last 2-3 weeks?
- Should we vote via ballot instead of verbally on the teleconf to allow an eligible organization to vote even if they cannot attend the teleconf?

jjhursey · 2019-04-12T21:58:54Z

On the topic of quorum/voting, I'm wondering if we are making the process unnecessarily formal. I worry that we will be spending a lot of time on process tracking when we might not need to.

In the prior RFC process, the PMIx community used a notion of "silence is lack of dissent" which might be a more straightforward way to move forward.

The idea is that folks can discuss issues on the PR and mailing list. Any dissent must be addressed by the author. The teleconfs serve as synchronization points for the discussion. After two teleconfs (first 'reading' and second 'vote to accept') the proposal is accepted if the dissenting opinions are addressed. Any strong objection can hold a PR, but more often it is in the best interest of everyone if we can resolve those objections before accepting a PR.

The presentation of PRs is well publicized so folks can chime in on the GitHub PR if they cannot attend the teleconf.

SteVwonder · 2019-04-12T22:21:50Z

Thanks @jjhursey for the summary of the discussion today.

However we end up implementing this, I think we should strive for a system where the "critical functions" (e.g., voting, objecting to/discussing changes, proposing changes) can be performed "asynchronously" (i.e., you are not required to join a conference call or a face-to-face meeting). The more we can push the "critical functions" onto mediums like GitHub, the better for those that reside outside of the Americas. That is not to say we shouldn't leverage phone calls or face-to-face meetings; I think we should. They are high bandwidth and generally very productive, but their discussions and outcomes should make their way back onto GitHub/the mailing list.

I worry that we will be spending a lot of time on process tracking when we might not need to.

This is a good point. I certainly don't want to be the one sending out emails to coordinate and collect votes, ensure people have met the criteria for being active and eligible to vote, etc. That process is probably necessary at some point down the road, but may be not now.

The idea is that folks can discuss issues on the PR and mailing list.

This is probably controversial, so take the idea with a grain of salt, but IMO, we should try and keep as much of the discussion on one platform as possible, rather than fragmenting it across platforms. The mailing list is archived and searchable, which is great, but (for example) years from now, it would be arduous to reconstruction a discussion if it is split across GitHub and the mailing list. Maybe we leverage the mailing list for "broadcasts"/announcements, and GitHub issues/PRs for discussions/back-and-forths on particular topics? Just a thought.

After two teleconfs (first 'reading' and second 'vote to accept') the proposal is accepted if the dissenting opinions are addressed. Any strong objection can hold a PR, but more often it is in the best interest of everyone if we can resolve those objections before accepting a PR.
The presentation of PRs is well publicized so folks can chime in on the GitHub PR if they cannot attend the teleconf.

Sounds reasonable to me. My only suggestion would be that in the case that a PR's original author cannot attend the concall, they can nominate someone else to "read" the PR on the concall and represent that change. Then the original author can address feedback/objections asynchronously (after they have been added to the GitHub PR by the representative or nominated recorder).

rhc54 · 2019-04-13T04:42:20Z

I agree about the concern over making this too formal - I don't think we want to create a "PMIx Forum", at least at this stage.

This is probably controversial, so take the idea with a grain of salt, but IMO, we should try and keep as much of the discussion on one platform as possible, rather than fragmenting it across platforms.

Not controversial, IMO - I think we should confine discussion to the GitHub issues and PRs. I would suggest that the mailing list be used as you suggest.

Sounds reasonable to me. My only suggestion would be that in the case that a PR's original author cannot attend the concall, they can nominate someone else to "read" the PR on the concall and represent that change. Then the original author can address feedback/objections asynchronously (after they have been added to the GitHub PR by the representative or nominated recorder).

Agreed - we should also allow that someone wishing to dissent in person but who cannot make the "accept" call can request that the PR be delayed until they can, subject to some reasonable time. For example, if someone is going on sabbatical for a year or taking a 3-month vacation, then they need to get someone else to represent them. Otherwise, dissents posted on the PR/issue can be addressed asynchronously there.

jjhursey · 2019-04-15T14:41:40Z

In thinking about this some more over the weekend I am also leaning more towards the model of "silence is lack of dissent" vs a formal "PMIx Forum" model. We have decided to use the mailing list and GitHub to discuss topics so that the thought process is well archived. This also allows those that cannot attend the teleconferences to still actively participate in the process. Teleconferences are useful synchronization points that provide a high bandwidth discussion medium. We decided that the teleconf meeting notes will be archived on the wiki for those that cannot attend to review and discuss. A nice fallout of this PR is that decisions are made through the GitHub issues (or mailing list if necessary) so that the everyone gets an opportunity to voice support/dissent and the discussion is archived. I think that this model allows for the broadest participation with the least amount of logistical overhead and maintains (maybe the most important aspect of) traceability of the discussion. I think it is fine to pick the GitHub service as the primary medium for communication is good, with the mailing list being for administrative announcements (e.g., meeting notices), or general project queries that don't fit inside a GitHub issue.

With that in mind, let me see if I can outline this notion of participation as a proposal.

Any individual or organization may participate in the discussion on a topic be it in the form of a GitHub Issue or Pull Request, on the mailing list, or on a teleconference.
Anyone may request that progress on reading or accepting a PR be delayed (for a reasonable amount of time) until they can review and participate in the discussion.
Regular teleconferences are announced on the PMIx Forum mailing list
- Teleconferences provide a high bandwidth discussion format for various topics surrounding the PMIx Standard.
- Notes from the teleconferences will be archived on the PMIx Forum wiki
- Specific notes from the teleconference regarding a GitHub Issue or Pull Request will be made on that Issue or Pull Request directly.
When a Pull Request is ready for "Reading":
- The author(s) must send an announcement email to the PMIx Standard mailing list.
- It will be added to the agenda for the next teleconf.
- One of the authors (or someone designated by them) must present the PR during the teleconf as the "Reading" for the PR.
Pull Request "Reading"
- In the teleconference, the presenter will discuss the Pull Request and address any questions raised.
- During the presentation of the PR in the teleconf someone will be designated to take notes on the GitHub issue for the presenter.
- If there is no objection to the PR then it can move to "Voting" in the next teleconf (at least one week later)
- if there is objection or concern expressed on the PR or in the teleconference then the author must work to address the issue before it can move forward to "Voting".
- if significant changes are required to the PR to address community concern the author(s) should file a replacement PR and link it to the corresponding Issue.
When a Pull Request is ready for "Voting":
- The author(s) must send an announcement email to the PMIx Standard mailing list.
- It will be added to the agenda for the next teleconf.
- Teleconf must be at least one full week after the "Reading" to allow for folks to review and comment.
- One of the authors (or someone designated by them) must be present on the teleconf to address any items for discussion.
Pull Request "Voting"
- In the teleconference, the presenter will be present to address any questions raised.
- During the teleconference, someone will be designated to take notes on the GitHub issue for the presenter.
- If there is no objection during the teleconf then the PR is "Accepted" into the PMIx Standard (possibly labeled as experimental per PR 179) and merged.
- If there is objection raised on the teleconference or on the PR before the meeting then the author must work on resolving the objection before the PR can be put forward for "Voting" in a following teleconference.
- If significant changes are required then the PR may be "Rejected" and a replacement PR will be presented.
- A "Rejected" PR can be brought forward for a "Reading" again once all objections have been addressed.

rountree · 2019-04-15T16:37:17Z

There was an idea brought up during the concall that I'd like to be part of the above: the PR must be in use by n>1 end users to qualify for a reading (obviously this applies to changes in substance rather than presentation).

As a related project, I'd like to get the source code from several different projects that use pmix and see what parts of the standard are being used by the wider community. That's probably worth starting a separate thread, though.

rhc54 · 2019-04-15T16:43:24Z

@rountree I'm not sure I quite understand - how can a PR be in use by an end user?? It won't be in master and won't be in a release branch, so how/why would an end user get ahold of it and implement to it?

rountree · 2019-04-15T16:49:47Z

The end users would be running a patched version of the reference implementation (or another impementation). If a feature is only useful to a single user, that feature probably shouldn't be in the standard and that user can keep running patched code. Once multiple users have deployed the feature, then we have high confidence that a) it works as advertised and b) it's broadly useful. At that point both the standard and the official reference implementation can be updated.

rhc54 · 2019-04-15T17:03:12Z

I honestly don't believe any user of an implementation would agree to such a thing, especially with the possibility of it changing on them once the PR starts working thru this process or perpetually being in a state of limbo. How would you even monitor compliance? Would you require that they disclose their code to "prove" they were using it? Proprietary users would never agree to that requirement.

If we look at other standards out there, we find many things that are used by a very small fraction of the community (MPI being a classic example). PMIx has always maintained that anyone has the right to not implement something - i.e., everyone has the right to say "not supported". Seems like that ought to be sufficient and we shouldn't be judging the usefulness of someone's feature based on how many other people want to use it.

Perhaps it would be better if you expressed the concern that you are trying to address with this proposal. Are you worried that the standard incorporates features that are not widely used? If so, your implementation doesn't have to implement them unless your users want to utilize them, so how is it negatively affecting things?

I'm not rejecting the idea, just trying to understand the motivation and why/how this would be necessary.

rountree · 2019-04-15T19:15:05Z

@rhc54
Hi Ralph.

The underlying concern is software specification versus software standardization. It's perfectly appropriate to specify new features for a particular implementation in hopes that they might eventually be useful, but a standard needs to reflect what's being used.

As to compliance --- these are publicly visible changes to public interfaces. Proprietary implementations don't need to be shared. If IBM is shipping a flavor of pmix with a new interface, I'll take Josh's word for whether or not their users are finding that useful. If we start shipping a version of Flux with the same interface, I hope he'll take my word for it that our users like it, too.

I certainly agree that there's a risk that changes won't make it into the standard, but having 2+ implementations using the change lowers that risk. The higher-risk approach is to propose a change that appears to benefit only a single set of users, or no users at all. Creating and maintaining patches that benefit only local users can still be a good investment, but that's not a sufficient reason for all conformant implementation to have to implement it as well.

Which speaks to your next point: "the right not to implement" is fine for a specification, but it doesn't work for standards. When LIvermore puts out a request for proposals, we want to be able to say "the successful bidder will provide software conformant with the PMIx standard." The more bits that are optional, the less certain we are about what we're getting, and the more time we have to spend creating our own mini-standard for that contract. Our vendors hate that, and it's not a good use of our time, either.

One way to resolve this might be to have separate development and reference implementations. The current specification document would describe the development implementation. There would be a low barrier to making changes to both and it could be used as a way of advertising new features that might eventually make it into the standard. The reference implementation would reflect only what was in the standard, which in turn would only reflect was was being used in the community. The standard would be mostly composed of "shall"s and "must"s, and it would be straightforward to determine if a particular version of a particular implementation was conformant to a particular version of the standard.

Thoughts?

jjhursey · 2019-04-15T20:24:43Z

On the call, I thought that we were discussing that once a PR was accepted that it be merged in and labeled as experimental in the document. Then to move from experimental to, say, widely used or stable there had to be more than one user of the interface. I tried to capture that part of the discussion in this comment on PR 179. We would use the PR mechanism to move the stability level in a document (since it would require a document change), and go through the reading/voting process to give everyone time to vet the change.

The interface classes allow us to communicate stability and wide use to the end user while not having to express the need for "PMIx Standard version X + PR 123 + PR 125" but instead "All stable interfaces in PMIx Standard version X + experimental interfaces in section 2.3 + 'widely used' interfaces in section 3.5 and 4.6". Or possibly, depending on the direction of the grouping/slicing discussion, "All interfaces from PMIx Standard version X in the use case chapter Bootstrapping and Fault Tolerance".

I like the idea of accepting new items and using the interface stability classes to express stability a bit better than tracking patches. There are some tricky edges to stability classes that I'd like to explore in that discussion, but nothing that I don't think that we can resolve. Certainly, implementations can provide functionality beyond the standard but should caution users about them.

I think that all PRs would tie back to a use case or problem statement. Maybe that should be called out in the PR or Issue template - "Description of and links to use case scenarios" or something like that.

rountree · 2019-04-15T21:10:06Z

@jjhursey, thanks, that's indeed what I had (mostly [somewhat?]) remembered. I hadn't caught where the notes had landed; I'll go back and review those now.

rhc54 · 2019-04-15T23:18:17Z

@rountree
Hi Barry

Which speaks to your next point: "the right not to implement" is fine for a specification, but it doesn't work for standards.

I'm not sure I understand - this is common practice for every standard. Consider MPI as an example. Many MPI implementations don't support the dynamics APIs, and several don't even include those APIs in their headers. The historical way for dealing with this has been the method @jjhursey described - i.e., the Labs stipulate that "the software must support v3.1 of the PMIx Standard, minus the tools chapter", just like they do today for MPI. I'm afraid I don't see the issue here nor why PMIx should behave differently.

I think we also need to be careful here that we recall PMIx is composed of very generic APIs plus attributes that can be rather specific. I'm unaware of any API (past, present, or envisioned) that would be specific to a given environment. However, there are a number of attributes which fall into that category. This is why we provided a method for querying the attributes supported by any given API and return both machine parsable and human readable output.

Acceptance by RMs and others has been predicated on this philosophy that allows them to "not support" various APIs and attributes. What we have seen so far is that they tend to provide support for the launch-related APIs/attributes early, and then gradually expand their coverage over time based on the demands from their target market. Thus, as long as we organize the standard doc in a way that facilitates specifying desired support (both in terms of APIs and attributes), I think we should be okay with the current proposal.

I certainly agree that there's a risk that changes won't make it into the standard, but having 2+ implementations using the change lowers that risk.

I think we may be mixing terms here. Are you proposing that the PR must have two implementations adopt it, or two users? I can readily see scenarios where only one implementation supports an API or attribute as the decision to support is often based on market segment. For example, a PMIx implementation built by an RM targeting the small cluster market has no need for the APIs associated with "instant on", and thus would have no reason to implement them.

If we see a world where there are dozens of implementations, then requiring at least two to support an API (or at least indicate their intent to support it) might be a reasonable requirement. However, we currently only know of perhaps 2-3 planned implementations, which makes any numerical requirement synonymous with unanimity - a bar that seems too high to me.

If we look at users instead of implementations, then that provides a larger pool but creates its own problems. I have found it difficult to get users to work off of "proposed" APIs. Remember, someone has to implement and distribute the code, and then users have to modify their library/application to use the proposed API/attribute. A lot of risk being taken there. People prefer to review the PR and see it accepted first, then get the updated implementation and begin to integrate it into their app.

This is why I support the "experimental" vs "stable" identification. At least you know the API is in the Standard and therefore can reasonably expect it not to change. Ditto for attributes, though we'll need to figure out more details on how to classify them.

HTH
Ralph

jjhursey · 2019-04-26T16:25:56Z

Notes from Teleconf April 26, 2019:

General agreement around the "lack of dissent" model
Rename the "Voting" (Second meeting) to "Acceptance" or something similar since there is no official vote so as not to confuse.
Would like to have a sense of how many people have viewed the PR to assess participation in the process. Suggested to add an unofficial "straw vote" like mechanism to the "Reading" and "Acceptance" comment on the GitHub PR.
- This can be done with a doodle poll, but what may be easier is to just use the emoji associated with the comment on the PR to signify acceptance (👍 ) or concern/dissent (👎 ). For any dissent, there must be a comment on the ticket that the author can facilitate discussion. This keeps the process lightweight and allows for broader participation for folks that cannot attend calls.
If folks need more time to review before a Reading or between the "Reading" and "Acceptance" calls they can request that on the ticket.
Move to formalize the process outlined in this comment plus these notes into a GitHub PR to README.md
- The class that is associated with a newly "Accepted" PR is still under discussion in Issue Creating PMIx interface "classes" based on stability #179. So that part of the PR will be missing and the PR associated with Issue Creating PMIx interface "classes" based on stability #179 will fill that in.
- Once the PR is ready it will go through the "Reading" and "Acceptance" process.

jjhursey · 2019-05-04T19:16:49Z

I opened a Draft PR to try and capture the process described in this Issue:

Description of the PMIx standardization process #183

Please take a look and make sure I captured everything from this Issue. A couple things that I did not include in this draft:

The class that is associated with a newly "Accepted" PR is still under discussion in Issue Creating PMIx interface "classes" based on stability #179. So that part of the PR will be missing and the PR associated with Issue Creating PMIx interface "classes" based on stability #179 will fill that in.
Concern was raised about a "reasonable" time limit on delaying progress so I left out the following bullet point until we can resolve that issue:
- Anyone may request a delay in the process, for a reasonable amount of time, until they
  can review and participate in the discussion. This must be requested on the PR before
  the next teleconference.

jjhursey · 2019-05-10T16:34:21Z

Notes from Teleconf May 10, 2019 Regarding discussion here

We circled back to the scenario:

Someone raises an objection to a PR. The Author works to address the objection. Hopefully, the author and the objector agree that the objection has been addressed. However, if they do not agree that it has been addressed can the community agree to move the PR forward for acceptance anyway - essentially siding with the PR author?

The general sentiment was that the community may choose to side with the PR author and move the PR forward. The question becomes how do we define that the community is in agreement, or at least a majority of the community, to move it forward?

Suggestion was:

Once the author believes that the objections have been addressed they can recommend that the PR be considered for Reading again. This triggers a straw poll of those that wish to participate. The straw poll defines the "agreement" of the community to either move forward or request more discussion on the objection.
If the community decides to move forward then it moves to the next stage of the process. This can be determined by a simple majority of participants (?) in the straw poll.
If more community members request more consideration of the objection then the author can work with that.

gvallee · 2019-05-10T19:29:53Z

I think it addresses the point I raised. We may want to clarify what simple majority of participants means, which I think, is also what you mean with your question mark. It could be the standard 1/2+1 or 2/3 (2/3 would ensure it is not too controversial).

jjhursey · 2019-05-17T19:17:37Z

Notes from May 17, 2019 teleconf:

Schedule PR Description of the PMIx standardization process #183 for Reading on May 31, 2019
Three additional topics need to be resolved before closing this issue (each will be presented as additional PRs):
- We want to allow folks to propose delaying the process for some "reasonable amount of time". The community needs to determine the amount of time, and exact wording here.
- Resolution of PR Creating PMIx interface "classes" based on stability #179
- Moving forward in more contentious PRs with a majority vote (see Proposed Changes to Standardization Process #181 (comment))

jjhursey · 2019-06-28T15:51:41Z

PR #193 is another proposal towards this issue.

SteVwonder · 2019-07-05T20:36:40Z

FWIW, I ran across a straw poll "in the wild" while looking at json-schema. I thought their template was quite nice. Here is an example poll: json-schema-org/json-schema-spec#15 (comment)

If others like it, maybe we leverage it/tweak it for PMIx's straw polls.

Per pmix#181 (comment) Signed-off-by: Ralph Castain <[email protected]>

rhc54 · 2019-07-06T23:13:55Z

@SteVwonder Added some definition based on this to the proposed process - see what you think.

jjhursey · 2019-07-25T13:49:09Z

Question: Will PR #193 close this issue or is there more to do?

jjhursey · 2019-08-02T16:00:33Z

Per teleconf July 26, 2019 and Aug. 2, 2019 we think that this can be closed now that PR #193 has been merged.

If there are outstanding issues to resolve this Issue can be reopened or (preferably) a new issue can be filed for discussion.

jjhursey mentioned this issue May 4, 2019

Description of the PMIx standardization process #183

Closed

SteVwonder mentioned this issue May 24, 2019

Creating PMIx interface "classes" based on stability #179

Closed

rhc54 added a commit to rhc54/pmix-standard that referenced this issue Jul 6, 2019

Define the "straw poll" comment format

4a4b950

Per pmix#181 (comment) Signed-off-by: Ralph Castain <[email protected]>

jjhursey closed this as completed Aug 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed Changes to Standardization Process #181

Proposed Changes to Standardization Process #181

SteVwonder commented Apr 11, 2019 •

edited

Loading

rhc54 commented Apr 11, 2019

SteVwonder commented Apr 11, 2019

rhc54 commented Apr 11, 2019

rhc54 commented Apr 11, 2019

abouteiller commented Apr 11, 2019

SteVwonder commented Apr 11, 2019

rhc54 commented Apr 11, 2019

jjhursey commented Apr 12, 2019

rhc54 commented Apr 12, 2019

jjhursey commented Apr 12, 2019

jjhursey commented Apr 12, 2019

SteVwonder commented Apr 12, 2019

rhc54 commented Apr 13, 2019

jjhursey commented Apr 15, 2019

rountree commented Apr 15, 2019

rhc54 commented Apr 15, 2019

rountree commented Apr 15, 2019

rhc54 commented Apr 15, 2019

rountree commented Apr 15, 2019

jjhursey commented Apr 15, 2019

rountree commented Apr 15, 2019 •

edited

Loading

rhc54 commented Apr 15, 2019

jjhursey commented Apr 26, 2019

jjhursey commented May 4, 2019

jjhursey commented May 10, 2019

gvallee commented May 10, 2019 •

edited

Loading

jjhursey commented May 17, 2019

jjhursey commented Jun 28, 2019

SteVwonder commented Jul 5, 2019

rhc54 commented Jul 6, 2019

jjhursey commented Jul 25, 2019

jjhursey commented Aug 2, 2019

Proposed Changes to Standardization Process #181

Proposed Changes to Standardization Process #181

Comments

SteVwonder commented Apr 11, 2019 • edited Loading

Problem Statement

Background

Proposed Process

Things to discuss/decide

rhc54 commented Apr 11, 2019

SteVwonder commented Apr 11, 2019

rhc54 commented Apr 11, 2019

rhc54 commented Apr 11, 2019

abouteiller commented Apr 11, 2019

SteVwonder commented Apr 11, 2019

rhc54 commented Apr 11, 2019

jjhursey commented Apr 12, 2019

rhc54 commented Apr 12, 2019

jjhursey commented Apr 12, 2019

Voting on a Pull Request

Voting Eligibility:

jjhursey commented Apr 12, 2019

SteVwonder commented Apr 12, 2019

rhc54 commented Apr 13, 2019

jjhursey commented Apr 15, 2019

rountree commented Apr 15, 2019

rhc54 commented Apr 15, 2019

rountree commented Apr 15, 2019

rhc54 commented Apr 15, 2019

rountree commented Apr 15, 2019

jjhursey commented Apr 15, 2019

rountree commented Apr 15, 2019 • edited Loading

rhc54 commented Apr 15, 2019

jjhursey commented Apr 26, 2019

jjhursey commented May 4, 2019

jjhursey commented May 10, 2019

gvallee commented May 10, 2019 • edited Loading

jjhursey commented May 17, 2019

jjhursey commented Jun 28, 2019

SteVwonder commented Jul 5, 2019

rhc54 commented Jul 6, 2019

jjhursey commented Jul 25, 2019

jjhursey commented Aug 2, 2019

SteVwonder commented Apr 11, 2019 •

edited

Loading

rountree commented Apr 15, 2019 •

edited

Loading

gvallee commented May 10, 2019 •

edited

Loading