Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StableHLO Compatibility RFC v2 #115

Merged
merged 9 commits into from
Dec 13, 2022
Merged

StableHLO Compatibility RFC v2 #115

merged 9 commits into from
Dec 13, 2022

Conversation

GleasonK
Copy link
Member

@GleasonK GleasonK commented Sep 13, 2022

(Edited 11/9/22 for RFC v2)
When we bootstrapped StableHLO in #1, we called out an acute need for stability of interchange dialects in between ML frameworks and ML compilers in the opensource community.

In #1, we informally promised stability for StableHLO:

Speaking of compatibility guarantees, I propose that we initially provide them at the boundary between StableHLO and MHLO (so that StableHLO provides stability, and MHLO can keep evolving mostly freely)

This PR proposes a precise definition of compatibility guarantees for StableHLO and CHLO (#4), using the recently introduced binary serialization infrastructure from MLIR upstream and versioning infrastructure that is feasible in the current state of MLIR.

Note for reviewers: Use "View File" to view the markdown as HTML. This can be found by clicking the three dots next to compatibility.md the top of the markdown diff. The "Rich Diff" feature does not properly load images.

@burmako
Copy link
Contributor

burmako commented Sep 13, 2022

Thank you, Kevin! This is a significant contribution that makes great progress towards addressing one of the P0 feature requests for StableHLO and builds on your earlier work on binary serialization for CHLO/StableHLO. Let's get the community involved to make sure that this proposal takes everyone's perspectives into account!

We don't have an RFC process for StableHLO yet, so we'll need to play it by ear a little bit. I'm hoping that we'll formalize the process in the next 1-2 weeks, so it might be ready in time for when we'll be making a decision on this proposal.

(Upd 9/16/2022) At the upcoming OpenXLA meeting (next Tuesday 9/20), we'll talk about the RFC process for StableHLO. Early next week, we'll have a markdown document in this repository talking about how RFCs will be discussed and how decisions about them will be made. Once this process is set up, we will start providing feedback about incoming RFCs within a predictable timeframe, and it will be clearly documented how decisions on these RFCs are being made.

@GleasonK
Copy link
Member Author

GleasonK commented Sep 13, 2022

Added @joker-eph to reviewers. Any feedback on feasibility of proposed binary serialization versioning hooks is appreciated.

Will post to other relevant threads / forums for additional feedback.

Edit: Posted in various threads with users that helped shape the requirements, including:

Copy link

@silvasean silvasean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG!

docs/compatibility.md Outdated Show resolved Hide resolved
docs/compatibility.md Outdated Show resolved Hide resolved
docs/compatibility.md Outdated Show resolved Hide resolved
docs/compatibility.md Outdated Show resolved Hide resolved
docs/compatibility.md Outdated Show resolved Hide resolved
docs/compatibility.md Outdated Show resolved Hide resolved
docs/compatibility.md Outdated Show resolved Hide resolved
docs/compatibility.md Outdated Show resolved Hide resolved
@GleasonK
Copy link
Member Author

Feedback from @stellaraccident from a separate thread:

+1 - This will be very welcome!

My only comment is whether you want to phase in to a full 6 month compatibility window, starting with something much less aggressive (say 1 month). I suspect that there are some backlogged changes we would like to do to get to more of an LTS state over the next while. If that is the case, I would personally be open to a phase-in period so that the situation can be improved more rapidly during this period where the compatibility guarantees are not yet fully needed (I'm thinking about things like elements attr representations, types, etc).

I am personally very supportive of this in the early phases. Not only will this help with the mentioned backlogged changes, but also will allow us to improve our tooling / processes for StableHLO compatibility without paying a 6month code maintenance price for any mistakes or oversights. I'm happy to start charting out a proposal for a timeline to arrive at that 6month compat window if other are supportive of this.

Will leave this comment as unresolved in case anyone else has thoughts on the matter.

@GleasonK
Copy link
Member Author

GleasonK commented Oct 3, 2022

A recent change landed upstream (D134847) which allows parsing IR/bytecode that may be invalid. This unblocks the creating of a compatibility prototype. I will be investigating this in the coming week. The goal is to make a tool that is resilient to changes within the StableHLO dialect. We still need to further discuss stability in upstream dependencies (bytecode format / core dialects), which will be my focus in the near future.

@vinodgro
Copy link

A couple of comments

  1. the unit of compatibility of StableHLO spec should be version numbers, not time in weeks or months.
  2. minor version changes should not impact compatibility

@stellaraccident
Copy link

A couple of comments

  1. the unit of compatibility of StableHLO spec should be version numbers, not time in weeks or months.
  2. minor version changes should not impact compatibility

I made a similar point recently in an internal design review for on device deployments (and likened it to Android API level). I think there is strong precedent for managing the backwards compatibility with a major/minor version scheme. I'm not sure the forward compatibility is the same: there is a strong element there of head being able to target a prior version for a period which allows for everything to be upgraded in a non lockstep fashion.

@vinodgro
Copy link

A couple of comments

  1. the unit of compatibility of StableHLO spec should be version numbers, not time in weeks or months.
  2. minor version changes should not impact compatibility

I made a similar point recently in an internal design review for on device deployments (and likened it to Android API level). I think there is strong precedent for managing the backwards compatibility with a major/minor version scheme. I'm not sure the forward compatibility is the same: there is a strong element there of head being able to target a prior version for a period which allows for everything to be upgraded in a non lockstep fashion.

forward compatibility is related to backward compatibility. What you need is to be able to use the reader from a backward compatible later version to be able to read backward compatible IR in an earlier version it is backward compatible with :)

@stellaraccident
Copy link

Yeah, I was just going to say something like that :)

@GleasonK GleasonK changed the title StableHLO Compatibility Spec Proposal StableHLO Compatibility RFC Oct 20, 2022
@GleasonK GleasonK changed the title StableHLO Compatibility RFC StableHLO Compatibility RFC v2 Nov 10, 2022
@GleasonK
Copy link
Member Author

GleasonK commented Nov 10, 2022

Thank you everyone for the feedback. I just pushed a commit for StableHLO Compatibility RFC v2 which incorporates much of the feedback on this PR as well as discussions with those who reached out separately. I have re-requested review from everyone who left feedback initially. If needed, v1 of this RFC is still available in the commit log.

Will notify of this change on a few other channels shortly.

Edit: Either a GH bug or there is a maximum amount or reviewers, but I am unable to re-request review from everyone. Disregard the "added request / removed request" notification. Intention is to re-request from all.

@rjpower
Copy link

rjpower commented Dec 5, 2022

Thanks, this looks great!

For the examples section, can we expand on this to highlight the various guarantees and the impact on versioning?

  • existing example: could we update the vhlo.add example so that it explicitly introduces a new attribute, but because it's the default value we can serialize to v1 (keeps forward compat via down-versioning). (Or show this in a comment if it isn't introduced by the legalization)
  • introducing a new op: breaks forwards compat if you use it, requires new minor release (IIUC)
  • removing an old op: breaks backwards compat unless it has a replacement pattern, requires a new major release
  • changing op semantics: introducing a new named op in the stablehlo dialect, or perhaps this is best handled by a new attribute

@GleasonK
Copy link
Member Author

GleasonK commented Dec 6, 2022

For the examples section, can we expand on this to highlight the various guarantees and the impact on versioning?

I think adding a more verbose examples section is a great idea. The original RFC had a section describing how we would maintain compatibility across several change kinds: Addition (new op / attr / type), Migration (replaced old feature with new feature / renames), Deprecation (removed features), but that section is now out of date.

I'll get to revising those examples in light of the RFC v2 changes.

@joker-eph
Copy link
Contributor

Current proposal LGTM, great work! :)

docs/compatibility.md Outdated Show resolved Hide resolved
@burmako
Copy link
Contributor

burmako commented Dec 13, 2022

Thank you, Kevin, for your contribution and thank you everyone for your feedback!

Since the RFC was first introduced on 9/12, it has been discussed with many community members, including at the recent MLIR Summit, and has addressed all the requirements and feedback. Following up on that, this RFC is now approved.

On a related note: so far, our RFC process has been fairly informal, but in Q1 2023 we are planning to sharpen it. As announced at the OpenXLA community meeting today, there is a proposal for OpenXLA governance that will be published shortly. Among other things, this proposal will codify decision making, and we're looking forward to adopting it once reviewed/discussed/revised by the community.

@burmako burmako merged commit ea43664 into openxla:main Dec 13, 2022
@GleasonK GleasonK deleted the compat-doc branch February 10, 2023 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants