-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specification of the stable API of base
#105
Comments
Perhaps I'm missing something, but would it make sense to make this distinction formal using a new package? Suppose we had a package This would have a few advantages:
Now of course some changes to I know this has been discussed before (e.g. https://gitlab.haskell.org/ghc/ghc/-/wikis/split-base) and it opens up various complicated design questions in the long term. But perhaps it is worth doing something simple to start decoupling |
That could be a way forward, but it is one that has repeatedly stalled (mainly due to lack of anyone to push it forward, rather than to any technical problem). I'm seeking an interim solution. Because, like it or not, we have an interim solution right now: some things (e.g. the function I'm just asking us to write down, in one place, our current specification. That doesn't preclude a more ambitious plan later. Moreover, even after such a split, every package (including these) has internal details and stable external API. So we still need to explain, for any particualr package (including these) what is internal and what is stable. Splitting every package in Hackage into two package woudl be a bit of a heavyweight solution -- but I grant that it is a possible choice. |
Could that perhaps be the Haddocks? Specifically, could everything that is currently considered internal-only be clearly documented as such in the Haddocks? That would stretch your requirement of "one place" a bit but I think it would be a very helpful first step. If we take further, more formal steps later, the Haddocks could serve as the definitive source of information about what exactly should be formally marked as internal. Put another way, I am suggesting that regardless of the criterion we choose, if we are to formally define what is internal then there are two separate steps to follow:
I am suggesting that there is a lot of value in step 1. Documenting the result of it in the Haddocks allows us to release its value quickly, whilst we mull over what the best way of doing 2 is. |
Sorry I wan't clear enough. By "in one place" I meant:
The agreed specification might be:
It is this agreement that I am seeking. If the answer is (1), the next step is for the CLC to say which modules are designated stable, so we can add the appropriate Haddocks. But specification precedes implementation! |
Some parts of
I'm extremely confused. In #38 we discussed the current charter, which says "Changes which affect performance or laziness and similar are deemed visible". What makes you think that changes to the implementation of list fusion in core-libraries-committee/README.md Line 30 in c0f20cc
|
@simonpj We're actually so close! I have https://gitlab.haskell.org/ghc/ghc/-/merge_requests/7898 which starts very conservatively by just having If we at least get everything to work via reexporting, we can incrementally clean things up over time, moving stuff from |
I'm fine with that. But Anyway that's only an example. My question remains: what is the specfication of what is stable and what is not? Once that question is answered, we can debate whether |
Ah, I see. But it underlines my point that "is a stable part of |
I see the purview of the CLC as defined in the readme
My difficulty is that I don't know what the API of I would love to hear from other members of the CLC too.
Our of curiousity, what unstable things would the CLC regard as part of its remit? If the API (once specified) does not change, nor its semantics, nor its performance (all of which I see as part of the API), what else would you like to have on the CLC's radar? |
The usual (e. g., used by PVP) definition is that an API of a library consists of all its public functions and their behaviour. While I think it might be a good idea to mark some bits of |
Thanks @Bodigrim.
Indeed. My question is: what are the public functions of I'm entirely happy to change terminology from "stable" to "public". I agree that some stuff that is absolutely intended to be public (you mention type level programming) are not yet very stable. [Indeed the "stability" field of module descriptions is really intended for the latter use, so re-purposing it to mean "public" has some downsides.] Anyway, we agree "public" = "under the purview of CLC". The question is: what functions are public, and how do we communicate that fact to our users? |
This has to be with a split library. We all agree that some things are more stable than others. We should therefore expose narrower interfaces accordingly so that they are subject to less breakage. Do https://gitlab.haskell.org/ghc/ghc/-/merge_requests/7898, which really is just like 1-2 weeks away for an actual expert (not me), and then we have a mechanism to do this. With the mechanism in place, people will actually be motivated to figure out what should go where, and whose responsibility should it be. Without a mechanism this is all to hypothetical and we're not going to get anywhere. The ability to make things stable doesn't just come from abstractly pondering the exposed interface, it also comes from implementation details --- things which are built atop other things we'd like to be stable can be stable for free. Only by going through |
In my vision, I would consider all of the I prefer to use functions and types not from In total, I use functions from the following
I'd love to follow the community-adopted approach of having |
Thanks @chshersh ! I think that kind of data is exactly what this issue needs. Clearly a decision about which declarations and modules are part of the Public API of I.e., we don't need new libraries, changes to the PVP, nor changes to the CLC mandate to simply name the declarations and modules that are part of base's Public API! That's a management decision by fiat, not a technical one. What we do with that data is another step entirely. Having said all that, @simonpj I think the onus might be on the GHC side. The pedantic answer to the original question is "the entire API is Public". I don't think it was the CLC that decided How the removal is accomplished technically is something I don't want to think about. :D Let's at least save that for another day.... |
Perhaps. But that entails the CLC to take two decisions
If that is the CLC's will, I'm sure we can work with it. But I rather suspect that the CLC would prefer to have a smaller and better behaved API to manage. I suspect that a better starting point is "all modules not starting GHC.* are stable", and then entertain proposals to re-export modules such as @chshersh has outlined above under more civilised names. |
@simonpj I think there is a desire, not just from me, not to make ad-hoc rules for specific libraries. The PVP says all the public modules count. The PVP + Obviously I care about this and keep on bringing it up, but when @Bodigrim leads
I read that as the same sort of thinking. Right now, there is simply no place for things where aren't deep magic and also aren't standard-library-worthy to live: there is nothing between Likewise with
What I read is a sense that getting the division of labor correct (standard libraries as implementation-agnostic interface vs how stuff works with primops, RTS, etc.) is somewhat epistemologically prior to stability. I would also agree with that. When we slowly separate out the implementation, we don't just purify interfaces, but also make that division of labor clear. My first step of
We can also move the implementations out of
We therefore sort all these issues, slowly but surely, in ways that users don't even need to think about! They can treat All this proceeds calmly and incrementally, versus a thread like this where there is surprisingly much confusion! |
I think that bringing in the PVP definition here is drastically confusing things. The question is not "how should base be versioned" or anything at all pvp related. It is just "what changes to base require signoff from the CLC and which don't" I like the longterm vision of a split base that's being presented. But in the meantime, I think its totally plausible to say "stuff in the GHC.* hierarchy won't typically require a CLC signoff as its just about things coupled to internals". If one needs some sort of grounding to justify this argument -- here's one: The CLC originally was given jurisdiction over prelude and other libraries specified in the libraries section of the Haskell Report. The GHC.* stuff never was part of that, and was never intended to be delegated to the CLC. The fact that currently the GHC.* stuff lives in the same package as the libraries specified in the report (and their descendants, as obviously they have changed over time while the report... has not) is effectively an accident which there are proposals (even in this thread!) to remedy. I think it is fine to let procedure run ahead of implementation of proposals here. |
If the problem GHC <-> CLC communication being hard, sure.
is a fine starting point. But if I am reading Simon correctly "stable" comes up more than "CLC". #105 (comment) is a list of things that are widely used in My guess is the real problem is not that the CLC is hard to work with / an extra hoop to jump through, but that GHC devs have been hounded for there being too much breakage, and want to have clear rules rather than feel like they are operating on a minefield where regular tasks have unclear ramifications regarding wrt outrage. If anything having the CLC review changes makes things easier --- no need to self-police indefinitely. That there ought to be some modules that are free to change willy-nilly isn't a condemnation of the CLC / process hoops, but just a hope that at least something is an internal implementation detail where everyone can breath easy and not worry about breakage. @simonpj Am I intuiting your motivations correctly? Is knowing where making changes always "safe" primary, and how more public/stable interfaces are managed / who does that secondary? If so, I connect that to what I am always being a broken record about, but before I annoyingly repeat myself once more, I want to make sure I am not off-base in my interpretations. |
Thanks John. Yes, I think you are. My goal as an implementor is to know:
I believe that users would also welcome clarity on this question, lest they inadvertently rely on something that no one intended to be part of the well-designed, carefully curated API of base; and which then changes unexpectedly. I do not regard the CLC as an extra hoop to jump through. Quite the contrary! I regard it as extremely helpful and supportive to have a group that I can ask questions like those in #104, and receive timely and well-judged guidance from a group that is far closer to the user experience that I am. I don't want to design the exact API of On the other hand, I don't want to consult the CLC about changes to So yes, my goal is having clear rules. I have opinions, but not strongly held views, on what those rules should be. My only strongly held view is that we should have a clear answer to the question: what is the stable API of base; that is, the API that is curated by the CLC, and that implementors strive to hold stable? Does that line up with what you are thinking, John? |
I want to throw in that I believe there's a subtle but important difference between unstable modules and internal modules. Unstable modules to me are those that simply change API frequently. There may be various reasons for that, e.g. because of lack of design and the authors want to signal that they can't make PVP guarantees just yet. Then there are internal modules. These expose internals, which might very well be stable, but not be meant for the average user, because they are hard to use correctly, for example (e.g. the ByteString constructor). I'm not sure if both types happen to coincide in base, but I always felt that I want this distinction. E.g. in streamly both concepts are conflated heavily, where "Internal" modules are used to ship beta versions of upcoming API. If GHC has this intention as well (shipping APIs that are not quite ready yet, but not really internal), then this distinction is more interesting. (of course, both variants are excluded from PVP and if they happen to coincide, then "Internal" seems to be the more appropriate description) |
To add to this: it would also allow us to make a distinction in terms of expectations. We don't expect internals to give any guarantees and will not bother GHC team about it. However, we will bother GHC team about unstable API and say "hi, have you figured out a good design yet?" and occasionally nag them. |
Thanks Julian. Yes, In your vocabulary I think I'm talking about "internal" vs "external", rather than "stable" vs "unstable". Sorry for the confusion. (Mind you, being 30 yrs old, I hope Of course that opens up the question: given a particuar type or class, how can I determine whether it is
|
I'm really glad that this discussion is being re-kindled. This is actually a part of the tragedy that plays out each time tools of the Haskell language should interact with each-other but do not, because there is no centralised leadership on ergonomics and features. We have metadata on Haddock that are perfectly able to represent such things at the module level. Unfortunately the historical tendency to radically avoid any kind of prescriptivism between any of the tools except when coming from GHC has perfectly illustrated the following proverb:
And as such, people can choose to believe that a module is stable because marked as such, but no tooling would enforce the bump of a major version on the maintainer's side when the API breaks. Another example of the peculiar detachment from the Haddock metadata is the case of modules like Control.Applicative, that were left as I did not question the validity of the stability marker (and later changed) because I was sure that the moment was right to do it, but because the moment had been right for more than a decade and nobody prevented me to do so. When it comes to an action plan, we must also focus on some concrete "user stories", like "what happens when a library author changes the stability or visibility marker of a module?" and "what happens when a library user depends on a module that is marked as unstable or internal?" What tools (especially those bundled with GHC distributions) can react and how should they react? Haddock can retrieve the metadata from the code, cabal-install (through And as to "how do we know a module is under CLC jurisdiction?", @gbaz describes it quite well.
Maybe I'm wrong but reading this thread (and every other thread before it) felt like the answer was to come in the form of a general theory of visibility and stability, with laws and proofs. I feel like we're stalling yet again without having taken any action that could have helped us refine our initial action plan. I'm not saying we must rush in a direction like a bull and not stop whatever the consequences, but it's much easier to correct course to adapt for circumstances than coming up with the perfect plan. |
@simonpj Quite so! That's great to here I did not misread this one.
I very much agree. I think we need basically a 3 way agreement, where GHC devs know what they can freely change, CLC knows what they must keep track of, and users know what is stable and what isn't (and they better not complain if they use it anyways and then it changes!). But also, I worry that users won't worry about any "merely documented" description about what is stable and what isn't. In the aggregate. Which I think is what @Kleidukos is getting at with the "Promises only bind those who believe in them" proverb too. Stepping back a bit, I view users of languages in the aggregate like walking drunks, trying things at random until their program works. I don't mean this in a condescending "users are bad programmers" way, but that we all vacillate between doing things intentionally and flailing around when stuck, and the latter leaves its trace in what we might call "rising entropy" in the code. Why does that matter? I feel prioritizing a "what is stable/public" policy over any mechanism is liable to get our hopes dashed because without any sort of intentional stumbling blocks, code will entropically deviate from just using the stable parts and "bleed" over using more and more of rest in its usage. That means the GHC devs will still need to self-police somewhat and the intended goal --- finding which parts can changed with abandon! --- is not actually met. I don't know about you, but I rather have a lot of stuff that is definitely internal and safe to change, than a broader amount of stuff that is merely "hopefully" internal, and one must still do impact analyses and other defensive measures in proportion to the scale of the breakage, just with a lower constant of proportionality. (A larger story is that the whole "GHC is breaking all the time" drama can only be resolved by barriers on both sides. Users can use things up a a boundary, and they will notice if they cross it so we can hold them accountable. Likewise GHC devs can change things freely up to a boundary, and they will will notice if they cross it so we can hold them accountable. Relying on "sheer discipline" on other side (no intentional cautionary speed-bumps) seems to me unlikely to work, and likely to lead to disappoint and more finger-pointing.) @gbaz kindly says
So when I had on split base it sounds like I am rushing to some pie-in-the-sky thing, but I really think this is more a case where both sides would see themselves as the hare and the other as the tortoise. For the "policy first" perspective, the idea is to start low tech, agree to there be some split, and then set about enforcing it. Spending time on mechanism before policy seems gauche, like we hate talking to each other and rather program away to avoid it :). Or at least it feels like putting the cart before the horse. Conversely, low tech discussing and building consensus seems like a prudent first step. Confirming there is consensus over some sort of split allows the priority of implementing some sort of enforcement to be raised. On the other hand, for the "mechanism first" approach, the idea is to carefully make sure policy decisions stay in the realm of what is actually actionable. Taking the time agree to some internal/external boundary without enforcement isn't useful, because without "speed-bumps" the agreement won't amount to anything, and the process of deciding on a grand split of internal vs external modules that doesn't end up having any impact just demoralizes people and makes them less likely to want to plan collectively the next time. Conversely, starting with some sort of enforcement mechanism (a crude base split) builds out the "institutional" capacity such that we can be sure the what we plan will actually matter. Repeated "let's triage whether this module should be public or private" followed up by immediate enforcement builds confidence in the process and gives us more momentum and enthusiasm. I really think the initial split-base step is < 1 month away --- I would (at least hope :)) shut up about it if I didn't think that! I would do it myself, but I am (a) overextended in general (b) need help figuring out root causes for remaining test failures. Do that and we can rip a few modules out of If we must wait for deprecated reexports, then yes perhaps what I am saying is off-base --- it is a trade off between the cost of waiting to plan vs the benefit of knowing those plans will amount to something, and thus latency is critical. If, on the other hand, there are modules we don't mind moving to |
I'm on board with a split library. I just think there should be an initial design for that library other than "everything is reexported by base". That's a good intermediate step, but it doesn't do anything on its own other than create a maintenance burden. We need an actual vision that will inspire future work. My suggestion is that the vision should be: a declaration is internal if and only if it is in the GHC namespace.
My point is, this vision doesn't define the true public API , but it does give a vision for how to build such a thing. |
I'd be fine with that.
Is there a CLC proposal that describes, specifically, what the plan is. Ineed, is there an agreed plan? I'd love an opportunity to review it. |
I updated with https://gitlab.haskell.org/ghc/ghc/-/issues/20647 with more information on the process I envision. It spells out that that the CLC must approve There is no in-flight CLC proposal because (as I recall @Bodigrim once wrote) how the implementation of |
AFAIU @simonpj asks a very practical question: which changes to @simonpj I think the README is quite unambiguous with regards to your question: "The primary responsibility of CLC is to manage API changes of I'm strongly opposed to leave entire
If GHC developers want to make changes, I suggest we use a three-tiered classification:
Would you (or someone else) like to nominate modules to be moved from Tier 1 to Tier 2? |
@Bodigrim I think the nomination was that "all GHC modules get moved to Tier 3". You have mentioned a few modules that you don't approve of, which is great input. So if I'm following correctly, the current nomination is, "All GHC modules except GHC.Generics and GHC.TypeNats get moved to Tier 3". But I assume there may be more GHC modules that should stay under CLC review? And is Tier 3 itself too much to ask for? Those last questions may sound tiresome, and I would agree. Unfortunately, I don't see any way to avoid doing a lot of manual work—on both sides—to approve or disapprove of each of base's 248 modules one-by-one. GHC.* alone has 123 modules. Perhaps it would be best to just schedule a 1-hour call with representatives from CLC and GHC to step through the list and make judgements? (I don't represent either of those groups, so this is just a suggestion for you all.) If some modules accidentally end up in the wrong category, it can be fixed over time. The process of moving modules between tiers is inevitable, anyway. Edit 2023-02-07: the 200+-row spreadsheet now exists 🥳 |
The reason I raised the stable vs unstable vs internal modules discussion is exactly this. My concern is that unstable modules are moved to Tier 3, simply because they're unstable and then may never move back, because the development practice has already crystallized. But that doesn't seem the right way to me. Instead there should be work to gather requirements, come up with a design and then stabilize those. Only internal modules should be Tier 3, IMO. So the discussion about Tiers is fundamentally linked to what is internal or just unstable. |
@chreekat In my opinion the vast majority of |
Bodigrim -- it is not just scrutiny, it is also stability, 3 release policies, etc. So suppose the GHC team realizes that a field in RTSStats (https://hackage.haskell.org/package/base-4.17.0.0/docs/GHC-Stats.html#t:RTSStats) is meaningless and can be deleted. Are they required to adhere to the three release policy, or can they just go ahead and delete it, bound only by the PVP? Or suppose it becomes necessary to extend the internal structure of file descriptors with an additional flag? Is this something that needs to be coordinated with the same stability process as e.g. a change to prelude? (https://hackage.haskell.org/package/base-4.17.0.0/docs/GHC-IO-FD.html#t:FD) You write: "which changes to base require CLC scrutiny, and which doesn't. This however has little direct relation to stable/unstable axis or base/ghc-base split" Here I am arguing that both are not exactly true. The CLC scruitinizes base precisely with regards to stability. And if there were a split, modules today that are now under the mandate of the CLC might later not be. In Simon's initial question an aspect of the connection between the two, which motivates this, is pretty well laid out: "What is the stable API of the base package, which we (the GHC implementers) try to keep stable, and about which we consult the CLC?" If you want to argue that the CLC should make an effort to consider all changes, that's fine -- more work for you, not me :-) That said, I would urge the CLC not apply the same stability policy to all of base. I also note that modules in base actually do have "stability" fields assigned to them -- the classic modules are marked "stable" -- some other modules (like Data.Functor) that are newer are marked "provisional" and most but not all stuff in the GHC.* namespace is marked "internal". I know that field was never really firmed up in meaning or enforced particularly uniformly, but it is another guidepost to look at. |
This would be a huge breaking change, affecting lots of core tooling, starting from
Again,
Agreed. |
OK great. Let's start small with categorizing just 1 module, This, more than other other module perhaps, should be a private implementations detail. Though it does export individual items which are eventually (via reexports) very public, the Worse, it reexport We should do an impact analysis, and I suspect we might indeed find that some packages do import How does this sound? |
I don't really understand how the community was supposed to expect that Imposing implications of an unwritten lore, a secret oral tradition onto entire Haskell community is not acceptable. If you want to keep your toys for yourself, do not expose them to everyone (use Yes, I think it would be much wiser to keep |
@Bodigrim I think we actually agree here? I am very much against unwritten lores! That is why I keep on wanting to split Heh, I forgot the Perhaps everything in I suppose at the very least we could add a bunch of |
Basically, if |
To be clear: I'm not against changes in So I'm not even completely convinced that |
Aha! In my OP for this issue I asked "what is the external API of base"? And you have now replied.
Fine. That is a clear answer. But you say it is "quite unambiguous" and I disagree about taht. It is clearly obvious to you, but not to me, nor to other people replying on this thread. Can I ask that you add to the CLC your clarifying clause "where API is understood as usual, as a sum of all public functions of all exposed modules."? That would be extremely helpful. I think this is probaly more than you want. It means we will have to consut you about changes to internal functions. But that's fine -- if you don't like it you can change it. At least it is clear. You go on to suggest
I'm a bit baffled by the difference between Tier 2 and Tier 3? If a change does not change the API of an exposed module, including its performance characteristics, surely the CLC is not interesed in any way? That is, don't we just have
Simple! That's what the charter (with the above clarification) says. |
@Bodigrim Yes we'll need documentation, and we'll also need things users cannot forget to read, like various sorts of deprecations. I am also confused by tier 2, but not too worried. I think it basically corresponds to "things we wish were private", the CLC doesn't care what extra added junk gets added, but it will care if there are then breaking changes, so beware GHC devs before throwing new things in there! That seems sort of a historical artifact of |
You asked to clarify the status quo - and that's what I did. I'm not wedded to it, I'm most keen to hear specific examples, where you suspect this to be more than we can chew.
The key word is "breaking". If you want to add
Indeed, "things we wish were private; or public, but intentionally unstable". E. g., |
OK. For what it's worth, I am not seeking a distinction between Tier 1 and Tier 2. Maybe someone else, but I am not. I'm not sure it gains much. But I think I didn't articulate my real question clearly enough. It is this:
Indeed in your tier list, none of the "currently" notes coveres un-exposed modules; perhaps an inadvertent omission? |
Ah, right, an inadverent omission indeed. Modules, marked as |
I wrote haskellfoundation/tech-proposals#47 which aims to address just this issue, among other problems. |
Thanks to Simon for starting this discussion. I think sorting out fast rules in this area will be quite useful as my mental model when interacting with the CLC for the last several years is hazy at best:
A hard rule to clear this up will be quite welcome. In this vein, I agree that splitting a On the other hand, we should also take care to exclude from |
There are many places to discuss the split of
Talking of your specific examples, I strongly believe that it's not enough to scrutinize only breaking changes to
I also strongly believe that GHC developers underestimate how much the community relies on modules they deem internal. There are plenty examples in this thread above. From my perspective it does not make sense for CLC to oversee the evolution only of a small part of @bgamari please come up (together with other GHC developers) with a specific set of rules to discuss. Until then the current arrangements are to stand. |
Indeed, we are planning to discuss this in the coming week. I have no doubt that we can find a solution here which makes sense for users, maintainers, and the CLC. |
The question was to clarify status quo, and I believe this has been sufficiently answered in the first part of #105 (comment). My view is that the answer unambiguously follows from When someone gets a specific proposal with regards to the second part of #105 (comment), please raise a new issue. |
I am keen to have a clear criterion for
base
package, which we (the GHC implementers) try to keep stable, and about which we consult the CLC?base
, which we make no attempt to keep stable, and about which we do not consult the CLC?Can the CLC give a clear specification? Two possiblities that have been suggested are:
base
consists of the exports of all modules other than those with aGHC.
prefix.base
consists of the exports of all modules other than those thatInternal
in the module name; orThe CLC may have other suggestions. But it would be really helpful to have a clear criterion. Thanks!
One example discussion that would be made easeier if we knew the answer to this question is #104.
The text was updated successfully, but these errors were encountered: