Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specification of the stable API of base #105

Closed
simonpj opened this issue Nov 9, 2022 · 48 comments
Closed

Specification of the stable API of base #105

simonpj opened this issue Nov 9, 2022 · 48 comments
Labels
meta General questions on CLC rules and policies

Comments

@simonpj
Copy link

simonpj commented Nov 9, 2022

I am keen to have a clear criterion for

  • What is the stable API of the base package, which we (the GHC implementers) try to keep stable, and about which we consult the CLC?
  • What is the internals of base, which we make no attempt to keep stable, and about which we do not consult the CLC?

Can the CLC give a clear specification? Two possiblities that have been suggested are:

  • The stable API of base consists of the exports of all modules other than those with a GHC. prefix.
  • The stable API of base consists of the exports of all modules other than those that
    • Have Internal in the module name; or
    • Have a Haddock comment at the top (defined precisely somehow) saying "this module is internal".

The CLC may have other suggestions. But it would be really helpful to have a clear criterion. Thanks!

One example discussion that would be made easeier if we knew the answer to this question is #104.

@adamgundry
Copy link
Member

Perhaps I'm missing something, but would it make sense to make this distinction formal using a new package?

Suppose we had a package ghc-base for the internals, and a package base that re-exported modules/definitions from it. The CLC would be responsible for base, while the GHC devs would then be responsible for ghc-base (and it would have minimal or no stability guarantees between versions, except for the need to maintain compatibility with whatever base re-exports).

This would have a few advantages:

  • GHC devs can change internals (and make them available to users) without immediately committing to their stability or requiring CLC approval.
  • The CLC can change the base APIs without necessarily changing ghc-base (including adding new definitions or choosing to re-export more or fewer things from ghc-base).
  • Users can depend only on base and avoid accidentally depending on GHC internals, or can depend on ghc-base if they want access to the internals (and consequently reduced stability guarantees).
  • We should be able to make base reinstallable and hence make it easier to release new base versions independently of GHC. Perhaps we could even have base support multiple GHC versions (provided the changes between the respective ghc-base versions were compatible).

Now of course some changes to ghc-base might affect base via re-exports (e.g. a change to Monad), and those would still require the GHC devs to consult the CLC.

I know this has been discussed before (e.g. https://gitlab.haskell.org/ghc/ghc/-/wikis/split-base) and it opens up various complicated design questions in the long term. But perhaps it is worth doing something simple to start decoupling base from the GHC release process, and incrementally approach a "better" design for base (whatever that may be)?

@simonpj
Copy link
Author

simonpj commented Nov 10, 2022

Perhaps I'm missing something, but would it make sense to make this distinction formal using a new package?

That could be a way forward, but it is one that has repeatedly stalled (mainly due to lack of anyone to push it forward, rather than to any technical problem). I'm seeking an interim solution. Because, like it or not, we have an interim solution right now: some things (e.g. the function GHC.Base.mapFB) are definitely internal, while others (e.g. the class Monad are definitely external. Yet nothing tells a user which is which. It's all in our minds, and different people have different things in their minds.

I'm just asking us to write down, in one place, our current specification. That doesn't preclude a more ambitious plan later.

Moreover, even after such a split, every package (including these) has internal details and stable external API. So we still need to explain, for any particualr package (including these) what is internal and what is stable. Splitting every package in Hackage into two package woudl be a bit of a heavyweight solution -- but I grant that it is a possible choice.

@tomjaguarpaw
Copy link
Member

I'm just asking us to write down, in one place, our current specification.

Could that perhaps be the Haddocks? Specifically, could everything that is currently considered internal-only be clearly documented as such in the Haddocks? That would stretch your requirement of "one place" a bit but I think it would be a very helpful first step. If we take further, more formal steps later, the Haddocks could serve as the definitive source of information about what exactly should be formally marked as internal.

Put another way, I am suggesting that regardless of the criterion we choose, if we are to formally define what is internal then there are two separate steps to follow:

  1. determine what should be internal
  2. formally specify it as such

I am suggesting that there is a lot of value in step 1. Documenting the result of it in the Haddocks allows us to release its value quickly, whilst we mull over what the best way of doing 2 is.

I did some work on this sort of thing for ghc-prim.

@simonpj
Copy link
Author

simonpj commented Nov 10, 2022

Could that perhaps be the Haddocks? Specifically, could everything that is currently considered internal-only be clearly documented as such in the Haddocks? That would stretch your requirement of "one place" a bit

Sorry I wan't clear enough. By "in one place" I meant:

  • Can we agree the specification of what is stable and want is internal for base?

The agreed specification might be:

  1. Look in the Haddocks for each module in base, or
    2, If the name starts GHC.*, it's internal, otherwise stable
  2. or something else

It is this agreement that I am seeking.

If the answer is (1), the next step is for the CLC to say which modules are designated stable, so we can add the appropriate Haddocks. But specification precedes implementation!

@Bodigrim
Copy link
Collaborator

@simonpj

I am keen to have a clear criterion for

* What is the **stable API** of the `base` package, which we (the GHC implementers) try to keep stable, and about which we consult the CLC?

* What is the **internals** of `base`, which we make no attempt to keep stable, and about which we do not consult the CLC?

Some parts of base might be less stable than other, but this does not mandate exclusion from CLC process per se.

I'm seeking an interim solution. Because, like it or not, we have an interim solution right now: some things (e.g. the function GHC.Base.mapFB) are definitely internal

I'm extremely confused. In #38 we discussed the current charter, which says "Changes which affect performance or laziness and similar are deemed visible". What makes you think that changes to the implementation of list fusion in base (which presumably affect performance one way or another) are outside of CLC purview?

The primary responsibility of CLC is to manage API changes of `base` package. The ownership of `base` belongs to GHC developers, and they can maintain it freely without CLC involvement as long as changes are invisible to clients. Changes which affect performance or laziness and similar are deemed visible. Documentation changes normally fall under GHC developers purview, except significant ones (e. g., adding or changing type class laws).

@Ericson2314
Copy link
Contributor

Ericson2314 commented Nov 10, 2022

Suppose we had a package ghc-base for the internals, and a package base that re-exported modules/definitions from it.

@simonpj We're actually so close! I have https://gitlab.haskell.org/ghc/ghc/-/merge_requests/7898 which starts very conservatively by just having ghc-base contain eveyrything, and base rexport everything. There are just a few test failures left!

If we at least get everything to work via reexporting, we can incrementally clean things up over time, moving stuff from ghc-base to base where appropriate, etc.

@simonpj
Copy link
Author

simonpj commented Nov 11, 2022

I'm extremely confused. In #38 we discussed the current charter, which says "Changes which affect performance or laziness and similar are deemed visible". What makes you think that changes to the implementation of list fusion in base (which presumably affect performance one way or another) are outside of CLC purview?

I'm fine with that. But mapFB is part of the implementation of fusion. I would not expect a user to import it or use it directly. It is not part of the stable API of base. For example, I could change its name to wombat, and no user should notice. Except of course that we have no current way to say what is part of the stable API and what isn't. So someone might be innocently importing and using mapFB thinking that it is as stable as map. But it isn't!

Anyway that's only an example. My question remains: what is the specfication of what is stable and what is not? Once that question is answered, we can debate whether mapFB should be in the stable API or not. (I would argue not, but it's a CLC decision.) But we can't have the debate until we have the specifiation.

@Bodigrim
Copy link
Collaborator

Ah, I see. But it underlines my point that "is a stable part of base" /= "falls under purview of CLC". It's fine if the purpose of this discussion is to define the former and, say, annotate haddocks with a proper Stability: field, but this would not mean that unstable parts are automatically out of CLC control (which I assumed you were interested to deliniate from the practical viewpoint).

@simonpj
Copy link
Author

simonpj commented Nov 11, 2022

But it underlines my point that "is a stable part of base" /= "falls under purview of CLC".

I see the purview of the CLC as defined in the readme

The primary responsibility of CLC is to manage API changes of base package. The ownership of base belongs to GHC developers, and they can maintain it freely without CLC involvement as long as changes are invisible to clients. Changes which affect performance or laziness and similar are deemed visible. Documentation changes normally fall under GHC developers purview, except significant ones (e. g., adding or changing type class laws).

My difficulty is that I don't know what the API of base is. That is what I am seeking precision on. That precision could be done in a number of ways, one of which is a Haddock on each module. Others are sketched above. It's up to the CLC to decide.

I would love to hear from other members of the CLC too.

this would not mean that unstable parts are automatically out of CLC control

Our of curiousity, what unstable things would the CLC regard as part of its remit? If the API (once specified) does not change, nor its semantics, nor its performance (all of which I see as part of the API), what else would you like to have on the CLC's radar?

@Bodigrim
Copy link
Collaborator

Bodigrim commented Nov 11, 2022

The usual (e. g., used by PVP) definition is that an API of a library consists of all its public functions and their behaviour. While GHC.Base is less stable than say Prelude, both are publicly exposed and constitute a part of API. The current charter does not exclude less stable or unstable parts of API from CLC purview.

I think it might be a good idea to mark some bits of base as outside of CLC control. I just don't see this delineation immediately connected to the question of stability. E. g., facilities for type level programming in base are (relatively) unstable, but I think they deserve CLC scrutiny. On the other side of the spectrum, plenty of stuff under GHC.IO.Encoding is extremely stable, but I would not mind to exclude it from CLC control.

@simonpj
Copy link
Author

simonpj commented Nov 11, 2022

Thanks @Bodigrim.

The usual (e. g., used by PVP) definition is that an API of a library consists of all its public functions and their behaviour.

Indeed. My question is: what are the public functions of base? That is the question on which I seek the guidance of the CLC. It cannot possibly be every function of every module.

I'm entirely happy to change terminology from "stable" to "public". I agree that some stuff that is absolutely intended to be public (you mention type level programming) are not yet very stable. [Indeed the "stability" field of module descriptions is really intended for the latter use, so re-purposing it to mean "public" has some downsides.]

Anyway, we agree "public" = "under the purview of CLC". The question is: what functions are public, and how do we communicate that fact to our users?

@Ericson2314
Copy link
Contributor

and how do we communicate that fact to our users?

This has to be with a split library. base, by the PVP has too much public stuff, and therefore has too many version bumps as breaking changes (per that overlay lax notion of what is stable deriving from what is public).

We all agree that some things are more stable than others. We should therefore expose narrower interfaces accordingly so that they are subject to less breakage. Do https://gitlab.haskell.org/ghc/ghc/-/merge_requests/7898, which really is just like 1-2 weeks away for an actual expert (not me), and then we have a mechanism to do this. With the mechanism in place, people will actually be motivated to figure out what should go where, and whose responsibility should it be.

Without a mechanism this is all to hypothetical and we're not going to get anywhere. The ability to make things stable doesn't just come from abstractly pondering the exposed interface, it also comes from implementation details --- things which are built atop other things we'd like to be stable can be stable for free. Only by going through base with a fine-toothed comb will we do that untangling, and we're only going to do that if we know it's possible to move things around and reexport them --- which it always should be, https://gitlab.haskell.org/ghc/ghc/-/merge_requests/7898 is uncovering bugs not missing new features.

@chshersh
Copy link
Member

In my vision, I would consider all of the GHC.* modules unstable and GHC specific and all other modules are stable.

I prefer to use functions and types not from GHC.* modules unless I have no choice. E.g. only GHC.Stack module exports the HasCallStack constraint so I'd prefer to not have any breaking changes in this module (and all relevant types and functions). However, IIUC, this module is tightly coupled to the ImplicitParams language extension and GHC-specific implementation of callstacks, so I wouldn't be too surprised to see any breaking changes in this module. Annoyed but not surprised.

In total, I use functions from the following GHC.* modules and consider them public API but I also feel they can have a better place if we want to move all GHC-specific stuff away:

  • GHC.Base
  • GHC.Clock
  • GHC.Generic
  • GHC.Fingerprint
  • GHC.IsList
  • GHC.OverloadedLabels
  • GHC.Stack
  • GHC.TypeError
  • GHC.TypeNats/TypeLits

I'd love to follow the community-adopted approach of having Internal modules which can circumvent PVP constraints. Maybe a good next step would be to create Internal modules and try to move as much internal API into them as possible.

@chreekat
Copy link

Thanks @chshersh ! I think that kind of data is exactly what this issue needs. Clearly a decision about which declarations and modules are part of the Public API of base can happen right now, on paper, without any code changes required anywhere. (I think that's what @simonpj is after?)

I.e., we don't need new libraries, changes to the PVP, nor changes to the CLC mandate to simply name the declarations and modules that are part of base's Public API! That's a management decision by fiat, not a technical one.

What we do with that data is another step entirely.

Having said all that, @simonpj I think the onus might be on the GHC side. The pedantic answer to the original question is "the entire API is Public". I don't think it was the CLC that decided GHC.* are internal—it certainly isn't listed in the README. What decls and modules do you want removed from the Public API of base? Perhaps removing them is a step that GHC devs can initiate as a CLC proposal.

How the removal is accomplished technically is something I don't want to think about. :D Let's at least save that for another day....

@simonpj
Copy link
Author

simonpj commented Nov 16, 2022

Having said all that, @simonpj I think the onus might be on the GHC side. The pedantic answer to the original question is "the entire API is Public".

Perhaps. But that entails the CLC to take two decisions

  1. The stable API of base is a specific, named set of modules. There is no naming convention. To find out what is stable, you have to look at the list.
  2. Initailly at least, the CLC wishes to treat the entirey of base as under its purview, and to be consulted about every change.

If that is the CLC's will, I'm sure we can work with it. But I rather suspect that the CLC would prefer to have a smaller and better behaved API to manage. I suspect that a better starting point is "all modules not starting GHC.* are stable", and then entertain proposals to re-export modules such as @chshersh has outlined above under more civilised names.

@Ericson2314
Copy link
Contributor

Ericson2314 commented Nov 16, 2022

@simonpj I think there is a desire, not just from me, not to make ad-hoc rules for specific libraries. The PVP says all the public modules count. The PVP + Internal convention makes everything but *.Internal modules count.

Obviously I care about this and keep on bringing it up, but when @Bodigrim leads

The usual (e. g., used by PVP) definition is that an API of a library consists of all its public functions and their behaviour.

I read that as the same sort of thinking.

Right now, there is simply no place for things where aren't deep magic and also aren't standard-library-worthy to live: there is nothing between ghc-prim and base. What I suggest we do is create one more more such places (like this ghc-base), and then we can triage definitions and implementations as you desire, without creating new machinary / new special cases as I think others desire.

Likewise with

I think it might be a good idea to mark some bits of base as outside of CLC control. I just don't see this delineation immediately connected to the question of stability. E. g., facilities for type level programming in base are (relatively) unstable, but I think they deserve CLC scrutiny. On the other side of the spectrum, plenty of stuff under GHC.IO.Encoding is extremely stable, but I would not mind to exclude it from CLC control.

What I read is a sense that getting the division of labor correct (standard libraries as implementation-agnostic interface vs how stuff works with primops, RTS, etc.) is somewhat epistemologically prior to stability. I would also agree with that.

When we slowly separate out the implementation, we don't just purify interfaces, but also make that division of labor clear.


My first step of base reexports all of ghc-base doesn't help with either stability or untangling in and of itself. But I think that is fine. We can then choose to not reexport things (or rather try to do deprecated reexports), which means:

This is an interface the CLC doesn't want to care about, go get it from elsewhere it will leave base soon

We can also move the implementations out of ghc-base back to base which means

this implementation is written entirely in terms of other things the CLC also cares about, and the CLC is therefore solely responsible for the implementation.

We therefore sort all these issues, slowly but surely, in ways that users don't even need to think about! They can treat base and ghc-base like any other library, just with the expectation that ghc-base will have much more frequent breaking changes. For the "stable GHC things" and "unstable non-GHC things" that @Bodigrim mentions, we can introduce yet more libraries so there are more ways to get stable items "liberated" from unstable libraries. This is also easy to do once we've proved that reexporting works.

All this proceeds calmly and incrementally, versus a thread like this where there is surprisingly much confusion!

@gbaz
Copy link

gbaz commented Nov 16, 2022

I think that bringing in the PVP definition here is drastically confusing things. The question is not "how should base be versioned" or anything at all pvp related. It is just "what changes to base require signoff from the CLC and which don't"

I like the longterm vision of a split base that's being presented. But in the meantime, I think its totally plausible to say "stuff in the GHC.* hierarchy won't typically require a CLC signoff as its just about things coupled to internals".

If one needs some sort of grounding to justify this argument -- here's one: The CLC originally was given jurisdiction over prelude and other libraries specified in the libraries section of the Haskell Report. The GHC.* stuff never was part of that, and was never intended to be delegated to the CLC. The fact that currently the GHC.* stuff lives in the same package as the libraries specified in the report (and their descendants, as obviously they have changed over time while the report... has not) is effectively an accident which there are proposals (even in this thread!) to remedy. I think it is fine to let procedure run ahead of implementation of proposals here.

@Ericson2314
Copy link
Contributor

If the problem GHC <-> CLC communication being hard, sure.

Stuff in the GHC.* hierarchy won't typically require a CLC signoff as its just about things coupled to internals

is a fine starting point.

But if I am reading Simon correctly "stable" comes up more than "CLC". #105 (comment) is a list of things that are widely used in GHC.*. Even if the CLC washes its hand of that, the GHC devs still need to "self-police" and make sure they don't break things too quick.

My guess is the real problem is not that the CLC is hard to work with / an extra hoop to jump through, but that GHC devs have been hounded for there being too much breakage, and want to have clear rules rather than feel like they are operating on a minefield where regular tasks have unclear ramifications regarding wrt outrage. If anything having the CLC review changes makes things easier --- no need to self-police indefinitely.

That there ought to be some modules that are free to change willy-nilly isn't a condemnation of the CLC / process hoops, but just a hope that at least something is an internal implementation detail where everyone can breath easy and not worry about breakage.

@simonpj Am I intuiting your motivations correctly? Is knowing where making changes always "safe" primary, and how more public/stable interfaces are managed / who does that secondary?

If so, I connect that to what I am always being a broken record about, but before I annoyingly repeat myself once more, I want to make sure I am not off-base in my interpretations.

@simonpj
Copy link
Author

simonpj commented Nov 16, 2022

@simonpj Am I intuiting your motivations correctly?

Thanks John. Yes, I think you are. My goal as an implementor is to know:

  • What types and functions are part of GHC's internal implementation, which I can freely change.
  • What types and functions are part of the API of base on which users rely, which I should be cautious about changing. And here "cautious" = "consult the CLC about" since a primary mission of the CLC is to take well-judged decisions about the API of base.

I believe that users would also welcome clarity on this question, lest they inadvertently rely on something that no one intended to be part of the well-designed, carefully curated API of base; and which then changes unexpectedly.

I do not regard the CLC as an extra hoop to jump through. Quite the contrary! I regard it as extremely helpful and supportive to have a group that I can ask questions like those in #104, and receive timely and well-judged guidance from a group that is far closer to the user experience that I am. I don't want to design the exact API of DataToTag; I positively welcome guidance.

On the other hand, I don't want to consult the CLC about changes to GHC.Real.integralEnumFromThen, which is a helper function used to support instances in GHC.Int and GHC.Word. And I don't think the CLC wants to be bothered with that either.

So yes, my goal is having clear rules. I have opinions, but not strongly held views, on what those rules should be. My only strongly held view is that we should have a clear answer to the question: what is the stable API of base; that is, the API that is curated by the CLC, and that implementors strive to hold stable?

Does that line up with what you are thinking, John?

@hasufell
Copy link
Member

hasufell commented Nov 17, 2022

I want to throw in that I believe there's a subtle but important difference between unstable modules and internal modules.

Unstable modules to me are those that simply change API frequently. There may be various reasons for that, e.g. because of lack of design and the authors want to signal that they can't make PVP guarantees just yet.

Then there are internal modules. These expose internals, which might very well be stable, but not be meant for the average user, because they are hard to use correctly, for example (e.g. the ByteString constructor).

I'm not sure if both types happen to coincide in base, but I always felt that I want this distinction. E.g. in streamly both concepts are conflated heavily, where "Internal" modules are used to ship beta versions of upcoming API.

If GHC has this intention as well (shipping APIs that are not quite ready yet, but not really internal), then this distinction is more interesting.

(of course, both variants are excluded from PVP and if they happen to coincide, then "Internal" seems to be the more appropriate description)

@hasufell
Copy link
Member

If GHC has this intention as well (shipping APIs that are not quite ready yet, but not really internal), then this distinction is more interesting.

To add to this: it would also allow us to make a distinction in terms of expectations. We don't expect internals to give any guarantees and will not bother GHC team about it.

However, we will bother GHC team about unstable API and say "hi, have you figured out a good design yet?" and occasionally nag them.

@simonpj
Copy link
Author

simonpj commented Nov 17, 2022

Thanks Julian. Yes, In your vocabulary I think I'm talking about "internal" vs "external", rather than "stable" vs "unstable". Sorry for the confusion.

(Mind you, being 30 yrs old, I hope base doesn't have many unstable-but-external functions. There may be some. For example, in #104 we propose adding a DataToTag class, as part of the external API. The CLC might well designate it "unstable" for a while, thereby signalling that it may change more rapidly than the stable part.

Of course that opens up the question: given a particuar type or class, how can I determine whether it is

  • Internal?
  • External but unstable?
  • External and stable?

@Kleidukos
Copy link
Member

I'm really glad that this discussion is being re-kindled.

This is actually a part of the tragedy that plays out each time tools of the Haskell language should interact with each-other but do not, because there is no centralised leadership on ergonomics and features. We have metadata on Haddock that are perfectly able to represent such things at the module level.

Unfortunately the historical tendency to radically avoid any kind of prescriptivism between any of the tools except when coming from GHC has perfectly illustrated the following proverb:

Promises only bind those who believe in them

And as such, people can choose to believe that a module is stable because marked as such, but no tooling would enforce the bump of a major version on the maintainer's side when the API breaks.

Another example of the peculiar detachment from the Haddock metadata is the case of modules like Control.Applicative, that were left as experimental since 2005 since Ross created them in 2005 until I took it upon me to 1) Open a ticket on the matter in 2020 (ghc/ghc#18963) and 2) opened an MR in 2022 (ghc/ghc!7668) to be done with it.

I did not question the validity of the stability marker (and later changed) because I was sure that the moment was right to do it, but because the moment had been right for more than a decade and nobody prevented me to do so.


When it comes to an action plan, we must also focus on some concrete "user stories", like "what happens when a library author changes the stability or visibility marker of a module?" and "what happens when a library user depends on a module that is marked as unstable or internal?" What tools (especially those bundled with GHC distributions) can react and how should they react?

Haddock can retrieve the metadata from the code, cabal-install (through cabal check) can alert on the need to bump the major version if there is a change in the markers, HLS can offer a visual indicator that the user is using an unstable / internal module, etc etc.

And as to "how do we know a module is under CLC jurisdiction?", @gbaz describes it quite well.

I think its totally plausible to say "stuff in the GHC.* hierarchy won't typically require a CLC signoff as its just about things coupled to internals".

The CLC originally was given jurisdiction over prelude and other libraries specified in the libraries section of the Haskell Report. The GHC.* stuff never was part of that, and was never intended to be delegated to the CLC

Maybe I'm wrong but reading this thread (and every other thread before it) felt like the answer was to come in the form of a general theory of visibility and stability, with laws and proofs. I feel like we're stalling yet again without having taken any action that could have helped us refine our initial action plan.

I'm not saying we must rush in a direction like a bull and not stop whatever the consequences, but it's much easier to correct course to adapt for circumstances than coming up with the perfect plan.

@Ericson2314
Copy link
Contributor

@simonpj Quite so! That's great to here I did not misread this one.

I believe that users would also welcome clarity on this question, lest they inadvertently rely on something that no one intended to be part of the well-designed, carefully curated API of base; and which then changes unexpectedly.

I very much agree. I think we need basically a 3 way agreement, where GHC devs know what they can freely change, CLC knows what they must keep track of, and users know what is stable and what isn't (and they better not complain if they use it anyways and then it changes!).

But also, I worry that users won't worry about any "merely documented" description about what is stable and what isn't. In the aggregate. Which I think is what @Kleidukos is getting at with the "Promises only bind those who believe in them" proverb too.

Stepping back a bit, I view users of languages in the aggregate like walking drunks, trying things at random until their program works. I don't mean this in a condescending "users are bad programmers" way, but that we all vacillate between doing things intentionally and flailing around when stuck, and the latter leaves its trace in what we might call "rising entropy" in the code.

Why does that matter? I feel prioritizing a "what is stable/public" policy over any mechanism is liable to get our hopes dashed because without any sort of intentional stumbling blocks, code will entropically deviate from just using the stable parts and "bleed" over using more and more of rest in its usage. That means the GHC devs will still need to self-police somewhat and the intended goal --- finding which parts can changed with abandon! --- is not actually met.

I don't know about you, but I rather have a lot of stuff that is definitely internal and safe to change, than a broader amount of stuff that is merely "hopefully" internal, and one must still do impact analyses and other defensive measures in proportion to the scale of the breakage, just with a lower constant of proportionality.

(A larger story is that the whole "GHC is breaking all the time" drama can only be resolved by barriers on both sides. Users can use things up a a boundary, and they will notice if they cross it so we can hold them accountable. Likewise GHC devs can change things freely up to a boundary, and they will will notice if they cross it so we can hold them accountable. Relying on "sheer discipline" on other side (no intentional cautionary speed-bumps) seems to me unlikely to work, and likely to lead to disappoint and more finger-pointing.)


@gbaz kindly says

I like the long-term vision of a split base that's being presented. But in the meantime...

So when I had on split base it sounds like I am rushing to some pie-in-the-sky thing, but I really think this is more a case where both sides would see themselves as the hare and the other as the tortoise.

For the "policy first" perspective, the idea is to start low tech, agree to there be some split, and then set about enforcing it. Spending time on mechanism before policy seems gauche, like we hate talking to each other and rather program away to avoid it :). Or at least it feels like putting the cart before the horse. Conversely, low tech discussing and building consensus seems like a prudent first step. Confirming there is consensus over some sort of split allows the priority of implementing some sort of enforcement to be raised.

On the other hand, for the "mechanism first" approach, the idea is to carefully make sure policy decisions stay in the realm of what is actually actionable. Taking the time agree to some internal/external boundary without enforcement isn't useful, because without "speed-bumps" the agreement won't amount to anything, and the process of deciding on a grand split of internal vs external modules that doesn't end up having any impact just demoralizes people and makes them less likely to want to plan collectively the next time. Conversely, starting with some sort of enforcement mechanism (a crude base split) builds out the "institutional" capacity such that we can be sure the what we plan will actually matter. Repeated "let's triage whether this module should be public or private" followed up by immediate enforcement builds confidence in the process and gives us more momentum and enthusiasm.


I really think the initial split-base step is < 1 month away --- I would (at least hope :)) shut up about it if I didn't think that!

I would do it myself, but I am (a) overextended in general (b) need help figuring out root causes for remaining test failures.

Do that and we can rip a few modules out of base, just leaving them in ghc-base right away. (Hard transition, but creates just the "speed bump" we will need.) Or we wait for deprecated reexports, ghc-proposals/ghc-proposals#489, and then deprecate in ghc-base prior to removing.

If we must wait for deprecated reexports, then yes perhaps what I am saying is off-base --- it is a trade off between the cost of waiting to plan vs the benefit of knowing those plans will amount to something, and thus latency is critical. If, on the other hand, there are modules we don't mind moving to ghc-base immediately (and if we are aiming for 9.8 we can, say, always add-back deprecated reexports after all if they become available) then I think the cost/benefit of going for enforcement first is pretty solid.

@chreekat
Copy link

I'm on board with a split library. I just think there should be an initial design for that library other than "everything is reexported by base". That's a good intermediate step, but it doesn't do anything on its own other than create a maintenance burden. We need an actual vision that will inspire future work.

My suggestion is that the vision should be: a declaration is internal if and only if it is in the GHC namespace.

  1. Declarations or modules that are in the "wrong" namespace can be discovered and migrated at leisure, whenever appropriate
  2. The whole GHC namespace can be moved to another lib/labeled unstable/whatever is desired
  3. Simon has his rubric, and this issue can be closed as resolved

My point is, this vision doesn't define the true public API , but it does give a vision for how to build such a thing.

@simonpj
Copy link
Author

simonpj commented Nov 19, 2022

My suggestion is that the vision should be: a declaration is internal if and only if it is in the GHC namespace.

I'd be fine with that.

I'm on board with a split library.

Is there a CLC proposal that describes, specifically, what the plan is. Ineed, is there an agreed plan? I'd love an opportunity to review it.

@Ericson2314
Copy link
Contributor

Ericson2314 commented Nov 20, 2022

@simonpj

I'm on board with a split library.

Is there a CLC proposal that describes, specifically, what the plan is. Ineed, is there an agreed plan? I'd love an opportunity to review it.

I updated with https://gitlab.haskell.org/ghc/ghc/-/issues/20647 with more information on the process I envision.

It spells out that that the CLC must approve base no longer reexporting a module in ghc-base. If the CLC approves. then that modules is officially and enforceably an internal implementation detail of GHC that the CLC relinquishes any jurisdiction over, and GHC devs are free to change however they like.

There is no in-flight CLC proposal because (as I recall @Bodigrim once wrote) how the implementation of base is structured, so long as it doesn't reflect behavior / isn't observable, is not a CLC concern. Deciding what modules ought to remaining part of base would indeed be this very issue!

@Bodigrim
Copy link
Collaborator

Bodigrim commented Nov 20, 2022

AFAIU @simonpj asks a very practical question: which changes to base require CLC scrutiny, and which doesn't. This however has little direct relation to stable/unstable axis or base/ghc-base split. Please raise a separate issue if you want mark some modules of base as stable or unstable. The question of base/ghc-base split emerges in every other discussion, but it is not something we can act upon immediately or in the short term. Please create a separate thread for it.

@simonpj I think the README is quite unambiguous with regards to your question: "The primary responsibility of CLC is to manage API changes of base package", where API is understood as usual, as a sum of all public functions of all exposed modules. (Last winter, when we discussed this statement, you suggested for CLC to maintain all changes to base at all, while I asked to reduce responsibility only to API changes, so this is actually less, not more than GHC developers asked for)

I'm strongly opposed to leave entire GHC.* namespace outside of CLC consideration, because

  • Some modules (e. g., GHC.Generics) are of fundamental importance for ecosystem, much greater than certain modules under Data.*.
  • Some modules (e. g., GHC.TypeNats) are significantly more stable than modules outside of GHC.* (e. g., System.Posix.Internals).

If GHC developers want to make changes, I suggest we use a three-tiered classification:

  • Tier 1: modules, where any API changes require CLC proposal (currently: all exposed modules, including GHC.*).
  • Tier 2: modules, where only breaking API changes require CLC proposal (currently: none).
  • Tier 3: modules completely excluded from CLC consideration (currently: none all unexposed modules as per Cabal file).

Would you (or someone else) like to nominate modules to be moved from Tier 1 to Tier 2? GHC.Exts is an obvious candidate.

@Bodigrim Bodigrim added the meta General questions on CLC rules and policies label Nov 20, 2022
@chreekat
Copy link

chreekat commented Nov 21, 2022

@Bodigrim I think the nomination was that "all GHC modules get moved to Tier 3". You have mentioned a few modules that you don't approve of, which is great input. So if I'm following correctly, the current nomination is, "All GHC modules except GHC.Generics and GHC.TypeNats get moved to Tier 3". But I assume there may be more GHC modules that should stay under CLC review? And is Tier 3 itself too much to ask for?

Those last questions may sound tiresome, and I would agree. Unfortunately, I don't see any way to avoid doing a lot of manual work—on both sides—to approve or disapprove of each of base's 248 modules one-by-one. GHC.* alone has 123 modules.

Perhaps it would be best to just schedule a 1-hour call with representatives from CLC and GHC to step through the list and make judgements? (I don't represent either of those groups, so this is just a suggestion for you all.)

If some modules accidentally end up in the wrong category, it can be fixed over time. The process of moving modules between tiers is inevitable, anyway.

Edit 2023-02-07: the 200+-row spreadsheet now exists 🥳

@hasufell
Copy link
Member

The process of moving modules between tiers is inevitable, anyway.

The reason I raised the stable vs unstable vs internal modules discussion is exactly this.

My concern is that unstable modules are moved to Tier 3, simply because they're unstable and then may never move back, because the development practice has already crystallized.

But that doesn't seem the right way to me. Instead there should be work to gather requirements, come up with a design and then stabilize those.

Only internal modules should be Tier 3, IMO. So the discussion about Tiers is fundamentally linked to what is internal or just unstable.

@Bodigrim
Copy link
Collaborator

@chreekat In my opinion the vast majority of base should remain Tier 1, selected modules might be downgraded to Tier 2 and Tier 3 is to remain empty. If someone wants to champion downgrading of certain modules, they are very welcome to motivate such change: why specifically is it tedious/difficult/impractical to adhere to the higher standard of scrutiny?

@gbaz
Copy link

gbaz commented Nov 21, 2022

Bodigrim -- it is not just scrutiny, it is also stability, 3 release policies, etc.

So suppose the GHC team realizes that a field in RTSStats (https://hackage.haskell.org/package/base-4.17.0.0/docs/GHC-Stats.html#t:RTSStats) is meaningless and can be deleted. Are they required to adhere to the three release policy, or can they just go ahead and delete it, bound only by the PVP?

Or suppose it becomes necessary to extend the internal structure of file descriptors with an additional flag? Is this something that needs to be coordinated with the same stability process as e.g. a change to prelude? (https://hackage.haskell.org/package/base-4.17.0.0/docs/GHC-IO-FD.html#t:FD)

You write: "which changes to base require CLC scrutiny, and which doesn't. This however has little direct relation to stable/unstable axis or base/ghc-base split"

Here I am arguing that both are not exactly true. The CLC scruitinizes base precisely with regards to stability. And if there were a split, modules today that are now under the mandate of the CLC might later not be. In Simon's initial question an aspect of the connection between the two, which motivates this, is pretty well laid out: "What is the stable API of the base package, which we (the GHC implementers) try to keep stable, and about which we consult the CLC?"

If you want to argue that the CLC should make an effort to consider all changes, that's fine -- more work for you, not me :-)

That said, I would urge the CLC not apply the same stability policy to all of base.

I also note that modules in base actually do have "stability" fields assigned to them -- the classic modules are marked "stable" -- some other modules (like Data.Functor) that are newer are marked "provisional" and most but not all stuff in the GHC.* namespace is marked "internal". I know that field was never really firmed up in meaning or enforced particularly uniformly, but it is another guidepost to look at.

@Bodigrim
Copy link
Collaborator

So suppose the GHC team realizes that a field in RTSStats (https://hackage.haskell.org/package/base-4.17.0.0/docs/GHC-Stats.html#t:RTSStats) is meaningless and can be deleted.

This would be a huge breaking change, affecting lots of core tooling, starting from criterion. I think a CLC proposal with an impact analysis would be due.

Or suppose it becomes necessary to extend the internal structure of file descriptors with an additional flag? Is this something that needs to be coordinated with the same stability process as e.g. a change to prelude? (https://hackage.haskell.org/package/base-4.17.0.0/docs/GHC-IO-FD.html#t:FD)

Again, GHC.IO.FD is widely used: unix, ansi-terminal, process, network, haskeline, etc., so breaking changes to it deserve a CLC proposal.

That said, I would urge the CLC not apply the same stability policy to all of base.

Agreed.

@Ericson2314
Copy link
Contributor

Ericson2314 commented Nov 22, 2022

OK great. Let's start small with categorizing just 1 module, GHC.Base.

This, more than other other module perhaps, should be a private implementations detail. Though it does export individual items which are eventually (via reexports) very public, the GHC.Base interfaces themselves should never need to be used publicly, as they contain far too much stuff messily thrown together.

Worse, it reexport GHC.Prim. This is very bad, as GHC.Prim is directly generated from the primops; in other words GHC.Prim is code that we are unable to paper over. Any attempt to make GHC.Base more stable would require getting rid of the blanket export and hand-curating unlifted functions that may be wrappers around the primops for back compat; in other words, it would require adding another layer of indirection.

We should do an impact analysis, and I suspect we might indeed find that some packages do import GHC.Base. But I think that shouldn't change our answer: we're aiming to reach a prescription, not description; these module still should be private, and those other packages should import a different module, like GHC.Exts, instead.

How does this sound?

@Bodigrim
Copy link
Collaborator

We should do an impact analysis, and I suspect we might indeed find that some packages do import GHC.Base.

Hundreds of them!

I don't really understand how the community was supposed to expect that GHC.Base is private. It is marked as exposed-module in Cabal file, there is no label or warning in the description, and it looks a perfectly reasonable, well-documented module. Why would I be shy to import it?

Imposing implications of an unwritten lore, a secret oral tradition onto entire Haskell community is not acceptable. If you want to keep your toys for yourself, do not expose them to everyone (use other-modules, right?) or at the very least slap a big fat warning atop.

Yes, I think it would be much wiser to keep GHC.Base private from the very beginning, but it's too late. Let's stop pretending that no one is using GHC and no one is to be affected, because if we carry on no one will be using GHC indeed.

@Ericson2314
Copy link
Contributor

Ericson2314 commented Nov 22, 2022

@Bodigrim I think we actually agree here? I am very much against unwritten lores! That is why I keep on wanting to split base!

Hundreds of them!

Heh, I forgot the \ in the Hackage search.

Perhaps everything in base today is in use. Fine. Then we keep exporting those same modules, and add new private ones for the new layer of indirection. Bwe don't make the implementation of GHC.Base something like module GHC.Prim in the export list, which is action at a distance. we write down exactly what is exported today. We also deprecate that module, even if we can never remove it, because it is still hot garbage.

I suppose at the very least we could add a bunch of other-modules as another first step, sure, We can "retreat" from our current public interface for every leaked private thing, and make sure the leaked public things are just deprecated leaf modules with locked-down interfaces, with the actual implementions in new private other-modules.

@Ericson2314
Copy link
Contributor

Basically, if base is a "lost case", instead of making a new ghc-base below it, we can make a new standard-library-take-2--no-leaky-guts-this-time-fingers-crossed above it. It is very similar except the name of the packages.

@Bodigrim
Copy link
Collaborator

Bodigrim commented Nov 23, 2022

To be clear: I'm not against changes in base and I agree that a threshold to break something in GHC.Base could be pretty low, but I'm against sneaky breakage "oh, we thought no one used it". Rest assured that someone did and you owe them a warning and an explanation, which is basically what CLC proposal is for.

So I'm not even completely convinced that base is a lost case (but it's close ;). The very first step would be to document intentions, e. g., educate readers of GHC.Base that it's better to import something else instead. @tomjaguarpaw has recently done this for ghc-prim.

@simonpj
Copy link
Author

simonpj commented Nov 23, 2022

@simonpj I think the README is quite unambiguous with regards to your question: "The primary responsibility of CLC is to manage API changes of base package", where API is understood as usual, as a sum of all public functions of all exposed modules.

Aha! In my OP for this issue I asked "what is the external API of base"? And you have now replied.

  • The external API of base consists of all the exports of all exposed modules.

Fine. That is a clear answer. But you say it is "quite unambiguous" and I disagree about taht. It is clearly obvious to you, but not to me, nor to other people replying on this thread. Can I ask that you add to the CLC your clarifying clause "where API is understood as usual, as a sum of all public functions of all exposed modules."? That would be extremely helpful.

I think this is probaly more than you want. It means we will have to consut you about changes to internal functions. But that's fine -- if you don't like it you can change it. At least it is clear.

You go on to suggest

  • Tier 1: modules, where any API changes require CLC proposal (currently: all exposed modules, including GHC.*).
  • Tier 2: modules, where only breaking API changes require CLC proposal (currently: none).
  • Tier 3: modules completely excluded from CLC consideration (currently: none).

I'm a bit baffled by the difference between Tier 2 and Tier 3? If a change does not change the API of an exposed module, including its performance characteristics, surely the CLC is not interesed in any way? That is, don't we just have

  • Exposed modules: changes to API need CLC proposal
  • Non-exposed modules: CLC is not interested

Simple! That's what the charter (with the above clarification) says.

@Ericson2314
Copy link
Contributor

Ericson2314 commented Nov 23, 2022

@Bodigrim Yes we'll need documentation, and we'll also need things users cannot forget to read, like various sorts of deprecations.


I am also confused by tier 2, but not too worried. I think it basically corresponds to "things we wish were private", the CLC doesn't care what extra added junk gets added, but it will care if there are then breaking changes, so beware GHC devs before throwing new things in there!

That seems sort of a historical artifact of base being "overexposed" versus a general principle of library interface design. Fair enough.

@Bodigrim
Copy link
Collaborator

I think this is probaly more than you want. It means we will have to consut you about changes to internal functions. But that's fine -- if you don't like it you can change it. At least it is clear.

You asked to clarify the status quo - and that's what I did. I'm not wedded to it, I'm most keen to hear specific examples, where you suspect this to be more than we can chew.

You go on to suggest

  • Tier 1: modules, where any API changes require CLC proposal (currently: all exposed modules, including GHC.*).
  • Tier 2: modules, where only breaking API changes require CLC proposal (currently: none).
  • Tier 3: modules completely excluded from CLC consideration (currently: none).

I'm a bit baffled by the difference between Tier 2 and Tier 3? If a change does not change the API of an exposed module, including its performance characteristics, surely the CLC is not interesed in any way?

The key word is "breaking". If you want to add wombat to Tier 2 module, or change laziness, or use a different algorithm, or attach a warning, you do not need CLC approval, because all of this are non-breaking changes. Removing something however is breaking and would require an approval. For Tier 3 module you can do whatever you wish, nuke it in one release and resurrect in the next one.

I am also confused by tier 2, but not too worried. I think it basically corresponds to "things we wish were private", the CLC doesn't care what extra added junk gets added, but it will care if there are then breaking changes, so beware GHC devs before throwing new things in there!

Indeed, "things we wish were private; or public, but intentionally unstable". E. g., GHC.Exts belongs to the latter category.

@simonpj
Copy link
Author

simonpj commented Nov 24, 2022

The key word is "breaking". If you want to add wombat to Tier 2 module, or change laziness, or use a different algorithm, or attach a warning, you do not need CLC approval, because all of this are non-breaking changes.

OK. For what it's worth, I am not seeking a distinction between Tier 1 and Tier 2. Maybe someone else, but I am not. I'm not sure it gains much.

But I think I didn't articulate my real question clearly enough. It is this:

  • Why aren't all un-exposed modules in Tier 3? (You say "currently:none").

Indeed in your tier list, none of the "currently" notes coveres un-exposed modules; perhaps an inadvertent omission?

@Bodigrim
Copy link
Collaborator

Indeed in your tier list, none of the "currently" notes coveres un-exposed modules; perhaps an inadvertent omission?

Ah, right, an inadverent omission indeed. Modules, marked as other-modules according to base.cabal, are out of CLC scrutiny.

@Ericson2314
Copy link
Contributor

I wrote haskellfoundation/tech-proposals#47 which aims to address just this issue, among other problems.

@bgamari
Copy link

bgamari commented Feb 2, 2023

Thanks to Simon for starting this discussion. I think sorting out fast rules in this area will be quite useful as my mental model when interacting with the CLC for the last several years is hazy at best:

  • Things specified by the Haskell Report clearly deserve a CLC proposal
  • We try hard to avoid breaking changes widely-used modules in the GHC.* hierarchy (e.g. GHC.TypeLits, GHC.Generics); changes which are breaking deserve a CLC proposal
  • Most other modules in GHC.* (particularly those marked with Stability: Internal) or .Internal in the name are GHC-internal

A hard rule to clear this up will be quite welcome. In this vein, I agree that splitting a ghc-base out from base is the right path forward. Moreover, I have no reservations about liberally including (and freezing) internal interfaces which are currently widely relied-upon (e.g. GHC.Base) in a stable base (where future changes would require CLC involvement).

On the other hand, we should also take care to exclude from base internal modules which are not relied upon or strongly dependent upon internal implementation. For instance, modules like GHC.TopHandler, GHC.IO.StdHandles, GHC.Fingerprint.Type only exist to serve internal implementation (and in the last case, only exists to avoid module import cycles). Sorting this out will take time.

@Bodigrim
Copy link
Collaborator

Bodigrim commented Feb 2, 2023

There are many places to discuss the split of ghc-base from base whatever it means; please let's not discuss it here.

We try hard to avoid breaking changes widely-used modules in the GHC.* hierarchy (e.g. GHC.TypeLits, GHC.Generics); changes which are breaking deserve a CLC proposal

Talking of your specific examples, I strongly believe that it's not enough to scrutinize only breaking changes to GHC.TypeLits and GHC.Generics. These modules are widely used and typically imported unqualified.

Most other modules in GHC.* (particularly those marked with Stability: Internal) or .Internal in the name are GHC-internal

I also strongly believe that GHC developers underestimate how much the community relies on modules they deem internal. There are plenty examples in this thread above.

From my perspective it does not make sense for CLC to oversee the evolution only of a small part of base: the observable effects for consumers, uncontrollably broken by each and every release, would be the same as if CLC never existed. I am happy to resign if this position is not shared by GHC developers.


@bgamari please come up (together with other GHC developers) with a specific set of rules to discuss. Until then the current arrangements are to stand.

@bgamari
Copy link

bgamari commented Feb 2, 2023

@bgamari please come up (together with other GHC developers) with a specific set of rules to discuss. Until then the current arrangements are to stand.

Indeed, we are planning to discuss this in the coming week. I have no doubt that we can find a solution here which makes sense for users, maintainers, and the CLC.

@Bodigrim
Copy link
Collaborator

The question was to clarify status quo, and I believe this has been sufficiently answered in the first part of #105 (comment). My view is that the answer unambiguously follows from README, but feel free to raise a PR to elaborate and extend.

When someone gets a specific proposal with regards to the second part of #105 (comment), please raise a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta General questions on CLC rules and policies
Projects
None yet
Development

No branches or pull requests