Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buy-in for Technical Proposal #47 #145

Closed
Kleidukos opened this issue Mar 16, 2023 · 70 comments
Closed

Buy-in for Technical Proposal #47 #145

Kleidukos opened this issue Mar 16, 2023 · 70 comments
Labels
meta General questions on CLC rules and policies

Comments

@Kleidukos
Copy link
Member

Dear CLC members,

In recent weeks, John Ericson has fine-tuned a Haskell Foundation Technical Proposal to split base into two libraries: ghc-base and base, the latter simply re-exporting everything for ghc-base (for now). You can read about the rationale and specifics more in details in the proposal itself: haskellfoundation/tech-proposals#47

Note that this proposal has recently been streamlined into a form which is more focused than its initial state, and might be worth a re-read.

The Haskell Foundation Technical Working Group has reached a consensus that this work will benefit the Haskell community. Moreover, the Haskell Foundation has agreed to spend some of its resources to implement this proposal, which would start by ensuring the completion of ghc/ghc!7898.

This work will affect the decisions taken by the CLC. Therefore, the Technical Working Group would like to get buy-in from the committee before formally accepting this proposal. Since you are on the frontlines of this matter this is extremely important for us to know about potential reservations or enthusiastic approval from your part. :)

@hasufell
Copy link
Member

Hi,

what does "buy-in" mean?

I'm a bit confused how this can be accepted by any body other than CLC.

Scope and API of base is a CLC matter. Is the working group expressing their opinion on a potential upcoming vote?

Cheers,
Julian

@Kleidukos
Copy link
Member Author

what does "buy-in" mean?

We would like to humbly request your judgement and determine how we can get your approval for such a plan

I'm a bit confused how this can be accepted by any body other than CLC.

We have also petitioned the GHC core team and they are discussing their position on this proposal, as this greatly impacts their work as well.

@hasufell
Copy link
Member

hasufell commented Mar 16, 2023

We would like to humbly request your judgement and determine how we can get your approval for such a plan

My opinion is that this can't be decided unless CLC figures out what base is really for (purpose, scope, etc.) and updates the mission statement accordingly.

CLC is not aligned on this as far as I can tell and doesn't have a joint opinion whether we like base to be minimal etc.

My last attempt at getting clarity in that area didn't go anywhere: #141

Why I think this is important? Well, because we're laying the foundation for a new approach to base here and I'd find it problematic to vote on it ad-hoc without everyone very clearly understanding that this impacts how we think of base and making sure that thinking becomes a policy and doesn't regress in 5 years.

@chshersh
Copy link
Member

I appreciate the amount of work spent on refining the proposal but I have troubles seeing the reasoning behind it. Maybe all is good, and the proposal text just needs to be slightly changed to make currently implicit things explicit, but for now I have a few questions.

Problem and unoptimal solution

Currently, the proposal defines only the following problem:

No clear boundary between private/unstable and public/stable interfaces in the standard library.

The proposed solution to this problem is (in my understanding of the proposal text):

  1. Create a separate package ghc-base.
  2. Move everything from base to ghc-base.
  3. Reexport entire ghc-base from base.
  4. Hope, that in the future someone will work on CLC proposals to maybe deprecate/remove/decouple.

This sounds like a massive waste of time and money for solving the problem as stated.

A much-much simpler solution in my vision is to put a single documentation line e.g. "This module is internal for GHC and it's API may change in a backwards-incompatible way" to all GHC-specific modules. I don't see this option in the section of alternative solutions with the corresponding comparison.

base-compat

Again, this is not written in the proposal explicitly, but from the discussion I have an impression that one of the final goals is to decouple base from GHC, so Haskell users can upgrade to a newer version of compiler without upgrading to a newer version of the standard library.

I find this goal admirable but it's not written explicitly. So if it's actually one of the goals for the proposal, I would recommend to:

  • Write this problem explicitly
  • Compare the ghc-base split approach to base-compat which exists for 11 years, already has 161 direct and 7394 indirect dependencies, and doesn't require massive upfront work (base-compat is currently not mentioned in the proposal at all).

I would like to see the above two points addressed before I can vote on this proposal as a CLC member.

@Ericson2314
Copy link
Contributor

@hasufell my proposal does not change the interface of base. It is just about changing the implementation. In prior occasions @Bodigrim has in fact said the opposite, that the CLC weigh in on their sort of thing at all.

So consider this thread not an attempt to bypass the CLC, but figure out what parts or this the CLC is to sign off on and formally get the CLC to sign off on those parts.

There still needs to be an HF proposal because the proposal explicitly has HF paying for things, but that doesn't mean the CLC needs to be bypassed in the slightest.

@tomjaguarpaw
Copy link
Member

A much-much simpler solution in my vision is to put a single documentation line e.g. "This module is internal for GHC and it's API may change in a backwards-incompatible way" to all GHC-specific modules.

Strong agree. This is the first step we should take before embarking on anything more elaborate. Hopefully it's quick and easy! But we should do it.

@hasufell
Copy link
Member

hasufell commented Mar 16, 2023

@Ericson2314 the MR in it's current form does neither change scope or API of base. However, that is the ultimate goal: splitting base. Or is it not? (See point 4. @chshersh raised)

So my opinion is this needs CLC approval regardless of whether any API is currently affected or not.

This is about deciding on the future of base. CLC doesn't have a joint opinion on the future. So my concern is that we may (e.g. accidentally) regress if we decide on all matters of base in an ad-hoc basis.

@tomjaguarpaw
Copy link
Member

A much-much simpler solution in my vision is to put a single documentation line e.g. "This module is internal for GHC and it's API may change in a backwards-incompatible way" to all GHC-specific modules.

Strong agree. This is the first step we should take before embarking on anything more elaborate. Hopefully it's quick and easy! But we should do it.

By the way I already did the equivalent for ghc-prim. I'm afraid I don't have the bandwidth to take on the same job for base.

@Ericson2314
Copy link
Contributor

@hasufell

Why I think this is important? Well, because we're laying the foundation for a new approach to base here and I'd find it problematic to vote on it ad-hoc without everyone very clearly understanding that this impacts how we think of base and making sure that thinking becomes a policy and doesn't regress in 5 years.

I think the "new approach" part is actually pretty based agnostic --- it is more about decoupling (stable, intended for public consumption) library major bumps and compiler major bumps in general.

I have already started writing something on that with much the same thought of it being good to nail down policy and vision separate from implementation.

@Ericson2314
Copy link
Contributor

@chshersh

A much-much simpler solution in my vision is to put a single documentation line e.g. "This module is internal for GHC and it's API may change in a backwards-incompatible way" to all GHC-specific modules. I don't see this option in the section of alternative solutions with the corresponding comparison.

@tomjaguarpaw

Strong agree. This is the first step we should take before embarking on anything more elaborate. Hopefully it's quick and easy! But we should do it.

So let's leave aside existing modules for the moment.

Why should new GHC-specific implementation details go into base it all? Why is it is good to have a "divided Berlin" library serving two different purposes? (Stable standard library for regular user, misc exposed implementation details for adventurous users.)

I think the GHC devs need some place to put random junk that is out of the way, rather than constantly be butting heads with the CLC on this.

Existing modules still need to be triaged as to who is using them, where they belong, etc. etc. but this has proven to be contentious and even without contention I would submit it is actually a lot of work.

So I do think the policy parts @hasufell mentions are important to handle up front --- do we actually want separate homes for unstable implementation-specific stuff vs stable implementation-agostic stuff? But once that is agreed upon the triaging and dividing can happen over time with clear goal post in sight.


@chshersh

base-compat

Again, this is not written in the proposal explicitly, but from the discussion I have an impression that one of the final goals is to decouple base from GHC, so Haskell users can upgrade to a newer version of compiler without upgrading to a newer version of the standard library.

I find this goal admirable but it's not written explicitly. So if it's actually one of the goals for the proposal, I would recommend to:

* Write this problem explicitly

* Compare the `ghc-base` split approach to [base-compat](https://hackage.haskell.org/package/base-compat) which exists for **11 years**, already has **161 direct and 7394** indirect dependencies, and doesn't require massive upfront work (`base-compat` is currently not mentioned in the proposal at all).

Yes great points. Sorry this motivation got lost when the proposal was split. base-compat is very much worth mentioning and I am sorry I forgot to do so!

@tomjaguarpaw
Copy link
Member

I think the GHC devs need some place to put random junk that is out of the way, rather than constantly be butting heads with the CLC on this.

Strong agree with this too.

@chshersh
Copy link
Member

Why should new GHC-specific implementation details go into base it all?

It's a mystery to me why modules with GHC-specific implementation details happened to be in base in the first place. Was there a particular technical limitation why they couldn't be in the ghc package?

do we actually want separate homes for unstable implementation-specific stuff vs stable implementation-agostic stuff?

Since the amount of work to split GHC internals from base to a separate package is huge, there should be strong justification to do so. I find the existing justification in a form of "bigger packages are worse than smaller packages" and "it feels like GHC internals should be in a separate package" too weak to justify the amount of work.

I think the GHC devs need some place to put random junk that is out of the way, rather than constantly be butting heads with the CLC on this.

Strong agree with this as well. But is it a technical problem or a communication one? If GHC devs want to exempt some base modules from the CLC supervision, they can just create an issue here with the proposed list of modules to be excluded. We're all reasonable people here and I don't think there'll be any objection to this.

But if there's a technical limitation (e.g. having GHC-internals in base prevents decoupling of base from GHC), then it would be great to have this written in the proposal explicitly.

@hasufell
Copy link
Member

We're all reasonable people here and I don't think there'll be any objection to this.

Keep in mind that this effectively means that even Haskell's standard library can't adhere to PVP.

Unless you mean "exempt from CLC supervision, but still bump version according to PVP". At which point I'm not sure anymore what's worse.

@chshersh
Copy link
Member

Keep in mind that this effectively means that even Haskell's standard library can't adhere to PVP

base currently bumps up a major version with every GHC release. I don't recall any API breaking changes between minor releases of base. So technically, it adheres to PVP.

Moreover, even if it's not written in PVP explicitly, it's a common practice to have Internal.* modules which don't follow PVP. This practice is rather new compared to the entire life of the Haskell community, and we have to deal with legacy and decisions of previous people here. So I personally would be okay if there was just a documentation note in each internal module saying that this module is internal.

Of course, this depends on specific modules. I wouldn't want sudden breaking changes to GHC.Generics but I'm absolutely okay with breaking changes in GHC.InfoProv or GHC.Stack.CloneStack. Hence my suggestion to propose a list of GHC modules to be exempted.

@hasufell
Copy link
Member

base currently bumps up a major version with every GHC release

You mean it does so for every major GHC release: https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/libraries/version-history

Moreover, even if it's not written in PVP explicitly, it's a common practice to have Internal.* modules which don't follow PVP.

I find that this is not good enough and the wrong approach.

PVP is governed by CLC and hackage trustees. Please raise a proposal for PVP before we silently bypass that channel as de-facto standard in base.

It breaks tools, it breaks user expectations. Especially if it's a line in some haddock documentation probably no one ever reads.

Making GHC aware of internal modules and being able to emit a compile time warning sounds much more exciting.

But as of today, I'm neither excited about the .Internal, nor the haddock suggestion. At least for base.

@hasufell
Copy link
Member

Almost forgot the proposal already exists: haskell/pvp#46

@Ericson2314
Copy link
Contributor

Ericson2314 commented Mar 17, 2023

@chshersh

It's a mystery to me why modules with GHC-specific implementation details happened to be in base in the first place.

Agreed. The history here is complex in that I think at one point base did have CPP for e.g. GHC vs Hugs. But still odd!

Was there a particular technical limitation why they couldn't be in the ghc package?

Yes. the ghc package is the compiler. Regular programs need to depend on these GHC-specific implementation details underneath the hood but should not have to link the entire compiler.

Since the amount of work to split GHC internals from base to a separate package is huge

What is being proposed at this time is not huge. It is a first step to unblock a long incremental process of untangling. It is OK to get partways through that process and then stop --- e.g. because more new language features are needed to "liberate" the remaining code in ghc-base which ought to be implementation agnostic and live in base.

But if there's a technical limitation (e.g. having GHC-internals in base prevents decoupling of base from GHC), then it would be great to have this written in the proposal explicitly.

I am confused, this sounds like a tautology to me?


I think the cost issues are the most important thing above for us to work out @chshersh.

I agree we can't just pay for untangling everything up front, but that doesn't mean there is no value in kicking off a smooth incremental process. This is the sort of thing where we just need to learn more to understand what it is easy and hard, but we can't do that very well today.

I am quite confident that at least some things can easily be separated from ghc-base, made implementation agnostic, and moved back into base. And that's the milestone I am eager to reach. Getting there will "broaden our horizons" and make it easier to imagine and scope out what untangling the rest looks like.

@Ericson2314
Copy link
Contributor

And yes I agree with @hasufell --- documentation is not good enough, because people generally only read docs when they get stuck. And if the modules are in base and public, there is little "speed bump" (imagine HLS automatically adding import statements!) and no reason to read the stability docs.

Arguably one could also imagine tooling that automatically extends build-depends too, and it is the same problem, but I think for social reasons people will find inter-package vs intra-package boundaries much more salient.

@hasufell
Copy link
Member

@Ericson2314 yes. But... my major concern is that we're drawing API boundaries based on projects (or: people) and not based on what makes sense for the end user.

How do we envision this working long-term? Will API flow back into base? Under what circumstances?

Forgive me for being really frank here, but I don't see GHC HQ pushing for stable API anytime soon. Once we have a split and an "internal" playground, I expect very frequent breakage in the split out base part. And yet people will likely use it too.

And here's my question: will this serve GHC development, will it serve base, or will it even serve the end user?

I feel this is really hard to answer.

@tomjaguarpaw
Copy link
Member

And yes I agree with @hasufell --- documentation is not good enough

I also agree it's not enough. I don't think @chshersh was saying that either. But if we know certain APIs are unstable and we don't take the most simple and straightforward means of communicating that to the user, then I think we're derelict in our duty. Let's not let perfect be the enemy of good here: we should document what we know to be unstable whilst also looking for a good, general, technical solution.

@gbaz
Copy link

gbaz commented Mar 17, 2023

Strong agree with this as well. But is it a technical problem or a communication one? If GHC devs want to exempt some base modules from the CLC supervision, they can just create an issue here with the proposed list of modules to be excluded. We're all reasonable people here and I don't think there'll be any objection to this.

Some backstory which helps underline the motivation here: #105

In that lengthy discussion the idea that some modules be exempted was floated, and the conclusion from the CLC was that this is not a good idea. A lot of the motivations in all directions are discussed there, along with a fair amount of useful empirical investigation. While the approach of denoting some modules internal or some functions such was discussed there, and while I was indeed very sympathetic to it, I found the discussion pretty convincing and I understand why it was not approved of.

On to @hasufell 's question, I think the proposal will serve all three arenas. Here are some (but not exhaustive) reasons. (Though I find it a bit odd to speak of serving ghc devs and end users -- both groups of people -- and also of serving "base" -- which is a body of code, and for now, last i checked, not sentient).

  1. It will serve GHC developers by giving them a place where they can put functions they need to provide but do not intend to be used widely.

  2. It will serve the base library by disentangling "intended base" (a potentially compiler-independent set of primitive and fundamental functionality) from "accidents of ghc implementation".

  3. It will serve end users by allowing them to reinstall the base library and hence decouple upgrades to make use of new base functionality from upgrades to make use of a new compiler.

I imagine it would also serve the CLC by making it possible for it to focus on the exposed functionality of "genuine base" without also having to directly worry about additional things in ghc-base.

@Ericson2314
Copy link
Contributor

@hasufell

But... my major concern is that we're drawing API boundaries based on projects (or: people) and not based on what makes sense for the end user.

I agree that most package divisions are done for conway's law reasons, and that's not good. Long ago, the same people worked on the GHC agnostic and specific parts of base; it was one repo. now there are separate groups; it can be two repos.

This is indeed not reason enough!

That actual goal is:

  • base is:

    • stable, GHC version agnostic
      • and if we ever have a serious living non-GHC implementation, also portable to that
    • users are encoraged to use it
  • ghc-base is:

    • unstable, GHC-version specific
    • users are discoraged from using it unless they wish to heavily take advantage of specific bleeding edge experimentations.

This is reasoning separate from who is working on what. I think it's good! :)

How do we envision this working long-term? Will API flow back into base? Under what circumstances?

Yes I hope it will. But to be clear we don't have to wait for that before the benefits kick in.

Forgive me for being really frank here, but I don't see GHC HQ pushing for stable API anytime soon.

Per the above, I think that is perfectly fine! ghc-base is unstable with major version churn. base is stable and as soon as possible (once bin dist vs source dist versioning is untangled) does not have major version churn.

Once we have a split and an "internal" playground, I expect very frequent breakage in the split out base part.

That is fine!

And yet people will likely use it too.

That's on them.


There is the crucial bit: Anything reexported from ghc-base in base is not part of the playground. Those things are squarely in CLC review, and ghc-base authors must not break those things.

Clearly, having "delicate porcelain" in the "playground" is no fun, and having to remember which parts are reexported is also no fun. So for the sake of having a proper sandbox where one doesn't have to be so careful, I think they will be likewise incentivized to try to move that functionality back to base. Which is just the incentive we want!

They also are incentivized to remove "grandfathered in" unstable stuff from base as soon as possible so it only lives in ghc-base. The CLC however is naturally worried about breakage from uses in the wild. So this will be more of a tug-of-war. I am happy to sit back and watch that no matter what the outcome is :D.

@Ericson2314
Copy link
Contributor

@tomjaguarpaw Definitely base should be triaged, and if something cannot be moved somewhere else, adding a note in the docs is better than nothing. No disagreement! I don't mind for this process to slow that down. This can proceed in parallel until the initial split is merged, and then it simply opens up more possibilities for that triage process which hopefully help it go smoothly.

As @gbaz says, it seems like that is the lowest hanging fruit, but after the conversation in #105 it isn't clear that it actually is! I get that doing a technical project to "hack around" not being able to do a simple documenting task might seem like a "cop out", but in this case I think it is fair to say at least establishing the long-term goal and direction (namely, base should contain unstable implementation-specific bonus functionality) is a good thing to settle before the triaging process.

@bgamari
Copy link

bgamari commented Mar 17, 2023

@chshersh,

A much-much simpler solution in my vision is to put a single documentation line e.g. "This module is internal for GHC and it's API may change in a backwards-incompatible way" to all GHC-specific modules. I don't see this option in the section of alternative solutions with the corresponding comparison.

Indeed this is a useful first step which we have proposed in #146. This is something that I have been quietly working on in the background for some weeks now; I do wish I could have opened #146 before this issue arose to avoid much of the confusion in this ticket, but sadly things did not work out that way.

However, to be clear, I personally think that splitting internal modules into a new ghc-base is a very good idea. As others have pointed out, the PVP currently has no mechanism (see haskell/pvp#8) for designating modules as excluded from versioning; this means that under the status quo it is impossible for us to version base properly.

To make matters worse, experience has shown that users tend to treat exposed modules as "stable" regardless of what documentation says. Consider, for instance, the fact that a significant fraction of the modules which we are proposing to be internal are already documented as being "internal" via their Haddock documentation. Nevertheless, many have significant numbers of users despite, in many cases, their being more stable places to import the provided functionality from.

Drawing a clear package boundary between base and those things provided by GHC would make it far more obvious to users when they are treading into less stable territory.

@tomjaguarpaw
Copy link
Member

I should probably clarify because I probably wasn't clear: I believe that modules that we know to be internal/unstable should be clearly documented as such whilst we pursue other options. I don't believe that documentation makes the other options unnecessary, and I'm in support of the direction of travel of @Ericson2314 @bgamari and others.

@cartazio
Copy link

@Ericson2314 would this also remove the no haddock pragmas from those internal modules?

@bgamari
Copy link

bgamari commented Mar 17, 2023

base currently bumps up a major version with every GHC release. I don't recall any API breaking changes between minor releases of base. So technically, it adheres to PVP.

This is really a matter of definition. I would have to go digging for examples but I am certain that there are cases where we have broken the PVP in minor releases due to changes in implementation. I think in practice we do a reasonably good job at versioning public interfaces but technically, when one includes the GHC.*, I think it is hard to argue that base is adhering to the PVP.

Since the amount of work to split GHC internals from base to a separate package is huge, there should be strong justification to do so. I find the existing justification in a form of "bigger packages are worse than smaller packages" and "it feels like GHC internals should be in a separate package" too weak to justify the amount of work.

To be clear, the amount of work needed to break out ghc-base from base is not massive. It's mostly moving files, some build system changes, and a bit of work in the compiler to change wired-in module names.

The only open question that comes to mind is how we ensure that error messages continue to refer to base and not ghc-base. I haven't yet given much thought to this but this seems like a manageable problem.

Strong agree with this as well. But is it a technical problem or a communication one? If GHC devs want to exempt some base modules from the CLC supervision, they can just create an issue here with the proposed list of modules to be excluded. We're all reasonable people here and I don't think there'll be any objection to this.

Indeed, see #146. However, I do still believe there is value in having a package distinction. As noted elsewhere, documentation tends not be be seen, is not easily auditable (mechanically or otherwise), and can fall out of date (e.g. there are a significant number of quite stable modules in base whose "stability" documentation still says "provisional" or similar).

Forgive me for being really frank here, but I don't see GHC HQ pushing for stable API anytime soon. Once we have a split and an "internal" playground, I expect very frequent breakage in the split out base part. And yet people will likely use it too.

I'm not sure I understand what you mean, @hasufell. Until very recently we have had precisely the split that you describe (where our playground was largely inside of GHC.*) and yet we have not broken things with the abandon that you suggest, particularly in interfaces where we expect widespread usage (intended or not).

How do we envision this working long-term? Will API flow back into base? Under what circumstances?

If an interface is intended to be public (note that many are not) and it is possible to provide the degree of stability which GHC's users expect, then I would indeed expect that the interface will be propagated back to base.

@phadej
Copy link

phadej commented Mar 17, 2023

@bgamari

I would have to go digging for examples

Mention one or two. Otherwise that point is easily dismissed.

@bgamari
Copy link

bgamari commented Mar 17, 2023

@bgamari

I would have to go digging for examples

Mention one or two. Otherwise that point is easily dismissed.

Happily, I was unable to find even a single example of this in the last few releases.

However, the principle stands: if we need to change, e.g., the type of GHC.Desugar.toAnnotationWrapper in a minor GHC release to fix a soundness issue then we should do so without putting the rest of the ecosystem through the pain of a major bump in base version. An unfortunate situation where something of this nature would be necessary is not hard to imagine; we should be anticipating it in our library design, regardless of whether or not it has happened in the recent past.

@adamgundry
Copy link
Member

There is no reason why GHC developers cannot put their new internal, unstable modules into a new package, say, ghc-junk (or ghc-prim), and just leave base as is. They can literally start doing it today. This did not happen before just because the control environment was derelict and there was no resistance against piling everything in base.

What happens when GHC devs need to add an internal definition that (a) depends on parts of base and (b) is depended upon by other parts of base? One could transitively move more things into ghc-junk, but then it ends up looking very much like ghc-base...

The ghc-junk plan is magnitudes easier than the proposed split of base.

I'm not convinced this is true. You seem to think the proposed split of base will be extremely expensive, but I don't see why?

@Bodigrim so you are ok with violating PVP in base through Internal module exemption?

What in the statement above makes you think that I am? I'm reluctant to violate PVP anywhere, and base seems to have quite a good track record?..

It has a good record of following PVP because the major version of base is bumped via every GHC release. This proposal would allow successive GHC releases to change internals but maintain the same base API (or only a minor bump), thereby reducing churn in the ecosystem.

This section omits the discussion of governance of ghc-base. The default assumption is that CLC remains in charge of both, is it the case? Otherwise if base re-exports ghc-base, but ghc-base is out of CLC control, it's impossible for CLC to fulfill its duty.

I would assume that ghc-base would be under the control of the GHC devs, subject to the proviso that changes affecting base re-exports are still the province of the CLC.

@Bodigrim
Copy link
Collaborator

As a point of order, this thread is not the best place to discuss ideas around or possible future developments of the HF proposal. It is a request to buy in a specific written text, and I am unwilling to get behind uncertain speculations.

What happens when GHC devs need to add an internal definition that (a) depends on parts of base and (b) is depended upon by other parts of base?

If a new definition is depended upon by other parts of base then a CLC proposal is due anyway and we can use it to stabilise this definition and put it into base. Because otherwise each change to this entity will require a CLC proposal to change affected parts of base.

I do not quite enjoy discussing theoretical possibilities, there are over 9000 them. Please come up with a realistic scenario and explain it in the proposal.

I'm not convinced this is true. You seem to think the proposed split of base will be extremely expensive, but I don't see why?

Maintaining one library is hard, maintaining two libraries in sync is hard squared.

The burden of proof lies with the proposer. The proposal does not discuss maintenance costs and makes no attempts to convince readers that they are somehow negligible.

This proposal would allow successive GHC releases to change internals but maintain the same base API (or only a minor bump), thereby reducing churn in the ecosystem.

As a Hackage trustee and a maintainer of multiple core packages, I find this argument very weak, but it does not matter because the current proposal does not seem to reference it.

I would assume that ghc-base would be under the control of the GHC devs, subject to the proviso that changes affecting base re-exports are still the province of the CLC.

In my opinion this is not a workable arrangement. It makes it too easy for GHC contributors to break things inadverently, to the degree that I would find CLC position untenable.

It also makes it too tempting to put something generally useful in ghc-base only, just to avoid CLC scrutiny. Which will in turn make it tempting for external users to depend on ghc-base, eliminating any potential benefit of the proposal. Proof? We've already seen it with ghc-prim.

@bgamari
Copy link

bgamari commented Mar 29, 2023

We've already seen it with ghc-prim.

I am curious to know what concretely this is referring to. The GHC maintainers have never, as far as I know, placed something in ghc-prim to avoid CLC scrutiny. I find it a bit surprising that this would be suggested.

@Bodigrim
Copy link
Collaborator

I am deeply sorry, my rhetoric got in the way of clarity. The quoted passage relates to the previous sentence only, not the entire paragraph.

I was referring to the fate of ghc-prim, which was meant to be an internal package only (even more internal than the proposed ghc-base) and yet ended up to be widely used in the wild because certain useful entities from it were (inadverently) not re-exported from base; getSolo is the latest instance of this saga. Controlled co-evolution of two packages requires huge amount of dedication and effort, even more so for the hypothetical split of base.

@hasufell
Copy link
Member

I was referring to the fate of ghc-prim, which was meant to be an internal package only (even more internal than the proposed ghc-base) and yet ended up to be widely used in the wild because certain useful entities from it were (inadverently) not re-exported from base;

Was ghc-prim a coordinated effort?

The entire point of the proposal is to make it coordinated and formalize the approach, so that things don't randomly evolve.

I find your stance a bit hard to follow... somehow you seem to point out that random evolution is problematic, but also suggest that coordinated effort is too much effort.

I think we'll be doomed to pick one.

Does the current proposal sufficiently describe how that coordination looks like? No. But it could.

@Ericson2314
Copy link
Contributor

I would also say in general, take the non-normative parts with a grain of salt. They perhaps still reflect my own opinions more than a general consensus. I hope we can reach an agreement on the normative parts, and then I am find with any outcome for the rest.

@bgamari
Copy link

bgamari commented Mar 30, 2023

I am deeply sorry, my rhetoric got in the way of clarity. The quoted passage relates to the previous sentence only, not the entire paragraph.

No worries. Indeed communications can be tricky.

Controlled co-evolution of two packages requires huge amount of dedication and effort, even more so for the hypothetical split of base.

Yes, this is true and admittedly GHC has not been as careful in this regard as we could have been. However, I do think it is reasonable and feasible to draw a clear line between the interfaces provided by Haskell-the-language and GHC-the-implementation.

For instance, the Haskell report specifies that an implementation should provide a type named Handle. However, it says nothing about the representation or operations supported by that type beyond a select few. While the operations which are specified by the Report are sufficient for most programs, there are cases where the user must rely on GHC's particular implementation (e.g. the bytestring and network libraries). By separating these two interfaces with a package boundary we will reduce the need for major version bumps in base and consequently reduce the impact of GHC releases on the majority of users.

@bgamari
Copy link

bgamari commented May 9, 2023

For what it's worth, I have recently stumbled across a use-case where ghc-base would be extremely useful. Specifically, the exception backtrace proposal proposes that exceptions carry stack backtrace information represented by GHC's new StackSnapshot# type, among other representations. It further proposes that backtraces can be decoded and formatted for display to the user. The logic for decoding such stack snapshots currently lives in the ghc-heap package, which cannot be depended upon by base (since this would induce a dependency cycle).

One way to address this would be to move the implementation of ghc-heap into ghc-base and expose it as a set of internal modules. These could then be used by the exception machinery in ghc-base/base and re-exported from ghc-heap (which would continue to be their canonical home).

@phadej
Copy link

phadej commented May 9, 2023

I commented on the same @bgamari comment elsewhere.

#146 (comment) and next few comments.

I think that will be bad. I probably would just use ghc-heap stuff directly from base if I'm not prevented by tools (and HLS helpfully adding imports for me).

TL;DR trying to solve short term problem making more long term ones. Not great.

@bgamari
Copy link

bgamari commented May 9, 2023

I think that will be bad. I probably would just use ghc-heap stuff directly from base if I'm not prevented by tools (and HLS helpfully adding imports for me).

I think you may have misread the comment, @phadej. I did not suggest that ghc-heap would be available from base. I rather suggested that it would be exported from ghc-base, imported by base (for use in exception backtrace formatting), and re-exported by ghc-heap.

@phadej
Copy link

phadej commented May 9, 2023

I think you may have misread the comment

You right. Your comment looked too similar to the other one, and it confused me.

@Ericson2314
Copy link
Contributor

Ahead of the next HF technical working group meeting (Thursday) I am trying to figure out what's going on in this mad related threads, and it is confusing. I hope we can draw this to a close soon.

@hasufell
Copy link
Member

Ahead of the next HF technical working group meeting (Thursday) I am trying to figure out what's going on in this mad related threads, and it is confusing. I hope we can draw this to a close soon.

I think the main proposed alternative in terms of "base split" is to do a gradual approach: instead of moving everything out and making base a reexport shim, we just let GHC devs create a boot package and do innocent stuff there without touching base (doesn't need CLC approval at that stage). The next stage could be:

  • move "portable" internals out of base (stuff that other base API doesn't depend on)

The last stage could be:

  • move both base code and internals into that package (in case they are intertwined)

All that would be negotiated in isolation, when it is actually proposed. And it could start without CLC involvement.

The upside of this is that it won't burn out CLC and is a conservative approach, that only requires CLC involvement at certain stages (that may or may not happen). The downside is that it's unclear whether we'll ever get a proper "split" and achieve the "base is reinstallable" goal.

@simonpj
Copy link

simonpj commented May 10, 2023

I think the main proposed alternative in terms of "base split" is to do a gradual approach:

I like the gradual approach too.

But I have a technical question. Provided the modules, exports, and API of base are maintained, isn't it immaterial to clients (and hence to CLC) exactly how they are maintained? It could be:

  1. All of the code is in base and ghc-base starts empty, or
  2. All of the code is in ghc-base and base is a bunch of shims

This decision is just an implementation matter, isn't it?

Of course, under (2) any changes in ghc-base that affected the API of base would require CLC agreement. It's an implementation matter where the function is actually defined, though.

Maybe there are issues I'm not aware of. It'd be great to lay them out.

Incidentally in another thread you drew attention to https://nikita-volkov.github.io/internal-convention-is-a-mistake/, describing it as the proper pattern for exposing internal modules. It's great that this ghc-base proposal precisely follows this advice. Thanks for pointing to it.

@Ericson2314
Copy link
Contributor

Thanks, both of you.

@hasufell

move "portable" internals out of base (stuff that other base API doesn't depend on)

Do note that if we move things up and out of base, then we are exposing our refactorings to end users much more noticeably.

The only reason I proposed moving things "down" to ghc-base first was to be able to move them back "up" afterwords while making everything still exported by base, so all the reshuffling can happen behind the scenes.

The upside of this is that it won't burn out CLC and is a conservative approach, that only requires CLC involvement at certain stages (that may or may not happen).

Absolutely this is a goal. I didn't go in to this expecting to drag in the CLC much at all, and I think something got lost in translation.


Based on the @hasufell's response, I think I might have spotted a source of what (to me) is causing unexpected acrimony. I think it is the fact that ghc-base is sort of playing double-duty in the proposal as written:

  • ghc-base is the temporary holder of code that we have yet to work through (the mass move and reexport)
  • ghc-base is the final holder of code that we think is more the GHC team's responsibility.

I've tried to make clear that anything reexported as-is by base is something that CLC still have full say over, but perhaps that is too fine a point. I concede at a glance what the proposal looks like that:

  1. We give a bunch of code to GHC devs, putting CLC in weird position
  2. CLC later gets a chance to take that code back.

To make it absolutely clear that is not the intention, perhaps we shouldn't use ghc-base for both purposes. We could instead have 3 layers:

  1. base: What regular users use, CLC's domain
  2. base-untriaged: everything is moved here to start.
  3. ghc-base, ghc-prim, etc. GHC's domain

This makes clear that code that has yet to be untangled is not biased towards either one of GHC devs or the CLC: it is still in the middle and has yet to be worked through, and everyone must be cautious when changing it.

Additionally, this allows the "peal off the bottom" and "peal off the top" approaches of proceed concurrently. While I suspect "peal off the top" to work much better in practice (and hence I didn't propose and empty ghc-base from the get-go), it is nice to not have to guess ahead of time which we expect to be more fruitful, and instead allow both.

@mixphix
Copy link
Collaborator

mixphix commented May 10, 2023

Adding another layer doesn't solve the problem, it only makes it more confusing. ghc-base doesn't even exist yet, and you're already suggesting the same solution (split off another module) in order to deal with the problem we haven't even finished making. Is that really the path we would like to tread?

@Ericson2314
Copy link
Contributor

Then think about just a ghc-base -> base-untriaged rename.

Whatever ghc-base is is TBD, and not the CLC's problem ahead of time. If and when something looks like it peeled off base-untriaged and given a new ghc-* home, the GHC devs can create a CLC proposal (since code is leaving base-untriaged) and argue the merits of that specific item.

It is perfectly possible that won't happen, and we'll just get stuff pealed off the top of base-untriaged and put back into base.

@gbaz
Copy link

gbaz commented May 10, 2023

I understand the conceptual desire for three layers, but as a practical matter I agree two is plenty. I think you're right to note that ghc-base will be doing double duty -- but moving stuff for conceptual purposes will I think confuse both public and intra-committee messaging further.

Let me point out a third component as well on top of the other two, which I think would make a further split even more of a pain. You said

  • ghc-base is the temporary holder of code that we have yet to work through (the mass move and reexport)
  • ghc-base is the final holder of code that we think is more the GHC team's responsibility.

On top of that, we have

  • ghc-base is the final holder of code that necessarily needs to be non-reinstallable and coupled to ghc (even should some of it be re-exported).

That said, between the "move and reexport" (ghc-base is initially "overpopulated") and the "create without moving" (ghc-base is initially sparsely populated) approach, I think both could work with ultimately similar end results, and its just a matter of which people think creates the nicest collaborative process going forward.

@Ericson2314
Copy link
Contributor

ghc-base is the final holder of code that necessarily needs to be non-reinstallable and coupled to ghc (even should some of it be re-exported).

Ah yes. That is what I meant; to me that is just a more precise statement of the same idea: It's the GHC's team responsibility precisely because of that coupling.

I think both could work with ultimately similar end results, and its just a matter of which people think creates the nicest collaborative process going forward.

Right, and that's where the rubber meets the road. I am very pessimistic about peeling stuff from the bottom; I am very optimistic about peeling stuff from the top; to do the latter (without moving things out of base, creating needless churn for users) requires some move everything down below prep step.

@bgamari
Copy link

bgamari commented May 10, 2023

@Ericson2314 if I understand you correctly, you are leaning towards keeping most implementation in base and moving things selectively into ghc-base.

I am remain rather skeptical that this is feasible. Many parts of base are quite cyclic and lean heavily on hs-boot files. No doubt some of this circularity could be eliminated with reorganization, but I believe a good amount of it is intrinsic. Consequently, I suspect you would find that moving a small number of declarations of base would quickly snowball into a much larger transitive closure (since we have no ability to break cycles across packages).

@Ericson2314
Copy link
Contributor

Other way around: I share those exact same concerns.

@Ericson2314
Copy link
Contributor

More broadly, the focus of the propose is not what division do we want in the long term but what untanglings make sense in the short term. That is entirely where the "move and reexport' comes from. This is supposed to be the beginning of a process, an experiment, not the end of it.

In the other thread @Bodigrim writes

But base split is a one-way road, we would not be able to revert it once enacted,

And that doesn't seem to me to be true at all: it is incredibly easy to recombine libraries that live in the same repo. And that is good, because in the spirit of trying things and experimenting, everything should be reversible.

@simonpj
Copy link

simonpj commented May 10, 2023

There is some debate above about what code might live in which library. But I'm puzzled. Let me ask my question again:
provided the modules, exports, and API of base are maintained, isn't it immaterial to clients (and hence to CLC) exactly how they are maintained? It could be:

  1. All of the code is in base and ghc-base starts empty, or
  2. All of the code is in ghc-base and base is a bunch of shims

This decision is just an implementation matter, isn't it?

Of course, under (2) any changes in ghc-base that affected the API of base would require CLC agreement. For example, if reverse is defined in ghc-base and re-exported by base, then any change to the type signature, semantics, (a fuzzier criterion) performance of reverse should be dicussed with CLC; it is part of the base API.

But It's an implementation matter where the function is actually defined. So I'm puzzled by the above debate.

Maybe there are issues I'm not aware of. It'd be great to lay them out.

@Ericson2314
Copy link
Contributor

Ericson2314 commented May 10, 2023

@simonpj I agree, it is puzzling.

My guess that the overloaded function of ghc-base (yet to be triaged code + triaged "internal" code) was a source of tension was in response to your question --- a theory that it would be too easy to confuse code in the former category in the latter, and this potential for misunderstanding what has made the discussion of this topic so surprisingly contentious.

That's why I hoped renamingghc-base to base-untriaged, so it is just about the former purpose, explicitly, would help.

@mixphix and @gbaz said 3 layers was too much, and because I confused @bgamari yet me reiterate: yes, I don't actually think we can separate GHC internals beneath everything else well at all. That means we wouldn't get 3 layers. I suspect what would happen instead is this:

  1. base (today)
  2. base-untriaged, it is rexported in base, which is otherwise empty. base is no trivially reinstallable.
  3. Some things moved from base-untriaged to base. Yay!
  4. ...
  5. Only weird internal things left in base-untriaged; CLC <-> GHC dev trust is increased, rename base-untriaged to ghc-base now that its role is clear from sifting nice stuff out prior.

No 3 layers in that plan, because of the great difficulty in doing any refactor to base-untriaged other than floating stuff up back to base.

If avoiding moving untriaged code to a library with ghc in its name is not the source of tension, then yes, I am just as confused as @simonpj. Please enlighten me someone!

@Bodigrim
Copy link
Collaborator

We had a very productive meeting with @Ericson2314 at Zurihac, and worked on a new proposal based on haskellfoundation/tech-proposals#47 (comment) suggestions, soon to be made public. Thanks all who participated here, helping us to understand the underlying problem and perspectives better.

Let me close this discussion, it's been quiet for the past month and it will be rendered obsolete by the new proposal soon.

@Bodigrim Bodigrim added the meta General questions on CLC rules and policies label Jun 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta General questions on CLC rules and policies
Projects
None yet
Development

No branches or pull requests