Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Move dependency solver to library #2768

Closed
wants to merge 6 commits into from

Conversation

bgamari
Copy link
Contributor

@bgamari bgamari commented Aug 16, 2015

One of the major challenges when working within cabal-install is the monolithic nature of the codebase. Moreover, since the package provides only an executable component, features like sandbox support and the solver are inaccessible to third-party tooling (a problem which I am current feeling with cabal-ghc-dynflags). This takes the first step towards breaking up cabal-install into a set of reusable libraries by taking the largest largely-independent chunk of code, the constraint solvers, and moving them into a separate library, cabal-install-solver.

Of course, the solvers do have some dependencies on the rest of cabal-install. I place these in a third package, cabal-install-lib.

This consists of two commits,

  1. Split out the solvers themselves (Distribution.Client.Dependency.*) into cabal-install-solver, moving the bare minimum modules necessary to cabal-install-lib. This exclusively involves file renames and cabal file mangling.
  2. Move Distribution.Client.Dependency itself out to cabal-install-solver. This involves bringing over a few more modules to cabal-install-solver and a bit of surgery to remove the policies

Looking forward, I would like to continue to move functionality into more fine-grained libraries. The package index (e.g. PackageIndex currently in cabal-install-lib and perhaps a few others still in cabal-install) is probably the next logical step here (moving it to cabal-install-index). I would especially like to expose some of the sandbox interfaces as sandboxes are currently entirely opaque to third-party tools. This, however, may require substantial effort as much of cabal-install is written specifically as the backend to a command-line application.

More on this later.

@bgamari bgamari force-pushed the split-up branch 3 times, most recently from 33a5311 to 2ddd877 Compare August 16, 2015 16:52
@23Skidoo
Copy link
Member

/cc @dcoutts, @kosmikus

@bgamari
Copy link
Contributor Author

bgamari commented Aug 17, 2015

Up to merge conflicts this should now pass the testsuite. However, I'm going to hold off on addressing the conflicts until there is some positive feedback on the idea and approach.

@phadej
Copy link
Collaborator

phadej commented Aug 17, 2015

I personally very like the idea! 👍

Other thing worth splitting is a code reading package database from tar file (I guess it's duplicated in a way or another in cabal-db and various other tools).

@mietek
Copy link
Contributor

mietek commented Aug 18, 2015

I like the idea. I would like to understand how this PR would bring Cabal closer to OPAM, with the ability to use CUDF-compatible solvers:

/cc @mboes @tomjaguarpaw

@mboes
Copy link
Contributor

mboes commented Aug 18, 2015

👍

@mietek this is a first step. The major difference with CUDF framework is that there package managers (such as OPAM) have a textual interface with the solvers, rather than interacting via a library API. So CUDF decouples components more strongly. But it could well be pretty easy to slap a CUDF frontend in front of the existing solvers, now that they are standalone packages.

@hvr
Copy link
Member

hvr commented Aug 18, 2015

One thing I couldn't figure out from the CUDF primer and skimming through the CUDF paper: Can .cabal specifications actually be fully expressed in the CUDF model? For instance, I haven't seen an obvious way to express cabal flags & conditionals.

@erikd
Copy link
Member

erikd commented Aug 19, 2015

I'm very much 👍 in this.

@bgamari
Copy link
Contributor Author

bgamari commented Aug 19, 2015

Alright, so it sounds as though there is fairly wide support for this idea. It would be great if I could get a more thorough code review. This will be a very difficult patch to keep up-to-date and will be rather annoying for anyone with outstanding patches, so I'd like to ensure that we plan this carefully.

@dcoutts, do you have any objections here?

@tomjaguarpaw
Copy link
Member

since the package provides only an executable component, features like sandbox support and the solver are inaccessible to third-party tooling

Isn't the correct solution to this to just turn the whole package into a library? One could literally make everything a library apart from the Main module. I think this would actually make a lot of sense.

@bgamari
Copy link
Contributor Author

bgamari commented Aug 19, 2015

@tomjaguarpaw personally I would rather see it split up a bit. Currently cabal-install is a bit of a monolith and unnecessarily so. A more modular layout would improve readability substantially IMHO.

@mietek
Copy link
Contributor

mietek commented Aug 19, 2015

Seems like turning cabal-install into a library would be a good first step towards further modularisation.

@BardurArantsson
Copy link
Collaborator

@mietek and @tomjaguarpaw see #1597 for reasons why that would not be a good idea. (But certainly general-enough and disconnected parts could perhaps.)

@bgamari I'll have a go at reviewing the code; I'm a noob round these parts (even more so in the solver), so I'm not sure what it's worth -- perhaps I can spot style issues and such.

@ttuegel
Copy link
Member

ttuegel commented Sep 15, 2015

I would prefer if we separated the network-aware components of the solver from the agnostic components and move the latter into the Cabal library itself. I think separating the solver from the transport is an important step toward enabling different package distribution schemes. Also, from a purely intellectual standpoint, I think the solver should care about the contents of the package index, not where it came from. Having the transport-agnostic parts of the solver in Cabal itself would help solve #2777. I still think it's a good idea to separate the transport-sensitive parts of the solver from cabal-install, too.

@kosmikus
Copy link
Contributor

Sorry for being late to join this discussion. Could we please make it explicit what exactly it is we hope to achieve by doing this? I'm not opposed to it, but I'm not sure what the goals are. If the goals are clearly stated, it would make it easier for me to evaluate if the currently proposed split is a good way to achieve them. Note that if the goal is mostly to be able to plug in external solvers (such as SMT solvers or CUDF solvers), this can in principle be done already, without separating out the solvers into extra packages. But I think there's more involved here, such as having library interfaces for cabal-install itself and/or the solvers so that functions can be called easily from other tools/libraries. It's here where I'm not quite sure what exactly it is we want or are hoping for.

@BardurArantsson
Copy link
Collaborator

So, I think I think I'd like to take a stab at this -- if not thing else then for my own edification. However, I want to try do it in a somewhat more... "disciplined" way than this PR. (That is, to not just create a 3rd "commons"-type library which both Cabal and cabal-install depend on.)

At the very least I can hopefully see what some of the difficulties with the "disciplined" approach are and provide a summary here.

There seems to be some disagreement on whether the solver should be moved to its own entirely-separate-from-Cabal library or "just" moved into Cabal-the-library. Would moving it into Cabal be an acceptable first step for the "solver" portion of your use case @bgamari?

@ttuegel If it turns out that it's feasible to separate the solver entirely from Cabal (and to provide an API which would remain long-term stable), would you object to that? I'm thinking of moving only the solver, no transports, file access, or anything like that.

@bgamari
Copy link
Contributor Author

bgamari commented Nov 8, 2015

Bardur Arantsson [email protected] writes:

So, I think I think I'd like to take a stab at this -- if not thing else then for my own edification. However, I want to try do it in a somewhat more... "disciplined" way than this PR. (That is, to not just create a 3rd "commons"-type library which both Cabal and cabal-install depend on.)

I'm very glad to hear this. While I would love to see things start move
in this direction, I'm afraid I just don't have the time to perform the
sort of restructuring that a more principled approach might require.
Good luck!

@BardurArantsson
Copy link
Collaborator

Somewhat leading question for everyone in here: Is anybody actually still using the TopDown solver? Given that it's not part of any stable API, does it make sense to keep it around?

After having fiddled about with the code a bit, here's my tentative plan:

  1. Move the modular solver into the Cabal library with absolutely minimal changes. It will initially remain in an .Internal. module namespace which will be exposed such that cabal-install can use it, but as such will be clearly marked as "unstable API" which does not adhere to the PVP (or any stability guarantees, really.) If anyone starts to use it from non-Cabal packages, then any breakage is on them and will not be considered when moving forward.
  2. The TopDown solver will remain in cabal-install and no effort will be expended on trrying to move it. This is because it has quite a few (rather larger) dependencies on bits of cabal-install; bits which I'm not sure it makes sense to move.
  3. Splitting TopDown from Modular will require splitting a few definitions cabal-install/Distribution/Client/Dependency/Types.hs in some sensible way -- here's where the leading question comes in. It would be much easier if we could just outright remove the TopDown solver. I'm pretty confident the split could be achieved without removing TopDown, but it would make my life a lot easier :).
  4. Once this works and all tests still pass, etc. this becomes a "checkpoint" for merging. This is mainly to avoid stalling other work on the solver in the mean time.
  5. At this point we can start refining the solver's "exposed & stable" API. We can also start thinking about how/if the solver can be separated from Cabal itself and whether that makes sense.

Does this plan make sense to everyone? Objections? Comments?

@23Skidoo
Copy link
Member

23Skidoo commented Nov 9, 2015

@BardurArantsson

Somewhat leading question for everyone in here: Is anybody actually still using the TopDown solver?

See #2531.

@BardurArantsson
Copy link
Collaborator

So that would be... no? ;)

@23Skidoo
Copy link
Member

23Skidoo commented Nov 9, 2015

@BardurArantsson The old solver is deprecated, but the plan is to keep it for a while (1-2 releases, I think).

@hvr
Copy link
Member

hvr commented Nov 9, 2015

@BardurArantsson re

Move the modular solver into the Cabal library with absolutely minimal changes.

Tbh, I'm not sure this belongs into the Cabal library, but I think this rather belongs in a new package/library, especially if this incurs additional build-depends which we'd have to integrate into GHC's build-system, which is quite a pain that causes problems on several levels. I'd rather see Cabal remain as minimal as possible (in order to satisfy the CABAL specification) with as few external dependencies as possible.

PS: another problem I see is that tying this into Cabal makes cabal-install less flexible if we want to extend/improve/fix the solver inbetween major releases of Cabal. cabal-install is tied to whatever solver is available in the linked against Cabal library.

@kosmikus
Copy link
Contributor

kosmikus commented Nov 9, 2015

Yes, I agree with @hvr. I said a while ago I'd think about whether to include the solvers in the library, and after doing so, I don't think it's a good idea. The Cabal lib is not tied up to the solver, and won't need to be, and putting the solver(s) in there is only going to make changes to the solvers unnecessarily harder. So I'd also go for a new package.

@kosmikus
Copy link
Contributor

kosmikus commented Nov 9, 2015

@BardurArantsson Regarding topdown: I don't think it's needed anymore, as also @23Skidoo says. I would very much like to keep it though. Even though it's mostly inferior to the modular solver, it's always good to have two implementations of an interface, and it's also worthwhile to have two independent solvers in the first place. I sometimes still use it for testing purposes.

@BardurArantsson
Copy link
Collaborator

@hvr : Right, for some reason it didn't cross my mind that I could just create a separate package which depends on Cabal and which cabal-install then depends on in turn. That's what I'll do. (Btw, the only added dependency I stumbled upon is "mtl".)

@kosmikus I definitely agree that it's better to have two implementations of the same interface in principle, but given how the code looks, I'm not really sure that that's actually what we have at the moment. There's also a non-trivial cost to having the two implementations. In particular the top-down implementation looks as though is quite tied in to internal cabal-install data structures. (Though, keep in mind I haven't spent that much time analyzing the code yet, so I may be mistaken.) I'll see what can be done about preserving the top-down solver as an option in cabal-install, but I am pretty sure that it won't be possible to split it from cabal-install in any sensible way -- it looks like quite a lot of (ultimately pointless since it's already deprecated) work to do so.

@BardurArantsson
Copy link
Collaborator

(Not sure how much the following summary will make for people not intimately familiar with the solver code, but here it is anyway: :) )

OK, so I've got a basic split-up implemented, but it needs a little bit of cleanup. Basically, it turned out that it wasn't that hard to move the two solvers to a new cabal-solver package which goes "between" Cabal and cabal-install. The biggest sticking point remaining right now is that the new cabal-solver package depends on network because the PackageLocation data type transitively contains a URI. Since no details of the PackageLocation type are actually used in the solvers, I'm pretty confident that the dependency can be broken by adding a new type parameter to SourcePackage. Thus it should be possible to keep PackageLocation in cabal-install -- I just haven't done the work quite yet.

I'll probably open an RFC Pull Request sometime during the coming week.

@phadej
Copy link
Collaborator

phadej commented Nov 16, 2015

@BardurArantsson sounds great! 👍

@BardurArantsson
Copy link
Collaborator

See #2928.

@ezyang
Copy link
Contributor

ezyang commented Dec 23, 2015

The discussion here seems to have glossed over a point @ttuegel, which is that currently the Cabal library has a very dumb solver for dependencies and flags which occasionally can use up a lot of memory (#2777).

It seems to me that if cabal-installs is having its solver factored out, then so should Cabal. Specifically, whatever interface the separated modular solver ends up having, Cabal's solver should also be factored to follow the same interface. Though as a type this, I'm not sure how possible this is, since Cabal's solver is really dumb (it does not even attempt to make sure that the dependencies are globally consistent; it just checks it afterwards.) It is too bad we cannot just retire the solver!

@BardurArantsson
Copy link
Collaborator

It seems to me that if cabal-installs is having its solver factored out, then so should Cabal.

How does this follow? cabal-install is (naturally) higher up in the software stack.

It is too bad we cannot just retire the solver!

I'm guessing you're thinking of the relatively hard requirement that "./Setup configure" should work? I could imagine splitting Cabal further such that the really core bits (reading the cabal files, etc.) were split up, but it's more work than I'm will to go through, at least. (Given the compatibility requirements it'd probably be a multi-year undertaking, I'm guessing. Perhaps the hassle could be alleviated by making Cabal into an "umbrella" project for compatibility, but still...) Would be nice, though...

@hvr
Copy link
Member

hvr commented Jan 17, 2016

@BardurArantsson Taking a step back and considering this from a meta-perspective: Sadly there's also an associated cost with splitting up Cabal into smaller pieces. Even though I'd like myself to be able to tap directly into the solver as a library conveniently, there needs to be a significant benefit for the Cabal project to justify the incurred cost of management overhead of managing and coordinating multiple interdependent packages, including diligent PVP conformance.

Then there's the decision to make whether to keep all sub-packages in a single Git repo (Git branching and tagging becomes confusing and unnatural, cabal get -s doesn't properly support this) or split up into different Git repos (Git repo inter-dependencies, Git submodule mess). And also how the issue-tracker is affected by this.

At this point I'd argue that we desperately need all resources to move forward Cabal+cabal-install development, and consequently postpone such a package refactoring to a time when the big features that are in the pipeline right now are closer to completeness, and we also have a better picture how the various moving parts of the finished thing depend on each other.

@BardurArantsson
Copy link
Collaborator

@hvr What I'm trying to do is actually the much simpler version (which I think you actually suggested?) where we have Cabal -> cabal-solver -> cabal-install as a simple dependency chain and all in the same repo. That will at least allow others to depend on just Cabal + cabal-solver. It is the 80% solution that I personally can live with and which I think adds a lot of value -- enough to overcome the downsides. Besides, it's just good software development practice to separate modules as strictly as possible. (See #2928 for a little more detail on what I've done to this end. It needs a refresh, but I'm waiting until the solver fixes/updates from @grayjay hopefully make it in at some point.)

(What I wrote in the previous comment is merely "wishlist", it's not something I'm actively pursuing.)

EDIT: The higher-level "end game" here is to get to a place where we can experiment with completely new front ends which aren't cabal-install relatively easily. At least that's my ultimate goal -- AFAICT the code base of cabal-install has become so crufty over time that getting any significant new functionality (and removing old) is nigh impossible unless it's in some niche/specialized area. (That and people have actually started to rely on the command line interface, partly because of the solver, etc.) But let's take things one step at a time...

At this point I'd argue that we desperately need all resources to move forward Cabal+cabal-install development, and consequently postpone such a package refactoring to a time when the big features that are in the pipeline right now are closer to completeness, and we also have a better picture how the various moving parts of the finished thing depend on each other.

Unless you're going to just actively turn away contributors then this is what's called "herding cats". Welcome to project management! :). I don't know if anybody around here is getting paid to do Cabal/cabal-install development, but I'm certainly not... which means I'm going to work primarily work on things that interest me... or which impede my progress on things that interest me (see e.g. the new test framework for cabal-install).

If the conclusion of "the management" is that no separation of the solver will be accepted, then I'll just find a more productive (and, frankly, fun) use of my time. (I'm not suggesting that this is about to actually happen right now, but if the decision on my approach is "no", then that's what's probably going to happen.)

@hvr
Copy link
Member

hvr commented Jan 17, 2016

@BardurArantsson

(btw, I'm not the project lead and I'm just yet another of those unpaid contributors voicing his opinion)

Don't get me wrong, I still think that separating out the solver is a good thing long-term, I'm just arguing that maybe now is not the best time to do that just yet until cabal has stabilized again, unless it's apparent that factoring out the solver is orthogonal and won't hinder progress in the big new features.

Btw, do you have a ticket/wiki-page for the new test framework you're working on? I wasn't aware of that

@BardurArantsson
Copy link
Collaborator

Sure, no problem with having opinions :). I'm also just providing counterpoints.

EDIT: This problem with not doing long-term work (even when said long-term work may impede short-term work temporarily) is that it leaves completely stuck in a local minimum with no way to get out. Progress on big new features is already slow enough (because the cabal-install code base is so crufty) that a little bit of extra slowness won't matter. That said, I believe that my proposal is basically orthogonal to everything other than the solver. The first step at least is a pretty simple refactor that basically just adds a type parameter here and there.

Re: test framework: Already merged a few months ago. It was most of my work was in #2864 and @phadej took it from there: #2868

@grayjay
Copy link
Collaborator

grayjay commented Jan 17, 2016

@BardurArantsson I don't want to hold up this change, so I'll just resolve any conflicts with my branch.

@ezyang
Copy link
Contributor

ezyang commented Jan 18, 2016

(The conversation has moved on, but I mentioned Cabal the library because there is a solver, it's kind of a delicate and not very performant piece of code (#2777), and it is fully subsumed by a better solver, like the one cabal-install has. I just think it's something to keep in mind, let's do small refactors when we can!)

@BardurArantsson
Copy link
Collaborator

@grayjay Thank you, but I don't think there's necessarily the necessary consensus that my approach is actually a good (enough) idea yet... though, given your offer I may try to actually redo my RFC PR just so people have something concrete to look at. We'll see. Thanks regardless.

...

All: Maybe this is all getting too messy at this point what with two RFC pull requests and descriptions scattered over tens of comments, etc. I'll try to make a new GH Issue for this where I try to lay out my suggestion as clearly as I possibly can in the issue description and we can try to take it from here. I'll try to get to that over the next few days.

Once that's done, I'll also close the two pending RFC PRs for this (mine and this one) and possibly related issues. @bgamari From what you've said so far, I take it you have no objection to this, right?

@BardurArantsson BardurArantsson self-assigned this Jan 29, 2016
@BardurArantsson
Copy link
Collaborator

(Assigned to self just so that I remember once I get a breather from work duties.)

@BardurArantsson
Copy link
Collaborator

This is continued in #3222.

@bgamari : If you think this should remain open, please just let me know and I'll reopen. (I'm not sure if you can re-open, but if so, feel free to do it yourself if it's relevant.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.