Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the modular solver the default always, and deprecate the top-down solver #2531

Merged
merged 34 commits into from
May 21, 2015
Merged

Make the modular solver the default always, and deprecate the top-down solver #2531

merged 34 commits into from
May 21, 2015

Conversation

edsko
Copy link
Contributor

@edsko edsko commented Apr 9, 2015

This PR is a single commit. It looks big because it depends on all the previous PRs.

This PR makes the modular solver the default solver even for GHC < 7, because we can now deal with the base shim package (PR #2530). To make sure that this is a sensible thing to do, I ran both the top-down solver and the modular solver against every package on Hackage using GHC 6.12.3. Here are some statistics:

  • 7936 packages on Hackage
  • Out of those 7936, the topdown solver found solutions for 4545 packages.
  • Out of those 4545, the modular solver gave precisely the same solution for 2893 packages, a different solution for 1485 packages, and failed to give a solution (with default flags) for 166 packages.
  • Conversely, the modular solved found solutions for 4954 packages. This means that there are (4954-4545+166=) 575 packages for which the modular solver found a solution where the topdown solver did not.
  • The top-down solver failed to terminate (at least, failed to terminate within half an hour) on two packages: leaky and seqaid.

Performance-wise, the top-down solver usually took around 11 seconds to solve these goals (and rarely faster than that), while the modular solver was an order of magnitude faster:

absolute

I took a look at the cases where the modular solver failed to find a solution. Adding --reorder-goals did not help very much: it only solved an additional 28 cases, leaving a 138 packages where the top-down solver found a solution but the modular solver did not. This also slowed down the modular solver quite a bit:

absolute

(time in seconds on the x-axis on all these plots). I tried running with --max-backjumps=-1, with a timeout of a maximum of 5 minutes, and on the packages that I tried this actually still did not enable the modular solver to find a solution within the allotted time.

The conclusion from all this is, in my opinion, that it's okay to switch the default now: we find solutions for more packages than we did before, and find them much faster. I guess it's not yet okay to remove the top-down solver completely, for the remaining 138 packages, but we can deprecate the top-down solver and see who complains (which is what this PR does). Meanwhile, we could look at these failing packages to see if we can add any heuristics to the modular solver that would enable it to find solutions. For completeness sake, I added these failures to the Wiki.

dcoutts and others added 30 commits March 27, 2015 15:38
It turns out not to be the right solution for general private
dependencies and is just complicated. However we keep qualified
goals, just much simpler. Now dependencies simply inherit the
qualification of their parent goal. This gets us closer to the
intended behaviour for the --independent-goals feature, and for
the simpler case of private dependencies for setup scripts.

When not using --independent-goals, the solver behaves exactly as
before (tested by comparing solver logs for a hard hackage goal).
When using --independent-goals, now every dep of each independent
goal is qualified, so the dependencies are solved completely
independently (which is actually too much still).
POption annotates a package choice with a "linked to" field. This commit
just introduces the datatype and deals with the immediate fallout, it doesn't
actually use the field for anything.
This is implemented as a separate pass so that it can be understood
independently of the rest of the solver.
In particular, in the definition of dependencyInconsistencies.

One slightly annoying thing is that in order to validate an install plan, we
need to know if the goals are to be considered independent. This means we need
to pass an additional Bool to a few functions; to limit the number of functions
where this is necessary, also recorded whether or not goals are independent as
part of the InstallPlan itself.
Since we didn't really have a unit test setup for the solver yet, this
introduces some basic tests for solver, as well as tests for independent goals
specifically.
I don't know why we we constructed this graph manually here rather than calling
`graphFromEdges`; it doesn't really matter except that we will want to change
the structure of this graph somewhat once we have more fine-grained
dependencies, and then the manual construction becomes a bit more painful;
easier to use the standard construction.
This commit does nothing but rearrange the Modular.Dependency module into a
number of separate sections, so that's a bit clearer to see what's what. No
actual code changes here whatsoever.
The ComponentDeps datatype will give us fine-grained information about the
dependencies of a package's components.  This commit just introduces the
datatype, we don't use it anywhere yet.
The modular solver has its own representation for a package (PInfo). In this
commit we modify PInfo to keep track of the different kinds of dependencies.

This is a bit intricate because the solver also regards top-level goals as
dependencies, but of course those dependencies are not part of any 'component'
as such, unlike "real" dependencies. We model this by adding a type parameter
to FlaggedDeps and go which indicates whether or not we have component
information; crucially, underneath flag choices we _always_ have component
information available.

Consequently, the modular solver itself will not make use of the ComponentDeps
datatype (but only using the Component type, classifying components); we will
use ComponentDeps when we translate out of the results from the modular solver
into cabal-install's main datatypes.

We don't yet _return_ fine-grained dependencies from the solver; this will be
the subject of the next commit.
In this commit we modify the _output_ of the modular solver (CP, the modular's
solver internal version of ConfiguredPackage) to have fine-grained dependency.
This doesn't yet modify the rest of cabal-install, so once we translate from CP
to ConfiguredPackage we still lose the distinctions between different kinds of
dependencies; this will be the topic of the next commit.

In the modular solver (and elsewhere) we use Data.Graph to represent the
dependency graph (and the reverse dependency graph). However, now that we have
more fine-grained dependencies, we really want an _edge-labeled_ graph, which
unfortunately it not available in the `containers` package. Therefore I've
written a very simple wrapper around Data.Graph that supports edge labels; we
don't need many fancy graph algorithms, and can still use Data.Graph on these
edged graphs when we want (by calling them on the underlying unlabeled graph),
so adding a dependency on `fgl` does not seem worth it.
The crucial change in this commit is the change to PackageFixedDeps to return a
ComponentDeps structure, rather than a flat list of dependencies, as long with
corresponding changes in ConfiguredPackage and ReadyPackage to accomodate this.

We don't actually take _advantage_ of these more fine-grained dependencies yet;
any use of

    depends

is now a use of

   CD.flatDeps . depends

but we will :)

Note that I have not updated the top-down solver, so in the output of the
top-down solver we cheat and pretend that all dependencies are library
dependencies.
Although we don't use the new setup dependency component anywhere yet, I've
replaced all uses of CD.flatDeps with CD.nonSetupDeps. This means that when we
do introduce the setup dependencies, all code in Cabal will still use all
dependencies except the setup dependencies, just like now. In other words,
using the setup dependencies in some places would be a conscious decision; the
default is that we leave the behaviour unchanged.
This patch adds it to the package description types and to the parser.
There is a new custom setup section which contains the setup script's
dependencies. Also add some sanity checks.
(and, therefore, also to the modular solver's output)
By chosing setup dependencies after regular dependencies we get more
opportunities for linking setup dependencies against regular dependencies.
The only problematic thing is that when we call `cabal clean` or `cabal
haddock` (and possibly others), _without_ first having called `configure`, we
attempt to build the setup script without calling the solver at all. This means
that if you do, say,

    cabal configure
    cabal clean
    cabal clean

for a package with a custom setup script that really needs setup dependencies
(for instance, because there are two versions of Cabal in the global package DB
and the setup script needs the _older_ one), then first call to `clean` will
succeed, but the second call will fail because we will try to build the setup
script without the solver and that will fail.
This happened independently in a number of places, which was bad; and was about
to get worse with the base 3/4 thing.
Never consider flag choices as independent from their package.
edsko added 4 commits April 7, 2015 15:53
(previously the default was the topdown solver for GHC < 7). Also adds a
deprecation warning when the topdown solver is selected.
@23Skidoo
Copy link
Member

@edsko

So we're waiting for @kosmikus to review this, since solver is his domain. Meanwhile, maybe you could take a look/leave a comment at #1575 - @kosmikus mentioned that your work on independent goals provides us with a way forward on that issue.

@kosmikus
Copy link
Contributor

Sorry for taking so long ...

@23Skidoo
Copy link
Member

For some reason GitHub didn't notify me about review comments that @kosmikus made here. /cc @edsko in case he also missed them.

@dcoutts
Copy link
Contributor

dcoutts commented May 21, 2015

So I'm happy switching now that the new solver works for 6.12 and older. And of course it'll help if ghc ships with multiple base versions again in future.

dcoutts added a commit that referenced this pull request May 21, 2015
Make the modular solver the default always, and deprecate the top-down solver
@dcoutts dcoutts merged commit 4083456 into haskell:master May 21, 2015
@edsko edsko deleted the pr/switch-default branch June 1, 2015 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants