Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow dependency resolution using solver option --reorder-goals #7784

Open
newhoggy opened this issue Oct 30, 2021 · 19 comments
Open

Slow dependency resolution using solver option --reorder-goals #7784

newhoggy opened this issue Oct 30, 2021 · 19 comments

Comments

@newhoggy
Copy link

newhoggy commented Oct 30, 2021

Describe the bug
Dependency resolution is very slow on Windows. For example here, it is taking 37 minutes:

image

To Reproduce
Steps to reproduce the behavior:

On Windows, install msys2. Then in a bash shell:

git clone https://github.com/input-output-hk/libsodium
cd libsodium
git checkout 66f017f16633f2060db25e17c170c2afa0f2a8a1
./configure
make
make install
cd -

Then:

git clone [email protected]:input-output-hk/cardano-node.git
cd cardano-node
cabal build all --dry-run

Expected behavior
A clear and concise description of what you expected to happen.

System information

  • Operating system: Windows
  • cabal-3.6.2.0, ghc-8.10.4

Additional context
N/A

@Mikolaj
Copy link
Member

Mikolaj commented Oct 30, 2021

Do you know how that compares to cabal 3.4 and to 3.6.2 on Linux on the same machine?

@gbaz
Copy link
Collaborator

gbaz commented Oct 30, 2021

We put a lot of work into optimizing the solver in the last release, and we know it works reasonably on other platforms on this exact case. I'd be surprised if it was the solver itself taking more time on windows than other platforms -- there's nothing machine dependent in it. On the other hand, the linked git repos could be expensive to manage and checkout compared to other systems, or in fact traversing the entire directory tree to scan files for updates could be expensive, as windows has very different filesystem ops with different performance characteristics. A CI trace such as above gives me very little indication of where time is actually spent in such a setting, and unfortunately I don't have a windows box handy, so can't dig in further directly.

@newhoggy
Copy link
Author

newhoggy commented Oct 31, 2021

Do you know how that compares to cabal 3.4 and to 3.6.2 on Linux on the same machine?

I don't have a "same machine" comparison.

For comparison in Github Actions, the same build step with cabal-3.6.2 takes:

  • 47m 44s on Windows
  • 47s on MacOS
  • 35s on Linux

https://github.com/input-output-hk/cardano-node/runs/4057378590?check_suite_focus=true

For comparison in Github Actions, the same build step with cabal-3.4 takes:

  • 1h 11m 41s on Windows
  • 4m 33s on MacOS
  • 4m 12s on Linux

https://github.com/input-output-hk/cardano-node/actions/runs/1399416440

@newhoggy
Copy link
Author

newhoggy commented Oct 31, 2021

I'd be surprised if it was the solver itself taking more time on windows than other platforms

It's possible it's not the solver, but it would at least be something that runs adjacent to the solver in that cabal sits on the print out Resolving dependencies... for a very long time.

unfortunately I don't have a windows box handy

Would a Virtual Box VM with everything setup help with this?

@jneira
Copy link
Member

jneira commented Nov 1, 2021

It's possible it's not the solver, but it would at least be something that runs adjacent to the solver in that cabal sits on the print out Resolving dependencies... for a very long time.

maybe setting a verbose output with -v3 could help to see what cabal is doing in that step with more detail

I have not get such ill behaviour building any package in my windows machines, will try to reproduce

@newhoggy
Copy link
Author

newhoggy commented Nov 1, 2021

I've attached timed v3 logs which should help identify where the slowness is coming from.

log.out.timed.txt

@fgaz
Copy link
Member

fgaz commented Nov 1, 2021

Interesting... there's zero backtracking, and while 6232 steps is a lot, it shouldn't take a literal hour, so there must be some problem in the step themselves: some take over a second!

10:01:52 Environment: ...
...
10:05:22 [__0] trying: Win32-network-0.1.0.0 (user goal)
...
10:07:30 [797] trying: cardano-node-chairman:cardano-node:exe.goblins:-development
10:07:31 [798] trying: cardano-node-chairman:cardano-node:exe.cardano-prelude-test~>cardano-prelude-test-0.1.0.0 (dependency of cardano-node-chairman:cardano-node:exe.cardano-ledger-shelley-test)
10:07:32 [799] trying: cardano-node-chairman:cardano-node:exe.cardano-prelude-test:-development
10:07:33 [800] trying: cardano-node-chairman:cardano-node:exe.cardano-ledger-pretty~>cardano-ledger-pretty-0.1.0.0 (dependency of cardano-node-chairman:cardano-node:exe.cardano-ledger-shelley-test)
10:07:34 [801] trying: cardano-node-chairman:cardano-node:exe.cardano-crypto-tests~>cardano-crypto-tests-2.0.0 (dependency of cardano-node-chairman:cardano-node:exe.cardano-api *test)
...
10:45:04 [6231] trying: plutus-core:alex:exe.containers~>containers-0.6.2.1/installed-0.6.2.1
10:45:04 [6232] done
10:45:04 Elaborating the install plan...
...
10:56:29 In order, the following would be built:

@coot
Copy link
Collaborator

coot commented Nov 8, 2021

I am hit by this problem on linux with cabal-3.6.2.0 (ghc-8.10.7) on the same repo that @newhoggy reported.

@coot
Copy link
Collaborator

coot commented Nov 8, 2021

oh, I noticed that my cabal.project.local file was a source of the delay, specifically:

reorder-goals: True

(if included it takes a terrible hit on the dependency resolution time)

@fgaz
Copy link
Member

fgaz commented Nov 8, 2021

@coot
Copy link
Collaborator

coot commented Nov 8, 2021

Thanks @fgaz, I asked @newhoggy exactly that on our internal slack :).

@coot
Copy link
Collaborator

coot commented Nov 8, 2021

I am not sure if there is a regression here. Although, a few months ago this used to work cross platform, since that time cardano-node included another large component which for sure had an impact on the complexity of the problem.

@Mikolaj
Copy link
Member

Mikolaj commented Nov 8, 2021

What's the time without reorder-goals? Is there any hint in the docs (inside cabal help and/or in the external manuals) that a slowdown should be expected?

@jneira
Copy link
Member

jneira commented Nov 8, 2021

time to change issue title and os label then?

@coot
Copy link
Collaborator

coot commented Nov 8, 2021

What's the time without reorder-goals?

Something in the order of 10-20s.

Is there any hint in the docs (inside cabal help and/or in the external manuals) that a slowdown should be expected?

Yes it does, and a slowdown is expected, ref, though a 2 order of magnitude slowdown is quite large (15s -> 1200s).

For me this sounds like a low priority, nice to have optimisation issue - but I don't know implantation details, maybe it's actually a reasonable tread-of between nice error message and performance. Regardless of this, the docs should be worded more carefully.

@fgaz
Copy link
Member

fgaz commented Nov 8, 2021

The commentary ticket linked in the docs explains the behavior noted in #7784 (comment)

#1780 (comment):

Let's clarify the situation a bit: --reorder-goals applies a heuristic that will typically mean that a valid solution (if one exist) appears significantly "earlier" in the search space. However, in doing so, it'll make traversing the search space more costly, because additional analysis is being performed at each step in determining which goals are more or less costly.
[...]
The extra analysis performed by --reorder-goals can potentially be sped by memoization.

#1780 (comment):

--reorder-goals still makes each step significantly slower

@jneira jneira changed the title Slow dependency resolution on Windows Slow dependency resolution using solver option --reorder-goals Nov 8, 2021
@Mikolaj
Copy link
Member

Mikolaj commented Nov 8, 2021

Wow. I've added a warning to the ticket that is linked to from the user guide.

@grayjay
Copy link
Collaborator

grayjay commented Nov 8, 2021

--reorder-goals tries to make decisions (explore goals) with fewer choices earlier. In the worst case, it needs to consider every goal at every level, which is N^2 in the number of steps.

I think that --reorder-goals became a lot less useful after #5918 added --fine-grained-conflicts, which is an independent optimization that seems to help in a lot of cases where --reorder-goals was needed before. It doesn't have a significant per-step performance penalty, so it is enabled by default. Since the options are different optimizations, I assume that there are still some cases where --reorder-goals makes a solution possible, and we shouldn't remove it. We should update the documentation, especially removing the part about --reorder-goals working well for big install plans.

@gbaz
Copy link
Collaborator

gbaz commented Nov 8, 2021

Skimming through the related code, I think reorderGoals is a pretty expensive heuristic for what it gives, and I wonder if there's not a faster and more effective one lurking around.

Also the expensive function could be made much faster either by using a cruder approximation or just giving it a lazier search:

-- Note that we count goal choices as having one choice, always.

(edit: this is to say that at at minimum the reorderGoals heuristic can be left as is and made universally faster, but also that other better and even faster heuristics seem very possible!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants