Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

solver: tune heuristics for choosing the next dependency to resolve … #8255

Closed

Conversation

radoering
Copy link
Member

@radoering radoering commented Jul 30, 2023

… in a way that dependencies that are not required by another unsatisfied dependency are resolved first

Alternative: #8256

Pull Request Check List

Resolves: (partially1) boto3/botocore vs. urllib3 issue
Closes: #8256

  • Added tests for changed code.
  • Updated documentation for changed code.

Currently, we choose to resolve dependencies with less possible versions first. We can improve the heuristics, which dependency to resolve first, by considering relations between dependencies:

If dependency A depends on dependency B, we should resolve dependency A first no matter which dependency has more versions. Of course, different versions of a package can have different dependencies, but most often dependencies are at least similar and don't change much between versions. So I think the new heuristics will not worsen the performance in most cases (and of course improve it in some cases) - at least if the heuristics itself can be calculated fast enough.

Example: If latest A requires B<2, then older versions of A will typically not allow newer versions of B and if they do it's only because the incompatibility had not been known at the time they were released.

Some measurements (with warm cache):

pyproject.toml from ... time without PR time with PR
shootout example 225 s 8 s
shootout example with additional urllib3<2 constraint 8 s 8 s
#4670 1 s 1 s
#4870 70 s 118 s
large internal project 64 s 52 s

As can be seen, the performance of the shootout example improves dramatically. The performance of the #4870 example is worse, however, we also get some different (probably better) results:

#4870 without PR #4870 with PR
Sphinx 3.5.3 (requires docutils>=0.12)
docutils 0.18.1
Sphinx 4.3.2 (requires docutils>=0.14,<0.18)
docutils 0.17.1

The solution with the PR is probably better because an older Sphinx version probably does not work with a newer docutils version. The constraint is just missing because the incompatibility was not known when the older Sphinx version was released. Since that's a typical issue, the new heuristics may even lead to less surprising results for inexperienced users if there are several possible solutions.

1 This PR can solve the boto3/botocore vs. urllib3 issue in some cases (like the shootout example), but not in all cases. That's because boto3 does not depend directly on urllib3 but on botocore, which depends on urllib3. With this PR we only look forward one level. That means it doesn't help if we have to choose between boto3 and urllib3 with no other dependency that depends on urllib3 in the list of unsatisfied dependencies yet as in the example in #7950 with boto3 and urllib3 as the only top level dependencies.

…n a way that dependencies that are not required by another unsatisfied dependency are resolved first
@dimbleby
Copy link
Contributor

Have you thought about the approach being explored in #8191, which simply reverses the preference for "packages with fewer versions"?

That is tempting to me in that it is a single character fix - just add a minus sign. And presumably it will also handle the cases described in your footnote.

I suppose all heuristics are in the end subject to bad cases - I'm pretty sure that the problem ought to be NP-hard. While I see the logic in trying to force conflicts sooner rather than later: perhaps that heuristic is optimising for a case that turns out to be not very important in the python ecosystem, whereas "packages with most versions" would be helpful in the world as it actually is.

My experience is that: in most solutions most packages are at the latest allowed version anyway. So simply eliminating the known pathological case might be a pretty sensible try.

@radoering
Copy link
Member Author

No, I haven't really thought about it. I just assumed that there is a good reason for choosing dependencies with less versions first - there's even a test (test_traverse_into_package_with_fewer_versions_first). On the other hand, the code is quite old so that we won't find out if there were some real-world examples that benefit from that decision.

Choosing packages with more versions first, of course, solves the boto3/urllib3 issue completely and also the Sphinx/docutils example. Testing the projects from the description, I can measure a slight performance regression for the shootout example with the additional urllib constraint: When choosing packages with less versions first it takes 8 s, when choosing packages with more versions first it takes 10 s. Since we have no example yet where it's worse, maybe, we should risk it...

@dimbleby
Copy link
Contributor

the original algorithm describes "fewest versions first" - https://github.com/dart-lang/pub/blob/master/doc/solver.md#decision-making

Pub chooses the latest matching version of the package with the fewest versions that match the outstanding constraint. This tends to find conflicts earlier if any exist, since these packages will run out of versions to try more quickly. But there's likely room for improvement in these heuristics.

I suspect the testcase was written just because that's how the algorithm is described. While that text acknowledges room for improvement, it would be amusing if that improvement was doing the exact opposite!

The way I'm thinking about it is that - in cases where there are real conflicts to resolve - reversing this heuristic probably loses some average performance, but helps with worst-case performance

  • most of the time it makes sense to try and find dead-ends sooner rather than later, per the original heuristic
  • but we have this known expensive case where poetry reaches a dead-end (eg with urllib3==2.0) but can't tell that it's a dead end without working its way through a zillion versions of eg botocore, and there it helps a lot to do things the other way round

But also most of the time there just aren't that many conflicts anyway - as I say, mostly most things end up at the latest version - so my hope is that the damage this does to the average case isn't much to worry about.

However I've a feeling that I could tell myself a story justifying almost any heuristic! Perhaps there's no way really to know except to ship it and see what the new bad cases are...

Copy link

github-actions bot commented Mar 3, 2024

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 3, 2024
@radoering radoering deleted the next-package-heuristics branch November 24, 2024 12:47
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants