Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert backjumping #142

Closed
wants to merge 1 commit into from

Conversation

notatallshaw
Copy link
Collaborator

@notatallshaw notatallshaw commented Nov 12, 2023

Fixes #134

I would prefer if backjumping were fixed, but correctness is more important than performance.

@frostming
Copy link
Member

Why not fallback to no jumping only when a valid state isn't found?

@notatallshaw
Copy link
Collaborator Author

notatallshaw commented Nov 13, 2023

Why not fallback to no jumping only when a valid state isn't found?

In the examples I've tested backjumping doesn't always directly cause a valid state not to be found, but rather can skip over a valid state that was critical to resolution earlier in the backtracking process.

Therefore a fallback would have to start over from the begining, causing significant performance issues for genuine resolution impossible errors.

@notatallshaw
Copy link
Collaborator Author

notatallshaw commented Nov 15, 2023

To be clearer, here are two sets of requirements, tested on Linux Python 3.10 across three versions of Pip, 23.3.1, 23.3.1 with speculative fix, and 23.3.1 with backjumping removed:

Requirements 23.3.1 23.3.1 with Speculative Fix 23.3.1 with Backjump Removed
"pandas>=1.3.5,<=1.4.0" "pystac>=1.8.2,<=1.8.3" "pystac-client>=0.3.2,<=0.3.3" "sat-stac<=0.1.1" ResolutionImpossible Installs Installs
kedro[test]==0.18.13 ResolutionImpossible ResolutionImpossible Installs

I'm fairly confident also I can come up with at least one more example of each behavior above (I could in the past but as PyPi is not a static target I need to play around adding upper bounds).

This is the diff of my without backjump branch: notatallshaw/pip@23.3.1...23.2.1-without-backjump

And this is the diff of my speculative fix backjump, it is refactored to be a simpler version of #134 (comment): notatallshaw/pip@23.3.1...23.3.1-with-specultive-fix

@frostming
Copy link
Member

I am not sure if we can come out with a better solution on backjumping or we are going to revert this.

/cc @pradyunsg @uranusjr @bennati

@notatallshaw
Copy link
Collaborator Author

notatallshaw commented Nov 29, 2023

I woke up this morning with an idea of how to implement a backjump fallback that falls back as minimally as possible: #144

It could still make valid resolution impossibles slower to find, but I think it's a good compromise on keeping the significant real world benefits of backjumping but still falling back to a correct solution.

@bennati
Copy link
Contributor

bennati commented Nov 29, 2023

Please keep in mind that the code without backjumping not only fails to resolve dependencies in certain cases but also takes hours (if not days on larger projects) to fail, as it scales quadratically with the number of dependencies and their available versions.

How about making backjumping optional via configuration, and let users decide if they prefer performance over correctness?

@frostming
Copy link
Member

Please keep in mind that the code without backjumping not only fails to resolve dependencies in certain cases

Can you provide the concrete examples for @notatallshaw to evaluate in the new patch?

@bennati
Copy link
Contributor

bennati commented Nov 29, 2023

Hi @frostming , I misunderstood a comment above. I guess the old code does in principle resolve correctly every time, but in practice it doesn't manage to complete as we have timeouts for its execution.

@notatallshaw
Copy link
Collaborator Author

I agree that there are requirements that are currently impractical without backjumping.

However, these could potentially resolved by different optimizations. But it's tricky to assess their effectiveness when resolvelib can currently produce an incorrect result.

I'm not sure if you saw already but I produced an alternative PR that falls back to backtracking under certain circumstances. If there isn't an outright fix this a possible way forward if resolvelib maintainers agree.

@notatallshaw
Copy link
Collaborator Author

notatallshaw commented Dec 12, 2023

@bennati I would still appreciate if you have real world examples of requirements that don't resolve in a reasonable amount of time or reach ResolutionTooDeep without backjumping.

I have created a new optimization for resolvelib / Pip called "prefer conflicting causes", and it would be good to know if there are cases where backjumping is needed even this optimization. I have created two branches, one with backjump and one without, to test these scenarios:

@bennati
Copy link
Contributor

bennati commented Dec 13, 2023

Hi @notatallshaw , it's easy to get such requirement by starting with these dependencies

boto3==1.10.16
s3fs
seaborn

These took on my machine around 7 mins to resolve (using python-inspector 0.9.5).

Then add as many dependencies as you like after those, for example these dependencies took 15 minutes to resolve;

boto3==1.10.16
s3fs
seaborn
pandas
scipy

For comparison, installing these dependencies with pip took 1 min.

@notatallshaw
Copy link
Collaborator Author

notatallshaw commented Dec 14, 2023

My initial results are showing that adding prefering causing conflicts makes backjumping optimizations small but measurable:

Requirements: boto3==1.10.16 s3fs seaborn
Results:
pip main (backjumping): 11.6s
prefer causing conflicts: 10.5s
prefer causing conflicts without backjumping: 10.8s

Requirements: boto3==1.10.16 s3fs seaborn pandas scipy
Results:
pip main (backjumping): 12s
prefer causing conflicts: 10.9s
prefer causing conflicts without backjumping: 10.9s

Requirements: sphinx sphinx-rtd-theme sphinx-toolbox myst-parser sphinxcontrib-bibtex nbsphinx
Results:
pip main (backjumping): ResolutionTooDeep
prefer causing conflicts: 6.2s
prefer causing conflicts without backjumping: 8.7s

But also that the backjumping fallback can, at least in some cases, add a small overhead compared to removing backjumping altogether, I will take a look if there's a way to improve this:

Requirements: kedro[test]==0.18.13
Results:
pip main (backjumping): ResolutionImpossible (incorrect)
backjumping fallback: 154.8s
prefer causing conflicts: ResolutionImpossible (incorrect)
prefer causing conflicts without backjumping: 142.8s
prefer causing conflicts with backjumping fallback: 144.6s

My methodology is running time python -m pip install --dry-run <requirements> three times and taking the lowest result.

I plan to test more requirements, especially ones that use https://github.com/pradyunsg/pip-resolver-benchmarks to confirm these

@notatallshaw
Copy link
Collaborator Author

notatallshaw commented Jan 7, 2024

In the last few days I have been working more seriously on making sure my "Prefer conflicting causes" optimization is working as expected: pypa/pip#12459

With the branch in that PR the incorrect behavior of ResolutionImpossible is no longer produced by either example kedro[test]==0.18.13 nor "pandas>=1.3.5,<=1.4.0" "pystac>=1.8.2,<=1.8.3" "pystac-client>=0.3.2,<=0.3.3" "sat-stac<=0.1.1" (and this incorrect behavior still is reproducible on the main branch).

I have an intuitive explanation of why I think this PR would help avoid this backjumping failure, basically the preference of causes with conflicts aligns well with the backjumping assumption that you can backjump to a state where the causes are no longer conflicting, as if you don't have that preference than there's no guarantee on the order in which conflicting causes were added. But it is hardly a formal expanation.

I think this continues to make the case for removing or adding a fallback (#144) to backjumping, as the optimization demonstrably makes an implicit assumption about how the provider prefers causes.

@notatallshaw
Copy link
Collaborator Author

I am unhappy with this solution, I am working further on understanding the root cause of the problem: #134 (comment)

I may recreate this PR once I have a working test.

@notatallshaw notatallshaw deleted the revert-backjumping branch July 21, 2024 01:17
@notatallshaw
Copy link
Collaborator Author

FYI, I beleive I found a proper logical fix to backjumping: #155

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

An example of resolution failure while it does have a valid resolution.
3 participants