Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different resolution failure messages on different verbosity #9823

Open
potiuk opened this issue Apr 23, 2021 · 12 comments
Open

Different resolution failure messages on different verbosity #9823

potiuk opened this issue Apr 23, 2021 · 12 comments
Assignees
Labels
C: dependency resolution About choosing which dependencies to install type: bug A confirmed bug or unintended behavior

Comments

@potiuk
Copy link
Contributor

potiuk commented Apr 23, 2021

  • pip version: 21.0.1
  • Python version: 3.6 (but 3.7 and 3.8 have the same problem)
  • Operating system: Linux (Debian buster docker image)

PIP 21.0.1 sometimes produces wrong error about conflicts , and it produces different (correct!) error when -vvvv options are added.

This problem originated with apache/airflow#15463 (you can see history of it there). We have quite complex dependencies in Airlfow and we are still recommending people to install airflow with PIP 20.2.4, but we are hoping to get rid of that limitation, one problem however was a very strange one and we did not have time to look at it - but when I looked today I realized that the error printed by PIP was misleading (as I could not see the reason for the original error).

I believe PIP instead of pyarrow reports google-cloud-bigquery-storage as having a problem. Looks like instead of printing the actual dependency that has a problem, it prints the "sibling" of that dependency (or smth like that).

It is very easily reproducible:

  1. Run: pip install apache-airflow[google]==2.0.2 --constraint https://raw.githubusercontent.com/apache/airflow/constraints-2.0.2/constraints-3.6.txt

You should get an error:

ERROR: Could not find a version that satisfies the requirement google-cloud-bigquery-storage<2.0.0dev,>=1.0.0; extra == "bqstorage" (from google-cloud-bigquery[bqstorage,pandas])
  1. Run pip install -vvvv apache-airflow[google]==2.0.2 --constraint https://raw.githubusercontent.com/apache/airflow/constraints-2.0.2/constraints-3.6.txt

You should get an error:

ERROR: Could not find a version that satisfies the requirement pyarrow<2.0dev,>=1.0.0; extra == "bqstorage" (from google-cloud-bigquery[bqstorage,pandas])
ERROR: No matching distribution found for pyarrow<2.0dev,>=1.0.0; extra == "bqstorage"

I believe in both cases we ONLY have problem with pyarrow, and it is misreported without the -vvvv flag. Looks like instead of actual dependency that is wrong (pyarrow), the sibling of that dependency (google-cloud-bigquery-storage) is printed out by PIP. Note that other than the dependency - those are the very same limits which are problematic (<2.0.0dev,>=1.0.0; extra == "bqstorage").

I also could not find any other packages from those being installed where google-cloud-bigquery-storage would be limited to <2.0.0dev,>=1.0.0 - that's why I think this is a bug in PIP.

Gists with the outputs to compare

pip install apache-airflow[google]==2.0.2 --constraint https://raw.githubusercontent.com/apache/airflow/constraints-2.0.2/constraints-3.6.txt :

Here: https://gist.github.com/potiuk/04f6127469a709e3e47be7585c9a863c

pip install -vvvv apache-airflow[google]==2.0.2 --constraint https://raw.githubusercontent.com/apache/airflow/constraints-2.0.2/constraints-3.6.txt:

https://gist.github.com/potiuk/17a3d591fb091bdd8a0e213f49b6b0af

I might be wrong, of course, but it looks like this.

UPDATE:

I run it with -vv and it fails with the 'google-cloud-bigquery-storage` error:
https://gist.github.com/potiuk/2f9af6a8eaac7ea393fd1f9fe64361c7

The -v and -vvv both fail with pyarrow error.

In neither of those I can find where the google-cloud-bigquery-storage<2.0.0dev,>=1.0.0; extra == "bqstorage" (from google-cloud-bigquery[bqstorage,pandas]) comes from :(.

@potiuk potiuk changed the title Wrong (and terribly misleading!) error printed by PIP sometimes when requirements are conflicting Likely wrong (and misleading!) error printed by PIP sometimes when requirements are conflicting Apr 23, 2021
@uranusjr
Copy link
Member

uranusjr commented Apr 23, 2021

This prompts me to try the same installation on 3.8, which uncovers another conflict(!)

ERROR: Cannot install apache-airflow[google]==2.0.2 because these package versions have conflicting dependencies.

The conflict is caused by:
    apache-airflow[google] 2.0.2 depends on cattrs~=1.1; python_version > "3.6"
    The user requested (constraint) cattrs==1.0.0

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

And 3.6 (against unreleased pip) also presents a different conflict, on connexion. So the issue may be that there are actually multiple conflicts in the current Airflow constraints, and each error is only showing a part of it. google-cloud-bigquery-storage could still be conflicting somewhere. Still, it’s surprising -vvvv would affect the dependency resolution logic, I’ll need to deeper look into this.

@uranusjr uranusjr changed the title Likely wrong (and misleading!) error printed by PIP sometimes when requirements are conflicting Confusing resolution failure message on 21.0.1 Apr 23, 2021
@uranusjr
Copy link
Member

uranusjr commented Apr 23, 2021

Edit: I successfully reproduced this different behaviours on Linux. This is really weird.

@potiuk
Copy link
Contributor Author

potiuk commented Apr 23, 2021

EDIT (Apologies for not seeing it first ). I dug a bit deeper, And I think I know where the -storage limitations are coming from (it is in fact in the 1.28.0 version of the bigquery library). Same of pyarrow. This is my bad. I looked at the latest < 3.0.0 version of -biquery not the one from constraints.

The -v behaviour is strange one, that it alternates between those. But they are actually right... I still do not know where the old 1.28.0 limitation comes from (but this is a different story).

I will close that one and look further to where it is coming from.

Apologies for the troubles (but it would be nice to find out the -v behaviour reason :).

@potiuk potiuk closed this as completed Apr 23, 2021
@uranusjr
Copy link
Member

uranusjr commented Apr 23, 2021

Ah, I think I found the real root of conflict. I get this against the main branch:

$ python src/pip install 'apache-airflow[google]==2.0.2' --constraint https://raw.githubusercontent.com/apache/airflow/constraints-2.0.2/constraints-3.6.txt
...
ERROR: Cannot install google-cloud-bigquery[bqstorage,pandas]==1.28.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    google-cloud-bigquery[bqstorage,pandas] 1.28.0 depends on pyarrow<2.0dev and >=1.0.0; extra == "bqstorage"
    The user requested (constraint) pyarrow==2.0.0
...

So the issue is pyarrow all along, but pip 21.0.1 misidentified the cause to be google-cloud-bigquery since it failed to consider version ranges introduced via constraints. I'm going to write this up as another case fixed by #9300. Thanks for the report, it's a really interesting rabbit hole to dig into!

(p.s. This still does not explain the -vvvv thing.)

@uranusjr
Copy link
Member

And the issue got closed before I can submit 🤣 Issue tracker race condition.

@potiuk
Copy link
Contributor Author

potiuk commented Apr 23, 2021

But yeah .. the "main" message is much CLEARER. So I re-open it.

@potiuk potiuk reopened this Apr 23, 2021
@potiuk
Copy link
Contributor Author

potiuk commented Apr 23, 2021

Ok. let me then try to do all my checks with the master version of PIP then

@uranusjr uranusjr changed the title Confusing resolution failure message on 21.0.1 Different resolution failure messages on different verbosity Apr 23, 2021
@uranusjr uranusjr added the type: bug A confirmed bug or unintended behavior label Apr 23, 2021
@potiuk
Copy link
Contributor Author

potiuk commented Apr 23, 2021

I hope we will soon be able to close all those and successfully move to 21. line in Airflow :)

@uranusjr
Copy link
Member

I'll keep this open regardless of the outcome because the -vvvv thing is still unexplained and probably needs to be looked into. It might not be a bug, but someone needs to look into it.

@uranusjr
Copy link
Member

constraints-3.6.txt

Above is a snapshot of the constraints-3.6.txt file that caused the issue, for future reproduction. I'm assuming the file hosted in Airflow's repo will be overwritten once you sort out the conflicts.

@potiuk
Copy link
Contributor Author

potiuk commented Apr 24, 2021

FYI. Seems that I found the root cause for conflict apache/airflow#15513 🤞

@uranusjr
Copy link
Member

uranusjr commented Jul 8, 2021

So it turns out the different error message is due to pkg_resources returns dependencies in indeterministic ordering (because internally it uses set to store those). When the ordering is different, the resolver can be sent down to subtrees in different orders, and report different errors if you have multiple conflicts in the dependency graph.

I think we should sort the dependencies somehow (maybe just alphabetically), this would be good for debuggability, if nothing else.

@uranusjr uranusjr self-assigned this Jul 8, 2021
@pradyunsg pradyunsg added C: dependency resolution About choosing which dependencies to install and removed C: new resolver labels Oct 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: dependency resolution About choosing which dependencies to install type: bug A confirmed bug or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants