Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pip install nondeterministic order even for explicitly provided wheels #11572

Open
1 task done
vanschelven opened this issue Nov 3, 2022 · 14 comments
Open
1 task done
Labels
C: dependency resolution About choosing which dependencies to install type: bug A confirmed bug or unintended behavior

Comments

@vanschelven
Copy link
Contributor

vanschelven commented Nov 3, 2022

Description

Subsequent calls to pip install do not execute in the same order, even when wheels are vendored, and no index is used.

This seems unnecessarily nondeterministic (to me), and makes it harder than necessary to reproduce bugs (including bugs in pip).

Expected behavior

No response

pip version

22.0.2

Python version

Ptyhon 3.10

OS

any (presumably); in practice: ubuntu

How to Reproduce

(areyoudeterministic) me:/tmp/areyoudeterministic$ cat requirements.txt 
Django==4.0.*
pytz==2022.1
whitenoise==6.0.0
sqlparse==0.4.2
cfenv==0.5.3
Authlib==1.0.1
requests==2.27.1
djangorestframework==3.13.1
djangorestframework-datatables==0.7.0
cryptography
pandas==1.5.0
reportlab==3.6.11
gunicorn==20.1.0
mysqlclient==2.1.0
django-migration-linter
cfenv==0.5.3
django-auth-ldap==4.0.0

(areyoudeterministic) me:/tmp/areyoudeterministic$ pip wheel --no-binary=:none: -r requirements.txt -w vendor
Collecting Django==4.0.*
  Using cached Django-4.0.8-py3-none-any.whl (8.0 MB)
[..]
Saved ./vendor/setuptools-65.5.0-py3-none-any.whl

(areyoudeterministic) me:/tmp/areyoudeterministic$ python -m venv .

(areyoudeterministic) me:/tmp/areyoudeterministic$ pip install -r requirements.txt --find-links=vendor --no-index
Looking in links: vendor
Processing ./vendor/Django-4.0.8-py3-none-any.whl
Processing ./vendor/pytz-2022.1-py2.py3-none-any.whl
Processing ./vendor/whitenoise-6.0.0-py3-none-any.whl
Processing ./vendor/sqlparse-0.4.2-py3-none-any.whl
Processing ./vendor/cfenv-0.5.3-py2.py3-none-any.whl
Processing ./vendor/Authlib-1.0.1-py2.py3-none-any.whl
Processing ./vendor/requests-2.27.1-py2.py3-none-any.whl
Processing ./vendor/djangorestframework-3.13.1-py3-none-any.whl
Processing ./vendor/djangorestframework_datatables-0.7.0-py2.py3-none-any.whl
Processing ./vendor/cryptography-38.0.3-cp36-abi3-manylinux_2_28_x86_64.whl
Processing ./vendor/pandas-1.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Processing ./vendor/reportlab-3.6.11-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Processing ./vendor/gunicorn-20.1.0-py3-none-any.whl
Processing ./vendor/mysqlclient-2.1.0-cp310-cp310-linux_x86_64.whl
Processing ./vendor/django_migration_linter-4.1.0-py3-none-any.whl
Processing ./vendor/django_auth_ldap-4.0.0-py3-none-any.whl
Processing ./vendor/asgiref-3.5.2-py3-none-any.whl
Processing ./vendor/furl-2.1.3-py2.py3-none-any.whl
Processing ./vendor/idna-3.4-py3-none-any.whl
Processing ./vendor/urllib3-1.26.12-py2.py3-none-any.whl
Processing ./vendor/certifi-2022.9.24-py3-none-any.whl
Processing ./vendor/charset_normalizer-2.0.12-py3-none-any.whl
Processing ./vendor/python_dateutil-2.8.2-py2.py3-none-any.whl
Processing ./vendor/numpy-1.23.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Processing ./vendor/Pillow-9.3.0-cp310-cp310-manylinux_2_28_x86_64.whl
Requirement already satisfied: setuptools>=3.0 in ./lib/python3.10/site-packages (from gunicorn==20.1.0->-r requirements.txt (line 13)) (59.6.0)
Processing ./vendor/python_ldap-3.4.3-cp310-cp310-linux_x86_64.whl
Processing ./vendor/cffi-1.15.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Processing ./vendor/toml-0.10.2-py2.py3-none-any.whl
Processing ./vendor/appdirs-1.4.4-py2.py3-none-any.whl
Processing ./vendor/pycparser-2.21-py2.py3-none-any.whl
Processing ./vendor/six-1.16.0-py2.py3-none-any.whl
Processing ./vendor/orderedmultidict-1.0.1-py2.py3-none-any.whl
Processing ./vendor/pyasn1_modules-0.2.8-py2.py3-none-any.whl
Processing ./vendor/pyasn1-0.4.8-py2.py3-none-any.whl
Installing collected packages: pytz, pyasn1, appdirs, whitenoise, urllib3, toml, sqlparse, six, pycparser, pyasn1-modules, pillow, numpy, mysqlclient, idna, gunicorn, charset-normalizer, certifi, asgiref, requests, reportlab, python-ldap, python-dateutil, orderedmultidict, Django, cffi, pandas, furl, djangorestframework, django-migration-linter, django-auth-ldap, cryptography, djangorestframework-datatables, cfenv, Authlib
Successfully installed Authlib-1.0.1 Django-4.0.8 appdirs-1.4.4 asgiref-3.5.2 certifi-2022.9.24 cfenv-0.5.3 cffi-1.15.1 charset-normalizer-2.0.12 cryptography-38.0.3 django-auth-ldap-4.0.0 django-migration-linter-4.1.0 djangorestframework-3.13.1 djangorestframework-datatables-0.7.0 furl-2.1.3 gunicorn-20.1.0 idna-3.4 mysqlclient-2.1.0 numpy-1.23.4 orderedmultidict-1.0.1 pandas-1.5.0 pillow-9.3.0 pyasn1-0.4.8 pyasn1-modules-0.2.8 pycparser-2.21 python-dateutil-2.8.2 python-ldap-3.4.3 pytz-2022.1 reportlab-3.6.11 requests-2.27.1 six-1.16.0 sqlparse-0.4.2 toml-0.10.2 urllib3-1.26.12 whitenoise-6.0.0

(areyoudeterministic) me:/tmp/areyoudeterministic$ rm bin/ include/ lib lib64/ -rf
(areyoudeterministic) me:/tmp/areyoudeterministic$ python -m venv .
(areyoudeterministic) me:/tmp/areyoudeterministic$ pip install -r requirements.txt --find-links=vendor --no-index
Looking in links: vendor
Processing ./vendor/Django-4.0.8-py3-none-any.whl
Processing ./vendor/pytz-2022.1-py2.py3-none-any.whl
Processing ./vendor/whitenoise-6.0.0-py3-none-any.whl
Processing ./vendor/sqlparse-0.4.2-py3-none-any.whl
Processing ./vendor/cfenv-0.5.3-py2.py3-none-any.whl
Processing ./vendor/Authlib-1.0.1-py2.py3-none-any.whl
Processing ./vendor/requests-2.27.1-py2.py3-none-any.whl
Processing ./vendor/djangorestframework-3.13.1-py3-none-any.whl
Processing ./vendor/djangorestframework_datatables-0.7.0-py2.py3-none-any.whl
Processing ./vendor/cryptography-38.0.3-cp36-abi3-manylinux_2_28_x86_64.whl
Processing ./vendor/pandas-1.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Processing ./vendor/reportlab-3.6.11-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Processing ./vendor/gunicorn-20.1.0-py3-none-any.whl
Processing ./vendor/mysqlclient-2.1.0-cp310-cp310-linux_x86_64.whl
Processing ./vendor/django_migration_linter-4.1.0-py3-none-any.whl
Processing ./vendor/django_auth_ldap-4.0.0-py3-none-any.whl
Processing ./vendor/asgiref-3.5.2-py3-none-any.whl
Processing ./vendor/furl-2.1.3-py2.py3-none-any.whl
Processing ./vendor/charset_normalizer-2.0.12-py3-none-any.whl
Processing ./vendor/idna-3.4-py3-none-any.whl
Processing ./vendor/urllib3-1.26.12-py2.py3-none-any.whl
Processing ./vendor/certifi-2022.9.24-py3-none-any.whl
Processing ./vendor/numpy-1.23.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Processing ./vendor/python_dateutil-2.8.2-py2.py3-none-any.whl
Processing ./vendor/Pillow-9.3.0-cp310-cp310-manylinux_2_28_x86_64.whl
Requirement already satisfied: setuptools>=3.0 in ./lib/python3.10/site-packages (from gunicorn==20.1.0->-r requirements.txt (line 13)) (59.6.0)
Processing ./vendor/python_ldap-3.4.3-cp310-cp310-linux_x86_64.whl
Processing ./vendor/cffi-1.15.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Processing ./vendor/appdirs-1.4.4-py2.py3-none-any.whl
Processing ./vendor/toml-0.10.2-py2.py3-none-any.whl
Processing ./vendor/pycparser-2.21-py2.py3-none-any.whl
Processing ./vendor/six-1.16.0-py2.py3-none-any.whl
Processing ./vendor/orderedmultidict-1.0.1-py2.py3-none-any.whl
Processing ./vendor/pyasn1-0.4.8-py2.py3-none-any.whl
Processing ./vendor/pyasn1_modules-0.2.8-py2.py3-none-any.whl
Installing collected packages: pytz, pyasn1, appdirs, whitenoise, urllib3, toml, sqlparse, six, pycparser, pyasn1-modules, pillow, numpy, mysqlclient, idna, gunicorn, charset-normalizer, certifi, asgiref, requests, reportlab, python-ldap, python-dateutil, orderedmultidict, Django, cffi, pandas, furl, djangorestframework, django-migration-linter, django-auth-ldap, cryptography, djangorestframework-datatables, cfenv, Authlib
Successfully installed Authlib-1.0.1 Django-4.0.8 appdirs-1.4.4 asgiref-3.5.2 certifi-2022.9.24 cfenv-0.5.3 cffi-1.15.1 charset-normalizer-2.0.12 cryptography-38.0.3 django-auth-ldap-4.0.0 django-migration-linter-4.1.0 djangorestframework-3.13.1 djangorestframework-datatables-0.7.0 furl-2.1.3 gunicorn-20.1.0 idna-3.4 mysqlclient-2.1.0 numpy-1.23.4 orderedmultidict-1.0.1 pandas-1.5.0 pillow-9.3.0 pyasn1-0.4.8 pyasn1-modules-0.2.8 pycparser-2.21 python-dateutil-2.8.2 python-ldap-3.4.3 pytz-2022.1 reportlab-3.6.11 requests-2.27.1 six-1.16.0 sqlparse-0.4.2 toml-0.10.2 urllib3-1.26.12 whitenoise-6.0.0

(areyoudeterministic) me:/tmp/areyoudeterministic$ pip --version
pip 22.0.2 from /tmp/areyoudeterministic/lib/python3.10/site-packages/pip (python 3.10)

Note the positioning of furl-2.1.3-py2.py3-none-any.whl and idna-3.4-py3-none-any.whl in the 2 subsequent runs.

Output

No response

Code of Conduct

@vanschelven vanschelven added S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior labels Nov 3, 2022
@uranusjr
Copy link
Member

uranusjr commented Nov 3, 2022

Before thinking about what may cause this, can I ask first why this is considered a problem? Do you have concrete examples this cause difficulties?

@vanschelven
Copy link
Contributor Author

As per the linked issue (about INFO spam): to reproduce that particular issue it would be nice if reproducing it once meant reproducing it always.

The more important case (for me) was the thing I was actually working on while running into the mentioned issue: when debugging a problem with a pip install invocation in some CI/CD pipeline, I was trying to zoom in on differences as one often does while debugging. That often means having a "known good" and "known bad" situation, comparing them, and trying to step-wise bring them closer and closer together while comparing outputs. In such a scenario it is very unhelpful if the outputs change all the time.

NB in the example above the differences are trivial (ordering of successful operations) but in the interesting case (failures) the differences may be more pronounced (i.e. more confusing)

@vanschelven
Copy link
Contributor Author

what may cause this

The liberal use of set() and frozenset() throughout the codebase come to mind... especially as part of the dependency resolution.

@pfmoore
Copy link
Member

pfmoore commented Nov 4, 2022

We’re not going to prohibit the use of sets/frozensets…

@vanschelven
Copy link
Contributor Author

We’re not going to prohibit the use of sets/frozensets…

Why not? Drop-in replacements which preserve order-of-adding would seem to be easy enough to add?

@pfmoore
Copy link
Member

pfmoore commented Nov 4, 2022

We don't have the interest or bandwidth to maintain (or vendor) an ordered set implementation when the stdlib supplies data structures that do what we need. IMO this issue isn't significant enough to justify that sort of maintenance overhead.

@vanschelven
Copy link
Contributor Author

vanschelven commented Nov 4, 2022 via email

@pfmoore
Copy link
Member

pfmoore commented Nov 4, 2022

It's my personal opinion, feel free to wait to see what the other pip maintainers think if you want. But it's not so much the initial contribution of the code that matters to me, it's the ongoing issue that we'd need to remember to not use sets anywhere in the codebase in future, just in case it results in nondeterministic behaviour. Also, how would we add a test to ensure that pip continues to behave deterministically? Without a test we couldn't be sure we wouldn't go back to being nondeterministic.

I just don't think the issue is serious enough to be worth working through all the implications involved in fixing it.

@notatallshaw
Copy link
Member

notatallshaw commented Nov 4, 2022

Wouldn't the easier fix be that when iterating through a collection that affects pips user presented ordering to use sorted with a key?

This already happens when choosing what package to backtrack on.

I'm sure then it would be possible to add a test that confirms this order is followed?

@pfmoore
Copy link
Member

pfmoore commented Nov 4, 2022

Agreed, something like that sounds far more plausible (although given that the reported issue was with the ordering of the "processing..." lines, I don't think that would help in the OP's case, as that would require changing the order of processing, not just the order of reporting).

@notatallshaw
Copy link
Member

notatallshaw commented Nov 4, 2022

Agreed my wording wasn't quite right, I meant whenever iterating through any collection that would affected the user.

With regards to the symptoms of this issue I have reproduced it myself just by running the same command several times:

python -m pip download -r requirements.txt -d downloads

It seems to me the top level packages are resolved in user order and the order of transitive dependencies is what can change. I'm trying to think of hypothetical situations where this could have significant user impact:

  • If 2 transitive dependencies that came from the same top level requirement had conflicting requirements, then depending on the order of processing a different backtracking path could be followed, which could in turn lead to one backtracking path resolving quickly and the other exploding in complexity

But I've never seen a user report of this so it seems like it isn't easily triggered if it is possible.

@vanschelven
Copy link
Contributor Author

vanschelven commented Nov 4, 2022

an ordered set implementation

In my naive mind any ordered dict could be used (e.g. storing None for all keys), i.e. in Python versions relevant to pip: a dict. but perhaps I'm missing something specific to pip that makes this hard.

Also, how would we add a test to ensure that pip continues to behave deterministically? Without a test we couldn't be sure we wouldn't go back to being nondeterministic.

In my experience testing deterministic code is much simpler than non-deterministic code (precisely because tests can rely on various orderings) both for happy paths and for pinpointing bugs (which may disappear on the next run in non-deterministic code). And testing that the code behaves deterministically basically comes for free, because you're very likely to start relying on the deterministic behaviors in your tests expectations.

In fact, such automatic reliance on deterministic behavior is so automatic, that if there's any arguments to be made against deterministic behavior in the context of testing, they would be quite opposite to the one you just made. Namely: [1] that in practice tests will start relying so much on deterministic behaviors that are in fact implementation details, that this makes refactoring of such implementation details harder and [2] that the fact that your code behaves more uniformly may obscure some cases of failures that would be easier to detect in a randomly behaving code-base. Still, in my mind these disadvantages are minor in comparison to the advantage of your code always behaving in the same way. If these were strong arguments, one would introduce randomness in as many locations as possible, and that's not something we tend to do for obvious reasons.

Wouldn't the easier fix be that when iterating through a collection that affects pips user presented ordering to use sorted with a key?

If the iterating itself does not have side-effects, this would indeed be simpler. I do seem to remember, however, that in some cases iterators were specifically introduced in the pip code-base to defer side-effects, so I'm not 100% sure this is possible (or at least: easier)

@pradyunsg
Copy link
Member

Instead of getting back into an extensive discussion about the nature of determistic vs non-deterministic order in data structures... I reckon it'd be a better investment of effort to investigate the specific cause of this issue.

@pradyunsg pradyunsg added C: dependency resolution About choosing which dependencies to install and removed S: needs triage Issues/PRs that need to be triaged labels Nov 4, 2022
@vanschelven
Copy link
Contributor Author

investigate the specific cause of this issue

If curiosity and spare time come together I might just do that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: dependency resolution About choosing which dependencies to install type: bug A confirmed bug or unintended behavior
Projects
None yet
Development

No branches or pull requests

5 participants