-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wheel 0.34 broke installing from sdist in docker (UnicodeEncodeError) #331
Comments
This error occurs when downloading and unpacking the wheel sdist file. It appears to be caused by a file in the sdist containing non-ascii characters in its path ( wheel-0.34.0.tar.gz does contain files with non-ascii characters:
So far as I can see, though, so did wheel 0.33.6 and 0.33.5
Wheel 0.33.4, however, does not have such a file. Can you let us know what previous versions of wheel you've used successfully in this way? |
In what environment are you installing wheel that does not support unicode file names? |
@agronholm We saw this in an |
Also seeing this in a
|
I was able to repro this with the |
FWIW, I can confirm this issue is still present in 0.34.1. In the past, I've had to remove unicode type things to make |
v0.34.1 was not intended to fix this issue. It was an emergency fix to handle generally failing sdist installs all over, and this particular issue existed before v0.34 so I didn't deem it critical since it was only reported now. I mistakenly tagged this issue in the commit since it was reported at the same time with the general sdist problem. The fix will be to either generate |
Sorry, wasn't trying to imply it should have been fixed in 0.34.1, just wanted to add information that I double checked it still existed given.
It's strange that it only showed up for us when 0.34 was released. We have CI that runs every day so it seems a strange coincidence that it started showing up for us 4 hours after 0.34. Perhaps some other change to |
It's strange indeed, but I just tested and the v0.33.6 sdist installs cleanly without unicode support but v0.34.1 doesn't. And they both have the same problematic file. |
One time, I found that a README file actually had a "unicode space" and that was being read into the |
The contents don't matter. The path names do. See how the traceback ends?
It's failing to create a file because it can't encode the path name to bytes using ASCII. |
As expected, the problematic file was |
I don't find any relevant changes in pip's unpacking code since at least 19.2.3. |
Pip is not the problem here since unpacking fails for v0.34.1 whereas v0.33.6 succeeds using the exact same pip version. |
The two tar files appear to have been created differently: $ file wheel-0.33.6.tar wheel-0.34.0.tar
wheel-0.33.6.tar: POSIX tar archive (GNU)
wheel-0.34.0.tar: POSIX tar archive |
Curious, |
I get very different results for the file path in question reading wheel-0.34.0 using BSD tar vs using GNU tar. I get the same results for wheel-0.33.6. I expect something different with non-ascii filename encoding is happening in the non-GNU format archive leading to the path bytes being interpreted differently. |
This probably explains it: https://docs.python.org/3/library/tarfile.html#tarfile.DEFAULT_FORMAT |
It's the former. If I package the sdist using Python 3.8, extraction fails in the Ubuntu container. If I package it with 3.7, extraction works. Next it fails because the ancient version of setuptools is trying to read |
The following quote from PEP 517 is tangentially relevant:
Basically sdist format was traditionally woefully underspecified. And one area that wasn't defined was how filenames are encoded. So basically non-ASCII filenames in sdists were traditionally non-portable. That section in PEP 517 makes the file format explicit - although obviously there are a lot of legacy sdists out there. Getting encodings right is an ongoing issue in packaging standards - we're getting there but unless you can ignore older files/systems, "anything other than ASCII is risky" remains the case 🙁 |
I worked around this by using Python 3.7 for creating the packages and replacing non-ASCII characters in setup.cfg. |
On that Ubuntu container, the newly created 0.34.1 sdist installed fine for Python 3, but not for Python 2:
|
I'm sad that this means you can't enter your own name without workarounds. But thanks for digging into this and working out what was going on :-) |
@johnthagen does the fix committed to master solve the problem for you? |
The pipeline where this fails pulls the pre-packaged sdist from PyPI using Worst case I could test immediately after the next wheel version is uploaded to PyPI. |
will fixed version be released on pypi? |
I was hoping to get confirmation that the fix works as expected. If this is not feasible, I'll just have to make a release and hope it works. @johnthagen How about |
Just noting that I filed pypa/pip#7667 for the underlying issue in pip that the wheel tarball format change revealed. |
Seconding @vlanse - it would be great to have a release with this fix. |
I was hoping to first get a confirmation for the fix but I guess that's not coming. I'll make a new release. |
@agronholm if you run an install under "LANG=C" with Python 3.6, that should let you test the failure locally. Later versions will also exhibit the same failure if you turn off locale coercion (I don't recall the exact env var for that - something like "PYTHONCOERCECLOCALE=0") |
@agronholm Confirmed the issue is resolved for me with Thanks!! 👍 |
@agronholm Can confirm the issue is fixed in 🎉 |
Thanks @agronholm! |
This doesn't seem to be fully fixed. On Python2:
|
How can I reproduce this then? |
I'm using Python 2.7.5 (a HPC system, this is used to bootstrap and hopefully won't be needed elsewise) to install setuptools 44.0.0, then wheel 0.34.2. Both done by downloading the tarball from PyPi and running:
Edit some more info: The problem seems to be the file tests/testdata/unicode.dist/unicodedist/åäö_日本語.py found by printf-debugging the installation process. 0.33.6 works, 0.34.0 fails, both have that file. Oh and it seems you need a "good" locale, with LC_ALL=C the file will be skipped and doesn't end up in the problematic code because it can't be encoded in ANSI_X3.4-1968. |
Is there a reason why you can't use wheels or pip? |
Because this is used to install pip. As mentioned this is part of some bootstrapping, so I'm installing setuptools, wheel, then pip |
Also, you said that 0.34.0 fails. Sure it fails because the tarball was packaged in a manner which causes installation issues on non-unicode capable systems, but 0.34.2 fixes that problem. So which version did you actually try with? |
I tried 0.33.6, 0.34.0 and 0.34.2 FWIW I found the change which led to the issue: e29a5bd It adds This leads to a mix of str and unicode paths at https://github.com/pypa/setuptools/blob/v44.0.0/setuptools/command/bdist_egg.py#L321 (path is str, prefix is unicode) and then this issue when trying to convert one to the other which it didn't before. Might be rather a setuptools issue with python2 :/ |
We download
wheel
as a sdist and install it into avenv
inside a docker container. As of 0.34, something changed so that we get aUnicodeDecodeError
. Perhaps a unicode character slipped into a README or CHANGELOG? I'll try to get more details as I track it down further, but sincewheel 0.34
was just released 4 hours ago, I wanted to get this report in as soon as I could.The text was updated successfully, but these errors were encountered: