Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wheel 0.34 broke installing from sdist in docker (UnicodeEncodeError) #331

Closed
johnthagen opened this issue Jan 27, 2020 · 42 comments
Closed

Comments

@johnthagen
Copy link

johnthagen commented Jan 27, 2020

We download wheel as a sdist and install it into a venv inside a docker container. As of 0.34, something changed so that we get a UnicodeDecodeError. Perhaps a unicode character slipped into a README or CHANGELOG? I'll try to get more details as I track it down further, but since wheel 0.34 was just released 4 hours ago, I wanted to get this report in as soon as I could.

954 pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-6gct78aj/overlay --no-warn-script-location --no-binary :all: -i https://pypi.org/simple -- setuptools wheel
955        cwd: None
956   Complete output (27 lines):
957   Collecting setuptools
958     Downloading setuptools-45.1.0.zip (859 kB)
959   Collecting wheel
960     Downloading wheel-0.34.0.tar.gz (60 kB)
961   ERROR: Exception:
962   Traceback (most recent call last):
963     File ".tox/bundle/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 186, in _main
964       status = self.run(options, args)
965     File ".tox/bundle/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 331, in run
966       resolver.resolve(requirement_set)
967     File ".tox/bundle/lib/python3.6/site-packages/pip/_internal/legacy_resolve.py", line 177, in resolve
968       discovered_reqs.extend(self._resolve_one(requirement_set, req))
969     File ".tox/bundle/lib/python3.6/site-packages/pip/_internal/legacy_resolve.py", line 333, in _resolve_one
970       abstract_dist = self._get_abstract_dist_for(req_to_install)
971     File ".tox/bundle/lib/python3.6/site-packages/pip/_internal/legacy_resolve.py", line 282, in _get_abstract_dist_for
972       abstract_dist = self.preparer.prepare_linked_requirement(req)
973     File ".tox/bundle/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 482, in prepare_linked_requirement
974       hashes=hashes,
975     File ".tox/bundle/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 287, in unpack_url
976       hashes=hashes,
977     File ".tox/bundle/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 164, in unpack_http_url
978       unpack_file(from_path, location, content_type)
979     File "].tox/bundle/lib/python3.6/site-packages/pip/_internal/utils/unpacking.py", line 261, in unpack_file
980       untar_file(filename, location)
981     File ".tox/bundle/lib/python3.6/site-packages/pip/_internal/utils/unpacking.py", line 222, in untar_file
982       with open(path, 'wb') as destfp:
983   UnicodeEncodeError: 'ascii' codec can't encode characters in position 72-74: ordinal not in range(128)
@johnthagen johnthagen changed the title Wheel 0.34 broke installing from sdist in docker Wheel 0.34 broke installing from sdist in docker (UnicodeEncodeError) Jan 27, 2020
agronholm added a commit that referenced this issue Jan 27, 2020
@jamadden
Copy link
Member

This error occurs when downloading and unpacking the wheel sdist file. It appears to be caused by a file in the sdist containing non-ascii characters in its path (open(path) does complex things to a path to get it into a form for the filesystem, depending on the operating system and things like sys.getfilesystemencoding, which in turn depends on the locale setting on unix).

wheel-0.34.0.tar.gz does contain files with non-ascii characters:

  • wheel-0.34.0/tests/testdata/unicode.dist/unicodedist/åäö_日本語.py

So far as I can see, though, so did wheel 0.33.6 and 0.33.5

  • wheel-0.33.6/tests/testdata/unicode.dist/unicodedist/åäö_日本語.py

Wheel 0.33.4, however, does not have such a file. Can you let us know what previous versions of wheel you've used successfully in this way?

@agronholm
Copy link
Contributor

In what environment are you installing wheel that does not support unicode file names?

@johnthagen
Copy link
Author

@agronholm We saw this in an ubuntu:18.04 docker image running on GitLab CI.

@maafk
Copy link

maafk commented Jan 27, 2020

Also seeing this in a lambci/lambda:build-python3.6 docker image running in AWS Codebuild

1088 |Collecting python-dateutil<3.0.0,>=2.1
1089 | Downloading python-dateutil-2.8.1.tar.gz (331 kB)
1090 | Installing build dependencies: started
1091 | Installing build dependencies: finished with status 'error'
1092 | ERROR: Command errored out with exit status 2:
command: /codebuild/output/src308665397/src/build/out/pyenv/versions/3.6.1/bin/python /codebuild/output/src308665397/src/build/out/pyenv/versions/3.6.1/lib/python3.6/site-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-wpdzby1b/overlay --no-warn-script-location --no-binary :all: --only-binary :none: -i https://pypi.org/simple -- 'setuptools; python_version != '"'"'3.3'"'"'' 'setuptools<40.0; python_version == '"'"'3.3'"'"'' wheel setuptools_scm
1094 | cwd: None
1095 | Complete output (28 lines):
1096 | Ignoring setuptools: markers 'python_version == "3.3"' don't match your environment
1097 | Collecting setuptools
1098 | Downloading setuptools-45.1.0.zip (859 kB)
1099 | Collecting wheel
1100 | Downloading wheel-0.34.1.tar.gz (55 kB)
1101 | ERROR: Exception:
1102 | Traceback (most recent call last):
1103 | File "/codebuild/output/src308665397/src/build/out/pyenv/versions/3.6.1/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 186, in _main
1104 | status = self.run(options, args)
1105 | File "/codebuild/output/src308665397/src/build/out/pyenv/versions/3.6.1/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 331, in run
1106 | resolver.resolve(requirement_set)
1107 | File "/codebuild/output/src308665397/src/build/out/pyenv/versions/3.6.1/lib/python3.6/site-packages/pip/_internal/legacy_resolve.py", line 177, in resolve
1108 | discovered_reqs.extend(self._resolve_one(requirement_set, req))
1109 | File "/codebuild/output/src308665397/src/build/out/pyenv/versions/3.6.1/lib/python3.6/site-packages/pip/_internal/legacy_resolve.py", line 333, in _resolve_one
1110 | abstract_dist = self._get_abstract_dist_for(req_to_install)
1111 | File "/codebuild/output/src308665397/src/build/out/pyenv/versions/3.6.1/lib/python3.6/site-packages/pip/_internal/legacy_resolve.py", line 282, in _get_abstract_dist_for
1112 | abstract_dist = self.preparer.prepare_linked_requirement(req)
1113 | File "/codebuild/output/src308665397/src/build/out/pyenv/versions/3.6.1/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 482, in prepare_linked_requirement
1114 | hashes=hashes,
1115 | File "/codebuild/output/src308665397/src/build/out/pyenv/versions/3.6.1/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 287, in unpack_url
1116 | hashes=hashes,
1117 | File "/codebuild/output/src308665397/src/build/out/pyenv/versions/3.6.1/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 164, in unpack_http_url
1118 | unpack_file(from_path, location, content_type)
1119 | File "/codebuild/output/src308665397/src/build/out/pyenv/versions/3.6.1/lib/python3.6/site-packages/pip/_internal/utils/unpacking.py", line 261, in unpack_file
1120 | untar_file(filename, location)
1121 | File "/codebuild/output/src308665397/src/build/out/pyenv/versions/3.6.1/lib/python3.6/site-packages/pip/_internal/utils/unpacking.py", line 222, in untar_file
1122 | with open(path, 'wb') as destfp:
1123 | UnicodeEncodeError: 'ascii' codec can't encode characters in position 72-74: ordinal not in range(128)
1124 | ----------------------------------------

@agronholm
Copy link
Contributor

I was able to repro this with the ubuntu:18.04 image. Is the only recourse to remove the module with the non-ASCII name from the project then?

@johnthagen
Copy link
Author

johnthagen commented Jan 28, 2020

FWIW, I can confirm this issue is still present in 0.34.1.

In the past, I've had to remove unicode type things to make sdist install reliably on all platforms. I've seen similar issue (with other projects, not wheel) when installing on Windows. In that case it was reading from a README file that had unicode characters in it, but the locale or something was wrong on Windows. I resolved it by keeping the sdist clean of unicode interactions.

@agronholm
Copy link
Contributor

v0.34.1 was not intended to fix this issue. It was an emergency fix to handle generally failing sdist installs all over, and this particular issue existed before v0.34 so I didn't deem it critical since it was only reported now. I mistakenly tagged this issue in the commit since it was reported at the same time with the general sdist problem.

The fix will be to either generate unicode.dist on the fly, or leave it out entirely.

@johnthagen
Copy link
Author

v0.34.1 was not intended to fix this issue.

Sorry, wasn't trying to imply it should have been fixed in 0.34.1, just wanted to add information that I double checked it still existed given.

and this particular issue existed before v0.34

It's strange that it only showed up for us when 0.34 was released. We have CI that runs every day so it seems a strange coincidence that it started showing up for us 4 hours after 0.34. Perhaps some other change to pip, setuptools, etc. contributed to it just showing up now.

@agronholm
Copy link
Contributor

It's strange indeed, but I just tested and the v0.33.6 sdist installs cleanly without unicode support but v0.34.1 doesn't. And they both have the same problematic file.

@johnthagen
Copy link
Author

It's strange indeed, but I just tested and the v0.33.6 sdist installs cleanly without unicode support but v0.34.1 doesn't. And they both have the same problematic file.

One time, I found that a README file actually had a "unicode space" and that was being read into the long_description tool by setuptools. Doubt that is the issue here, but just sharing as an example that it can be very subtle to track down. Perhaps we should look at all the diffs between 0.33.6 and 0.34.0 and see what text files changed (and which might be touched by sdist).

@agronholm
Copy link
Contributor

The contents don't matter. The path names do. See how the traceback ends?

  File "/usr/lib/python3/dist-packages/pip/utils/__init__.py", line 595, in untar_file
    with open(path, 'wb') as destfp:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 64-66: ordinal not in range(128)

It's failing to create a file because it can't encode the path name to bytes using ASCII.

@agronholm
Copy link
Contributor

As expected, the problematic file was åäö_日本語.py. Next I'll figure out why it doesn't fail for 0.33.6.

@jamadden
Copy link
Member

I don't find any relevant changes in pip's unpacking code since at least 19.2.3.

@agronholm
Copy link
Contributor

Pip is not the problem here since unpacking fails for v0.34.1 whereas v0.33.6 succeeds using the exact same pip version.

@jamadden
Copy link
Member

The two tar files appear to have been created differently:

$ file wheel-0.33.6.tar wheel-0.34.0.tar
wheel-0.33.6.tar: POSIX tar archive (GNU)
wheel-0.34.0.tar: POSIX tar archive

@agronholm
Copy link
Contributor

Curious, len(path) reports 83 for that file in 0.33.6 whereas for 0.34.1 it reports 60. What's going on?

@jamadden
Copy link
Member

I get very different results for the file path in question reading wheel-0.34.0 using BSD tar vs using GNU tar. I get the same results for wheel-0.33.6. I expect something different with non-ascii filename encoding is happening in the non-GNU format archive leading to the path bytes being interpreted differently.

@agronholm
Copy link
Contributor

@agronholm
Copy link
Contributor

@agronholm
Copy link
Contributor

It's the former. If I package the sdist using Python 3.8, extraction fails in the Ubuntu container. If I package it with 3.7, extraction works. Next it fails because the ancient version of setuptools is trying to read setup.cfg as ASCII (# coding: utf-8 does not help).

@pfmoore
Copy link
Member

pfmoore commented Jan 28, 2020

The following quote from PEP 517 is tangentially relevant:

The generated tarball should use the modern POSIX.1-2001 pax tar format, which specifies UTF-8 based file names. This is not yet the default for the tarfile module shipped with Python 3.6, so backends using the tarfile module need to explicitly pass format=tarfile.PAX_FORMAT.

Basically sdist format was traditionally woefully underspecified. And one area that wasn't defined was how filenames are encoded. So basically non-ASCII filenames in sdists were traditionally non-portable. That section in PEP 517 makes the file format explicit - although obviously there are a lot of legacy sdists out there.

Getting encodings right is an ongoing issue in packaging standards - we're getting there but unless you can ignore older files/systems, "anything other than ASCII is risky" remains the case 🙁

@agronholm
Copy link
Contributor

I worked around this by using Python 3.7 for creating the packages and replacing non-ASCII characters in setup.cfg.

@agronholm
Copy link
Contributor

On that Ubuntu container, the newly created 0.34.1 sdist installed fine for Python 3, but not for Python 2:

  copying build/lib.linux-x86_64-2.7/wheel/__init__.py -> build/bdist.linux-x86_64/wheel/wheel
  copying build/lib.linux-x86_64-2.7/wheel/pkginfo.py -> build/bdist.linux-x86_64/wheel/wheel
  copying build/lib.linux-x86_64-2.7/wheel/_version.py -> build/bdist.linux-x86_64/wheel/wheel
  copying build/lib.linux-x86_64-2.7/wheel/__main__.py -> build/bdist.linux-x86_64/wheel/wheel
  copying build/lib.linux-x86_64-2.7/wheel/pep425tags.py -> build/bdist.linux-x86_64/wheel/wheel
  copying build/lib.linux-x86_64-2.7/wheel/bdist_wheel.py -> build/bdist.linux-x86_64/wheel/wheel
  copying build/lib.linux-x86_64-2.7/wheel/wheelfile.py -> build/bdist.linux-x86_64/wheel/wheel
  copying build/lib.linux-x86_64-2.7/wheel/macosx_libfile.py -> build/bdist.linux-x86_64/wheel/wheel
  running install_egg_info
  error: 'egg_base' must be a directory name (got `src`)

@pfmoore
Copy link
Member

pfmoore commented Jan 28, 2020

I'm sad that this means you can't enter your own name without workarounds. But thanks for digging into this and working out what was going on :-)

@agronholm
Copy link
Contributor

@johnthagen does the fix committed to master solve the problem for you?

@johnthagen
Copy link
Author

The pipeline where this fails pulls the pre-packaged sdist from PyPI using pip download. I'm not sure how to exactly emulate this with a git repo.

Worst case I could test immediately after the next wheel version is uploaded to PyPI.

@vlanse
Copy link

vlanse commented Jan 28, 2020

will fixed version be released on pypi?

@agronholm
Copy link
Contributor

I was hoping to get confirmation that the fix works as expected. If this is not feasible, I'll just have to make a release and hope it works.

@johnthagen How about pip download https://github.com/pypa/wheel/archive/master.zip?

@ncoghlan
Copy link
Member

Just noting that I filed pypa/pip#7667 for the underlying issue in pip that the wheel tarball format change revealed.

@zenmonkeykstop
Copy link

Seconding @vlanse - it would be great to have a release with this fix.

@agronholm
Copy link
Contributor

I was hoping to first get a confirmation for the fix but I guess that's not coming. I'll make a new release.

@ncoghlan
Copy link
Member

@agronholm if you run an install under "LANG=C" with Python 3.6, that should let you test the failure locally. Later versions will also exhibit the same failure if you turn off locale coercion (I don't recall the exact env var for that - something like "PYTHONCOERCECLOCALE=0")

@maafk
Copy link

maafk commented Jan 31, 2020

@agronholm Confirmed the issue is resolved for me with 0.34.2 release

Thanks!! 👍

@johnthagen
Copy link
Author

@agronholm Can confirm the issue is fixed in 0.34.2 as well!

🎉

@zenmonkeykstop
Copy link

Thanks @agronholm!

@Flamefire
Copy link

This doesn't seem to be fully fixed. On Python2:

Traceback (most recent call last):
  File "setup.py", line 4, in <module>
    setup(maintainer=u'Alex Grnholm')
  File "build/bdist.linux-x86_64/egg/setuptools/__init__.py", line 145, in setup
  File "/usr/lib64/python2.7/distutils/core.py", line 152, in setup
    dist.run_commands()
  File "/usr/lib64/python2.7/distutils/dist.py", line 953, in run_commands
    self.run_command(cmd)
  File "/usr/lib64/python2.7/distutils/dist.py", line 972, in run_command
    cmd_obj.run()
  File "build/bdist.linux-x86_64/egg/setuptools/command/install.py", line 67, in run
  File "build/bdist.linux-x86_64/egg/setuptools/command/install.py", line 109, in do_egg_install
  File "/usr/lib64/python2.7/distutils/cmd.py", line 326, in run_command
    self.distribution.run_command(command)
  File "/usr/lib64/python2.7/distutils/dist.py", line 972, in run_command
    cmd_obj.run()
  File "build/bdist.linux-x86_64/egg/setuptools/command/bdist_egg.py", line 204, in run
  File "build/bdist.linux-x86_64/egg/setuptools/command/bdist_egg.py", line 321, in copy_metadata_to
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 40: ordinal not in range(128)

@agronholm
Copy link
Contributor

How can I reproduce this then?

@Flamefire
Copy link

Flamefire commented Feb 17, 2020

I'm using Python 2.7.5 (a HPC system, this is used to bootstrap and hopefully won't be needed elsewise) to install setuptools 44.0.0, then wheel 0.34.2. Both done by downloading the tarball from PyPi and running:

  • python setup.py build
  • python setup.py install --prefix=<somedir>

Edit some more info: The problem seems to be the file tests/testdata/unicode.dist/unicodedist/åäö_日本語.py found by printf-debugging the installation process. 0.33.6 works, 0.34.0 fails, both have that file. Oh and it seems you need a "good" locale, with LC_ALL=C the file will be skipped and doesn't end up in the problematic code because it can't be encoded in ANSI_X3.4-1968.

@agronholm
Copy link
Contributor

Is there a reason why you can't use wheels or pip?

@Flamefire
Copy link

Flamefire commented Feb 17, 2020

Because this is used to install pip. As mentioned this is part of some bootstrapping, so I'm installing setuptools, wheel, then pip

@agronholm
Copy link
Contributor

Also, you said that 0.34.0 fails. Sure it fails because the tarball was packaged in a manner which causes installation issues on non-unicode capable systems, but 0.34.2 fixes that problem. So which version did you actually try with?

@Flamefire
Copy link

I tried 0.33.6, 0.34.0 and 0.34.2

FWIW I found the change which led to the issue: e29a5bd

It adds package_dir={'': 'src'} which is picked up by setuptools as type unicode which leads to an egg-info folder name of type unicode while it was str before. See https://github.com/pypa/setuptools/blob/v44.0.0/setuptools/command/egg_info.py#L218 and https://github.com/pypa/setuptools/blob/v44.0.0/setuptools/command/egg_info.py#L223

This leads to a mix of str and unicode paths at https://github.com/pypa/setuptools/blob/v44.0.0/setuptools/command/bdist_egg.py#L321 (path is str, prefix is unicode) and then this issue when trying to convert one to the other which it didn't before. Might be rather a setuptools issue with python2 :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants