Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'pip inspect' crashes with python-3.10.10 and pip-23.0.0 on Windows 11 (an utf-8 problem) #11798

Closed
1 task done
stonebig opened this issue Feb 12, 2023 · 18 comments
Closed
1 task done
Labels
C: output Related to what pip prints kind: crash For situations where pip crashes

Comments

@stonebig
Copy link
Contributor

stonebig commented Feb 12, 2023

Description

trying pip inspect >> text.txt on my big list of wheels, I get the following crash:

pip install yarl==1.7.2
pip inspect>>test.txt
exit_buffer
    self._check_buffer()
  File "C:\WinP\bd310\budot\WPy32-310100b1\python-3.10.10\lib\site-packages\pip\_vendor\rich\console.py", line 2024, in _check_buffer
    legacy_windows_render(buffer, LegacyWindowsTerm(self.file))
  File "C:\WinP\bd310\budot\WPy32-310100b1\python-3.10.10\lib\site-packages\pip\_vendor\rich\_windows_renderer.py", line 17, in legacy_windows_render
    term.write_styled(text, style)
  File "C:\WinP\bd310\budot\WPy32-310100b1\python-3.10.10\lib\site-packages\pip\_vendor\rich\_win32_console.py", line 442, in write_styled
    self.write_text(text)
  File "C:\WinP\bd310\budot\WPy32-310100b1\python-3.10.10\lib\site-packages\pip\_vendor\rich\_win32_console.py", line 403, in write_text
    self.write(text)
  File "C:\WinP\bd310\budot\WPy32-310100b1\python-3.10.10\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 1843-1846: character maps to <undefined>

Expected behavior

no crash

pip version

pip-23.0.0

Python version

cpython-3.10.10

OS

Windows11

How to Reproduce

you may try on windows

pip install yarl==1.7.2
pip inspect>>test.txt

I was using this list of packages:

pip list
Package Version


idna 3.1
msvc-runtime 14.32.31326
multidict 6.0.2
pip 23.0
setuptools 67.2.0
simpy 4.0.1
sqlite-bro 0.12.2
wheel 0.38.4
winpython 6.0.20230212
yarl 1.7.2

Output

Code of Conduct

@stonebig stonebig added S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior labels Feb 12, 2023
@stonebig stonebig changed the title 'pip inspect' crashes with ptyhon-3.10.10 and pip-23.0.0 on Windows 11 (a rich problem ?) 'pip inspect' crashes with python-3.10.10 and pip-23.0.0 on Windows 11 (a rich problem ?) Feb 12, 2023
@stonebig
Copy link
Contributor Author

stonebig commented Feb 12, 2023

hum, with this exception addition in the code, I can find the package causing the crash: yarl

image

    {
      "metadata": {
        "metadata_version": "2.1",
        "name": "zarr",
        "version": "2.11.3",
        "platform": [
          "UNKNOWN"
        ],
        "summary": "An implementation of chunked, compressed, N-dimensional arrays for Python.",
        "description_content_type": "text/markdown",
        "home_page": "https://github.com/zarr-developers/zarr-python",
        "maintainer": "Alistair Miles",
        "maintainer_email": "[email protected]",
        "license": "MIT",
        "classifier": [
          "Development Status :: 6 - Mature",
          "Intended Audience :: Developers",
          "Intended Audience :: Information Technology",
          "Intended Audience :: Science/Research",
          "License :: OSI Approved :: MIT License",
          "Programming Language :: Python",
          "Topic :: Software Development :: Libraries :: Python Modules",
          "Operating System :: Unix",
          "Programming Language :: Python :: 3",
          "Programming Language :: Python :: 3.7",
          "Programming Language :: Python :: 3.8",
          "Programming Language :: Python :: 3.9"
        ],
        "requires_dist": [
          "asciitree",
          "numpy (>=1.7)",
          "fasteners",
          "numcodecs (>=0.6.4)",
          "notebook ; extra == 'jupyter'",
          "ipytree ; extra == 'jupyter'"
        ],
        "requires_python": ">=3.7, <4",
        "provides_extra": [
          "jupyter"
        ],
        "description": "<div align=\"center\">\n  <img src=\"https://raw.githubusercontent.com/zarr-developers/community/master/logos/logo2.png\"><br>\n</div>\n\n# Zarr\n\n<table>\n<tr>\n  <td>Latest Release</td>\n  <td>\n    <a href=\"https://pypi.org/project/zarr/\">\n    <img src=\"https://badge.fury.io/py/zarr.svg\" alt=\"latest release\" />\n    </a>\n  </td>\n</tr>\n  <td></td>\n  <td>\n    <a href=\"https://anaconda.org/anaconda/zarr/\">\n    <img src=\"https://anaconda.org/conda-forge/zarr/badges/version.svg\" alt=\"latest release\" />\n    </a>\n</td>\n</tr>\n<tr>\n  <td>Package Status</td>\n  <td>\n\t\t<a href=\"https://pypi.org/project/zarr/\">\n\t\t<img src=\"https://img.shields.io/pypi/status/zarr.svg\" alt=\"status\" />\n\t\t</a>\n  </td>\n</tr>\n<tr>\n  <td>License</td>\n  <td>\n    <a href=\"https://github.com/zarr-developers/zarr-python/blob/master/LICENSE\">\n    <img src=\"https://img.shields.io/pypi/l/zarr.svg\" alt=\"license\" />\n    </a>\n</td>\n</tr>\n<tr>\n  <td>Build Status</td>\n  <td>\n    <a href=\"https://travis-ci.org/zarr-developers/zarr-python\">\n    <img src=\"https://travis-ci.org/zarr-developers/zarr-python.svg?branch=master\" alt=\"travis build status\" />\n    </a>\n  </td>\n</tr>\n<tr>\n  <td>Coverage</td>\n  <td>\n    <a href=\"https://codecov.io/gh/zarr-developers/zarr-python\">\n    <img src=\"https://codecov.io/gh/zarr-developers/zarr-python/branch/master/graph/badge.svg\"/ alt=\"coverage\">\n    </a>\n  </td>\n</tr>\n<tr>\n  <td>Downloads</td>\n  <td>\n    <a href=\"https://zarr.readthedocs.io\">\n    <img src=\"https://pepy.tech/badge/zarr\" alt=\"pypi downloads\" />\n    </a>\n  </td>\n</tr>\n<tr>\n\t<td>Gitter</td>\n\t<td>\n\t\t<a href=\"https://gitter.im/zarr-developers/community\">\n\t\t<img src=\"https://badges.gitter.im/zarr-developers/community.svg\" />\n\t\t</a>\n\t</td>\n</tr>\n<tr>\n\t<td>Citation</td>\n\t<td>\n\t\t<a href=\"https://doi.org/10.5281/zenodo.3773450\">\n\t\t\t<img src=\"https://zenodo.org/badge/DOI/10.5281/zenodo.3773450.svg\" alt=\"DOI\">\n\t\t</a>\n\t</td>\n</tr>\n\n</table>\n\n## What is it?\n\nZarr is a Python package providing an implementation of compressed, chunked, N-dimensional arrays, designed for use in parallel computing. See the [documentation](https://zarr.readthedocs.io) for more information.\n\n## Main Features\n\n- [**Create**](https://zarr.readthedocs.io/en/stable/tutorial.html#creating-an-array) N-dimensional arrays with any NumPy `dtype`.\n- [**Chunk arrays**](https://zarr.readthedocs.io/en/stable/tutorial.html#chunk-optimizations) along any dimension.\n- [**Compress**](https://zarr.readthedocs.io/en/stable/tutorial.html#compressors) and/or filter chunks using any NumCodecs codec.\n- [**Store arrays**](https://zarr.readthedocs.io/en/stable/tutorial.html#tutorial-storage) in memory, on disk, inside a zip file, on S3, etc...\n- [**Read**](https://zarr.readthedocs.io/en/stable/tutorial.html#reading-and-writing-data) an array [**concurrently**](https://zarr.readthedocs.io/en/stable/tutorial.html#parallel-computing-and-synchronization) from multiple threads or processes.\n- Write to an array concurrently from multiple threads or processes.\n- Organize arrays into hierarchies via [**groups**](https://zarr.readthedocs.io/en/stable/tutorial.html#groups).\n\n## Where to get it\n\nZarr can be installed from PyPI using `pip`:\n\n```bash\npip install zarr\n```\n\nor via `conda`:\n\n```bash\nconda install -c conda-forge zarr\n```\n\nFor more details, including how to install from source, see the [installation documentation](https://zarr.readthedocs.io/en/stable/#installation).\n\n\n"
      },
      "metadata_location": "c:\\winp\\bd310\\bu\\wpy64-310100b1\\python-3.10.10.amd64\\lib\\site-packages\\zarr-2.11.3.dist-info",
      "installer": "pip",
      "requested": true
    },
    {
      "metadata": {
        "metadata_version": "2.1",
        "name": "yarl",
        "version": "1.7.2",
        "platform": [
          "UNKNOWN"
        ],
        "summary": "Yet another URL library",
        "description_content_type": "text/x-rst",
        "home_page": "https://github.com/aio-libs/yarl/",
        "author": "Andrew Svetlov",
        "author_email": "[email protected]",
        "license": "Apache 2",
        "classifier": [
          "License :: OSI Approved :: Apache Software License",
          "Intended Audience :: Developers",
          "Programming Language :: Python",
          "Programming Language :: Python :: 3",
          "Programming Language :: Python :: 3.6",
          "Programming Language :: Python :: 3.7",
          "Programming Language :: Python :: 3.8",
          "Programming Language :: Python :: 3.9",
          "Programming Language :: Python :: 3.10",
          "Topic :: Internet :: WWW/HTTP"
        ],
        "requires_dist": [
          "multidict (>=4.0)",
          "idna (>=2.0)",
          "typing-extensions (>=3.7.4) ; python_version < \"3.8\""
        ],
        "requires_python": ">=3.6",
        "description": zzzzzzzzzzzzzzzzzzzzzz
      },
      "metadata_location": "c:\\winp\\bd310\\bu\\wpy64-310100b1\\python-3.10.10.amd64\\lib\\site-packages\\yarl-1.7.2.dist-info",
      "installer": "pip",
      "requested": false
    },````

@stonebig
Copy link
Contributor Author

stonebig commented Feb 12, 2023

well in fact, i have 61 'zzzzzzzzzzzzzzzzzzzzzz' line, so 61 packages have a description that do crash the thing when using my full list of about 600 packages

@stonebig
Copy link
Contributor Author

stonebig commented Feb 12, 2023

simple reproducer shall be, open a windows console (dos or powershell):

pip install yarl==1.7.2
pip inspect>>test.txt

@pfmoore
Copy link
Member

pfmoore commented Feb 12, 2023

It doesn't fail for me. I imagine it's Unicode characters in the description, and when you redirect output in Windows to a file, the encoding is not UTF-8. Probably something to do with your environment that I can't reproduce.

@stonebig
Copy link
Contributor Author

I'm using "pip inspect" as it is the only API provided per pip, so it shall not crash, at least put a 'zzzzzzzzz' on the output problem.

@pfmoore
Copy link
Member

pfmoore commented Feb 12, 2023

All I can suggest at this point:

  1. Try to establish how your environment is different from mine, as unless a pip maintainer can reproduce the issue, there's not much we can do. For example, where did you get your copy of Python from? Is it the standard python.org build?
  2. As the problem is an exception in rich, report it to rich, with a reproducer that doesn't involve pip (so that they can see the issue clearly)
  3. Provide enough of a traceback to include a line of code from pip, rather than from its vendored dependencies. With that, I'm willing to try again to see if I can get any further.

But please understand, that encoding errors are almost always a problem with your configuration. Pip simply prints the metadata, and if that data includes characters which are unprintable in your environment, doing so will always fail (or produce mojibake, which is not something we would want to do).

@willmcgugan
Copy link

This is a reoccurring class of problem on Windows, not exclusively a Rich issue. Essentially sys.stdout.encoding is reporting "CP-1252", apparently incorrectly (and I don't know why). You would likely get the same error with a plain old print.

Rich dutifully uses the reported encoding when writing output, and you get an encoding error. Setting the env var PYTHONIOENCODING="utf-8" may fix it.

Frustratingly, Rich has done the right thing here by respecting the encoding. But its not great for dev or users.

Rich could set errors="replace" which means that you would get question marks in place of some characters. Rich could also force encoding to "utf-8" which may work around any misconfiguration on the environment. Both changes risk causing issues for others.

@pfmoore Do you have any insight on the best way for Rich to tackle this?

@stonebig
Copy link
Contributor Author

stonebig commented Feb 12, 2023

on my example, set PYTHONIOENCODING=utf-8 does workaround the issue (tested on python-3.11.2), but I don't know of possible side effects, I need to explore.

Windows was utf-8-SIG until recently, to not blind guess the pagecode

I see in PEP-686 it is becoming the intended default (for 3.12 ?) https://peps.python.org/pep-0686/

https://discuss.python.org/t/pep-686-make-utf-8-mode-default-round-2/14737/15

@pfmoore
Copy link
Member

pfmoore commented Feb 12, 2023

Personally, I think that I'm happy with the current approach. I'd rather display things accurately, or fail clearly, than potentially mangle output. As @stonebig said, longer term the issue will be fixed in core Python.

I'm still curious what it is about @stonebig's environment that makes it not use UTF-8 by default. I tried to reproduce it, on Windows 11, with Python 3.10.6 and 3.11.0 from python.org, in Powershell in Windows Terminal, and it worked fine. But given that it seems to be cause by something specific in the user's environment, I'm inclined not to worry about it from pip's point of view.

@stonebig
Copy link
Contributor Author

stonebig commented Feb 12, 2023

i'm using launcher icons from an old nsis.exe to launch my environment via even by obscure old windows script tech....
... a long time ago, I hoped briefly that an even more obscure project https://github.com/pfmoore/shimmy would gave me a simpler way.

I guess at a moment, all will be a ProgressiveWebApp and these futil pre-historic questions will go away https://stonebig.github.io/my-demo2/lab/index.html .... replaced per new problems

@pfmoore
Copy link
Member

pfmoore commented Feb 12, 2023

Sorry, I don't follow what you're saying here. Are you not running a copy of pip installed in a standard Python virtual environment? You gave no indication of that in your bug report 🙁

@stonebig
Copy link
Contributor Author

it's pip launched on a WinPython terminal, via launcher icons.

@pfmoore
Copy link
Member

pfmoore commented Feb 13, 2023

If you can’t reproduce in a standard terminal, I’d suspect WinPython, then.

@uranusjr
Copy link
Member

I wonder if it would make sense to provide a way to configure Rich to use errors="replace" (and maybe other modes). For this case specifically correctness isn’t that necessary and pip could simply skip the unencodable characters, so such flags could be useful, but this may not be the universal preference.

@pradyunsg pradyunsg added kind: crash For situations where pip crashes C: output Related to what pip prints and removed type: bug A confirmed bug or unintended behavior S: needs triage Issues/PRs that need to be triaged labels Feb 18, 2023
@gdlmx
Copy link

gdlmx commented Mar 18, 2023

I can reproduce this issue on the official python (from MS Store) in Windows 11 and use pdb to locate the problem.
In function pip._vender.rich.console.Console._check_buffer(), it tries to detect

  • whether it is "legacy windows"
  • whether the output stream is stdout.

If both are true, it will call legacy_windows_render(). This function fails to detect the correct encoding and raises error on any Unicode character.

In the code section just below this line, the UnicodeEncodeError is caught and handled with a user friendly message:

      if use_legacy_windows_render:
          ...
          legacy_windows_render(buffer, LegacyWindowsTerm(self.file))
      else:
          # Either a non-std stream on legacy Windows, or modern Windows.
          text = self._render_buffer(self._buffer[:])
          # https://bugs.python.org/issue37871
          write = self.file.write
          for line in text.splitlines(True):
              try:
                  write(line)
              except UnicodeEncodeError as error:
                  error.reason = f"{error.reason}\n*** You may need to add PYTHONIOENCODING=utf-8 to your environment ***"
                  raise

It would be nice to use the same error message here to ask the user to set PYTHONIOENCODING=utf-8.

This bug has already be reported upstream in rich/issures/212

@gdlmx
Copy link

gdlmx commented Mar 18, 2023

The root problem is that Python uses _io.FileIO wrapper for I/O streams (stdout) that are piped to a file or other commands. By default, FileIO uses an "ASCII compatible" encoding (defined by the System's ANSI codepage). When stdour is connected to a console, the new wrapper _io.WindowsConsoleIO is used with utf8 as default. Please read PEP 528 for explanation.

Testing in Powershell 5.1 of Windows 11:

PS C:\Users> python -c "import sys;print(sys.stdout.encoding,type(sys.stdout.buffer.raw))"
# Output: utf-8 <class '_io._WindowsConsoleIO'>

# Redirect stdout to file or another cmdlet
PS C:\Users> python -c "import sys;print(sys.stdout.encoding,type(sys.stdout.buffer.raw))"  | %{echo $_}
# Output: cp1252 <class '_io.FileIO'>

# Assign to PS variable
PS C:\Users> $msg = python -c "import sys;print(sys.stdout.encoding)" ; echo $msg
# Output: cp1252

Testing in CMD of Windows 11:

C:\Users>  python3.10 -c "import sys;print(sys.stdout.encoding)" | findstr /R /C:".*"
cp1252

The most innocent pip command will fail on Unicode-path due to this bug:

(new_venv) PS D:\tmp\ひらがな> $pipinfo = pip show pip
--- Logging error ---
Traceback (most recent call last):
  ...
  File "D:\tmp\ひらがな\new_venv\lib\site-packages\pip\_vendor\rich\console.py", line 1999, in _check_buffer
    legacy_windows_render(

You will notice that setting PYTHONIOENCODING="utf8" is not enough:

(new_venv) PS D:\tmp\ひらがな> $env:PYTHONIOENCODING="utf8"
(new_venv) PS D:\tmp\ひらがな> echo (pip show pip)
...
Location: d:\tmp\ひらがな\new_venv\lib\site-packages

That's because windows console use UTF-16-LE by default.

The tested python distribution comes from MS APP Store, although I don't think it matters.

The python team has known this issue for a long time and introduced the "UTF-8" mode in PEP 540 in 2016. However this mode is not and will not be default in near future because of those reasons mentioned in the PEP. Thus, it is the users' responsibility to set the correct encoding and the libraries' responsibility to remind the users to do so.

So, it would be enough for pip to generate a warning/error message asking the user to set PYTHONIOENCODING=utf16 or use -X utf8=1 or set PYTHONUTF8=1 to enforce UTF8 mode.

@stonebig
Copy link
Contributor Author

stonebig commented Mar 10, 2024

I was hit per about-time package recently, rich package is not the only problem around.
So, this method seems to survive all situation:

  • launch a subprocess with:
    • set pythonutf8=1
    • python -X utf8 -m pip inspect
import sys
import subprocess
import json

# choose you python executable
path = sys.prefix
python = sys.executable

my_wish = f'set pythonutf8=1 & "{python}"  -X utf8=1 -m pip inspect'
process = subprocess.Popen(
        my_wish ,
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
        cwd=path,
        shell=True
    )
pip_json = json.loads(process.stdout.read())

@stonebig stonebig changed the title 'pip inspect' crashes with python-3.10.10 and pip-23.0.0 on Windows 11 (a rich problem ?) 'pip inspect' crashes with python-3.10.10 and pip-23.0.0 on Windows 11 (an utf-8 problem) Mar 10, 2024
@stonebig
Copy link
Contributor Author

stonebig commented May 1, 2024

I understood how to use importlib.metadata, so pip inspect command line and Windows utf-8 tricks are no more a problem

@stonebig stonebig closed this as completed May 1, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C: output Related to what pip prints kind: crash For situations where pip crashes
Projects
None yet
Development

No branches or pull requests

6 participants