-
-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.pth files cannot contain folders with utf-8 names #77102
Comments
Add "G:\русский язык" to a pth file and start python. it fails with -------------- Failed to import the site module
Traceback (most recent call last):
File "C:\Program Files\ROXAR\RMS dev_release\windows-amd64-vc_14_0-release\bin\lib\site.py", line 546, in <module>
main()
File "C:\Program Files\ROXAR\RMS dev_release\windows-amd64-vc_14_0-release\bin\lib\site.py", line 532, in main
known_paths = addusersitepackages(known_paths)
File "C:\Program Files\ROXAR\RMS dev_release\windows-amd64-vc_14_0-release\bin\lib\site.py", line 287, in addusersitepackages
addsitedir(user_site, known_paths)
File "C:\Program Files\ROXAR\RMS dev_release\windows-amd64-vc_14_0-release\bin\lib\site.py", line 209, in addsitedir
addpackage(sitedir, name, known_paths)
File "C:\Program Files\ROXAR\RMS dev_release\windows-amd64-vc_14_0-release\bin\lib\site.py", line 165, in addpackage
for n, line in enumerate(f):
File "C:\Program Files\ROXAR\RMS dev_release\windows-amd64-vc_14_0-release\bin\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 8: character maps to <undefined> This might very well have sideeffects, but adding "encoding='utf-8'" to the open() call in site.py def addpackage seems to fix the issue for me |
Yes, it'll have significant side effects. The default file encoding on Windows is your configured code page (1252, in your case), and there's no good way around that default. The easiest immediate fix is to re-encode that file yourself. Perhaps what we could do instead is allow the first line of a .pth file to be a coding comment? Then site.py can reopen the file with the specified encoding. (FWIW, when I added the ._pth file, I explicitly made it UTF-8. But it had no history at that time so it was safe to do so.) |
This seems to be related to the problem in pypa/setuptools#3937. I wonder what is the position of the core maintainers about it. Is this a valid bug, or is it considered a Considering the Python uses UTF-8 by default for source files, and that One important point is that most of the times the While the packaging system could use locale.getpreferredencoding() to create the @zooba do you have any advice on how to handle this? Footnotes
|
It's valid, but unfortunately not fixable outside of changing the default encoding for the entire process.
Being able to include code in the files is the weird bit, and is likely to be removed well before the default encoding is changed. If it were meant to be a source file, it'd have a
Yep, which is why users have to opt-in to these options, and if it breaks then they need to opt-out or fix it themselves. Other tools don't need to be concerned about this scenario - just use the default encoding.
Use a regular Unfortunately, it seems it currently requests locale encoding explicitly to read, when it ought to be using the default. If it were the default, everyone would get onto UTF-8 when that default changes, but as it stands it'll be stuck on locale encoding forever. So I guess the advice is to explicitly use locale encoding until we fix that. |
Thank you very much @zooba, I am working on pypa/setuptools#4009 trying to follow this recommendation. |
Sorry for delay. I believe pth file encoding should follow utf-8 mode. When I implemented 4827483, I used "locale" encoding because setuptools used locale encoding. How do you think this, @abravalheri ?:
|
Can we try to decode from UTF-8, and fallback on locale on decode error? |
Yes. pth files are small for most cases. |
I think this is a good idea, for the following reason: the creation of the
Assuming that we are talking about editable installations1, my comments on the suggestions are the following:
Footnotes
|
UTF-8 mode is a runtime option rather than a build option, so I don't know that you'd want to use it to create a persistent config file like this. It's not really reasonable to expect UTF-8 mode to be consistently on between install processes and application runtime. You're probably just as well defaulting to UTF-8 all the time, and offer an obscure environment variable setting to switch back to Maybe on 3.12 and earlier you could even try encoding to ASCII and print a warning that Python may not be able to load it? The sooner we make it start reading UTF-8 and then fall back to locale, the better. |
It's worth noting that tools which use the From the perspective of a tool (or user) writing a From the perspective of the core reading I don't think we should bring UTF-8 mode into it, as that's a runtime choice (and |
Locale encoding is also a runtime configuration. When you switch languages on Windows or change environment variables on Linux, it changes. UTF-8 Mode helps ensure that the virtual environment is not destroyed by changes in language or environment variables. When I proposed PEP 597, I did not have plans to make UTF-8 Mode the default. site.py supported UTF-8 Mode up to Python 3.9, but when we implemented PEP 597, I stopped supporting it for eliminating EncodingWarning. I choose locale encoding because setuptools used it. However, I realized that the feasibility of a migration process using EncodingWarnings is very low. I proposed PEP 686, which is a migration process using the UTF-8 Mode, and it was approved. In UTF-8 Mode, we should use UTF-8 in all places unless there is a special reason not to. So I think supporting UTF-8 mode again in 3.11/3.12 is good idea. It is compatible to 3.9. It provides reason to opt-in UTF-8 mode before Python 3.15. Another way to support UTF-8 pth file is #117802 . It tries UTF-8 and fallbacks to locale encoding. I'm OK for it too. Anyway, Python is a glue language, and interoperability with other languages is important. I don't want to force the uv developers to use a huge number of codecs that works 100% same to Python codecs to handle locale encoding. I want to provide a way to use UTF-8 in not only Python 3.13 but also Python 3.11 and 3.12 as well. |
Maybe, #117802 would be better idea for now. setuptools uses locale encoding and uv uses utf-8. |
(possibly dumb question) uv doesn't provide a build backend, so when is it writing pth files? edit: I guess uv venv writes to |
The only difficulty from There are other problem cases:
For (1), we need to document the rules properly, and then it's up to the user to follow them - if the user wants to add a site-packages directory (via For (2), things are similar, as the same package could be installed in different Python versions. I doubt a project is going to want to publish different wheels for different Python versions, just to handle Footnotes
|
You are right. I thought pth files are created by
Poetry uses
How about this plan?
|
LGTM. Would the check |
Yes. UTF-8 support will be backported to 3.11 and 3.12 only for tools already produce UTF-8 |
…g when reading .pth file (pythonGH-117802) (cherry picked from commit 6dc661b) Co-authored-by: Inada Naoki <[email protected]>
3.11 is security fix only mode. I backported this only to 3.12. |
Thanks @methane, using UTF-8 or falling back on locale encoding is a nice tradeoff. |
…reading .pth file (python#117802)
GH-119503 made a small tweak to ignore UTF-8 BOMs when reading .pth files (this aligns with the way source files are decoded) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
The text was updated successfully, but these errors were encountered: