-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LookupError: unknown encoding: utf16-le #6054
Comments
Still happens on 19.x. |
It looks like that code was added in PR #3485. @xavfernandez, can you take a look? |
It looks like the fix might be as simple as changing There should be a regression test to iterate over the |
Indeed, |
I'll submit a PR with the fix and regression test. |
The table of aliases here would seem to confirm that |
utils.encoding.auto_decode() was broken when decoding Big Endian BOM byte-strings on Little Endian or vice versa. The TestEncoding.test_auto_decode_utf_16_le test was failing on Big Endian systems, such as Fedora's s390x builders. A similar test, but with BE BOM test_auto_decode_utf_16_be was added in order to reproduce this on a Little Endian system (which is much easier to come by). A regression test was added to check that all listed encodings in utils.encoding.BOMS are valid. Fixes pypa#6054
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Environment
This is a bug that manifests itself on a Big Endian architecture, when the tests are run.
However it can be examined on Little Endian as well.
Description
This is the test failure on s390x:
Expected behavior
The tests should pass on all architectures alike.
How to Reproduce
More info
I've checked and pip has:
pip/src/pip/_internal/utils/encoding.py
Lines 6 to 14 in e5ab7f6
And:
pip/src/pip/_internal/utils/encoding.py
Lines 23 to 25 in e5ab7f6
So this has 2 problems:
I have a small reproducer here (run on my machine, x86_64):
This is the output on s390x:
Clearly we see that
utf16-be
,utf16-le
,utf32-be
andutf32-le
encoding are not even possible to use.Is that expected? The code should not reach those anyway?
The testing bytestring is:
b'\xff\xfeD\x00j\x00a\x00n\x00g\x00o\x00=\x00=\x001\x00.\x004\x00.\x002\x00'
It starts with
\xff\xfe
and hence should be decoded by first encoding that has this bom. On little endian, that isutf16
: Everything works, we haven't reached the nonexisiting encodings.However on big endian system, the
utf16
bom is big endian and hence the first item with the\xff\xfe
bom isutf16-le
- it blows up.To reproduce this problem on little endian architectures, add a
test_auto_decode_utf16_be
tests with:The text was updated successfully, but these errors were encountered: