-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can not parse utf-8 strings #218
Comments
Can you provide a way to reproduce your problem, especially an input file that triggers the error + a yamllint version? |
test.yaml
|
PYTHONIOENCODING=UTF-8 can fix this for stdin
But file inupt still wrong.
|
On Linux, your example file works perfectly. It looks like Windows default encoding is not Unicode. yamllint uses PyYAML to parse YAML, could you try the following command, to see if PyYAML is able to load the file?
|
|
Might be related to yaml/pyyaml#123 (comment). Probably the following would work.
|
I confirmed @rhysd 's code work. |
- YAMLを標準入力から読ませることでWindowsに対応したyamllintフックを作成 - cf. adrienverge/yamllint#218
I'm doing some issue gardening 🌱🌿 🌷 and came upon this issue. Since it's quite old I just wanted to ask if this is still relevant? If it isn't, maybe we can close this issue? By closing some old issues we reduce the list of open issues to a more manageable set. |
Before this change, yamllint would open YAML files using open()’s default encoding. As long as UTF-8 mode isn’t enabled, open() defaults to using the system’s locale encoding [1][2]. Most of the time, the locale encoding on Linux systems is UTF-8 [3][4], but it doesn’t have to be [5]. Additionally, the locale encoding on Windows systems is the system’s ANSI code page [6]. As a result, you would have to either enable UTF-8 mode, give Python a custom manifest or enable a beta feature in Windows settings in order to lint UTF-8 YAML files on Windows [2][7]. Finally, using open()’s default encoding is a violation of the YAML spec. Chapter 5.2 says: “On input, a YAML processor must support the UTF-8 and UTF-16 character encodings. For JSON compatibility, the UTF-32 encodings must also be supported. If a character stream begins with a byte order mark, the character encoding will be taken to be as indicated by the byte order mark. Otherwise, the stream must begin with an ASCII character. This allows the encoding to be deduced by the pattern of null (x00) characters.” [8] This change fixes all of those problems by implementing the YAML spec’s character encoding detection algorithm. Now, as long as YAML files begins with either a byte order mark or an ASCII character, yamllint will automatically detect them as being UTF-8, UTF-16 or UTF-32. Other character encodings are not supported at the moment. Fixes adrienverge#218. Fixes adrienverge#238. Fixes adrienverge#347. [1]: <https://docs.python.org/3.12/library/functions.html#open> [2]: <https://docs.python.org/3.12/library/os.html#utf8-mode> [3]: <https://sourceware.org/glibc/manual/html_node/Extended-Char-Intro.html> [4]: <https://wiki.musl-libc.org/functional-differences-from-glibc.html#Character-sets-and-locale> [5]: <https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/SUPPORTED;h=c8b63cc2fe2b4547f2fb1bff6193da68d70bd563;hb=36f2487f13e3540be9ee0fb51876b1da72176d3f> [6]: <https://docs.python.org/3.12/glossary.html#term-locale-encoding> [7]: <https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page> [8]: <https://yaml.org/spec/1.2.2/#52-character-encodings>
Before this change, yamllint would open YAML files using open()’s default encoding. As long as UTF-8 mode isn’t enabled, open() defaults to using the system’s locale encoding [1][2]. Most of the time, the locale encoding on Linux systems is UTF-8 [3][4], but it doesn’t have to be [5]. Additionally, the locale encoding on Windows systems is the system’s ANSI code page [6]. As a result, you would have to either enable UTF-8 mode, give Python a custom manifest or enable a beta feature in Windows settings in order to lint UTF-8 YAML files on Windows [2][7]. Finally, using open()’s default encoding is a violation of the YAML spec. Chapter 5.2 says: “On input, a YAML processor must support the UTF-8 and UTF-16 character encodings. For JSON compatibility, the UTF-32 encodings must also be supported. If a character stream begins with a byte order mark, the character encoding will be taken to be as indicated by the byte order mark. Otherwise, the stream must begin with an ASCII character. This allows the encoding to be deduced by the pattern of null (x00) characters.” [8] This change fixes all of those problems by implementing the YAML spec’s character encoding detection algorithm. Now, as long as YAML files begins with either a byte order mark or an ASCII character, yamllint will automatically detect them as being UTF-8, UTF-16 or UTF-32. Other character encodings are not supported at the moment. Fixes adrienverge#218. Fixes adrienverge#238. Fixes adrienverge#347. [1]: <https://docs.python.org/3.12/library/functions.html#open> [2]: <https://docs.python.org/3.12/library/os.html#utf8-mode> [3]: <https://sourceware.org/glibc/manual/html_node/Extended-Char-Intro.html> [4]: <https://wiki.musl-libc.org/functional-differences-from-glibc.html#Character-sets-and-locale> [5]: <https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/SUPPORTED;h=c8b63cc2fe2b4547f2fb1bff6193da68d70bd563;hb=36f2487f13e3540be9ee0fb51876b1da72176d3f> [6]: <https://docs.python.org/3.12/glossary.html#term-locale-encoding> [7]: <https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page> [8]: <https://yaml.org/spec/1.2.2/#52-character-encodings>
Before this change, yamllint would open YAML files using open()’s default encoding. As long as UTF-8 mode isn’t enabled, open() defaults to using the system’s locale encoding [1][2]. Most of the time, the locale encoding on Linux systems is UTF-8 [3][4], but it doesn’t have to be [5]. Additionally, the locale encoding on Windows systems is the system’s ANSI code page [6]. As a result, you would have to either enable UTF-8 mode, give Python a custom manifest or enable a beta feature in Windows settings in order to lint UTF-8 YAML files on Windows [2][7]. Finally, using open()’s default encoding is a violation of the YAML spec. Chapter 5.2 says: “On input, a YAML processor must support the UTF-8 and UTF-16 character encodings. For JSON compatibility, the UTF-32 encodings must also be supported. If a character stream begins with a byte order mark, the character encoding will be taken to be as indicated by the byte order mark. Otherwise, the stream must begin with an ASCII character. This allows the encoding to be deduced by the pattern of null (x00) characters.” [8] This change fixes all of those problems by implementing the YAML spec’s character encoding detection algorithm. Now, as long as YAML files begins with either a byte order mark or an ASCII character, yamllint will automatically detect them as being UTF-8, UTF-16 or UTF-32. Other character encodings are not supported at the moment. Fixes adrienverge#218. Fixes adrienverge#238. Fixes adrienverge#347. [1]: <https://docs.python.org/3.12/library/functions.html#open> [2]: <https://docs.python.org/3.12/library/os.html#utf8-mode> [3]: <https://sourceware.org/glibc/manual/html_node/Extended-Char-Intro.html> [4]: <https://wiki.musl-libc.org/functional-differences-from-glibc.html#Character-sets-and-locale> [5]: <https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/SUPPORTED;h=c8b63cc2fe2b4547f2fb1bff6193da68d70bd563;hb=36f2487f13e3540be9ee0fb51876b1da72176d3f> [6]: <https://docs.python.org/3.12/glossary.html#term-locale-encoding> [7]: <https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page> [8]: <https://yaml.org/spec/1.2.2/#52-character-encodings>
Before this change, yamllint would open YAML files using open()’s default encoding. As long as UTF-8 mode isn’t enabled, open() defaults to using the system’s locale encoding [1][2]. Most of the time, the locale encoding on Linux systems is UTF-8 [3][4], but it doesn’t have to be [5]. Additionally, the locale encoding on Windows systems is the system’s ANSI code page [6]. As a result, you would have to either enable UTF-8 mode, give Python a custom manifest or enable a beta feature in Windows settings in order to lint UTF-8 YAML files on Windows [2][7]. Finally, using open()’s default encoding is a violation of the YAML spec. Chapter 5.2 says: “On input, a YAML processor must support the UTF-8 and UTF-16 character encodings. For JSON compatibility, the UTF-32 encodings must also be supported. If a character stream begins with a byte order mark, the character encoding will be taken to be as indicated by the byte order mark. Otherwise, the stream must begin with an ASCII character. This allows the encoding to be deduced by the pattern of null (x00) characters.” [8] This change fixes all of those problems by implementing the YAML spec’s character encoding detection algorithm. Now, as long as YAML files begins with either a byte order mark or an ASCII character, yamllint will automatically detect them as being UTF-8, UTF-16 or UTF-32. Other character encodings are not supported at the moment. Fixes adrienverge#218. Fixes adrienverge#238. Fixes adrienverge#347. [1]: <https://docs.python.org/3.12/library/functions.html#open> [2]: <https://docs.python.org/3.12/library/os.html#utf8-mode> [3]: <https://sourceware.org/glibc/manual/html_node/Extended-Char-Intro.html> [4]: <https://wiki.musl-libc.org/functional-differences-from-glibc.html#Character-sets-and-locale> [5]: <https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/SUPPORTED;h=c8b63cc2fe2b4547f2fb1bff6193da68d70bd563;hb=36f2487f13e3540be9ee0fb51876b1da72176d3f> [6]: <https://docs.python.org/3.12/glossary.html#term-locale-encoding> [7]: <https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page> [8]: <https://yaml.org/spec/1.2.2/#52-character-encodings>
Before this change, yamllint would open YAML files using open()’s default encoding. As long as UTF-8 mode isn’t enabled, open() defaults to using the system’s locale encoding [1][2]. Most of the time, the locale encoding on Linux systems is UTF-8 [3][4], but it doesn’t have to be [5]. Additionally, the locale encoding on Windows systems is the system’s ANSI code page [6]. As a result, you would have to either enable UTF-8 mode, give Python a custom manifest or enable a beta feature in Windows settings in order to lint UTF-8 YAML files on Windows [2][7]. Finally, using open()’s default encoding is a violation of the YAML spec. Chapter 5.2 says: “On input, a YAML processor must support the UTF-8 and UTF-16 character encodings. For JSON compatibility, the UTF-32 encodings must also be supported. If a character stream begins with a byte order mark, the character encoding will be taken to be as indicated by the byte order mark. Otherwise, the stream must begin with an ASCII character. This allows the encoding to be deduced by the pattern of null (x00) characters.” [8] This change fixes all of those problems by implementing the YAML spec’s character encoding detection algorithm. Now, as long as YAML files begins with either a byte order mark or an ASCII character, yamllint will automatically detect them as being UTF-8, UTF-16 or UTF-32. Other character encodings are not supported at the moment. Fixes adrienverge#218. Fixes adrienverge#238. Fixes adrienverge#347. [1]: <https://docs.python.org/3.12/library/functions.html#open> [2]: <https://docs.python.org/3.12/library/os.html#utf8-mode> [3]: <https://sourceware.org/glibc/manual/html_node/Extended-Char-Intro.html> [4]: <https://wiki.musl-libc.org/functional-differences-from-glibc.html#Character-sets-and-locale> [5]: <https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/SUPPORTED;h=c8b63cc2fe2b4547f2fb1bff6193da68d70bd563;hb=36f2487f13e3540be9ee0fb51876b1da72176d3f> [6]: <https://docs.python.org/3.12/glossary.html#term-locale-encoding> [7]: <https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page> [8]: <https://yaml.org/spec/1.2.2/#52-character-encodings>
Before this change, yamllint would open YAML files using open()’s default encoding. As long as UTF-8 mode isn’t enabled, open() defaults to using the system’s locale encoding [1][2]. Most of the time, the locale encoding on Linux systems is UTF-8 [3][4], but it doesn’t have to be [5]. Additionally, the locale encoding on Windows systems is the system’s ANSI code page [6]. As a result, you would have to either enable UTF-8 mode, give Python a custom manifest or enable a beta feature in Windows settings in order to lint UTF-8 YAML files on Windows [2][7]. Finally, using open()’s default encoding is a violation of the YAML spec. Chapter 5.2 says: “On input, a YAML processor must support the UTF-8 and UTF-16 character encodings. For JSON compatibility, the UTF-32 encodings must also be supported. If a character stream begins with a byte order mark, the character encoding will be taken to be as indicated by the byte order mark. Otherwise, the stream must begin with an ASCII character. This allows the encoding to be deduced by the pattern of null (x00) characters.” [8] This change fixes all of those problems by implementing the YAML spec’s character encoding detection algorithm. Now, as long as YAML files begins with either a byte order mark or an ASCII character, yamllint will automatically detect them as being UTF-8, UTF-16 or UTF-32. Other character encodings are not supported at the moment. Fixes adrienverge#218. Fixes adrienverge#238. Fixes adrienverge#347. [1]: <https://docs.python.org/3.12/library/functions.html#open> [2]: <https://docs.python.org/3.12/library/os.html#utf8-mode> [3]: <https://sourceware.org/glibc/manual/html_node/Extended-Char-Intro.html> [4]: <https://wiki.musl-libc.org/functional-differences-from-glibc.html#Character-sets-and-locale> [5]: <https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/SUPPORTED;h=c8b63cc2fe2b4547f2fb1bff6193da68d70bd563;hb=36f2487f13e3540be9ee0fb51876b1da72176d3f> [6]: <https://docs.python.org/3.12/glossary.html#term-locale-encoding> [7]: <https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page> [8]: <https://yaml.org/spec/1.2.2/#52-character-encodings>
Before this change, yamllint would open YAML files using open()’s default encoding. As long as UTF-8 mode isn’t enabled, open() defaults to using the system’s locale encoding [1][2]. Most of the time, the locale encoding on Linux systems is UTF-8 [3][4], but it doesn’t have to be [5]. Additionally, the locale encoding on Windows systems is the system’s ANSI code page [6]. As a result, you would have to either enable UTF-8 mode, give Python a custom manifest or enable a beta feature in Windows settings in order to lint UTF-8 YAML files on Windows [2][7]. Finally, using open()’s default encoding is a violation of the YAML spec. Chapter 5.2 says: “On input, a YAML processor must support the UTF-8 and UTF-16 character encodings. For JSON compatibility, the UTF-32 encodings must also be supported. If a character stream begins with a byte order mark, the character encoding will be taken to be as indicated by the byte order mark. Otherwise, the stream must begin with an ASCII character. This allows the encoding to be deduced by the pattern of null (x00) characters.” [8] This change fixes all of those problems by implementing the YAML spec’s character encoding detection algorithm. Now, as long as YAML files begins with either a byte order mark or an ASCII character, yamllint will automatically detect them as being UTF-8, UTF-16 or UTF-32. Other character encodings are not supported at the moment. Fixes adrienverge#218. Fixes adrienverge#238. Fixes adrienverge#347. [1]: <https://docs.python.org/3.12/library/functions.html#open> [2]: <https://docs.python.org/3.12/library/os.html#utf8-mode> [3]: <https://sourceware.org/glibc/manual/html_node/Extended-Char-Intro.html> [4]: <https://wiki.musl-libc.org/functional-differences-from-glibc.html#Character-sets-and-locale> [5]: <https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/SUPPORTED;h=c8b63cc2fe2b4547f2fb1bff6193da68d70bd563;hb=36f2487f13e3540be9ee0fb51876b1da72176d3f> [6]: <https://docs.python.org/3.12/glossary.html#term-locale-encoding> [7]: <https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page> [8]: <https://yaml.org/spec/1.2.2/#52-character-encodings>
Before this change, yamllint would open YAML files using open()’s default encoding. As long as UTF-8 mode isn’t enabled, open() defaults to using the system’s locale encoding [1][2]. Most of the time, the locale encoding on Linux systems is UTF-8 [3][4], but it doesn’t have to be [5]. Additionally, the locale encoding on Windows systems is the system’s ANSI code page [6]. As a result, you would have to either enable UTF-8 mode, give Python a custom manifest or enable a beta feature in Windows settings in order to lint UTF-8 YAML files on Windows [2][7]. Finally, using open()’s default encoding is a violation of the YAML spec. Chapter 5.2 says: “On input, a YAML processor must support the UTF-8 and UTF-16 character encodings. For JSON compatibility, the UTF-32 encodings must also be supported. If a character stream begins with a byte order mark, the character encoding will be taken to be as indicated by the byte order mark. Otherwise, the stream must begin with an ASCII character. This allows the encoding to be deduced by the pattern of null (x00) characters.” [8] This change fixes all of those problems by implementing the YAML spec’s character encoding detection algorithm. Now, as long as YAML files begins with either a byte order mark or an ASCII character, yamllint will automatically detect them as being UTF-8, UTF-16 or UTF-32. Other character encodings are not supported at the moment. Fixes adrienverge#218. Fixes adrienverge#238. Fixes adrienverge#347. [1]: <https://docs.python.org/3.12/library/functions.html#open> [2]: <https://docs.python.org/3.12/library/os.html#utf8-mode> [3]: <https://sourceware.org/glibc/manual/html_node/Extended-Char-Intro.html> [4]: <https://wiki.musl-libc.org/functional-differences-from-glibc.html#Character-sets-and-locale> [5]: <https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/SUPPORTED;h=c8b63cc2fe2b4547f2fb1bff6193da68d70bd563;hb=36f2487f13e3540be9ee0fb51876b1da72176d3f> [6]: <https://docs.python.org/3.12/glossary.html#term-locale-encoding> [7]: <https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page> [8]: <https://yaml.org/spec/1.2.2/#52-character-encodings>
Before this change, yamllint would open YAML files using open()’s default encoding. As long as UTF-8 mode isn’t enabled, open() defaults to using the system’s locale encoding [1][2]. Most of the time, the locale encoding on Linux systems is UTF-8 [3][4], but it doesn’t have to be [5]. Additionally, the locale encoding on Windows systems is the system’s ANSI code page [6]. As a result, you would have to either enable UTF-8 mode, give Python a custom manifest or enable a beta feature in Windows settings in order to lint UTF-8 YAML files on Windows [2][7]. Finally, using open()’s default encoding is a violation of the YAML spec. Chapter 5.2 says: “On input, a YAML processor must support the UTF-8 and UTF-16 character encodings. For JSON compatibility, the UTF-32 encodings must also be supported. If a character stream begins with a byte order mark, the character encoding will be taken to be as indicated by the byte order mark. Otherwise, the stream must begin with an ASCII character. This allows the encoding to be deduced by the pattern of null (x00) characters.” [8] This change fixes all of those problems by implementing the YAML spec’s character encoding detection algorithm. Now, as long as YAML files begins with either a byte order mark or an ASCII character, yamllint will automatically detect them as being UTF-8, UTF-16 or UTF-32. Other character encodings are not supported at the moment. Fixes adrienverge#218. Fixes adrienverge#238. Fixes adrienverge#347. [1]: <https://docs.python.org/3.12/library/functions.html#open> [2]: <https://docs.python.org/3.12/library/os.html#utf8-mode> [3]: <https://sourceware.org/glibc/manual/html_node/Extended-Char-Intro.html> [4]: <https://wiki.musl-libc.org/functional-differences-from-glibc.html#Character-sets-and-locale> [5]: <https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/SUPPORTED;h=c8b63cc2fe2b4547f2fb1bff6193da68d70bd563;hb=36f2487f13e3540be9ee0fb51876b1da72176d3f> [6]: <https://docs.python.org/3.12/glossary.html#term-locale-encoding> [7]: <https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page> [8]: <https://yaml.org/spec/1.2.2/#52-character-encodings>
Before this change, yamllint would open YAML files using open()’s default encoding. As long as UTF-8 mode isn’t enabled, open() defaults to using the system’s locale encoding [1][2]. Most of the time, the locale encoding on Linux systems is UTF-8 [3][4], but it doesn’t have to be [5]. Additionally, the locale encoding on Windows systems is the system’s ANSI code page [6]. As a result, you would have to either enable UTF-8 mode, give Python a custom manifest or enable a beta feature in Windows settings in order to lint UTF-8 YAML files on Windows [2][7]. Finally, using open()’s default encoding is a violation of the YAML spec. Chapter 5.2 says: “On input, a YAML processor must support the UTF-8 and UTF-16 character encodings. For JSON compatibility, the UTF-32 encodings must also be supported. If a character stream begins with a byte order mark, the character encoding will be taken to be as indicated by the byte order mark. Otherwise, the stream must begin with an ASCII character. This allows the encoding to be deduced by the pattern of null (x00) characters.” [8] This change fixes all of those problems by implementing the YAML spec’s character encoding detection algorithm. Now, as long as YAML files begins with either a byte order mark or an ASCII character, yamllint will automatically detect them as being UTF-8, UTF-16 or UTF-32. Other character encodings are not supported at the moment. Fixes adrienverge#218. Fixes adrienverge#238. Fixes adrienverge#347. [1]: <https://docs.python.org/3.12/library/functions.html#open> [2]: <https://docs.python.org/3.12/library/os.html#utf8-mode> [3]: <https://sourceware.org/glibc/manual/html_node/Extended-Char-Intro.html> [4]: <https://wiki.musl-libc.org/functional-differences-from-glibc.html#Character-sets-and-locale> [5]: <https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/SUPPORTED;h=c8b63cc2fe2b4547f2fb1bff6193da68d70bd563;hb=36f2487f13e3540be9ee0fb51876b1da72176d3f> [6]: <https://docs.python.org/3.12/glossary.html#term-locale-encoding> [7]: <https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page> [8]: <https://yaml.org/spec/1.2.2/#52-character-encodings>
Before this change, yamllint would open YAML files using open()’s default encoding. As long as UTF-8 mode isn’t enabled, open() defaults to using the system’s locale encoding [1][2]. Most of the time, the locale encoding on Linux systems is UTF-8 [3][4], but it doesn’t have to be [5]. Additionally, the locale encoding on Windows systems is the system’s ANSI code page [6]. As a result, you would have to either enable UTF-8 mode, give Python a custom manifest or enable a beta feature in Windows settings in order to lint UTF-8 YAML files on Windows [2][7]. Finally, using open()’s default encoding is a violation of the YAML spec. Chapter 5.2 says: “On input, a YAML processor must support the UTF-8 and UTF-16 character encodings. For JSON compatibility, the UTF-32 encodings must also be supported. If a character stream begins with a byte order mark, the character encoding will be taken to be as indicated by the byte order mark. Otherwise, the stream must begin with an ASCII character. This allows the encoding to be deduced by the pattern of null (x00) characters.” [8] This change fixes all of those problems by implementing the YAML spec’s character encoding detection algorithm. Now, as long as YAML files begins with either a byte order mark or an ASCII character, yamllint will automatically detect them as being UTF-8, UTF-16 or UTF-32. Other character encodings are not supported at the moment. Fixes adrienverge#218. Fixes adrienverge#238. Fixes adrienverge#347. [1]: <https://docs.python.org/3.12/library/functions.html#open> [2]: <https://docs.python.org/3.12/library/os.html#utf8-mode> [3]: <https://sourceware.org/glibc/manual/html_node/Extended-Char-Intro.html> [4]: <https://wiki.musl-libc.org/functional-differences-from-glibc.html#Character-sets-and-locale> [5]: <https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/SUPPORTED;h=c8b63cc2fe2b4547f2fb1bff6193da68d70bd563;hb=36f2487f13e3540be9ee0fb51876b1da72176d3f> [6]: <https://docs.python.org/3.12/glossary.html#term-locale-encoding> [7]: <https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page> [8]: <https://yaml.org/spec/1.2.2/#52-character-encodings>
Before this change, yamllint would open YAML files using open()’s default encoding. As long as UTF-8 mode isn’t enabled, open() defaults to using the system’s locale encoding [1][2]. Most of the time, the locale encoding on Linux systems is UTF-8 [3][4], but it doesn’t have to be [5]. Additionally, the locale encoding on Windows systems is the system’s ANSI code page [6]. As a result, you would have to either enable UTF-8 mode, give Python a custom manifest or enable a beta feature in Windows settings in order to lint UTF-8 YAML files on Windows [2][7]. Finally, using open()’s default encoding is a violation of the YAML spec. Chapter 5.2 says: “On input, a YAML processor must support the UTF-8 and UTF-16 character encodings. For JSON compatibility, the UTF-32 encodings must also be supported. If a character stream begins with a byte order mark, the character encoding will be taken to be as indicated by the byte order mark. Otherwise, the stream must begin with an ASCII character. This allows the encoding to be deduced by the pattern of null (x00) characters.” [8] This change fixes all of those problems by implementing the YAML spec’s character encoding detection algorithm. Now, as long as YAML files begins with either a byte order mark or an ASCII character, yamllint will automatically detect them as being UTF-8, UTF-16 or UTF-32. Other character encodings are not supported at the moment. Fixes adrienverge#218. Fixes adrienverge#238. Fixes adrienverge#347. [1]: <https://docs.python.org/3.12/library/functions.html#open> [2]: <https://docs.python.org/3.12/library/os.html#utf8-mode> [3]: <https://sourceware.org/glibc/manual/html_node/Extended-Char-Intro.html> [4]: <https://wiki.musl-libc.org/functional-differences-from-glibc.html#Character-sets-and-locale> [5]: <https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/SUPPORTED;h=c8b63cc2fe2b4547f2fb1bff6193da68d70bd563;hb=36f2487f13e3540be9ee0fb51876b1da72176d3f> [6]: <https://docs.python.org/3.12/glossary.html#term-locale-encoding> [7]: <https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page> [8]: <https://yaml.org/spec/1.2.2/#52-character-encodings>
Before this change, yamllint would open YAML files using open()’s default encoding. As long as UTF-8 mode isn’t enabled, open() defaults to using the system’s locale encoding [1][2]. Most of the time, the locale encoding on Linux systems is UTF-8 [3][4], but it doesn’t have to be [5]. Additionally, the locale encoding on Windows systems is the system’s ANSI code page [6]. As a result, you would have to either enable UTF-8 mode, give Python a custom manifest or enable a beta feature in Windows settings in order to lint UTF-8 YAML files on Windows [2][7]. Finally, using open()’s default encoding is a violation of the YAML spec. Chapter 5.2 says: “On input, a YAML processor must support the UTF-8 and UTF-16 character encodings. For JSON compatibility, the UTF-32 encodings must also be supported. If a character stream begins with a byte order mark, the character encoding will be taken to be as indicated by the byte order mark. Otherwise, the stream must begin with an ASCII character. This allows the encoding to be deduced by the pattern of null (x00) characters.” [8] This change fixes all of those problems by implementing the YAML spec’s character encoding detection algorithm. Now, as long as YAML files begins with either a byte order mark or an ASCII character, yamllint will automatically detect them as being UTF-8, UTF-16 or UTF-32. Other character encodings are not supported at the moment. Fixes adrienverge#218. Fixes adrienverge#238. Fixes adrienverge#347. [1]: <https://docs.python.org/3.12/library/functions.html#open> [2]: <https://docs.python.org/3.12/library/os.html#utf8-mode> [3]: <https://sourceware.org/glibc/manual/html_node/Extended-Char-Intro.html> [4]: <https://wiki.musl-libc.org/functional-differences-from-glibc.html#Character-sets-and-locale> [5]: <https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/SUPPORTED;h=c8b63cc2fe2b4547f2fb1bff6193da68d70bd563;hb=36f2487f13e3540be9ee0fb51876b1da72176d3f> [6]: <https://docs.python.org/3.12/glossary.html#term-locale-encoding> [7]: <https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page> [8]: <https://yaml.org/spec/1.2.2/#52-character-encodings>
I know #20 and #2. But it's on non-Windows. On Windows, LANG, LC_CTYPE does not set in generally. I think yamllint should provide way to read utf-8 string even if LANG/LC_CTYPE is not set.
The text was updated successfully, but these errors were encountered: