-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
windows-1255 encoding: add mapping for 0xCA #73
Comments
Per https://www.w3.org/International/tests/repo/results/encoding-sb-dec#windows-1255 it's indeed only Microsoft that has failures here. I can't seem to run the test however in Edge and the note indicates it's mostly about PUA code points. @r12a? (Note that to implement this change we'd update the JSON resource and run tools-index.py, but it's not entirely clear to me that we want too given that the majority of implementations is aligned.) |
Yes, usually I follow this "majority of implementations" argument. But here, given that the main use of windows-1255 is as "a code page used under Microsoft Windows" [see https://en.wikipedia.org/wiki/Windows-1255], I would follow what the implementation of MultiByteToWideChar under Windows does: it maps 0xCA to U+05BA. |
@annevk i had no problem running the test. If you continue to have a problem, let me know. Here's a snap of the results. 0xCA is mapped to U+05BA and called out as an error. |
The "best fit" mappings for windows-1255 (http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1255.txt) have the 0xCA to U+05BA mapping, by the way. I'm OK with adding the mapping to Gecko's implementation. |
This is an archaic archive and should not be considered as a reference these days. For example, it does not contain a mapping to euro sign. Recently Microsoft removed the former reference site and put a link to the "best fit" mappings on unicode.org. So the "best fit" mappings should be considered as the latest reference now. |
Thank you for the pointer to these tables. I've updated the mapping table comparison in http://haible.de/bruno/charsets/conversion-tables/CP1255.html. FWIW, I made the corresponding change in GNU libiconv: http://git.savannah.gnu.org/gitweb/?p=libiconv.git;a=commitdiff;h=500b967b8f4bcb2bd656c293c5412dc611c5720b |
I'm OK with adding this mapping. |
Unless @jungshik objects, it seems this is ready to be merged. |
Microsoft Windows has had this mapping for over fifteen years. Despite it not being universally adopted, it seems best to align with Windows here. Fixes #73.
I created a PR, let me know if you see any problems. I plan on merging by end-of-day. |
I don't have any objection. I'll add that to Blink's mapping. |
Microsoft Windows has had this mapping for over fifteen years. Despite it not being universally adopted, it seems best to align with Windows here. Fixes #73.
Microsoft Windows has had this mapping for over fifteen years. Despite it not being universally adopted, the Encoding Standard aligned with Windows here. whatwg/encoding#73 whatwg/encoding#77
The windows-1255 specified through the spec does NOT map the byte 0xCA.
However, the main use of windows-1255 is as a codepage on Windows, and the native Windows converter (function MultiByteToWideChar) maps 0xCA to U+05BA, already since Windows 2000, i.e. for 15 years.
On the other hand, the codepage chart at Microsoft https://msdn.microsoft.com/en-us/library/cc195057.aspx marks this position as "not used", and the majority of non-Windows conversion software does not map the byte 0xCA.
For details of these mapping tables, see
http://haible.de/bruno/charsets/conversion-tables/index.html
http://haible.de/bruno/charsets/conversion-tables/CP1255.html
The implementation of the change would be to edit index-windows-1255.txt, adding a line
74 0x05BA (HEBREW POINT HOLAM HASER FOR VAV)
The text was updated successfully, but these errors were encountered: