windows-1255 encoding: add mapping for 0xCA #73

bhaible · 2016-10-03T20:41:25Z

The windows-1255 specified through the spec does NOT map the byte 0xCA.

However, the main use of windows-1255 is as a codepage on Windows, and the native Windows converter (function MultiByteToWideChar) maps 0xCA to U+05BA, already since Windows 2000, i.e. for 15 years.

On the other hand, the codepage chart at Microsoft https://msdn.microsoft.com/en-us/library/cc195057.aspx marks this position as "not used", and the majority of non-Windows conversion software does not map the byte 0xCA.

For details of these mapping tables, see
http://haible.de/bruno/charsets/conversion-tables/index.html
http://haible.de/bruno/charsets/conversion-tables/CP1255.html

The implementation of the change would be to edit index-windows-1255.txt, adding a line
74 0x05BA (HEBREW POINT HOLAM HASER FOR VAV)

annevk · 2016-10-04T07:35:15Z

Per https://www.w3.org/International/tests/repo/results/encoding-sb-dec#windows-1255 it's indeed only Microsoft that has failures here. I can't seem to run the test however in Edge and the note indicates it's mostly about PUA code points. @r12a?

(Note that to implement this change we'd update the JSON resource and run tools-index.py, but it's not entirely clear to me that we want too given that the majority of implementations is aligned.)

bhaible · 2016-10-04T09:56:11Z

it's not entirely clear to me that we want too given that the majority of implementations is aligned

Yes, usually I follow this "majority of implementations" argument. But here, given that the main use of windows-1255 is as "a code page used under Microsoft Windows" [see https://en.wikipedia.org/wiki/Windows-1255], I would follow what the implementation of MultiByteToWideChar under Windows does: it maps 0xCA to U+05BA.

r12a · 2016-10-04T10:34:33Z

@annevk i had no problem running the test. If you continue to have a problem, let me know.

Here's a snap of the results.

0xCA is mapped to U+05BA and called out as an error.

vyv03354 · 2016-10-04T11:21:58Z

The "best fit" mappings for windows-1255 (http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1255.txt) have the 0xCA to U+05BA mapping, by the way.

I'm OK with adding the mapping to Gecko's implementation.

vyv03354 · 2016-10-05T10:49:43Z

On the other hand, the codepage chart at Microsoft https://msdn.microsoft.com/en-us/library/cc195057.aspx marks this position as "not used"

This is an archaic archive and should not be considered as a reference these days. For example, it does not contain a mapping to euro sign.

Recently Microsoft removed the former reference site and put a link to the "best fit" mappings on unicode.org. So the "best fit" mappings should be considered as the latest reference now.

annevk · 2016-10-05T11:03:20Z

@jungshik @hsivonen okay with you too?

bhaible · 2016-10-05T19:22:15Z

The "best fit" mappings for windows-1255 (http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1255.txt) have the 0xCA to U+05BA mapping, by the way.

Thank you for the pointer to these tables. I've updated the mapping table comparison in http://haible.de/bruno/charsets/conversion-tables/CP1255.html.

FWIW, I made the corresponding change in GNU libiconv: http://git.savannah.gnu.org/gitweb/?p=libiconv.git;a=commitdiff;h=500b967b8f4bcb2bd656c293c5412dc611c5720b

hsivonen · 2016-10-10T07:29:11Z

I'm OK with adding this mapping.

mathiasbynens · 2016-10-23T17:21:38Z

Unless @jungshik objects, it seems this is ready to be merged.

Microsoft Windows has had this mapping for over fifteen years. Despite it not being universally adopted, it seems best to align with Windows here. Fixes #73.

annevk · 2016-10-24T08:12:42Z

I created a PR, let me know if you see any problems. I plan on merging by end-of-day.

jungshik · 2016-10-24T17:00:21Z

I don't have any objection. I'll add that to Blink's mapping.

Microsoft Windows has had this mapping for over fifteen years. Despite it not being universally adopted, it seems best to align with Windows here. Fixes #73.

Microsoft Windows has had this mapping for over fifteen years. Despite it not being universally adopted, the Encoding Standard aligned with Windows here. whatwg/encoding#73 whatwg/encoding#77

Per whatwg/encoding#73

Per whatwg/encoding#73.

annevk added a commit that referenced this issue Oct 24, 2016

windows-1255 map 0xCA to U+05BA

fd57296

Microsoft Windows has had this mapping for over fifteen years. Despite it not being universally adopted, it seems best to align with Windows here. Fixes #73.

annevk mentioned this issue Oct 24, 2016

windows-1255 map 0xCA to U+05BA #77

Merged

annevk closed this as completed in #77 Oct 24, 2016

annevk added a commit that referenced this issue Oct 24, 2016

windows-1255 map 0xCA to U+05BA

e32a57b

Microsoft Windows has had this mapping for over fifteen years. Despite it not being universally adopted, it seems best to align with Windows here. Fixes #73.

inexorabletash mentioned this issue Oct 24, 2016

windows-1255 map 0xCA to U+05BA inexorabletash/text-encoding#59

Closed

jungshik added a commit to jungshik/web-platform-tests that referenced this issue Oct 27, 2016

Add "0xCA <=> U+05BA" mapping to windows-1255

e45b511

Per whatwg/encoding#73

jungshik mentioned this issue Oct 27, 2016

Add "0xCA <=> U+05BA" mapping to windows-1255 (spec change) web-platform-tests/wpt#4090

Merged

hsivonen added a commit to hsivonen/encoding_rs that referenced this issue Oct 31, 2016

Map 0xCA to U+05BA in windows-1255 per whatwg/encoding#73.

a8e2a7a

annevk pushed a commit to web-platform-tests/wpt that referenced this issue Nov 7, 2016

Add "0xCA <=> U+05BA" mapping to windows-1255

2a4e5cf

Per whatwg/encoding#73.

Mr0grog mentioned this issue Oct 17, 2023

Encodings reported by chardetng-py don't always match up to python's decoding john-parton/chardetng-py#11

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

windows-1255 encoding: add mapping for 0xCA #73

windows-1255 encoding: add mapping for 0xCA #73

bhaible commented Oct 3, 2016 •

edited

Loading

annevk commented Oct 4, 2016

bhaible commented Oct 4, 2016

r12a commented Oct 4, 2016

vyv03354 commented Oct 4, 2016

vyv03354 commented Oct 5, 2016

annevk commented Oct 5, 2016

bhaible commented Oct 5, 2016

hsivonen commented Oct 10, 2016

mathiasbynens commented Oct 23, 2016

annevk commented Oct 24, 2016

jungshik commented Oct 24, 2016

windows-1255 encoding: add mapping for 0xCA #73

windows-1255 encoding: add mapping for 0xCA #73

Comments

bhaible commented Oct 3, 2016 • edited Loading

annevk commented Oct 4, 2016

bhaible commented Oct 4, 2016

r12a commented Oct 4, 2016

vyv03354 commented Oct 4, 2016

vyv03354 commented Oct 5, 2016

annevk commented Oct 5, 2016

bhaible commented Oct 5, 2016

hsivonen commented Oct 10, 2016

mathiasbynens commented Oct 23, 2016

annevk commented Oct 24, 2016

jungshik commented Oct 24, 2016

bhaible commented Oct 3, 2016 •

edited

Loading