Skip to content

Commit

Permalink
ISO-2022-JP encoder: convert halfwidth katakana to fullwidth
Browse files Browse the repository at this point in the history
Fixes #105.
  • Loading branch information
annevk authored May 8, 2017
1 parent 717d435 commit 5a09856
Show file tree
Hide file tree
Showing 36 changed files with 127 additions and 44 deletions.
32 changes: 21 additions & 11 deletions encoding.bs
Original file line number Diff line number Diff line change
Expand Up @@ -655,10 +655,11 @@ changed, so has the <a>index</a>.
<var>code point</var> is not in <var>index</var>.

<div class=note id=visualization>
<p>There is a non-normative visualization for each index other than <a>index gb18030 ranges</a>.
<a>index jis0208</a> also has an alternative <a>Shift_JIS</a> visualization. Additionally, there is
visualization of the Basic Multilingual Plane coverage of each index other than
<a>index gb18030 ranges</a>.
<p>There is a non-normative visualization for each <a>index</a> other than
<a>index gb18030 ranges</a> and <a>index ISO-2022-JP katakana</a>. <a>index jis0208</a> also has an
alternative <a>Shift_JIS</a> visualization. Additionally, there is visualization of the Basic
Multilingual Plane coverage of each index other than <a>index gb18030 ranges</a> and
<a>index ISO-2022-JP katakana</a>.

<p>The legend for the visualizations is:

Expand Down Expand Up @@ -748,6 +749,12 @@ specification, excluding <a>index single-byte</a>, which have their own table:
No JIX X 0212 ISO-2022-JP support:
https://www.w3.org/Bugs/Public/show_bug.cgi?id=26885
-->
<tr>
<td><dfn export>index ISO-2022-JP katakana</dfn>
<td colspan=3><a href=index-iso-2022-jp-katakana.txt>index-iso-2022-jp-katakana.txt</a>
<td>This maps halfwidth to fullwidth katakana as per Unicode Normalization Form KC, except that
U+FF9E and U+FF9F map to U+309B and U+309C rather than U+3099 and U+309A. It is only used by the
<a>ISO-2022-JP encoder</a>. [[UNICODE]]
</table>

<p>The <dfn>index gb18030 ranges code point</dfn> for <var>pointer</var> is
Expand Down Expand Up @@ -826,10 +833,9 @@ these steps:

<hr>

<p class="note no-backref">All <a lt=index>indexes</a> are also available as
non-normative <a href=indexes.json>indexes.json</a> resource.
(<a>index gb18030 ranges</a> has a slightly different format here, to be able
to represent ranges.)
<p class="note no-backref">All <a lt=index>indexes</a> are also available as a non-normative
<a href=indexes.json>indexes.json</a> resource. (<a>Index gb18030 ranges</a> has a slightly
different format here, to be able to represent ranges.)



Expand Down Expand Up @@ -1898,7 +1904,7 @@ consumers of content generated with <a>GBK</a>'s <a for=/>encoder</a>.
<li><p>If <a>EUC-JP lead</a> is 0x8E and <var>byte</var> is
in the range 0xA1 to 0xDF, inclusive, set <a>EUC-JP lead</a> to 0x00 and return
a code point whose value is 0xFF61 &minus; 0xA1 + <var>byte</var>.
<!-- katakana; subtraction is done first to avoid upsetting compilers -->
<!-- Katakana; subtraction is done first to avoid upsetting compilers -->

<li><p>If <a>EUC-JP lead</a> is 0x8F and <var>byte</var> is in the range
0xA1 to 0xFE, inclusive, set the <a>EUC-JP jis0212 flag</a>, set
Expand Down Expand Up @@ -2050,7 +2056,7 @@ consumers of content generated with <a>GBK</a>'s <a for=/>encoder</a>.
<dd><p>Unset the <a>ISO-2022-JP output flag</a> and return <a>error</a>.
</dl>

<dt><dfn lt="ISO-2022-JP decoder Katakana">Katakana</dfn>
<dt><dfn lt="ISO-2022-JP decoder katakana">katakana</dfn>
<dd>
<p>Based on <var>byte</var>:
<dl class=switch>
Expand Down Expand Up @@ -2166,7 +2172,7 @@ consumers of content generated with <a>GBK</a>'s <a for=/>encoder</a>.
<var>state</var> to <a lt="ISO-2022-JP decoder Roman">Roman</a>.

<li><p>If <var>lead</var> is 0x28 and <var>byte</var> is 0x49<!--I-->, set
<var>state</var> to <a lt="ISO-2022-JP decoder Katakana">Katakana</a>.
<var>state</var> to <a lt="ISO-2022-JP decoder katakana">katakana</a>.

<li><p>If <var>lead</var> is 0x24 and <var>byte</var> is either
0x40<!--@--> or 0x42<!--B-->, set <var>state</var> to
Expand Down Expand Up @@ -2266,6 +2272,10 @@ consumers of content generated with <a>GBK</a>'s <a for=/>encoder</a>.

<li><p>If <var>code point</var> is U+2212, set it to U+FF0D.

<li><p>If <var>code point</var> is in the range U+FF61 to U+FF9F, inclusive, set it to the
<a>index code point</a> for <var>code point</var> &minus; 0xFF61 in
<a>index ISO-2022-JP katakana</a>.

<li>
<p>Let <var>pointer</var> be the <a>index pointer</a> for <var>code point</var> in
<a>index jis0208</a>.
Expand Down
2 changes: 1 addition & 1 deletion index-big5.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 8dfc771062e7be0810919082c2c06baa2236147909e0ecc235b1cb9ad782ac82
# Date: 2016-10-24
# Date: 2017-05-06

942 0x43F0 䏰 (<CJK Ideograph Extension A>)
943 0x4C32 䰲 (<CJK Ideograph Extension A>)
Expand Down
2 changes: 1 addition & 1 deletion index-euc-kr.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 1d97134cbf187263585bc8f593ca4196654ed4c7a673f5672eaad4f5d9fdc4ba
# Date: 2016-10-24
# Date: 2017-05-06

0 0xAC02 갂 (HANGUL SYLLABLE GAGG)
1 0xAC03 갃 (HANGUL SYLLABLE GAGS)
Expand Down
2 changes: 1 addition & 1 deletion index-gb18030-ranges.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: f963aaa1653f630c523e7b04729fb4e4458f35806c45eb5c179445623138f0c0
# Date: 2016-10-24
# Date: 2017-05-06

0 0x0080
36 0x00A5
Expand Down
2 changes: 1 addition & 1 deletion index-gb18030.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 715f084846f5c6fc9dd31046d0a4d604bd2d88bfe3a22833cea048415e413c70
# Date: 2016-10-24
# Date: 2017-05-06

0 0x4E02 丂 (<CJK Ideograph>)
1 0x4E04 丄 (<CJK Ideograph>)
Expand Down
2 changes: 1 addition & 1 deletion index-ibm866.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: db6fe14a559d1601a7667338d83704773d5708dbc641e1ad3c5e21405770f05e
# Date: 2016-10-24
# Date: 2017-05-06

0 0x0410 А (CYRILLIC CAPITAL LETTER A)
1 0x0411 Б (CYRILLIC CAPITAL LETTER BE)
Expand Down
72 changes: 72 additions & 0 deletions index-iso-2022-jp-katakana.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Any copyright is dedicated to the Public Domain.
# https://creativecommons.org/publicdomain/zero/1.0/
#
# For details on index index-iso-2022-jp-katakana.txt see the Encoding Standard
# https://encoding.spec.whatwg.org/
#
# Identifier: 6ffc12c11f6eab1ccb3dada740d9b0db096ef0b0783c3bd5ec951dcb4a44b95e
# Date: 2017-05-06

0 0x3002 。 (IDEOGRAPHIC FULL STOP)
1 0x300C 「 (LEFT CORNER BRACKET)
2 0x300D 」 (RIGHT CORNER BRACKET)
3 0x3001 、 (IDEOGRAPHIC COMMA)
4 0x30FB ・ (KATAKANA MIDDLE DOT)
5 0x30F2 ヲ (KATAKANA LETTER WO)
6 0x30A1 ァ (KATAKANA LETTER SMALL A)
7 0x30A3 ィ (KATAKANA LETTER SMALL I)
8 0x30A5 ゥ (KATAKANA LETTER SMALL U)
9 0x30A7 ェ (KATAKANA LETTER SMALL E)
10 0x30A9 ォ (KATAKANA LETTER SMALL O)
11 0x30E3 ャ (KATAKANA LETTER SMALL YA)
12 0x30E5 ュ (KATAKANA LETTER SMALL YU)
13 0x30E7 ョ (KATAKANA LETTER SMALL YO)
14 0x30C3 ッ (KATAKANA LETTER SMALL TU)
15 0x30FC ー (KATAKANA-HIRAGANA PROLONGED SOUND MARK)
16 0x30A2 ア (KATAKANA LETTER A)
17 0x30A4 イ (KATAKANA LETTER I)
18 0x30A6 ウ (KATAKANA LETTER U)
19 0x30A8 エ (KATAKANA LETTER E)
20 0x30AA オ (KATAKANA LETTER O)
21 0x30AB カ (KATAKANA LETTER KA)
22 0x30AD キ (KATAKANA LETTER KI)
23 0x30AF ク (KATAKANA LETTER KU)
24 0x30B1 ケ (KATAKANA LETTER KE)
25 0x30B3 コ (KATAKANA LETTER KO)
26 0x30B5 サ (KATAKANA LETTER SA)
27 0x30B7 シ (KATAKANA LETTER SI)
28 0x30B9 ス (KATAKANA LETTER SU)
29 0x30BB セ (KATAKANA LETTER SE)
30 0x30BD ソ (KATAKANA LETTER SO)
31 0x30BF タ (KATAKANA LETTER TA)
32 0x30C1 チ (KATAKANA LETTER TI)
33 0x30C4 ツ (KATAKANA LETTER TU)
34 0x30C6 テ (KATAKANA LETTER TE)
35 0x30C8 ト (KATAKANA LETTER TO)
36 0x30CA ナ (KATAKANA LETTER NA)
37 0x30CB ニ (KATAKANA LETTER NI)
38 0x30CC ヌ (KATAKANA LETTER NU)
39 0x30CD ネ (KATAKANA LETTER NE)
40 0x30CE ノ (KATAKANA LETTER NO)
41 0x30CF ハ (KATAKANA LETTER HA)
42 0x30D2 ヒ (KATAKANA LETTER HI)
43 0x30D5 フ (KATAKANA LETTER HU)
44 0x30D8 ヘ (KATAKANA LETTER HE)
45 0x30DB ホ (KATAKANA LETTER HO)
46 0x30DE マ (KATAKANA LETTER MA)
47 0x30DF ミ (KATAKANA LETTER MI)
48 0x30E0 ム (KATAKANA LETTER MU)
49 0x30E1 メ (KATAKANA LETTER ME)
50 0x30E2 モ (KATAKANA LETTER MO)
51 0x30E4 ヤ (KATAKANA LETTER YA)
52 0x30E6 ユ (KATAKANA LETTER YU)
53 0x30E8 ヨ (KATAKANA LETTER YO)
54 0x30E9 ラ (KATAKANA LETTER RA)
55 0x30EA リ (KATAKANA LETTER RI)
56 0x30EB ル (KATAKANA LETTER RU)
57 0x30EC レ (KATAKANA LETTER RE)
58 0x30ED ロ (KATAKANA LETTER RO)
59 0x30EF ワ (KATAKANA LETTER WA)
60 0x30F3 ン (KATAKANA LETTER N)
61 0x309B ゛ (KATAKANA-HIRAGANA VOICED SOUND MARK)
62 0x309C ゜ (KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK)
2 changes: 1 addition & 1 deletion index-iso-8859-10.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 02c2b5590d8ccda9931008c471f6ee2c590b2c8fe5e6ccb3b08638115d778507
# Date: 2016-10-24
# Date: 2017-05-06

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-13.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 40736338e964ab520407cebcb01329f8d450abf6ce12bf88b74b655b60e43300
# Date: 2016-10-24
# Date: 2017-05-06

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-14.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 2c8651cfc08b1f35b17919ee5379f2fa006af3ec809f11b3b7f470785580542b
# Date: 2016-10-24
# Date: 2017-05-06

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-15.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: a560aba47bccd7510a6ac77f671fe75dca3800f05cf6d676910c311a8f8ff079
# Date: 2016-10-24
# Date: 2017-05-06

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-16.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 55676320d2d1b6e6909f5b3d741a7cf0cefc84e920aa4474afc091459111c2e3
# Date: 2016-10-24
# Date: 2017-05-06

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 9569c67f22d0b57790e1c407c6eecf227e4562322dc296de43cdab7a0152ec73
# Date: 2016-10-24
# Date: 2017-05-06

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-3.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: af8f1e12df79b768322b5e83613698cdc619438270a2fc359554331c805054a3
# Date: 2016-10-24
# Date: 2017-05-06

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-4.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 72f29c92344d351fe9e74a946e7e0468d76d542c6894ff82982cb652ebe0feb7
# Date: 2016-10-24
# Date: 2017-05-06

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-5.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: fa9b1f3f5242df43e2e7bca80e9b6997c67944f20a4af91ee06bacc4e132d9c9
# Date: 2016-10-24
# Date: 2017-05-06

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-6.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 85bb7b5c2dc75975afebe5743935ba4ed5a09c1e9e34e9bfb2ff80293f5d8bbc
# Date: 2016-10-24
# Date: 2017-05-06

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-7.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: f53d8aeba36314ef950eef02ffcf11dff540638ce27dfe7a86b6ccc6875afb24
# Date: 2016-10-24
# Date: 2017-05-06

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-8.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 7657a9ca3fa875990da960d3f812eea28dcd0ae6ed55a18d5394303c86f5484b
# Date: 2016-10-24
# Date: 2017-05-06

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-jis0208.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: cbaa91f3deb7d0841faf5c33041fc15a285da0e87e64ab802c4bf04b7c4da861
# Date: 2016-10-24
# Date: 2017-05-06

0 0x3000   (IDEOGRAPHIC SPACE)
1 0x3001 、 (IDEOGRAPHIC COMMA)
Expand Down
2 changes: 1 addition & 1 deletion index-jis0212.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 83bf90dd1c591a4355730d8c4567efc499d74da7490531019ef22a879991cfb7
# Date: 2016-10-24
# Date: 2017-05-06

108 0x02D8 ˘ (BREVE)
109 0x02C7 ˇ (CARON)
Expand Down
2 changes: 1 addition & 1 deletion index-koi8-r.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: c5497cd9071cb352c0e56b219154e539badf63de40b71578f09e2e11fe7d50ae
# Date: 2016-10-24
# Date: 2017-05-06

0 0x2500 ─ (BOX DRAWINGS LIGHT HORIZONTAL)
1 0x2502 │ (BOX DRAWINGS LIGHT VERTICAL)
Expand Down
2 changes: 1 addition & 1 deletion index-koi8-u.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 19a4da2c3f245118bbc8019326f45a07832949938ff903f03d62ac4da1f61f40
# Date: 2016-10-24
# Date: 2017-05-06

0 0x2500 ─ (BOX DRAWINGS LIGHT HORIZONTAL)
1 0x2502 │ (BOX DRAWINGS LIGHT VERTICAL)
Expand Down
2 changes: 1 addition & 1 deletion index-macintosh.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: f2c6a4f6406b3e86a50a5dba4d2b7dd48e2e33c0d82aefe764535c934ec11764
# Date: 2016-10-24
# Date: 2017-05-06

0 0x00C4 Ä (LATIN CAPITAL LETTER A WITH DIAERESIS)
1 0x00C5 Å (LATIN CAPITAL LETTER A WITH RING ABOVE)
Expand Down
2 changes: 1 addition & 1 deletion index-windows-1250.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 0669455a7a1c70ba6003ea737991e8ee9adc455125c13cfe6705a361358de5fa
# Date: 2016-10-24
# Date: 2017-05-06

0 0x20AC € (EURO SIGN)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-windows-1251.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 7592ef921679ba168b00a9e9afa3b4eebd67bf13dc7e84c4b6e120de856826e0
# Date: 2016-10-24
# Date: 2017-05-06

0 0x0402 Ђ (CYRILLIC CAPITAL LETTER DJE)
1 0x0403 Ѓ (CYRILLIC CAPITAL LETTER GJE)
Expand Down
2 changes: 1 addition & 1 deletion index-windows-1252.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: e56d49d9176e9a412283cf29ac9bd613f5620462f2a080a84eceaf974cfa18b7
# Date: 2016-10-24
# Date: 2017-05-06

0 0x20AC € (EURO SIGN)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-windows-1253.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 49fdc881a3488904dd1e8dfba9aef3258454249958b611bcded1d4c981ab5561
# Date: 2016-10-24
# Date: 2017-05-06

0 0x20AC € (EURO SIGN)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-windows-1254.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: e80a27adf377438be8ba5bd223875ea56d6a4d47f958cce1c957a2c446825caa
# Date: 2016-10-24
# Date: 2017-05-06

0 0x20AC € (EURO SIGN)
1 0x0081  (<control>)
Expand Down
Loading

0 comments on commit 5a09856

Please sign in to comment.