[uppltotf] コードポイント（U 表記）の読取バグ #48

aminophen · 2018-01-29T15:59:04Z

#8 で入れた「U+10000 以上のコードポイントを JFM の TYPE > 0 に入れる」コードの一部に regression があることがわかりました。

(CHARSINTYPE O 1 U FF08)

のように U xxxx の後に空白文字を含めずに ) で閉じると

$ uppltotf ucodeparse.pl
This expression is out of JIS-code encoding. (line 1).
...   U FF08) 
             ...
This expression is out of JIS-code encoding. (line 1).
...( 
    CHARSINTYPE O 2...
Input file is in kanji YOKO-kumi format.

が出てしまいます。U 表記で 5 桁と 6 桁を許すようにしたコードに問題がある，という点までは心当たりがあるので，パッチはまた明日以降書きます。

The text was updated successfully, but these errors were encountered:

aminophen · 2018-01-30T13:22:50Z

pPLtoTF についても，もともと J xxxx は 4 桁固定でパースされているため，3 桁以下だと

$ ppltotf ucodeparse
This expression is out of JIS-code encoding. (line 1).
...(CHARSINTYPE O 1 J 214) 
                          ...
Illegal characters. I was expecting a jis code or character (line 1).
...( 
    CHARSINTYPE O 2...
Illegal characters. I was expecting a jis code or character (line 1).
...(C 
     HARSINTYPE O 2...
Illegal characters. I was expecting a jis code or character (line 1).
...(CH 
      ARSINTYPE O 2...

のように延々とエラーメッセージが続いてしまいました。これはあまり嬉しくないので，もう少し安全なコードにしてみました。 → bfbe64d

これに伴い，J 表記・U 表記ともに「任意桁数の 16 進数を受け付ける」という挙動に変化します（そして，受け付けたコードが JIS code または UCS code の最大値を超えていればそこでエラー）。

aminophen · 2018-01-30T13:53:31Z

さらに気づきましたが，

(CHARSINTYPE O 1 U 0021)

という行を含む PL source を upPLtoTF → upTFtoPL → upPLtoTF すると

$ uppltotf utorig
$ uptftopl utorig utnew
$ uppltotf utnew
Illegal characters. I was expecting a jis code or character (line 1).
...   ! 
       ...

でエラーになります。これは多分 upPLtoTF は「直書き」の文字として和文文字しか受け付けないのに，upTFtoPL の方が

  if BYTE1(cx)<>0 then out(xchr[BYTE1(cx)]);
  if BYTE2(cx)<>0 then out(xchr[BYTE2(cx)]);
  if BYTE3(cx)<>0 then out(xchr[BYTE3(cx)]);
                       out(xchr[BYTE4(cx)]);

のルーチンで ASCII code も直に文字で吐き出してしまうためのようです。これを防ぐため，upTFtoPL に例外扱いを入れようと思います。

aminophen · 2018-01-30T14:10:13Z

127 以下の文字コードは常に hex code で吐き出すようにする例外処理を 4085a02 で入れました。

aminophen · 2018-02-02T13:24:39Z

#47 と一緒に r46518 でコミットしました。

aminophen added the bug label Jan 29, 2018

aminophen added a commit that referenced this issue Jan 30, 2018

ppltotf.ch: safer kanji hex code parsing (#48)

bfbe64d

aminophen added a commit that referenced this issue Jan 30, 2018

ptftopl.ch: char<0x80 is always printed in hex code (#48)

4085a02

aminophen mentioned this issue Jan 30, 2018

ppltotf, makejvf, ptex: (GLUEKERN) support SKIP property and rearrangement #47

Merged

aminophen added a commit that referenced this issue Feb 2, 2018

uptexdir: add tests for #48

57d6f6f

aminophen closed this as completed Feb 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[uppltotf] コードポイント（U 表記）の読取バグ #48

[uppltotf] コードポイント（U 表記）の読取バグ #48

aminophen commented Jan 29, 2018 •

edited

Loading

aminophen commented Jan 30, 2018

aminophen commented Jan 30, 2018

aminophen commented Jan 30, 2018

aminophen commented Feb 2, 2018

[uppltotf] コードポイント（U 表記）の読取バグ #48

[uppltotf] コードポイント（U 表記）の読取バグ #48

Comments

aminophen commented Jan 29, 2018 • edited Loading

aminophen commented Jan 30, 2018

aminophen commented Jan 30, 2018

aminophen commented Jan 30, 2018

aminophen commented Feb 2, 2018

aminophen commented Jan 29, 2018 •

edited

Loading