-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[zhmetrics-uptex] Enhancement of upTeX metrics #292
Comments
〈:〉, 〈;〉, 〈!〉 and 〈?〉 should be treated differently in horizontal and vertical writing mode. Each of them has a "width" of 0.5zw in horizontal writing mode but 1zw in vertical writing mode. 〈:〉 and 〈;〉 may also have a "width" of 0.5zw in vertical writing mode in some fonts, but if we assume that they have a "width" of 1zw, our font metrics can work for both situations. |
〈:〉, 〈;〉, 〈!〉 and 〈?〉 are all left-aligned in simplified Chinese, cf. GBT 15834-2011. Fandol fonts (and some other simplified Chinese fonts) are wrong. SourceHanSerif / SourceHanSans fonts are right. |
Thanks for your comments.
Actually I don't know what is the desired output for Chinese vertical writing; could you show me a sample page of Chinese books printed in vertical writing? I'd like to see how these punctuations appear in vertical writing. |
This info is also helpful; Thank you so much! |
Many photos of old books can be found on this website: For example, there are some photos of 《三國演義》; these books were printed in the early 1950s: http://book.kongfz.com/item_pic_191860_617117234/ In fact, the output of Chinese punctuations depends on two "style"s: the font design style and the punctuation compression style. The font design style decides how the punctuation is positioned in the square (character box) and the punctuation compression style decides the spacing between punctuations' and/or Han characters' square. Both of them have several possibilities and there are several possible combinations of them. (This is true for both traditional and simplified Chinese; the centre-aligned style of traditional Chinese in Taiwan is merely one case.) Unfortunately, as far as I know, there is still no package takes care of both of the two styles in the TeX world; instead, only the punctuation compression style is taken into consideration. Clerk Ma said that he was working on this problem (with a new method), but his work is still not published. I'm writing a package for (e-)upTeX and ApTeX and try to resolve this problem with the traditional method (JFM and JVF). But on the other hand, I think there is no necessity to support all those possibilities in |
Great! Now I see clearly how punctuations appear in vertical writing.
Thanks for clarification.
That'll be great. Please let me know if I can help you ;-)
Yes, our plan is similar, except that we may include two "style"s for traditional Chinese to support two font design styles. This is because centre-aligned style (which is going to be named "uptchc*" series) is the most suitable for traditional Chinese but we have distributed the left-aligned style (existing "uptch*" series) for almost 10 years. As a result, there will be
The source files for "upsch*" and "uptch*" will be shared each other, so all we have to do is to design two JFM/JVF sets. |
Note of quotation marks of Chinese: But in practice, the Taiwan style quotation marks may also be used in vertical typesetting in Mainland. For example, this is 《十三經注疏・論語注疏》 (北京大學出版社, 2000): Even 「…『…』…」 is used in horizontal simplified Chinese on the Internet and in a few published books in Mainland (sorry, I can't remember the name of the book). Note of middle dot of Chinese: There are five characters may be used as middle dot: U+00B7, U+2022, U+2027, U+30FB, U+FF0E, and there is no official standard. U+00B7 is mainly used in Mainland; U+FF0E was mainly used in Taiwan (note that U+FF0E is also centre-aligned in Taiwan, so it looks like a middle dot). But I don't know which character is used by most people in Hong Kong and Macau, and in Taiwan nowadays.
This is only the clreq team's opinion. |
Thank you very much! |
With regard to middle dot, current upjisr-h.pl (meant for left-aligned punctuations) has only U+30FB in TYPE 3
and uptchcr-h.pl (meant for centre-aligned punctuations) has U+30FB and U+FF0E in TYPE 3
Then, I'll add U+00B7 to TYPE 3. I'm not sure about U+2022 and U+2027; these are marked as "bullet" and "hyphenation point", so truncating widths and applying JFM glue around these characters like TYPE 3 might be odd ... |
Done in texjporg/uptex-fonts@0f919e2. (.tfm and .vf files are not still regenerated) |
Yes, my replies are only notes, so I titled them "Note of...". Actually, there are some odder usage: use two consecutive U+2500 (U+2502) as a horizontal (vertical) long dash (破折號) and use two consecutive U+22EF (U+22EE) as a horizontal (vertical) ellipsis... Certainly, this is also a note, not a suggestion. |
Very helpful! Then I will not add them as a standard.
Based on this comment and this comment, I moved 〈:〉 and 〈;〉 to TYPE 2, and added 〈!〉 and 〈?〉 to TYPE 4, both of which are restricted to horizontal writing of simplified Chinese (texjporg/uptex-fonts@daa2e3c). I regenerated JFM/JVF based on the above commits; if you are interested, please test it. I appreciate any comments. If anyone wants to build these .vf files by yourself, makejvf |
It seems that this line should be added to ukinsoku.tex (plain TeX) and ukinsoku.dtx (LaTeX):
Typos: The last paragraph of README_ASCII_Corp.txt: dvip -> dvips README_uptex_font.txt: ambiguos -> ambiguous lator -> later |
That change has been already planned, but I'm wondering about the possibility of side effect. ukinsoku.tex also has \xspcode"b7=3, which is meant for 8-bit encoding (like T1 encoding), so first we have to examine what happens when these kinsoku parameters become effective.
Thanks. I'll fix them soon in uptex-base repo. |
I think I found a side effect. Consider following example, where the code point 0xB7 appears twice, from two different encodings (T1 and Unicode): \documentclass{ujarticle}
\usepackage{lmodern}
\pagestyle{empty}
\AtBeginDvi{\special{pdf:mapline uprml-h UniJIS-UTF16-H :0:simsun.ttc}}
% \prebreakpenalty is applied to both CJK and non-CJK tokens
\prebreakpenalty"B7=10000\relax
\parindent0pt
\textwidth7zw
\begin{document}
% By default, U+00B7 is treated as non-CJK token
字字字字字字字\char"B7 abc\par % unexpected
% By using \kchar, U+00B7 is treated as CJK token
字字字字字字字\kchar"B7 abc\par
\end{document} Note about upTeX spec: When \char primitive is used, the character is treated as a CJK token or a non-CJK token, depending on the character code. When \kchar primitive is used, the character is always treated as a CJK token. In this example, both line breaks before 0xB7 ("ů" and "·") are disabled, but the first one (before "ů") seems to be unexpected (though the character "ů" rarely comes at the beginning of any words). |
This is really a problem... If we use Incidentally, U+00B7 cannot be typeset correctly when using some fonts: \special{pdf:mapline upstsl-h unicode SourceHanSerifSC-Regular.otf}
\font\schrmh=upschrm-h
\def\test{中文·中文·English·English·中文}
% IPAexMincho
\test
% SourceHanSerifSC-Regular
\schrmh\test
\bye |
Unfortunately, if we add
That is not what we can handle; (u)pTeX assumes fixed width fonts, but the real fonts (IPAexMincho and SourceHanSerifSC-Regular) has U+00B7 in proportional width. If we are going to use such a proportional width font, we have to prepare a specific JFM for it. |
I forgot to note that some other latin encodings might have a character other than ů in 0xB7 ... |
Sorry, my expression was not clear. What I meant is that the problem is to choose the lesser of two evils: Thank you for your explanation! |
That's true... but the problem is not only about 0xB7. Some people use latin double quotes \documentclass{ujarticle}
\textwidth10zw
% penalties for OT1 double quotes
\postbreakpenalty92=10000
\prebreakpenalty34=10000
\begin{document}
\parindent0zw
字字字字字字字字字``字字字字''字字字。
\end{document} However, we rejected the request, since these penalties are invalid for T1 and others. I think the current problem for middle dot (0xB7) is similar. |
I decided to add Here is the reason; we already have following lines in ukinsoku.tex: \postbreakpenalty`«=10000
\prebreakpenalty`»=10000 Both characters are defined in recent Japanese standard (JIS X 0213), and they are assigned to U+00AB and U+00BB (Latin-1 block) in Unicode; the situation for U+00B7 is quite similar to U+00AB and U+00BB. I guess, the only reason why we don't have U+00B7 in ukinsoku.tex is that U+00B7 is rarely used in Japan compared to U+30FB. When Chinese and Korean are taken into consideration, it seems natural to add a penalty to U+00B7 for consistency. |
Good news! |
A personal question: What is the reason for the following lines? I think \postbreakpenalty`\%=500
\postbreakpenalty`\&=500
\postbreakpenalty`%=200
\postbreakpenalty`&=200 |
I don't know either ;-) These codes were originally written by ASCII Corporation in 1995 or earlier (see kinsoku.dtx in texjporg/platex) and remain unchanged. I think we should fix them ... |
The following lines are copied from ukinsoku, and I guess there is something wrong: %%
%% inhibitxspcode JIS X 0213
%%
\inhibitxspcode`¡=2
\inhibitxspcode`¿=2
%%
%% inhibitxspcode JIS X 0212
%%
%%\inhibitxspcode`¡=1
%%\inhibitxspcode`¿=1 |
\inhibitxspcode for JIS X 0212 (comment-out) seems wrong. Fixed in uptex-base and uplatex. Please wait for \postbreakpenaly -> \prebreakpenalty fixes for |
Done in texjporg/ptex-base#5 (see commit references). |
Now I hope upschr-h.pl and upschr-v.pl are ready for Chinese; if there is someone interested, please incorporate them as upzhm-h.pl and upzhm-v.pl. When the new zhmetrics-uptex is going to be released, I recommend to build new .vf files using
|
I noticed that #369 is already merged; thanks! |
Hello, this is Japanese TeX Development Community.
We are considering about improving standard upTeX metrics for simplified Chinese, bundled in https://github.com/texjporg/uptex-fonts . We find that CTeX-org has a derivative called zhmetrcis-uptex, so I decided to ask you what is the most desirable.
When revisiting currently available metrics (upschrm-{h,v}, upschgt-{h,v}), the following questions are raised;
All these changes have been impossible due to the old implementation of makejvf (Japanese virtual font generator for pTeX/upTeX), whose hard-coded routine is optimized for Japanese fonts. Yesterday I added “enhanced mode” to makejvf (texlive svn r44817), which can output more suitable VF for properly-classified JFM types.
The text was updated successfully, but these errors were encountered: