Adding to kanbun discussion #242

JPRidgeway · 2019-07-19T07:28:22Z

To the helpful (though unfortunately heated) talk about kanbun that started recently,I hope to add a bit of research, as my necessity of typesetting kanbun (in LaTeX) brought me to similar considerations.

How should kanbun look in text? There are possible two strategies a typesetter may choose.

a) Forsake the em-box sanctity and make kanbun marks proportional; this can be exemplified by a scan from Morohashi's Dai Kanwa Jiten below:

This also shows that Morohashi's typesetter chose to set a kanbun mark and a punctuation mark side-by-side.

b) Demand strick proportionality but insert extended kerning between any two kanji and use the remaining empty space to typeset, as done here, in Komai and Rohlich's textbook:

Note another salient feature not, I believe, imitated by any current font: when ㆒ and ㆑ are in contact, they combine into a joint glyph (same for ni- and san-marks).

The opposite question follows. How can one even typeset kanbun? There are two possible ways that can be imagined (and, maybe, more).

Actually insert kanbun gloss-marks (U+3190..U+319F) within the main text. This requires additional job from a font-maker as, no matter what kind of appearance a) or b) is chosen, the kerning complexities multiply. Also, as the marks are smaller than the rest of the text, the same argument that led OpenType to registering 'ruby' feature: a mere 50%-downscale cannot guarantee optimal look of glyphs. Anyway, if one wishes to use this method, having glyphs merely mapped from the original ones, without scaling and/or moving, is unacceptable.

The fonts that employ this method include: Hanazono Mincho A, which even is aware of the fact that 3190 ㆐ is the only centered glyph among these. It parses the glyphs inside the em-box instead of proportionalizing them, but perhaps produces the best overall results. It is directly followed by Microsoft’s gothics, TC JhengHei and Simplified YaHei, somehow good in rendering Japanese, as well as cute Seto Nozomi’s SetoFont, Taiwanese kaishu TW-Kai and mincho TW-Sung and bitmap Unifont, all of which make a mistake of treating ㆑ as if it hangs in the middle of the text; the opposite mistake, that is, of flushing all to upper-left, is done by Linux’s YOzFont.

Code2000 does a very peculiar thing: treats them all as proportional though sets every one in the middle (and misses many kanji). Were they moved to the left, it could have been a close-to-perfect match.

Two more fonts set every single mark centered, which is a more serious problem: BabelStone Han and Hong Kong’s Free HK Kai 4700, which also misses kanji.

Typeset kanbun as ruby. As kanbun is annotation, after all, this might be a better semantic option. Yet, it perhaps required much more complexities: it is, basically, telling the end-user solve all their problems. Indeed, if one wishes to typeset kanbun as furigana, one might be forced to overcome the limitations of an app to have only one furigana sequence per kanji (what if kanji itself requires proper furigana as well - see Komai, Rohlich!); think about how to flush them to the left side of the sequence; somehow obtain the correct positions, including ㆐as connector (how? There is no way to tell ruby to appear in the middle and push the next kanji away in some apps) and the very fact they follow the kanji (which means padding with space marks and having sub-par rendering). Furthermore, the problem of type 1 flourishes here as well: scaled down, kanji just don't look satisfactorily in the main text.

The fonts choosing this include: Kozuka family by Adobe, as well as Microsoft’s Meiryo, project Source Han and another family from Microsoft, Yu. This seems a kind of corporate standard of Microsoft/Adobe? Japanese fonts.

(Note: Two more were found that just skip 3190-1, providing the rest. Curiously, while official Taiwan’s MOESongUN just maps them to the full-size ones, open-source WenQuanYi flushes themall to the correct side! If they were able to obtain correct behaviour, why was it impossible to draw two more glyphs?)

Anyway, for the reasons detailed above, Source Han family is now not well-fit to kanbun typesetting with ruby style as well.

The mere fact that kanbun was encoded in Unicode (and, say, ruby wasn't) strongly implies the intention was of method 1, as otherwise plain text cannot contain the 3190-319F span at all, that kind of defies the fact of encoding.

To conclude, the opinion is: there are two ways to typeset kanbun. The preferred is, probably, setting in main text, in which case the glyphs should be proportional (half-width?), scaled 50% down and flushed to the center top (3190) or left top (any other). The opposide is treating them as ruby, which requires complexity from typesetting equipment, defies the idea of interoperability and still would probably require forms adapted for forced scaling. Neither can be done with ease in the current Source Han fonts, which makes one wonder whether one should just take the glyphs out and save some place for other required extensions.

(Another note: Another thing is "what to do with the span 16283-98 in Adobe-Japan1 4th Supplement". The fact they are given in the standard in full-size forms forces some kind of backward compatibility, but apparently the way they are encoded forces font-makers into creating font that are unusable for, well, typesetting kanbun.)

My preferred solution would be following option 1 and imitating HanaMinA solution of glyphs adapted and flushed to the side (it doesn't matter that much, whether proportional or full-width).

Of course, other opinions may exist, and it could be an enlightening discussion to watch how it works out.

kenlunde · 2019-07-19T12:11:41Z

Why did you open this issue when you are fully aware that Noto CJK Issue 159 is still open?

The feedback that I submitted for for discussion during next week's UTC meeting, which includes a link to the Noto CJK issue, can be found in L2/19-272, and is dated 2019-07-15.

The mere fact that the JIS X 4051 standard uses a full ten pages to explain Kanbun typesetting can be translated into stating that any representation in "plain text" is a pipe dream, and requires layout gymnastics, to include scaling. The characters were encoded so long ago that no one likely knew all of the ramifications.

Should U+3192 through U+319F require separate glyphs from their corresponding CJK Unified Ideographs? In my opinion yes, because it seems that the proper approach is to treat them generically, specifically not to vary by weight. All of the fonts of which I am aware do use separate glyphs, but the actual glyphs are sometimes the same as the corresponding CJK Unified Ideographs. This, contrary to everything else in this discussion, is an actual issue that will be addressed.

kenlunde closed this as completed Jul 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding to kanbun discussion #242

Adding to kanbun discussion #242

JPRidgeway commented Jul 19, 2019

kenlunde commented Jul 19, 2019

Adding to kanbun discussion #242

Adding to kanbun discussion #242

Comments

JPRidgeway commented Jul 19, 2019

kenlunde commented Jul 19, 2019