-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kanbun marks (U+3190...U+319F) not differentiated from normal CJK characters #159
Comments
Um, no. Hundreds and possibly thousands of existing Japanese fonts that are based on Adobe-Japan1-5 or higher implement these characters using glyphs that may appear to be full-size, but are expected to be reduced in size when used for their intended purpose. This is no different than ruby whose glyphs are provided at full-size, but are expected to be shrunk to an appropriate size when used for their intended purpose. If you've read JIS X 4051, which is referenced in the Section 18.1 of the Core Specification, you'd understand this. |
Ruby is a markup system, a style. But these characters are encoded in Unicode as plain-text where they are described as being superscript. Their reference chart is also showing them only as superscript. |
Still no. Unicode explicitly refers to JIS X 4051 which states that the glyphs for Kanbun are to be scaled to one-half size when used. You're conflating Ruby as a markup system and the additional need to scale them to one-half size, and it is the latter requirement to which I am comparing to Kanbun. Whether the glyphs for Kanbun are exactly the same as their corresponding ideographs depends entirely on the font. |
I would also like to add that U+FE45 ﹅ SESAME DOT and U+FE46 ﹆ WHITE SESAME DOT in the CJK Compatibility Forms block require similar treatment according to JIS X 4051, meaning that they are reduced to half size. These are among a small number of 圏点 (kenten) characters. EDITED TO STATE TO IGNORE: I am planning to submit feedback for UTC #160 that the "<super>" portion of their annotation be removed, along with providing an alternate font that more accurately reflects how these characters are implemented in virtually all fonts. |
Sorry, but YOU only are making this conflation. I've never spoken about Ruby. The glyphs are to be scaled 1/2 when used, INCLUDING in plain text for which they were encoded in Unicode. These kanbun are not much different from other encoded superscripts like "²" which is pre-reduced, and where the superscript is also part of the identity, jsut like the superscript is part of the Kanbun identity. Unicode would have never encoded these Kanbun characters if they were simple duplicates of the existing full-size characters. And the Unicode charts are correctly stating showing them PRE-REDUCED, so the "<super>" portion in the UCD is accurate (and it is normative! you cannot remove it because it would violate the stability of normalizations!) |
The characters in question date back to the very beginning of Unicode, so they were not really added, but were there from the beginning. There are plenty of characters in Unicode that were there from the beginning and for which not everything was known about how they should be implemented. Consider U+3031 〱 VERTICAL KANA REPEAT MARK and U+3032 〲 VERTICAL KANA REPEAT WITH VOICED SOUND MARK whose "implemented as glyphs that are two-em tall" annotation was only recently added to the code charts (at my suggestion). These characters were in Unicode from the very beginning. Anyway, good luck in convincing all Japanese type foundries to change the glyphs for these characters, which affects several hundred Japanese fonts. |
Good luck then for your intent to "remove" the "<super>" portion of the UCD, it will be consistantly rejected as this property is frozen. And then the Charts published since the begining by Unicode and ISO are also accurate, but have not been correctly applied when creating Japanese fonts with Unicode mappings. |
After looking at the UCD, I agree that changing the property is a non-starter. I should have checked before making such a suggestion. BTW, these characters are not included in any legacy JIS standard, except for JIS X 0221 that is a clone of ISO/IEC 10646. Also, while the glyphs may look the same as their corresponding ideographs, their GIDs are unique. Some fonts treat them as generic, meaning that they do not vary in weight like their corresponding ideographs. Some fonts give them weight-specific treatment. Kozuka Mincho and Hiragino are examples of the former treatment, and Kozuka Gothic, Source Han, and Noto CJK are examples of the latter treatment. I feel that the former treatment is correct. I will be at UTC #160 later this month, and will raise this as an issue. Fixing hundreds of Japanese fonts is a non-starter. |
From a renderer’s perspective, pre-sizing the glyphs differently so they occupy some fraction of the full embox would make it more difficult to size them exactly as the typographer specs it, I would think. Superscript and subscript sizes are customizable and so choosing one (in the font) to fit all cases of kanbun notation could be problematic. The renderer is responsible for size and placement relative to the font embox, not relative to the body text embox (as it would be if ruby and kenten and kanbun and other odd-sized glyphs were pre-shrunk by the font designer). |
This means that there are only proprietary implementations by some font foundries (but outside Japan, the usage of these font positions is really rare, as the Japanese language is almost unknown elsewhere, because of the difficult of the language and of its complex script whose phonographic part is ignored, and whose glyphs are also modified in many places, making the Kanjis very difficult to use and learn, even for Japanese people themselves, much more than simplified or traditional Hanzi characters for Chinese speakers; the Han characters were so difficult to use for Vietnamese, and so much mis-adapted to their language, that they have abandoned the old "Chuh Nom" style). But the real standard is what is set in Unicode and ISO/IEC 10646 (and their charts give them a string identity). Some proprietary font vendors have just done errors when they started to adapt their fonts to the Unicode mapping: may be this was good for their CID-keyed mapping, but the conversion of old CID-keys (or private-use proprietary extensions in JIS) to Unicode, when these characters were finally encoded in the UCS, was really incorrect (and this has not been so many years ago: Fonts with Unicode mappings for these Kanbun are still very young). This is not different from other preencoded superscripts like "²": fonts have to make them superscript for the superscript glyph (they can adopt the metrics they want, not necessarily "half-width" or fitting exactly in the quarter ideographic cell), but it must respect the identity. No additional styling is needed : in plain-text these Kanbun should still be superscripts. For rich-text format, you would not even use these characters, you would use superscript or ruby styling over the normal characters.
|
Incorrect ! The font designer still defines the metrics to adopt to the Kanbun. It's not up to the renderer to define it, except to synthetize the Kanbun if the font does not provide glyphs for them. And this will never happen for styling rich-texts with superscripts or ruby: these texts will use the standard base characters, and then styles will apply to them (superscript/subscriptt variants of these base glyphs may be looked up in "OpenType features" of the same font, but if the feature is unimplemented or does not map these variants, the renderer will synthetize these styles using the glyphs mapped for the base characters in the "default feature" mapping, and will never need to use the Kanbun in the base mapping of the same font). So there's no metric issue at all. |
The point that @macnmm is making is that Kanbun layout is complex, which explains why Section 5 of JIS X 4051:2004 (reconfirmed on 2018-10-22) spans 10 pages (pp 35 through 44), and describes special layout requirements that go above and beyond plain text. Its use is also relatively rare, which probably explains why JLREQ didn't take it on. In any case, I submitted public feedback via Unicode's Contact Form, which included a link to this discussion, so that it can be discussed during UTC #160 later this month. |
I don't have opinions how to define them in TUS/UCD, but I agree with @macnmm on the renderer perspective. Fonts pre-sizing will make renderes more difficult, at least until we have a good Kanbun spec for OpenType. |
@JPRidgeway Note the link directly above to the Source Han Sans issue that you needlessly opened, which means that there is no need to add verbatim what you wrote there. |
@marekjez86 or @davelab6: You can safely close this issue. The only actions, which I have recorded in Source Han Sans Issue #205 and Source Han Serif Issue #36, is to make the glyphs for U+3191 ㆑ through U+319F ㆟, uni3191 through uni319F, generic in terms of weight. The feedback that I submitted on 2019-07-15, which was included in L2/19-272, was discussed at length during last week's UTC meeting last week. The conclusion was that the superscript property does not require the glyphs to be preshrunk or positioned in a particular way. In retrospect, a property unique to these characters, perhaps called kanbun, would have been better. The property assignment was made before the implementation details were fully understood. Also, as a member of the Unicode Editorial Committee, I was given an Action Item to clarify this in the section of the Core Specification that describes the Kanbun block. This will be reflected in Unicode Version 13.0. |
@kenlunde We're still waiting for an opinion other than your's. And there's NOTHING in the current Unicode alpha version 13.0, so this has most probably not been discussed or decided. Closing this issue seems then VERY prematurate as the "conclusion" you state is invisible for now. But may be this issue can be tracked in the two alterate bugs now open for "adobe-fonts/source-han-serif#36" and "adobe-fonts/source-han-sans#205" |
Sorry, but the issue was discussed at UTC #160, and the result was that I was assigned Action Item 160-A54 to clarify this in the "Kanbun" section of the Core Specification for Unicode 13.0, which will be published next March. The opinion other than mine that disagrees with you is the UTC. The attendee list at the end of the meeting minutes shows you who were in attendance, which included several "property" experts, such as Roozbeh Pournader, Mark Davis, and Ken Whistler. There are also opinions expressed earlier in this issue that disagree with you. What we have yet to see is an opinion other than yours that agrees with you. |
And "Action Item 160-A54" to which you were committed to work on a proposal is still waiting for a text that the UTC will need to approve (with other experts expressing their opinions before deciding it) before including in it in one of the "TBD" sections of the later Unicode 13.0 alpha (or some later version). For now this is still an open issue in Unicode, but thanks for reporting it to them and including it in their working schedule. |
As soon as I have drafted the additional text for the "Kanbun" section, and had it reviewed by the Unicode Editorial Committee, I will share it here. Property experts are also on that committee. |
For those interested in this issue, the following is the text for the Kanbun section of the Core Specification for Unicode Version 13.0 that the Unicode Editorial Committee finalized today:
|
This text still does not indicate how the characters will render in plain text (without any external layout). As the "Kanbun" behavior is entirely dependant of the presentation by the external styling engine and not at all on the glyphs themselves, they should have their default positioning and sizing (in plain text) still different from the normal CJK plain ideographs. Otherwise, these characters are in fact only complete duplicate TOTALLY equivalent to the default CJK characters and the compatibility decomposition (which is normative) makes no sense at all. So yes I maintain that fonts should map the Kanbun characters in superscript size and positioning, and it will still be up to the specific rendering engine to resize/reposition them in a Kanbun layout where available (with two variants, one for the vertical presentation, another for the horizontal presentation). If needed, fonts may supply OpenType features to include metrics/sizing/positioning substitution for Kanbun, but these features won't be enabled for the plain text rendering which will continue to use the superscript default style. I don't like at all this new text, but even if it is adopted, this does not prohibit fonts to implement the opentype features for resizing/repositioning these superscript glyphs for vertical (or horizontal?) Kanbun interlinear layout (which, IMHO, just equivalent to ruby text). And I've not seen for now any working implementation of Kanbun layout, except by using the standard CJK characters (not these Kanbun characters) along with standard ruby styling (e.g. in HTML/CSS). |
What all this means is that the "compatibility decomposition" was not based at all on compatibility. And the choice of the And when rendering texts containing these "Kabun", I don't see at all how layout engines can do anything without adding explicit styling in the document (i.e. using I don't see at all what is specific to the "Kanbun" layout. For me it's just a precomposition of standard ruby text, and if so, the correct rendering in plain text should also be interlinear as ruby text: if there's no way to for the ruby in the current lineheight, e.g. in monospaced texts, then the proper way to render it inline, because it cannot be placed elsewhere, is still as superscript, which is then the appropriate style to use for the default mapping in fonts in absence of any other styling/layout engine. |
@verdy-p At this point, you need to take this up with the UTC, not by posting in this now-closed issue. The UTC already discussed this topic at length during UTC 160, so you're not likely to convince many people. Still, if you care to do so, please submit your feedback so that it can be discussed at UTC 161 in mid-October. Or, if you prefer to wait until the text shown above is published in the Unicode Version 13.0 Core Specification to submit feedback, that works, too. |
I canot report to Unicode something they still have not published at all, and that has not even entered any beta survey. What was discussed by them was because of this bug I submitted here, and I think that this talk is still part of their ongoing process (I just hope they still have a link to this bug, which may continue to be useful for the future beta to come later). |
The "Kanbun" sinographic marks (U+3190...U+319F) are rendered the same as plain CJK characters.
They should be rendered as superscript (centered in the top help for the first one, "tateten", or occuying the top-left quarter of the ideographic box for the 15 other "kaereten"), instead of remapping the standard CJK (which should then be rescaled to 50% to 60% of width or height, and repositioned, with slightly simplified stroke to improve readability).
But it seems that you have just remapped the same glyph betweeen the full-square ideographs, and the Kanbun marks.
Note: these are NOT "compatibility" characters but really they are annotation marks. Rendering them as standard characters just creates ambiguity/confusion.
This affects ALL existing CJK fonts families (Noto Sans CJK, Noto Serif CJK), in the four supported language groups (JP, KR, TC, SC), and all styles and weights.
For reference, look at this chart:
https://www.unicode.org/charts/PDF/U3190.pdf
The text was updated successfully, but these errors were encountered: