-
Notifications
You must be signed in to change notification settings - Fork 681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[css-text][text-spacing] Extra spacing between ideographs and non-fullwidth punctuation/symbols #9479
Comments
Hmm, interesting. Many of the use cases you showed above look like things that belong in a
and doing something like this: code {
text-autospace: no-autospace;
padding: 0 0.125em;
} But not all fit in that pattern.
This suggests that:
For the rest:
Maybe? That could be a solution. Is this a case of symbols that must always be autospaced (when autospacing is on)? If so, we should probably just do it. Does it depend on something which the author is aware of, but that the user agent cannot easily infer? if so, a new value Does it depend on whether they're next to a string of non-ideographic letters/numbers? If so, it might suggest we need to treat the as some kind of ambiguous/neutral group, that gets grouped together with a string of non-ideographic letters/numbers if any is there, but doesn't introduce spacing by itself if found without non-ideographic letters/numbers For example, If
Also, regardless of how we handle that category, as you mentioned that not all symbols would fit into that category, I am a little unsure about how we'd go about maintaining the list of those that do and those that don't. |
Note this was raised to JLTF a while ago but it didn't get much attentions there. I'll ping again. |
Why is it rendered within the innermost element that contains the boundary (i.e.
Maybe. I don't have a counterexample now.
Maybe, but I'm not quite sure about code points like U+2122 (Trade Mark Sign). I personally don't think the extra spacing is needed for it, but I would like to discuss it with the clreq group.
I agree that sometimes there is ambiguity, and I'll discuss it with the clreq group. |
No particular reason, authors could do what they prefer. I guess my choice here was influenced by the default GitHub style which includes some inline padding in |
Is it possible to somewhat involve UNICODE TEXT SEGMENTATION? Fortunately they are usually consecutive, so I guess SEGMENTATION, or the grouping logic mentioned by #9479 (comment) , should improve the situation. |
Could you provide an example of how to use |
@frivoal Yes! Considering that Chrome is in the process of implementing
It looks like Apple's OS takes a similar approach, for example:
@fantasai Do you know the exact rules for adding space in iOS? In the absence of a suitable algorithm, in the future it might be worth considering using a @kojiishi If the specification defines a rule for this, would you prioritize implementing it? |
Including symbols makes sense to me, but we probably don't want to include all gc= By seeing multilpe feedback to the character classes coming up, I'm leaning towards moving this definition to Unicode as I commented on PR#9503. Doing so should make discussing with Unicode experts easier, and maintaining the list should be easier too. Regarding the syntax, as several issues coming up and there are some uncertainty, I think it's better to step back rather than adding more. One idea is including them to both sets without adding a new value. Another idea is to defer detailed classifications of letters and numerals to future versions and start with |
I surely believe I am missing some important points, but what is the cause of this oddity?
With the text-autospace: normal property, I thought a small space would be generated between 'ideographs' and characters that are not. This two-state machine should prevent the imbalance that was mentioned. I apologize for the interruption, but I would greatly appreciate it if you could clarify where my misunderstanding lies. |
@kidayasuo The current default behaviour is For example, there's no extra spacing for the colon (:), parentheses, "hash sign" (#) and plus signs (+) and the ideograph next to them in the picture below: |
@kojiishi Apple's |
@xfq Thank you. Got it. Do you know why If they are truly useful and needed despite added complexities, I agree |
Right, thanks. Yes, I mean if Apple can disclose it. Sorry if my comment didn't read that way. |
I agree that adding extra spacing only between ideographs and non-ideographic letters, or only between ideographs and non-ideographic numerals is not useful. However, there are some characters that should not have extra spacing between ideographs and them, such as:
There are also some characters that I'm not sure, such as Taixuanjing symbols (like U+1D300), mahjong tiles (like U+1F000), Xiangqi symbols (like U+1FA60), copyright/copyleft signs, and so on. |
I agree. So, it seems we need 'neutral'? Do we need right/left directionality? Such neutrals create unbalanced spacing if/when they are used as a part of a word or a phrase. So, we might want to limit them to some small number. If the amount of space is small like 1/8 of a fullwidth like Apple does, we might be able to say ok to create a space for some edge cases. |
Based on our discussion in yesterday's clreq teleconference, we think it would be useful to make this behaviour language-dependant because of the difference in conventions between Chinese and Japanese. For example, in Japanese, it's normal to have extra spacing before "12" but not after "%" in the phrase "永永永12%永永永". However, in Chinese there's extra spacing after "%". |
@xfq Thanks for the info. I haven't checked with JLREQ folks, but I don't think this is language dependent. If the text is "永永永12%永永永" then I believe Japanese expects spacing after "%" too. The complexity of handling punctuation and symbols is that it depends on the context, but supporting longer context slows down the layout engine quite severely. Imagine "永永永12%永永永" and "永永永X%永永永" with the CSS It should be a bit simpler if CSS doesn't distinguish The discussion should move to Unicode once the proposal is accepted, and I hope we can find a good balance of desired results, complexity, and performance there. |
I got this information from w3c/jlreq#387 :
Although I'm not sure which bahaviour more common / expected.
Indeed.
If this is language-dependant, it may be difficult to solve the problem at the Unicode level only. Also, if the rule is defined in a Unicode character property, it's very difficult to change. IIRC it's on the agenda of UTC 178 this week, so let's see what the Unicode experts think about it. |
Thanks for the link I missed it. I think it's more about style, not language. Probably a diff between traditional print style and online text style. |
According to Bin-sensei, the spacing is intended to prevent characters from being too close together, not to highlight words like parentheses do. Such 'unbalanced' situations are actually common practice in publications. Adding the following comment on behalf of Bin-sensei: |
I disagree with not applying auto-spacing between a Japanese character and a Western punctuation mark. I believe it's not a matter of visual "balance" but a matter of consistency. In fact, as far as I remember, for instance, Morisawa-Linotype's CORA5-E text composition language designed for Linotype CRT/laser typesetters used by Japanese professional typographers allowed auto-spacing between a Japanese character and a Western punctuation. I don't mean you "must" always do it, but it is one of widely accepted conventions in Japanese typography. |
Talking to Ned from Apple, he says their algorithm is quite involved, and allows for both compression and expansion of the default spacing, and some spacing will take the glyph ink into account as part of the logic. So, this leads me to believe that we need to approach this problem with a bit more rigor and nuance. For example:
All this is to say that the proposal from Koji may not be sufficient to solve the Latin-to-J or Latin-to-CJK spacing issue, and that that issue is merely a single case of the generic spacing rules issue defined in JLReq or JIS X 4051 and so it should try for a higher bar from the beginning. |
@macnmm as repeated in the document, the proposal is not an effort to make a definite rule. It is intended to serve as a fallback default when no other information is specified by the higher level protocol. By having a reliable base, customizations become much easier because your description can be only the diff from the default. It is a benefit of having a stable default. |
@macnmm I like the idea of using the variation selectors as a potential solution to the challenges posed by the unification of code points for characters that are used differently in Western texts and in Japanese, despite their inherent distinctions. My understanding however is, as they are all proposed to be class "O" regardless of if they are fullwidth or proportional, it is an orthogonal issue. May be I am missing something…… |
It may be the proposal strayed from where I hoped it would land, but my hope is if you can define a VS with the missing SJIS width, spacing class, and vertical writing posture, you solve the Unicode unification issues for Japanese character behavior in line layout. So, I would say we push for this. |
In the process of trying Chromium's implementation of text-autospace, some interested Chinese developers found an issue: there is no extra spacing between ideographs and non-fullwidth punctuation/symbols. In many cases, this results in unbalanced spacing around embedded non-CJK text in CJK languages.
Examples:
However, not all non-fullwidth punctuation/symbols require extra spacing. For example, footnote marks like *, †, ‡, and ◊ should not have the extra spacing.
Should we add a new value
ideograph-symbol
(the name and specific design can be discussed later) to cover this situation? This value may not cover all situations, but it can cover some common ones. For uncommon cases, it would be nice to have a mechanism for author customization.The text was updated successfully, but these errors were encountered: