-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the "backwards deletion" Q+A article #520
Conversation
- Clean up javascript - Fix examples [NOT YET READY FOR REVIEW]
- Thanks to @r12a for the page: https://r12a.github.io/scripts/thai/th.html#webSegmentation - I used the word 'toilet' from this example because it has a sara am (U+0E33) - I added mention of the sara am
✅ Deploy Preview for i18n-drafts ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
I updated the article to match the latest template. We also need to:
|
<div class="try"> | ||
<h4>Try it in your browser</h4> | ||
<p>Try selecting, cursoring, deleting, and backspacing with this word in Hindi (in the Devanagari script). The word means "Unicode" and contains <em>four</em> graphemes and <em>seven</em> Unicode code points.</p> | ||
<p><input id="tryHindi" type="text" name="tryHindi" lang="hi" class="try" value="यूनिकोड"></input> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that input
elements don't need a close tag. (Same for other instances.)
Co-authored-by: Fuqiao Xue <[email protected]>
- add a note about vertical text - fix tamil tag - reword about thai's relationship to indic
I took it to the logical conclusion and just made a small section about vertical text at the end. It makes the document more attractive.
|
||
<p>In vertical text, the left arrow moves left one row and the right arrow right one row, while up and down arrows move one visual character up or down in the row of text.</p> | ||
<p>In vertical text, the left arrow moves left one row and the right arrow right one row, while up and down arrows move one user-perceived character (grapheme) up or down in the row of text.</p> | ||
|
||
<h3 id="text_selection_desc">Selecting text via the keyboard</h3> | ||
|
||
<p>Text selection begins much like cursoring, by positioning the cursor at the start (or end) of the desired text and then selecting to the other end of the desired text. This can be done using a pointing instrument, such as a mouse, or using keyboard gestures such as holding "shift" and cursoring through the text. Unlike cursoring, text selection is constrained by the need to select logical characters, so a different number of keystrokes or gestures may be required compared to simple cursoring. This is particularly true for bidirectional text.</p> | ||
|
||
<p>Selection using a pointing device, such as a mouse, is subtly different in most implementations than using the cursor keys to extend a selection. When using a pointing device, the selection is entirely logical, between the start and end point of the selection. At least on most physical keyboards, the user can access text selection, usually by holding down the "shift" key while cursoring in the text. As noted before, the cursor keys always move visually and in the indicated direction of the key. For certain bidirectional texts this can mean that the entire text cannot be selected via the cursor keys alone!</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is incorrect. I just checked, using some bidi text in my Arabic picker, and when the shift key is held down and the cursor keys used to extend the highlight, the left and right cursor keys extend the highlight in the opposite direction to that observed when simply moving the cursor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try it on Firefox. Chrome does things differently.
This and other parts of the document strike me as over-simplified and in places incorrect, but there are terminological problems (which we are already familiar with) that cloud the issue. My experience in working with these things has lead me to view the world in terms of code points, which are grouped into grapheme clusters, which are in turn grouped into orthographic syllables. (I'm in the process of writing that up more clearly, elsewhere...) I'm inclined to agree with Norbert that this idea of user-perceived character boundaries is too vague and not clearly substantiated enough to be used as the name of a unit of segmentation. Rather it's merely a way of helping people imagine why code point units are not sufficient in some cases. The distinction between grapheme clusters and orthographic syllables is not informed by it's used, but is crucial in the information provided by this article. My experience has shown that browsers use these 3 different units for text operations such as cursor movement and deletion, depending on the language, and sometimes inconsistently within a single language, but also from browser to browser. I've been investigating this and writing up results for the various browsers in my orthography notes, under the section "Graphemes". It may be worth going to https://r12a.github.io/scripts/switch.html and selecting the 'graphemes' segment id, then cycling throught the orthographies using the control "Select an orthography". You should especially look for the subheading "Browser behaviour", where it exists, to find the results per browser. (I was wondering whether it would be useful to list behaviour against orthography in a table of some sort – not necessarily in this article, but somewhere.) That said, it's not clear to me what is your source of authoritative information about how cursoring and deletion should work. I don't think that it is made clear in the UAX how things should work, but is rather left up to the application to decide the exact mechanism.(?) Or are you meaning to describe what browsers currently do? I think it would be good to make that much clearer. I also think that the article should make it much clearer (actually, i think it's hardly mentioned at all other than for one Thai example) that very different segmentation rules may apply for other operations on the text, such as line breaking, justification, text spacing, and the like – and that this is not an issue, but is useful. The exceptions section alludes to the importance of orthographic syllables, but this isn't really an exception - even in terms of current browser support. Again it varies by browsers and by orthography, but it's something that needs to be mentioned either together with or given equal importance to the section entitled "Combining characters". |
|
||
<p>Indic scripts, such as the Devanagari and Tamil examples above, are not the only scripts affected by this. The same can be found for combining marks in many languages. For example, the first cluster in this Thai word: <q>คืออะไร</q>. [get better example; demonstrate middle cursor deletion effects in Thai]</p> | ||
<p>South-Asian scripts, such as the Devanagari and Tamil examples above, are not the only ones affected by this; similar behavior can be found in any script that employs combining marks. For example, the first cluster in this Thai word <q lang="th">ห้องน้ำ</q> has similar behavior. The end of this word shows additional complexity: the <span class="codepoint" translate="no"><bdi lang="th">ำ</bdi><code class="uname">U+0E33 THAI CHARACTER SARA AM</code></span> appears as a separate typographical unit for effects such as inter-character spacing, but behaves as a single grapheme for the purposes of selection, cursoring, and forward deletion.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description of the SARA AM is at the end of the word is not quite correct. It doesn't appear as a single typographic unit for text spacing: rather it is split into a combining mark and a letter, and the space is introduced before the latter – so actually it is split into 2 typographic units, the first of which includes the preceding letter and its combining tone mark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Richard for your comments.
I'm not unaware that this article doesn't really do the job and I agree with the points in your long comment above. Part of me wants to rubbish the whole thing. A proper job would require a thorough rewrite, which is more work than I think I'm willing to do, so I may look for a volunteer to take it over.
I think it might be useful to add an example of IVS. For example, the characters on this page are made of two code points (U+9F8D + an ideographic variation selector), but for users, they should be input, selected, and deleted as a whole. Regarding input methods, many input methods can already input IVS. We can mention cursor movement, selection, and deletion here. |
The working group elected not to complete work on this QA document. However, I don't want to lose the invested effort. I'm merging the changes for now. I should probably add a visible deprecation too. |
This is w3c/i18n-actions#6.
Preview it here: https://aphillips.github.io/i18n-drafts/questions/qa-backwards-deletion.en.html