Update the "backwards deletion" Q+A article #520

aphillips · 2023-08-24T14:54:40Z

Preview it here: https://aphillips.github.io/i18n-drafts/questions/qa-backwards-deletion.en.html

- Clean up javascript - Fix examples [NOT YET READY FOR REVIEW]

@r12a

- Thanks to @r12a for the page: https://r12a.github.io/scripts/thai/th.html#webSegmentation - I used the word 'toilet' from this example because it has a sara am (U+0E33) - I added mention of the sara am

netlify · 2023-08-24T14:54:45Z

✅ Deploy Preview for i18n-drafts ready!

Name	Link
🔨 Latest commit	`179dfa0`
🔍 Latest deploy log	https://app.netlify.com/sites/i18n-drafts/deploys/65a1575c89752f00097a0ebd
😎 Deploy Preview	https://deploy-preview-520--i18n-drafts.netlify.app/questions/qa-backwards-deletion.en
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

xfq · 2023-08-25T05:36:18Z

I updated the article to match the latest template.

We also need to:

add a qa-backwards-deletion/translations.js file
add alt attribute to the images

questions/qa-backwards-deletion.en.html

xfq · 2023-08-25T06:54:36Z

questions/qa-backwards-deletion.en.html

+  <div class="try">
+	  <h4>Try it in your browser</h4>
+	  <p>Try selecting, cursoring, deleting, and backspacing with this word in Hindi (in the Devanagari script). The word means "Unicode" and contains <em>four</em> graphemes and <em>seven</em> Unicode code points.</p>
+	  <p><input id="tryHindi" type="text" name="tryHindi" lang="hi" class="try" value="&#x92f;&#x942;&#x928;&#x93f;&#x915;&#x94b;&#x921;"></input>


Note that input elements don't need a close tag. (Same for other instances.)

Co-authored-by: Fuqiao Xue <[email protected]>

- add a note about vertical text - fix tamil tag - reword about thai's relationship to indic

I took it to the logical conclusion and just made a small section about vertical text at the end. It makes the document more attractive.

questions/qa-backwards-deletion.en.html

style/article-2022.css

r12a · 2023-08-29T15:59:53Z

questions/qa-backwards-deletion.en.html


-  <p>In vertical text, the left arrow moves left one row and the right arrow right one row, while up and down arrows move one visual character up or down in the row of text.</p>
+  <p>In vertical text, the left arrow moves left one row and the right arrow right one row, while up and down arrows move one user-perceived character (grapheme) up or down in the row of text.</p>

  <h3 id="text_selection_desc">Selecting text via the keyboard</h3>

  <p>Text selection begins much like cursoring, by positioning the cursor at the start (or end) of the desired text and then selecting to the other end of the desired text. This can be done using a pointing instrument, such as a mouse, or using keyboard gestures such as holding "shift" and cursoring through the text. Unlike cursoring, text selection is constrained by the need to select logical characters, so a different number of keystrokes or gestures may be required compared to simple cursoring. This is particularly true for bidirectional text.</p>

  <p>Selection using a pointing device, such as a mouse, is subtly different in most implementations than using the cursor keys to extend a selection. When using a pointing device, the selection is entirely logical, between the start and end point of the selection. At least on most physical keyboards, the user can access text selection, usually by holding down the "shift" key while cursoring in the text. As noted before, the cursor keys always move visually and in the indicated direction of the key. For certain bidirectional texts this can mean that the entire text cannot be selected via the cursor keys alone!</p>


I believe this is incorrect. I just checked, using some bidi text in my Arabic picker, and when the shift key is held down and the cursor keys used to extend the highlight, the left and right cursor keys extend the highlight in the opposite direction to that observed when simply moving the cursor.

Try it on Firefox. Chrome does things differently.

r12a · 2023-08-29T16:14:03Z

Generally speaking, most text navigation and editing follows the user-perceived character boundaries. For most implementations this corresponds to Unicode's definition of "default extended grapheme cluster" boundaries [UAX29]. The main exception to this is backspacing, which usually follows Unicode code point boundaries in the underlying encoded text (although there are exceptions to this). For the simplest scripts and languages, these often amount to the same thing.

This and other parts of the document strike me as over-simplified and in places incorrect, but there are terminological problems (which we are already familiar with) that cloud the issue. My experience in working with these things has lead me to view the world in terms of code points, which are grouped into grapheme clusters, which are in turn grouped into orthographic syllables. (I'm in the process of writing that up more clearly, elsewhere...)

I'm inclined to agree with Norbert that this idea of user-perceived character boundaries is too vague and not clearly substantiated enough to be used as the name of a unit of segmentation. Rather it's merely a way of helping people imagine why code point units are not sufficient in some cases. The distinction between grapheme clusters and orthographic syllables is not informed by it's used, but is crucial in the information provided by this article.

My experience has shown that browsers use these 3 different units for text operations such as cursor movement and deletion, depending on the language, and sometimes inconsistently within a single language, but also from browser to browser. I've been investigating this and writing up results for the various browsers in my orthography notes, under the section "Graphemes". It may be worth going to https://r12a.github.io/scripts/switch.html and selecting the 'graphemes' segment id, then cycling throught the orthographies using the control "Select an orthography". You should especially look for the subheading "Browser behaviour", where it exists, to find the results per browser.

(I was wondering whether it would be useful to list behaviour against orthography in a table of some sort – not necessarily in this article, but somewhere.)

That said, it's not clear to me what is your source of authoritative information about how cursoring and deletion should work. I don't think that it is made clear in the UAX how things should work, but is rather left up to the application to decide the exact mechanism.(?) Or are you meaning to describe what browsers currently do? I think it would be good to make that much clearer.

I also think that the article should make it much clearer (actually, i think it's hardly mentioned at all other than for one Thai example) that very different segmentation rules may apply for other operations on the text, such as line breaking, justification, text spacing, and the like – and that this is not an issue, but is useful.

The exceptions section alludes to the importance of orthographic syllables, but this isn't really an exception - even in terms of current browser support. Again it varies by browsers and by orthography, but it's something that needs to be mentioned either together with or given equal importance to the section entitled "Combining characters".

r12a · 2023-08-29T16:24:38Z

questions/qa-backwards-deletion.en.html


-  <p>Indic scripts, such as the Devanagari and Tamil examples above, are not the only scripts affected by this. The same can be found for combining marks in many languages. For example, the first cluster in this Thai word: <q>คืออะไร</q>. [get better example; demonstrate middle cursor deletion effects in Thai]</p>
+  <p>South-Asian scripts, such as the Devanagari and Tamil examples above, are not the only ones affected by this; similar behavior can be found in any script that employs combining marks. For example, the first cluster in this Thai word <q lang="th">ห้องน้ำ</q> has similar behavior. The end of this word shows additional complexity: the <span class="codepoint" translate="no"><bdi lang="th">&#xe33;</bdi><code class="uname">U+0E33 THAI CHARACTER SARA AM</code></span> appears as a separate typographical unit for effects such as inter-character spacing, but behaves as a single grapheme for the purposes of selection, cursoring, and forward deletion.</p>


The description of the SARA AM is at the end of the word is not quite correct. It doesn't appear as a single typographic unit for text spacing: rather it is split into a combining mark and a letter, and the space is introduced before the latter – so actually it is split into 2 typographic units, the first of which includes the preceding letter and its combining tone mark.

Thanks Richard for your comments.

I'm not unaware that this article doesn't really do the job and I agree with the points in your long comment above. Part of me wants to rubbish the whole thing. A proper job would require a thorough rewrite, which is more work than I think I'm willing to do, so I may look for a volunteer to take it over.

xfq · 2023-10-29T07:29:25Z

I think it might be useful to add an example of IVS. For example, the characters on this page are made of two code points (U+9F8D + an ideographic variation selector), but for users, they should be input, selected, and deleted as a whole. Regarding input methods, many input methods can already input IVS. We can mention cursor movement, selection, and deletion here.

aphillips · 2024-01-12T15:15:51Z

The working group elected not to complete work on this QA document. However, I don't want to lose the invested effort. I'm merging the changes for now. I should probably add a visible deprecation too.

aphillips and others added 9 commits August 9, 2023 12:50

Completing this Q+A article

18518f2

- Clean up javascript - Fix examples [NOT YET READY FOR REVIEW]

Further improvements of "try it" examples

56189c6

Add Thai example from 2023-08-10 telecon

83e3743

- Thanks to @r12a for the page: https://r12a.github.io/scripts/thai/th.html#webSegmentation - I used the word 'toilet' from this example because it has a sara am (U+0E33) - I added mention of the sara am

Minor edits.

aeb82b8

Cleanup JS and resetting of page

39d9342

fix the try blocks with good text

b4b5d19

Add character styles.

091d6bf

Update qa-backwards-deletion.en.html

d0f6fee

Merge branch 'w3c:gh-pages' into gh-pages

675efa4

aphillips requested review from xfq and r12a August 24, 2023 14:54

aphillips mentioned this pull request Aug 24, 2023

Check if unicode describes backwards deletion clearly and develop doc if not w3c/i18n-actions#6

Closed

Update the template

535f8ca