-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incomplete word symbol #206
Comments
If the marker of an incomplete word is part of the orthography, to indicate word-interal structure, such as the hyphen However, if the marker of an incomplete is not part of the orthography, such as the tilde The current crk alpha version actually deletes all incomplete word markers ( |
I think it will make the system technically simpler, and thus easier to maintain, if we separate what is shown to the user and what is used internally. Only one symbol internally, for all languages. What is shown to users is language specific, as suggested above. |
This would be fine in my opinion. But then the sign that a language's orthography uses to indicate morpheme boundaries (and thus incomplete words), e.g. hyphen |
I started to experiment with this a bit, c.f. hfst-ospell-predict and the analyser branch of divvunspell have the switch -C for the incomplete word symbol and divvunspell I wrote it so that Suggestion datatype has an extra bool field for the finishedness of the word maybe that will be usable downstream for the UI. |
Another aspect to the table above is that sometimes a string is both complete and incomplete, e.g. nikî-wâpamâw would be the complete form with the translation 's/he saw him/her', but at the same time also the incomplete portion of nikî-wâpamâwak, translated as 's/he saw them'. |
For various word part prediction / completion approaches, we need a way to tell the keyboard that word form suggestion X is not a complete word (yet), and thus should not be followed by a space character.
The actual prediction system will probably vary (fst-based, machine learning based, something else?), but the point here is that in certain cases, the suggestions given by the speller are NOT complete words, just fragments of words. The basic idea is that for languages with complex morphology, it will help users write if we can suggest parts of words at natural break points, and that when one part is selected, the system will suggest the next part. This means that the suggested parts are not real or full words, and thus should not be followed by a space character when selected.
The character should not be visible, it is just a hint to the underlying system whether a space character should be added or not for selected suggestions. When an incomplete word suggestion is selected, the full input string up until the end of the selected suggestion should be used to create new suggestions.
There is presently an alpha version of a system like this for Plains Cree (in the Divvun Dev Keyboard app) , when using the circumflex SRO keyboard layout («nêhiyawêwin», note circumflexes, not macrons). Getting a space character for every selected continuation is not fun.
To test it, try input as follows:
nikî
nikî-~
,nîki
nîki
is a complete word;nikî-~
is an incomplete wordnikî-wâp
nikî-wâpam~
,nikî-wâpamâw
nikî-wâpam~
is an incomplete word;nikî-wâpamâw
is a complete word~
used to visually mark a word fragment as incomplete.Whether or not to use a visual marker for incompleteness probably has to be language specific. It is probably not needed for Plains Cree, as the hyphen will give enough feedback. For other languages it will probably be needed.
Further discussions and examples can be found in giellalt/keyboard-crk#14 and linked issues.
@Eijebong @bbqsrc @aarppe
The text was updated successfully, but these errors were encountered: