Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete word symbol #206

Open
snomos opened this issue Sep 22, 2022 · 5 comments
Open

Incomplete word symbol #206

snomos opened this issue Sep 22, 2022 · 5 comments

Comments

@snomos
Copy link
Member

snomos commented Sep 22, 2022

For various word part prediction / completion approaches, we need a way to tell the keyboard that word form suggestion X is not a complete word (yet), and thus should not be followed by a space character.

The actual prediction system will probably vary (fst-based, machine learning based, something else?), but the point here is that in certain cases, the suggestions given by the speller are NOT complete words, just fragments of words. The basic idea is that for languages with complex morphology, it will help users write if we can suggest parts of words at natural break points, and that when one part is selected, the system will suggest the next part. This means that the suggested parts are not real or full words, and thus should not be followed by a space character when selected.

The character should not be visible, it is just a hint to the underlying system whether a space character should be added or not for selected suggestions. When an incomplete word suggestion is selected, the full input string up until the end of the selected suggestion should be used to create new suggestions.

There is presently an alpha version of a system like this for Plains Cree (in the Divvun Dev Keyboard app) , when using the circumflex SRO keyboard layout («nêhiyawêwin», note circumflexes, not macrons). Getting a space character for every selected continuation is not fun.

To test it, try input as follows:

user input suggestions explanation
nikî nikî-~, nîki nîki is a complete word; nikî-~ is an incomplete word
nikî-wâp nikî-wâpam~, nikî-wâpamâw nikî-wâpam~ is an incomplete word; nikî-wâpamâw is a complete word

~ used to visually mark a word fragment as incomplete.

Whether or not to use a visual marker for incompleteness probably has to be language specific. It is probably not needed for Plains Cree, as the hyphen will give enough feedback. For other languages it will probably be needed.

Further discussions and examples can be found in giellalt/keyboard-crk#14 and linked issues.

@Eijebong @bbqsrc @aarppe

@snomos snomos changed the title Prediction incomplete word symbol Incomplete word symbol Sep 22, 2022
@aarppe
Copy link

aarppe commented Sep 22, 2022

If the marker of an incomplete word is part of the orthography, to indicate word-interal structure, such as the hyphen -, then it must remain when a suggestion of such an incomplete word is selected, and output, but without a trailing space.

However, if the marker of an incomplete is not part of the orthography, such as the tilde ~, then it should not be output when selected, and neither should such a selection be followed with a space.

The current crk alpha version actually deletes all incomplete word markers (~) from the input, if such a marker might have ended in the input as a result of selecting such a suggested incomplete word with such a marker. The removal of such a marker could just as well be done on the code side.

@snomos
Copy link
Member Author

snomos commented Sep 22, 2022

I think it will make the system technically simpler, and thus easier to maintain, if we separate what is shown to the user and what is used internally. Only one symbol internally, for all languages. What is shown to users is language specific, as suggested above.

@aarppe
Copy link

aarppe commented Sep 22, 2022

... we separate what is shown to the user and what is used internally.

This would be fine in my opinion. But then the sign that a language's orthography uses to indicate morpheme boundaries (and thus incomplete words), e.g. hyphen - in the case of Plains Cree, needs to be kept functionally separate about what the marker character is for indicating such incompleteness, i.e. such a character needs to be both shown to the user and be included in the appropriate suggestion (if using the hyphen as such an internal marker). In this sense, using a character such as tilde ~ to indicate incompleteness can be interpreted in whichever way to show the incompleteness of a suggestion (whether as a tilde or some other visual form), as well as then be discarded from the actual suggestion that the user choose to continue with, but the hyphen as the orthography-internal marker would still need to be output and included in such suggestions.

@flammie
Copy link
Contributor

flammie commented Oct 7, 2022

I started to experiment with this a bit, c.f. hfst-ospell-predict and the analyser branch of divvunspell have the switch -C for the incomplete word symbol and divvunspell I wrote it so that Suggestion datatype has an extra bool field for the finishedness of the word maybe that will be usable downstream for the UI.

@aarppe
Copy link

aarppe commented Oct 10, 2023

Another aspect to the table above is that sometimes a string is both complete and incomplete, e.g. nikî-wâpamâw would be the complete form with the translation 's/he saw him/her', but at the same time also the incomplete portion of nikî-wâpamâwak, translated as 's/he saw them'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants