You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is a considerable speed difference between speller runs on the same text depending on whether hfst-ospell is allowed to give all suggestions or just a few:
tf-hsl-m0020:hfst smo036$ time preprocess test.txt | hfst-ospell -S tools/spellcheckers/fstbased/hfst/kl.zhfst > test.res.hsp-all.txt
real 0m40.156s
user 0m40.152s
sys 0m0.046s
tf-hsl-m0020:hfst smo036$ time preprocess test.txt | hfst-ospell -S -n5 tools/spellcheckers/fstbased/hfst/kl.zhfst > test.res.hsp-5.txt
real 0m10.123s
user 0m10.132s
sys 0m0.039s
tf-hsl-m0020:hfst smo036$ time preprocess test.txt | hfst-ospell -S -n10 tools/spellcheckers/fstbased/hfst/kl.zhfst > test.res.hsp-10.txt
real 0m11.897s
user 0m11.897s
sys 0m0.043s
At the same time voikkospell (which only gives 5 suggestions maximum - always) is markedly slower than hfst-ospell:
$ time preprocess test.txt | voikkospell -s -d kl -p tools/spellcheckers/fstbased/hfst/ > test.res.vk.txt
real 0m16.588s
user 0m16.334s
sys 0m0.305s
I don't know the details of libvoikko's interactions with libhfstospell, but since there is no built-in configure-time/compile-time option to limit the number of suggestions in hfst-ospell, could it be that hfst-ospell is generating a lot of suggestions in the background that are never used? Please note that there would be fewer "misspellings" comming from voikkospell, since voikkospell handles upper/lower casing automatically, whereas hfst-ospell (at least with the tested fst) only accepts lexical case. This difference might be one explanation for voikko being faster than the all-suggestion call to hfst-ospell (but still 1,5 slower than the corresponding hfst-ospell with only 5 suggestions).
In any case I believe that being able to set a default number of suggestions at compile time is an easy way to ensure that hfst-ospell is not slower than needed.
Reported by: snomos
The text was updated successfully, but these errors were encountered:
IIRC voikko interface predates limit options and especially such limit implementation that provides speed gains, however, a good upgrade should maybe use voikko options to determine max suggestions during run-time (possibly in addition to this static max configure option), this would possibly give users of various interfaces the option to tune it themselves, although the defaults in most implementations like Offices and enchant are probably maxed at 5–8 now?
It sounds like a good idea to use whatever voikko options there are. Different interfaces and apps have different behavior: MacOSX system wide speller does not have any restrictions, and the number of suggestions depends on the underlying speller (Hunspell seems to produce potentially huge lists of suggestions). Voikko limits the suggestions to 5 in all contexts, whereas MS Word shows 5 suggestions when rightclicking, but up to 20 suggestions in the spelling and grammar dialog. For about any user, more than 5 suggestions do not make any sense - it is too hard to see the correct one, or it takes too much time.
My idea was to use the static limit only as a default, letting the outside caller (Voikko, hfst-ospell command line tool, any other host app) set the actual limit via overrides.
There is a considerable speed difference between speller runs on the same text depending on whether hfst-ospell is allowed to give all suggestions or just a few:
At the same time voikkospell (which only gives 5 suggestions maximum - always) is markedly slower than hfst-ospell:
I don't know the details of libvoikko's interactions with libhfstospell, but since there is no built-in configure-time/compile-time option to limit the number of suggestions in hfst-ospell, could it be that hfst-ospell is generating a lot of suggestions in the background that are never used? Please note that there would be fewer "misspellings" comming from voikkospell, since voikkospell handles upper/lower casing automatically, whereas hfst-ospell (at least with the tested fst) only accepts lexical case. This difference might be one explanation for voikko being faster than the all-suggestion call to hfst-ospell (but still 1,5 slower than the corresponding hfst-ospell with only 5 suggestions).
In any case I believe that being able to set a default number of suggestions at compile time is an easy way to ensure that hfst-ospell is not slower than needed.
Reported by: snomos
The text was updated successfully, but these errors were encountered: