You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Stemming from conversations here and here, it would be worthwhile to do our own comparison of language detection packages to see which is best for detecting the language of short text. Speed and size of the packages should also be considered. Packages of interest include langdetect (which we are currently using), fasttext (if it is compatible with py 3.11), langid, and lingua. We are also currently using a regex pattern and arbitrary text length limit to default to "eng", so this should also be considered/reconsidered.
See detect_languages in lang.py
The text was updated successfully, but these errors were encountered:
Stemming from conversations here and here, it would be worthwhile to do our own comparison of language detection packages to see which is best for detecting the language of short text. Speed and size of the packages should also be considered. Packages of interest include langdetect (which we are currently using), fasttext (if it is compatible with py 3.11), langid, and lingua. We are also currently using a regex pattern and arbitrary text length limit to default to "eng", so this should also be considered/reconsidered.
See
detect_languages
inlang.py
The text was updated successfully, but these errors were encountered: