-
-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve logic behind spell checking text #8
Comments
…s to the sentences count fix. The logic to calculate spelling score needs to be attended to, see issue #8.
A simpler solution would be to revert back to the original logic:
and adjust the Words of Estimative Probability table to a stricter scoring:
We can tune this logic further with new input from users in the community. Eventually, this table could be made custom or can be passed as a parameter to assist in the scoring. |
… score is calculated and also adjusting the WEP table - making it stricter. Fixes issues #8.
The new logic can be found in https://github.com/neomatrix369/nlp_profiler/blob/master/nlp_profiler/spelling_quality_check.py#L59 and the changes are as per the comment #8 (comment). May not be the best or the optimal fix, but it's a simple fix to start with. |
Issue is partially fixed via #16. |
We have a spell checking functionality in NLP Profiler which uses a third-party library i.e.
TextBlob
, it does a decent job although the scores returned per misspelt word would then need to be correctly amortised across the whole text.Meaning, in a fair fashion evaluate on the whole how bad is the spelling in the text.
At the moment it's using the below logic:
Which can be improved as there are visible chances of false positive or false negative scores.
PS: performance of this feature is being addressed on #2, so this particular issue isn't about improving it's speed/performance. Performance issues may be addressed via other issues at a later stage. There has already been some significant performance improvements to the spell check and other aspects of NLP Profiler via #2.
Fix to #14 impacts, this issue, will need to also be fixed together.
Replace the spellchecker with the packageReplaced withpyspellchecker
(on PyPi) which appears to be closer to Peter Norvig's work.Symspellpy
(https://pypi.org/project/symspellpy/)The text was updated successfully, but these errors were encountered: