-
-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix English word list retrieval in qmk generate-autocorrect-data #20915
Conversation
- Update to address breaking changes in the english_words package - english_words_lower_alpha_set now replaced by get_english_words_set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can confirm that this works.
Also, looks like the new english_words package causes it to catch a LOT more potential issues.
Yes, the word lists seem quite a bit bigger (almost 10x). There are actually two lists, and I selected Version 1 list:
Version 2 lists:
|
Package documentation including the source for each of the lists - https://pypi.org/project/english-words/ |
Perhaps add a cli argument to select the different word lists? |
I feel an argument might be over-engineering. A bigger word list will show more potential issues, but it doesn't block the generation of the file, or return an exit code so should not be a breaking change, unless someone is parsing the output. If someone is parsing the output for an automated process, then I feel like they care enough to at least look at the clashes and probably should have a means to acknowledge/ignore the clash. Both word lists are quite a bit bigger than the version 1 list. If the old behavior (smaller list) is desired the user can pin
If going down the argument path, then should it allow custom lists? I feel like it probably should if an argument is being added, as there isn't a significant difference between the two lists and if someone is unhappy with the list they are more likely to want their own list than either of the ones provided by the package. I chose the I feel like this feature is intended as more of a heads up that you might have some conflicts with correctly spelt words. A bigger list will generate more matches, but then that is probably desirable and it's not hard to scan the list for words you actually use. Thoughts? |
I personally don't use autocorrect; was more that you had a few options for selecting a wordlist -- figured it could be parameterised. Will defer to @drashna here -- he's an avid user. |
Would it make more sense to just implement support for v2? |
Since the new version doesn't get automatically updated... worst case, we'd want to check for the old version and issue a notice. That said, given that the new version seems to be a lot more robust, I'd rather we push for that version.
Looks like it's from a 1913 Websters dictionary, specifically. So yeah, the web2 library would likely be the best option, as while outdated still, it's from 2017, a full century more recently. And as for alpha/lower setting, that does look to be the same as from the 1.1 version. |
It can probably be checked with |
It's only a couple of lines to handle the old version. I don't think it's a significant maintenance issue, and could be removed when people have had time to update if it does become a problem. I'll make it output a message to suggest updating. Thanks for the |
The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me and tested on my end.
No worries! And
Ideally, should support both for now, push people to the v2 module, and then later (next cycle or so), remove support for the 1.x version. |
Description
The english_words package has breaking changes in v2.0 which results in the failure to load the English dictionary and an error indicating the english_words package should be installed for the functionality to work. This PR implements support for either v1 or v2 of the english_words package.
Types of Changes
Issues Fixed or Closed by This PR
Checklist