-
Notifications
You must be signed in to change notification settings - Fork 472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider ways to automate en_US->en_GB dictionary corrections #1468
Comments
With
one can reverse the dictionary automatically. I will write some tests, that check dictionary_en-GB_to_en-US.txt against dictionary_en-US_to_en-GB.txt. |
It seems cleaner just to reverse it in Python at runtime to generate the opposite dict. Then there is less repetition in the repo |
I guess using |
yes. it would. but do british words with multiple spellings in us-american english exist? Or vice-versa? |
Dunno, I'm not a linguist. But I guess 'gas' in en-US can be spelled as both 'gas' and 'petrol' in en_GB 😉 🤣 |
Here be dragons: |
Not really. We have Kinda weird though in that en-GB the Hmm, and just to confuse things en-US also keeps the distinction between |
Which is exactly what this issue is about... Sorry I've retitled it, as I realise that wasn't very clear. |
Maybe for the dictionary_en-GB_to_en-US.txt only we could break some of the rules that apply to other dictionaries, and allow something like
which would then become:
when "reversed" into the en_US -> en_GB dictionary; so that when codespell encounters Although that might be too confusing, so perhaps a better/simpler approach would be to somehow indicate that the There's also the problem that when converting from en_US to en_GB you'd want to correct "color" to "colour" when used in natural-text, but you'd probably need to leave it as "color" in code-text as many functions / classes / etc. use the US spelling of "color". (hmmm, does codespell have the ability to use different dictionaries based on the file-extension of the file it's currently checking?) |
I think, the fact, that BE uses licence and license, but AE only license, is the game breaker for automated reversing. |
The construction rule could be "check for the reverse dict and add entries to it (in Python) as long as there is only one correction". Then GB->US can be as it is, and a new US->GB file (for now) can have the single entry |
and also |
Yeah probably -- just starting off with one for the sake of discussion. I would not expect the dictionary to stay at a single entry :) I'm sure there are many examples... |
It seems a shame to have to duplicate practice->practice,practise (it would also currently hit our corrects to itself test, so we'd need to make that optional for some places), can't we potentially leave it out of GB->US and use US->GB to populate that part of it? We're got:
Do we not need potentially four files to cover all cases? So we can use the 1:Many bit to drive most of it, but I think we still need some way of skipping some words which might seem to be reversible. And in the case of chips/crisps/fries potentially we'd need a non-sorted dictionary, so we don't do crisps->chips, chips->fries type things. |
And "fries" in US are "chips" in UK, but "french fries" are present in both UK and US? 🍟 EDIT: And I think "chocolate chip cookies" are the same in both? 🍪 And we definitely wouldn't want to auto-translate "silicon chips" to "silicon crisps" 😆 😋 Given how context-sensitive all of this is, maybe there's not actually much we can do? 😕 |
In en-GB_to_en-US, you can have: Please find script which auto generate both en-GB_to_en-US & en-US_to_en-GB dictionaries using SCOWL VarCon #1917 |
Here's how I'd do it: extend the format to use For example:
(Note the "one-way" conversion of
Yes. In international/traditional/proper English, |
Do we force this to be only one correction and then provide a function to reverse this dictionary for converting the other way too?
Originally posted by @peternewman in #1142
Maybe we can do this sort of thing someday, but for now I think it makes sense just to have the gb-to-us dict, it's not enabled by default, and if you enable it then you probably want the conversions to be done
Originally posted by @larsoner in #1142
The text was updated successfully, but these errors were encountered: