Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dictionary questions #1660

Open
lurch opened this issue Aug 30, 2020 · 6 comments
Open

Dictionary questions #1660

lurch opened this issue Aug 30, 2020 · 6 comments

Comments

@lurch
Copy link
Contributor

lurch commented Aug 30, 2020

I've not yet got my head around the way the multiple dictionaries work and/or fit together (some examples in README of what the different dictionaries are and how they're expected to be used would be helpful 😉 ), but...

  • why is crate->create in dictionary_rare.txt ? crate seems like a perfectly normal word to me (and is obviously used quite extensively in the Rust community).
  • why are
    isconnection->isconnected
    iscrated->iscreated
    
    in dictionary.txt ?

Maybe the latter should be added to #1624 @sebweb3r ?

@sebweb3r
Copy link
Contributor

I'm open to more suggestions. But i would favor a "fix stuff 2.0" 😄
#1624 is already quite old/big.

@sebweb3r
Copy link
Contributor

I think a meta-problem exists. #1660, #1469, #1468, #1275

Codespell wants to fix, enGB, enUS and coding slang.
At the same time, the words and suggestions should be checked against existing dictionaries.

Thus, some merge requests cannot be accepted (f.e. #1485, #1626 ), since their words are not in in aspell.

And the naming of dictionary_code.txt is contradicting to the naming of the other books. It is basically dictionary_don't_check_ code_with_this.txt.

@lurch
Copy link
Contributor Author

lurch commented Sep 1, 2020

versionaddded->versionadded, verticlealign->verticalalign and viewtransfromation->viewtransformation also all seem like things that don't belong in the main dictionary? (and I'm sure there's probably many others, I've not checked the whole thing!!)

As well as dictionary_don't_check_code_with_this.txt maybe we also want a dictionary_only_check_code_with_this.txt ? 😉

@peternewman
Copy link
Collaborator

I've not yet got my head around the way the multiple dictionaries work and/or fit together (some examples in README of what the different dictionaries are and how they're expected to be used would be helpful wink ), but...

The --help explains quite a bit. If it's not sufficient then we should be improving that first.

* why is `crate->create` in [dictionary_rare.txt](https://github.com/codespell-project/codespell/blob/master/codespell_lib/data/dictionary_rare.txt) ? `crate` seems like a perfectly normal word to me

Same as sting->string, the chances of meaning one but typing the other. Or in other words it was somewhere for us to move existing corrections that were actually real words. As @sebweb3r mentioned, I've pondered if we should split rare further based on presence in some level of aspell dictionary or some other arbitrary decision (but something like aspell means it can be automated).

(and is obviously used quite extensively in the Rust community).

I wasn't personally aware of this, that might be an argument to remove it, or just for you to add it to the ignore list when using it.

* why are
  ```
  isconnection->isconnected
  iscrated->iscreated
    ```

Maybe the latter should be added to #1624 @sebweb3r ?

I don't know, but I'd imagine they're from the Linux kernel or something. Adding a few more suggestions (with space of the original and the correction) would probable keep everyone happy.

At the same time, the words and suggestions should be checked against existing dictionaries.

Thus, some merge requests cannot be accepted (f.e. #1485, #1626 ), since their words are not in in aspell.

We could turn off the aspell checking and accept them now, but then we're back to #1624 . To me personally, having clean and accurate replacements is more important than having the impossibly 100% coverage. Although as I've mentioned elsewhere I'd personally favour splitting some of these dictionaries for words that either aren't in aspell at all, or aren't in easy aspell dictionaries, as it would then mean there are only a handful of words we need to go over with a fine toothed comb when they're added to the dictionary as they aren't in aspell.

And the naming of dictionary_code.txt is contradicting to the naming of the other books. It is basically dictionary_don't_check_ code_with_this.txt.

Which does it conflict with? names has names correcting to other words, rare has real words that are rare correcting to what the person is more likely to have typed.

versionaddded->versionadded, verticlealign->verticalalign and viewtransfromation->viewtransformation also all seem like things that don't belong in the main dictionary? (and I'm sure there's probably many others, I've not checked the whole thing!!)

Aren't they just missing a correction to the one with a space in as well?

@peternewman
Copy link
Collaborator

peternewman commented Sep 2, 2020

I'd probably agree actually, isconnection->isconnected should probably go looking at grep.app (or suggest disconnection). Whereas iscrated->iscreated appears to have some merit.

@lurch
Copy link
Contributor Author

lurch commented Sep 2, 2020

I'm afraid I don't have time (right now) to get bogged down with discussing different dictionaries, and which corrections should go where, so I'll leave this as something for you and @sebweb3r and @larsoner to discuss 😉

luzpaz added a commit that referenced this issue Sep 9, 2024
DimitriPapadopoulos pushed a commit that referenced this issue Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants