-
Notifications
You must be signed in to change notification settings - Fork 966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should fix and extend the pluralization rules (WAS: Consider using built in .NET for underlying pluralization) #142
Comments
Thanks. I seriously considered using that for singularization and pluralization; but as you said really didn't want to depend on a different package, particularly one so commonly disliked by the community. One of the reasons I don't want to depend on another package is that Humanizer has an aggressive release cycle: releasing new features and patches on average every week. So if a bug is found on anything including the pluralization, which obviously is a bit unlikely, I want to be able to turn it around very quickly and not have to wait for the release of a third party library. Also this way I can add new features more freely. FWIW this feature has been there for a few months now and I haven't received any issues on it. So I think it should be good. Do you have an example of this working incorrectly? |
Sure, it works for most words, but words which pluralize irregularly are hit and miss. I don't blame you for avoiding Entity of course. Here are a few I discovered while checking some irregular words.
These aren't the most common words in application development, but I like to make sure I don't have to doublecheck if a word will work when using a function. Thanks for your attention. |
Oopsi! That's quite a few for a quick check! Do'h. Thanks for your effort. Perhaps an unfair question, but you think the issues may be mostly around irregular verbs? Because that should be relatively easy to fix. |
No worries. I went directly to irregular nouns to test, so I'm not sure if the issues are specific to these or if there are deeper problems. Even if there are other issues, fixing the irregular ones are as simple as adding them to the irregular word dictionary in the code... it's a step in the right direction. I wish there was a place to get dictionaries of nicely formatted data to test against. (is there?) |
Thanks. Yeah, I think we should at least do that. Wanna send me a PR for it? :p hehehe, getting a dictionary and iterating over it with Humanizer is a good idea :) If you find that solution we could also complete an abandoned kick-arse feature in Humanizer. |
There are some errors in the table you provided. "memorandum" => correct plurals are "memorandums" and "memoranda" |
Here's a csv that I put together that merges the existing tests, the above table, and other cases that I came across when implementing this pluralization algorithm: https://gist.github.com/nemec/201f6e2b2af3a4390f0b Humanizer won't open in Monodevelop (too lazy to boot into Windows), so I haven't yet tried running Humanizer against the dataset. Note that there are a couple of existing plural tests that are totally wrong, like virus that have been corrected. When there was more than one valid plural option, I picked the one that made my matching code simpler -- I don't think I changed any of the existing tests, but the new ones I've added may not be picked up by Humanizer's current ruleset. The regex dictionary is also really awesome for double checking your rules. I wanted to check my rule for "hoof" and other double-vowel-f's, so I plugged in |
Thanks @nemec. I cannot be sure about these to be honest. I see different things all over the place; e.g. for octopus (and there are some discussions around Virus in there too) from wikipedia:
I guess what we could do is to provide a solid baseline for the rules by adding some of the missing ones and fixing the existing rule and then open the API so users can plug-in new entries or override the behavior. Thoughts? |
You could add in support for returning a list of matches in the case where multiple alternatives are equally viable, or maybe rank the forms based on Google N-gram hits, but what we really need to do is ban most existing languages and forbid people from speaking languages that don't fit a context-free grammar ;) For the opposite direction, plural -> singular, it would be cool if it was comprehensive enough to accept all forms, even disputed ones, but that may not be feasible. Plugins for new entries is definitely a great idea. Giving users the ability to extend it with regional dialects (you -> y'all, for example) or things like pop culture could make the library feel more intelligent, even if those additions aren't strictly necessary. |
This is not possible as it would be a huge breaking change which not only impacts these methods but also things like
Good idea; although the mapping might get a bit complicated. Definitely worth considering. Show me the PR :) |
seems nothing to action here |
The built in .NET PluralizationService has more work done in correctly pluralizing most of English. Please consider using this as the service behind the Pluralize feature of Humanizer.
http://referencesource.microsoft.com/#System.Data.Entity.Design/Entity/Design/PluralizationService/EnglishPluralizationService.cs
I realize it is in an assembly you probably do not want to require as a dependency, but perhaps some notes can be taken from it.
The text was updated successfully, but these errors were encountered: