Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Titlelize returns empty if text contains no known letters #385

Open
alowdon opened this issue Feb 24, 2015 · 21 comments
Open

Titlelize returns empty if text contains no known letters #385

alowdon opened this issue Feb 24, 2015 · 21 comments
Labels

Comments

@alowdon
Copy link

alowdon commented Feb 24, 2015

This might be by design, because this is not necessarily appropriate input, but if you try the following, for example:

"Майк".Titleize()
"@@".Titleize()

then it will throw an InvalidOperationException:

System.InvalidOperationException : Sequence contains no elements
   at System.Linq.Enumerable.Aggregate(IEnumerable`1 source, Func`3 func)
   at Humanizer.StringHumanizeExtensions.FromPascalCase(String input) in c:\TeamCity\buildAgent\work\8b6380f423671986\src\Humanizer\StringHumanizeExtensions.cs: line 32
   at Humanizer.StringHumanizeExtensions.Humanize(String input) in c:\TeamCity\buildAgent\work\8b6380f423671986\src\Humanizer\StringHumanizeExtensions.cs: line 64
   at Humanizer.StringHumanizeExtensions.Humanize(String input, LetterCasing casing) in c:\TeamCity\buildAgent\work\8b6380f423671986\src\Humanizer\StringHumanizeExtensions.cs: line 75
   at Humanizer.InflectorExtensions.Titleize(String input) in c:\TeamCity\buildAgent\work\8b6380f423671986\src\Humanizer\InflectorExtensions.cs: line 269

I would expect it to either throw a more relevant exception, or just to return the original input?

(P.S. Humanizer is a great library - thanks for producing and maintaining it!)

@alowdon alowdon changed the title Titlelize throws InvalidOperationException thrown if text contains no known letters Titlelize throws InvalidOperationException if text contains no known letters Feb 24, 2015
@hazzik hazzik added the bug label Feb 24, 2015
@MehdiK
Copy link
Member

MehdiK commented Feb 24, 2015

Thanks for reporting this Adrian.

This is not by design; it's a bug.

Did you want to send a PR? :)
On 24/02/2015 11:12 PM, "Adrian Lowdon" [email protected] wrote:

This might be by design, because this is not necessarily appropriate
input, but if you try the following, for example:

"Майк".Titleize()
"@@".Titleize()

then it will throw an InvalidOperationException:

System.InvalidOperationException : Sequence contains no elements
at System.Linq.Enumerable.Aggregate(IEnumerable1 source, Func3 func)
at Humanizer.StringHumanizeExtensions.FromPascalCase(String input) in c:\TeamCity\buildAgent\work\8b6380f423671986\src\Humanizer\StringHumanizeExtensions.cs: line 32
at Humanizer.StringHumanizeExtensions.Humanize(String input) in c:\TeamCity\buildAgent\work\8b6380f423671986\src\Humanizer\StringHumanizeExtensions.cs: line 64
at Humanizer.StringHumanizeExtensions.Humanize(String input, LetterCasing casing) in c:\TeamCity\buildAgent\work\8b6380f423671986\src\Humanizer\StringHumanizeExtensions.cs: line 75
at Humanizer.InflectorExtensions.Titleize(String input) in c:\TeamCity\buildAgent\work\8b6380f423671986\src\Humanizer\InflectorExtensions.cs: line 269

I would expect it to throw either a more relevant exception, or just to
return the original input?

(P.S. Humanizer is a great library - thanks for producing and maintaining
it!)


Reply to this email directly or view it on GitHub
#385.

@alowdon
Copy link
Author

alowdon commented Feb 28, 2015

Thanks Mehdi - if I get chance I'll try to have a look!

@cusmonaut
Copy link

Hello, I am unable to reproduce the InvalidOperationException even with the examples given it just returns an empty string. Is this still an issue and if so is returning an empty string the desired behavior?

@alowdon
Copy link
Author

alowdon commented Mar 9, 2015

I've just updated to v1.34.0 - it no longer throws the exception, but I'm not convinced that returning an empty string is the desired behaviour, personally I would expect it to return the original string if it can't process it?

@cusmonaut
Copy link

I agree, returning an empty string feels like data lose. I think we should always try to save as much information as we can in a situation like this.

@MehdiK MehdiK changed the title Titlelize throws InvalidOperationException if text contains no known letters Titlelize returns empty if text contains no known letters Mar 10, 2015
@MehdiK
Copy link
Member

MehdiK commented Mar 10, 2015

Thanks for double checking this @alowdon. I have changed the title of the issue accordingly.

I agree that this is not desired behavior; but we will have to maintain this behavior for the existing users as what might look like a bug to us might be an expected behavior to them and I don't want to introduce a bug for some users by fixing a bug for others.

I have flagged this to be fixed as part of V2 which will include breaking changes. Until then you could wrap this with your own API and make it behave!

@MehdiK MehdiK added this to the V2 milestone Mar 10, 2015
@cusmonaut
Copy link

@MehdiK If I were to fix this bug should I just do a pull request and comment it as for V2? Or is there a repo for V2?

@MehdiK
Copy link
Member

MehdiK commented Apr 5, 2015

@cusmonaut a separate repo would diverge from this one very quickly and so does a PR that stays here for a while.

The only way to have this now would be to add an optional parameter to the method that indicates what should happen to special and non-ascii characters defaulting to the current approach. Something like:

public static class Titlize
{
  public static ITitlizeStrategy DropSpecialChars = new TitlizeDropSpecialChars();
}

string Titlize(this string input, ITitlizeStrategy strategy = Titlize.DropSpecialChars)

@bahadirarslan
Copy link

Hi,

Titlize have some problems with non standart chars. For example there dotless and dotted i letter and this cause some problems.

Consider a text which contains dotless small i letter in it, like "kadın sesi" (which means "woman voice" in english btw)
So if you titlize it with Humanizer "kadın sesi".Titlize() the result will be like this "Kad N Sesi"
Or if you titlize word "Yönetici", which means manager in english", the result will be like "Netici"

@MehdiK
Copy link
Member

MehdiK commented Apr 17, 2015

On a second thought, due to this bug, I don't think anyone's using Titlize for non-English words. So it should be safe to fix this as part of V1 with no backward compatibility.

@MehdiK MehdiK removed this from the V2 milestone Apr 17, 2015
@jeremysimmons
Copy link
Contributor

This bug is in StringHumanizeExtensions:FromPascalCase(string input) string
PascalCaseWordPartsRegex causes strings like "Майк" and "@@" to become whitespace.
Did you ever experiment with TextInfo.ToTitleCase?

@chris03
Copy link

chris03 commented Aug 31, 2015

I don't think anyone's using Titlize for non-English words

I'm using Humanize(LetterCasing.Title) to format the name of our clients and a lot of them have french names with accents (i.e. Stéphanie, François). It's kind of lame that these simple characters with accents disappear when they go trough Humanizer.

I found a workaround by calling Humanizer.To.TitleCase.Transform(""); I get the expected output.

@MehdiK
Copy link
Member

MehdiK commented Sep 2, 2015

Thanks for the input @chris03. That's actually a perfect use case and proves my assumption completely wrong.

Again I think we should fix the bug. We accept PRs too :)

@clairernovotny
Copy link
Member

@jeremysimmons Any chance you could take a stab at fixing that RegEx?

@jeremysimmons
Copy link
Contributor

@onovotny I'll take a peek. Thank you for asking. Why aren't we just using TextInfo.ToTitleCase http://referencesource.microsoft.com/#mscorlib/system/globalization/textinfo.cs,706 for this?
It works on all the test cases mentioned above...
@@ -> @@
kadın sesi -> Kadın Sesi
yönetici -> Yönetici
stéphanie -> Stéphanie
françois -> François
mайк -> Mайк

@jeremysimmons
Copy link
Contributor

@onovotny this is a portable assembly. TextInfo.ToTitleCase isn't available in portable assemblies.

@jeremysimmons
Copy link
Contributor

and. wow. that was rather painless.

old and busted.
image

new hotness.
image

too bad it breaks other unit tests...

@jeremysimmons
Copy link
Contributor

@onovotny this is a working regex you can plug into PascalCaseWordPartsRegex in StringHumanizeExtensions.

PascalCaseWordPartsRegex = new Regex(@"\p{Lu}?\p{Ll}+|[0-9]+|\p{Lu}+(?=\p{Lu}\p{Ll}|[0-9]|\b)",

Does not work with @@ -> @@, still produces @@ -> String.Empty.
Works with all the other cases though, and no failing unit tests.

This should work for users that have a language that characters outside basic Latin.

I would have done a pull request, but I'm having some issues getting the dev branch working locally.

@jeremysimmons
Copy link
Contributor

@MehdiK @onovotny Any opinion on whether the proposed regex above will work? Are we ok not supporting strings that are entirely symbols (@, ?, &, etc) for now?

@mexx
Copy link
Collaborator

mexx commented Jan 6, 2016

IMHO any unrecognized characters should be left as is.

@MehdiK
Copy link
Member

MehdiK commented Jan 6, 2016

I'm with @mexx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants