-
-
Notifications
You must be signed in to change notification settings - Fork 506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Internet.UserName ru locale return numeric characters only #225
Comments
Hi @Orygeunik, Thanks for your question. I think this is due to issue #86. The reason is, people complained that Bogus generated emails with diacritics. So, Bogus would have generated an email address like IIRC, technically, glyphs other than ASCII are valid email addresses but people don't make validation pass in their systems if they contain non-ASCII characters. The issue here is really because Bogus/Source/Bogus/DataSets/Internet.cs Lines 51 to 56 in f4cd220
Lines 15 to 19 in f4cd220
Bogus/Source/Bogus/Extensions/ExtensionsForString.cs Lines 33 to 53 in f4cd220
As you can see, the call stack ultimately hits I guess this boils down to the following question: is For now, there are a few workarounds which is to basically extend Bogus with your own custom extension methods: void Main()
{
var faker = new UserInfoFaker();
faker.Generate(10).Dump();
}
public class UserInfoFaker : Faker<UserInfo>
{
public UserInfoFaker() : base("ru")
{
RuleFor(ui => ui.FirstName, f => f.Person.FirstName);
RuleFor(ui => ui.LastName, f => f.Person.LastName);
RuleFor(ui => ui.UserLogin, f => f.UserName2() );
RuleFor(ui => ui.UserLogin3, (f, ui) => f.UserName3(ui.FirstName, ui.LastName) );
RuleFor(ui => ui.UserPassword, f => f.Internet.Password());
}
}
public class UserInfo{
public string FirstName{get;set;}
public string LastName{get;set;}
public string UserLogin{get;set;}
public string UserLogin3{get;set;}
public string UserPassword{get;set;}
}
public static class CustomExtensions{
public static string UserName2(this Faker f){
var en = f.Name["en"];
return f.Internet.UserName(en.FirstName(), en.LastName());
}
public static string UserName3(this Faker f, string firstName, string lastName)
{
var val = f.Random.Number(2);
string result;
if (val == 0)
{
result = firstName + f.Random.Number(99);
}
else if (val == 1)
{
result = firstName + f.Random.ArrayElement(new[] { ".", "_" }) + lastName;
}
else
{
result = firstName + f.Random.ArrayElement(new[] { ".", "_" }) + lastName + f.Random.Number(99);
}
result = result.Replace(" ", string.Empty);
return result;
}
} I think there's possibly some work we could do here to make it a little easier if you want to keep the diacritics. Perhaps a parameter like Let me know what you think. Thanks, 💨 🚶 "Bubbles of gas in my brain... Send me off balance, it's not enough" |
It's good. And I think the second way of solution this allow non-diactric characters to be translated into Latin. (Maybe enum option (remove, translit, other)?) For example It's simple and modular way And, why if you set the locale in the .ctor body (as in the example above), logins/mails are generated as if the locale is "en" (default)? |
Additional question. |
Hi @Orygeunik, I just learned the process of translating Unicode characters to US-ASCII Latin/Roman characters is called "transliteration". Knowing is half the battle. Lol. 😃 I'll see what I can do to make Bogus better in this respect. To be honest, the issue described here has been an issue I never really put to rest. I had a feeling this issue was going to come up again. So I think it's finally time to put this issue at rest once and for all. If the community has more input or anyone can offer more insight, please let me know. As far as I can tell, a quick google search for projects that specifically 'solve' "transliteration" are linked below: https://github.com/pid/speakingurl If anyone has experience using them (or with transliteration in general), please let me know. As for password generation with Cyrillic letters, I don't think Bogus will change the password generation algorithm at the moment. But I do get what you're saying, it would be nice if Bogus switched algorithms when void Main()
{
var faker = new UserInfoFaker();
faker.Generate(10).Dump();
}
public class UserInfoFaker : Faker<UserInfo>
{
public UserInfoFaker() : base("ru")
{
RuleFor(ui => ui.FirstName, f => f.Person.FirstName);
RuleFor(ui => ui.LastName, f => f.Person.LastName);
RuleFor(ui => ui.UserPassword, f => f.Internet.RuPassword());
}
}
public class UserInfo
{
public string FirstName { get; set; }
public string LastName { get; set; }
public string UserPassword { get; set; }
}
public static class CustomExtensions
{
private static readonly char[] RuChars = "АаБбВвГгДдЕеЁёЖжЗзИиЙйКкЛлМмНнОоПпРрСсТтУуФфХхЦцЧчШшЩщЪъЫыЬьЭэЮюЯя".ToArray();
public static string RuPassword(this Bogus.DataSets.Internet i, int? len = null)
{
var length = len ?? i.Random.Number(8, 10);
var picked = i.Random.ArrayElements(RuChars, length);
return new string(picked);
}
} |
Also, to answer your question:
I don't think you can set the public class UserInfoFaker : Faker<UserInfo>
{
public UserInfoFaker()// : base("ru")
{
// Locale = "ru"; doesn't work here.
this.FakerHub = new Faker("ru");
RuleFor(ui => ui.UserLogin, f => f.Internet.UserName());
RuleFor(ui => ui.UserPassword, f => f.Internet.Password());
}
} But this still won't solve your original problem. |
Each country has its own rules of transliteration (транслит). For Russian letters, you can quickly google a simple tables (which at first will be more than enough) Typically, the description of the translit is used for the manufacture of a foreign passport. Additionally. |
Hi @Orygeunik, @bchavez. Accidentally found this conversation. The idea of having transliteration in Bogus is very cool! However, I don't think it is possible for every language. Actually, Russian could be one of the easiest cases. While it requires a simple table (a dictionary) to look up letters/syllables, some languages are not so easily transliterated. For example, as far as I know, kanji (hieroglyphs) in Japanese could be quite polyphonic, changing their sound depending on the context they are used in. This particular problem is also mentioned in this issue of Slugify Elixir library. What I want to say is if you decide to implement this feature, you will face the need to assemble multiple transliteration libraries into one, which will definitely require considerable effort and may not be a complete and desirable solution for the problem. |
Hi Arseni, Thank you for the feedback and insights. I really appreciate it. I don't have much experience in this area so, anything helps! I think you are right about not being able to solve the problem 100% for all locales. My hope is we can cover a good majority of them; like My first implementation attempt used several massive large dictionary-like character replacements to perform transliteration. It was a straight-up port of speakingurl algorithm's in C#. The C# implementation was very ugly and brittle, but it worked. Ultimately, though, I wasn't happy with it. My second implementation attempt uses a Trie data-structure for character replacements. I just got the basic algorithm working last night successfully. IMHO, it is a big improvement over speakingurl and is more aligned with what I had in mind. The implementation is quite elegant too. Since we're using a Trie, you can "probe ahead" and replace chunks of the input string. For example,
It also should work with locale-specific translates. For example, where the same character can have two different translates depending on the locale your using. IE:
A lot of these libraries are "slug"ifying text in the middle of processing characters which tends to make understanding and porting these algorithms like speakingurl difficult and overly complex. In Bogus, I want transliteration and slugifying to be two separate and distinct operations. When both operations are separate, these algorithms tend to be more elegant and easier to understand, and maintain. I still have more work and experiments to do. Also, I still need to do more work understanding how other libraries (like the Elixer library you pointed out) solve the same problem. Hopefully, at the end of this, we'll have a half-way decent implementation for Bogus. :) Again, I want to thank everyone for their input and feedback. It is immensely helpful when others give feedback are looking ahead with more experience. |
Fixes #225. Transliteration using Trie data-structure. ❤️
Hi @Orygeunik , @rynkevich , Basic transliteration support is now available in Bogus v28.0.1. The Additionally, void Main()
{
Enumerable.Range(1, 10)
.Select(_ => new Person("ru"))
.Select(p => new {FullName = p.FullName, Email = p.Email, UserName = p.UserName})
.Dump();
} Additionally, new method Hope it works out well. Thanks, |
Situation
Made next code (on C#):
In another place called code:
Seen in array:
If change code:
Seen in array:
Another way. Made next code:
Seen in array:
Btw Russian fullname is correct :)
Why with russian locale ("ru") UserName and Email not generated?
The text was updated successfully, but these errors were encountered: