-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.find() method to retrieve all the group of diacritics from a specific char #20
Comments
what about just exporting |
Exporting Is the same goal of the remove method, instead of just having the list, you have created the method to help. So i thought that it could be good to have this helper in this package. But it's ok if you don't agreed. Do you think that you will update it to export the |
i'll have to defer to @andrewrk on this, but in my own opinion, i have to admit, i don't really understand what the function is supposed to be used for. In particular, you lose some information when you concatenate Can you give more information on the usecase for this function? |
To make an diacritic insensitive By having the group of diacritics i can easily create a |
did you mean isn't there a problem with multi-char diacritics like |
how about this function: function charToRegexPattern(chr) {
for (var i = 0; i < replacementList.length; i++) {
var replacement = replacementList[i];
if (replacement.chars.indexOf(chr) === -1) continue;
if (replacement.base.length > 1) {
// allow the complete multi-char sequence or a literal diacritic character
return '(?:' + replacement.base + '|[' + replacement.chars + '])';
} else {
// allow the ascii char or a literal diacritic character
return '[' + replacement.base + replacement.chars + ']';
}
}
// either already ascii or not a diacritic char
return chr;
} It's arguably less "general purpose", since it returns strings formatted for regex, but i think it's the only way to make it actually work for multi-char sequences, like "ae". |
Yes @thejoshwolfe, i meant exactly like you said on the first RegExp. I just kept it short to give you a simple example. With the version that i wrote i would use in a case like this: function toRegExp(str){
return RegExp(str.split('').map(chr => `[${diacritics.find(chr) || chr}]`).join(''), 'gi');
}
let str = 'acaoae1ae';
let strDiacritic = 'açãoae1æ';
// RegExp will be: /[aⓐaẚàáâầấẫẩãāăằắẵẳȧǡäǟảåǻǎȁȃạậặḁąⱥɐɑ][ccⓒćĉċčçḉƈȼꜿↄ][aⓐaẚàáâầấẫẩãāăằắẵẳȧǡäǟảåǻǎȁȃạậặḁąⱥɐɑ][oⓞoòóôồốỗổõṍȭṏōṑṓŏȯȱöȫỏőǒȍȏơờớỡởợọộǫǭøǿꝋꝍɵɔᴑ][aⓐaẚàáâầấẫẩãāăằắẵẳȧǡäǟảåǻǎȁȃạậặḁąⱥɐɑ][eⓔeèéêềếễểẽēḕḗĕėëẻěȅȇẹệȩḝęḙḛɇǝ][1][aeæǽǣ]/gi
// And "str" it will match "strDiacritic"
str.match(toRegExp(strDiacritic)) See that the expected input can be a diacritic, or a And yes, this version is not handling the input of a diacritic of length > 1. |
Hi @andrewrk,
What do you think about this? Right now i'm facing a case where i need to have a group of all possible diacritics from a specific char. I remembered about your great list of diacritics, and that your package is named as 'diacritics', and not something like 'remove-diacritics', so i thought that would be better to extend it with one more method instead of create another package.
I already created the new method:
If you think it is ok, i can send you a pull request.
The text was updated successfully, but these errors were encountered: