Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Feature request: Alias generation for Vietnamese #6010

Closed
tiennguyenpy opened this issue Jul 26, 2013 · 2 comments
Closed

Feature request: Alias generation for Vietnamese #6010

tiennguyenpy opened this issue Jul 26, 2013 · 2 comments
Labels
Milestone

Comments

@tiennguyenpy
Copy link

Hi developers,
Problem:
Alias generator for Vietnamese is failed to detect some characters. Checked with Contao 3.1.1.
Input 1
aAàÀảẢãÃáÁạẠăĂằẰẳẲẵẴắẮặẶâÂầẦẩẨẫẪấẤậẬbBcCdDđĐeEèÈẻẺẽẼéÉẹẸêÊềỀểỂễỄếẾệỆfFgGhHiIìÌỉỈĩĨíÍịỊjJkKlLmMnN
Ouput 1
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbccddddeeeeeeeeeeeeeeeeeeeeeeeeffgghhiiiiiiiiiiiijjkkllmmnn
Input 2
oOòÒỏỎõÕóÓọỌôÔồỒổỔỗỖốỐộỘơƠờỜởỞỡỠớỚợỢpPqQrRsStTuUùÙủỦũŨúÚụỤưƯừỪửỬữỮứỨựỰvVwWxXyYỳỲỷỶỹỸýÝỵỴzZ
Output 2
ooooooooooooppqqrrssttuuuuuuuuuuvvwwxxyyyyyyzz
There is 88 Vietnamese characters is not recognized. See next list:
See next list:
'a' =>'a',
'A' =>'a',
'à' =>'a',
'À' =>'a',
'ả' =>'',
'Ả' =>'',
'ã' =>'a',
'Ã' =>'a',
'á' =>'a',
'Á' =>'a',
'ạ' =>'',
'Ạ' =>'',
'ă' =>'a',
'Ă' =>'a',
'ằ' =>'',
'Ằ' =>'',
'ẳ' =>'',
'Ẳ' =>'',
'ẵ' =>'',
'Ẵ' =>'',
'ắ' =>'',
'Ắ' =>'',
'ặ' =>'',
'Ặ' =>'',
'â' =>'a',
'Â' =>'a',
'ầ' =>'',
'Ầ' =>'',
'ẩ' =>'',
'Ẩ' =>'',
'ẫ' =>'',
'Ẫ' =>'',
'ấ' =>'',
'Ấ' =>'',
'ậ' =>'',
'Ậ' =>'',
'b' =>'b',
'B' =>'b',
'c' =>'c',
'C' =>'c',
'd' =>'d',
'D' =>'d',
'đ' =>'d',
'Đ' =>'d',
'e' =>'e',
'E' =>'e',
'è' =>'e',
'È' =>'e',
'ẻ' =>'',
'Ẻ' =>'',
'ẽ' =>'',
'Ẽ' =>'',
'é' =>'e',
'É' =>'e',
'ẹ' =>'',
'Ẹ' =>'',
'ê' =>'e',
'Ê' =>'e',
'ề' =>'',
'Ề' =>'',
'ể' =>'',
'Ể' =>'',
'ễ' =>'',
'Ễ' =>'',
'ế' =>'',
'Ế' =>'',
'ệ' =>'',
'Ệ' =>'',
'f' =>'f',
'F' =>'f',
'g' =>'g',
'G' =>'g',
'h' =>'h',
'H' =>'h',
'i' =>'i',
'I' =>'i',
'ì' =>'i',
'Ì' =>'i',
'ỉ' =>'',
'Ỉ' =>'',
'ĩ' =>'i',
'Ĩ' =>'i',
'í' =>'i',
'Í' =>'i',
'ị' =>'',
'Ị' =>'',
'j' =>'j',
'J' =>'j',
'k' =>'k',
'K' =>'k',
'l' =>'l',
'L' =>'l',
'm' =>'m',
'M' =>'m',
'n' =>'n',
'N' =>'n',
'o' =>'o',
'O' =>'o',
'ò' =>'o',
'Ò' =>'o',
'ỏ' =>'',
'Ỏ' =>'',
'õ' =>'o',
'Õ' =>'o',
'ó' =>'o',
'Ó' =>'o',
'ọ' =>'',
'Ọ' =>'',
'ô' =>'o',
'Ô' =>'o',
'ồ' =>'',
'Ồ' =>'',
'ổ' =>'',
'Ổ' =>'',
'ỗ' =>'',
'Ỗ' =>'',
'ố' =>'',
'Ố' =>'',
'ộ' =>'',
'Ộ' =>'',
'ơ' =>'o',
'Ơ' =>'o',
'ờ' =>'',
'Ờ' =>'',
'ở' =>'',
'Ở' =>'',
'ỡ' =>'',
'Ỡ' =>'',
'ớ' =>'',
'Ớ' =>'',
'ợ' =>'',
'Ợ' =>'',
'p' =>'p',
'P' =>'p',
'q' =>'q',
'Q' =>'q',
'r' =>'r',
'R' =>'r',
's' =>'s',
'S' =>'s',
't' =>'t',
'T' =>'t',
'u' =>'u',
'U' =>'u',
'ù' =>'u',
'Ù' =>'u',
'ủ' =>'',
'Ủ' =>'',
'ũ' =>'u',
'Ũ' =>'u',
'ú' =>'u',
'Ú' =>'u',
'ụ' =>'',
'Ụ' =>'',
'ư' =>'u',
'Ư' =>'u',
'ừ' =>'',
'Ừ' =>'',
'ử' =>'',
'Ử' =>'',
'ữ' =>'',
'Ữ' =>'',
'ứ' =>'',
'Ứ' =>'',
'ự' =>'',
'Ự' =>'',
'v' =>'v',
'V' =>'v',
'w' =>'w',
'W' =>'w',
'x' =>'x',
'X' =>'x',
'y' =>'y',
'Y' =>'y',
'ỳ' =>'y',
'Ỳ' =>'y',
'ỷ' =>'',
'Ỷ' =>'',
'ỹ' =>'',
'Ỹ' =>'',
'ý' =>'y',
'Ý' =>'y',
'ỵ' =>'',
'Ỵ' =>'',
'z' =>'z',
'Z' =>'z',
This list generated by using this code:
$vnChars = 'aAàÀảẢãÃáÁạẠăĂằẰẳẲẵẴắẮặẶâÂầẦẩẨẫẪấẤậẬbBcCdDđĐeEèÈẻẺẽẼéÉẹẸêÊềỀểỂễỄếẾệỆfFgGhHiIìÌỉỈĩĨíÍịỊjJkKlLmMnNoOòÒỏỎõÕóÓọỌôÔồỒổỔỗỖốỐộỘơƠờỜởỞỡỠớỚợỢpPqQrRsStTuUùÙủỦũŨúÚụỤưƯừỪửỬữỮứỨựỰvVwWxXyYỳỲỷỶỹỸýÝỵỴzZ';
$vnCharsArr = $this->str_split_unicode($vnChars);
$vnCharRomanizedArr = array();
foreach ($vnCharsArr as $vnChar) {
$vnCharRomanizedArr[$vnChar] = standardize(\String::restoreBasicEntities($vnChar));
}
And the function for splitting unicode string:
function str_split_unicode($str, $l = 0) {
if ($l > 0) {
$ret = array();
$len = mb_strlen($str, "UTF-8");
for ($i = 0; $i < $len; $i += $l) {
$ret[] = mb_substr($str, $i, $l, "UTF-8");
}
return $ret;
}
return preg_split("//u", $str, -1, PREG_SPLIT_NO_EMPTY);
}

To solve problem, just add new mapping for 88 missing chars into file /system/helper/utf8_lookup.php at line 304:
'ả' =>'a',
'Ả' =>'a',
'ạ' =>'a',
'Ạ' =>'a',
'ằ' =>'a',
'Ằ' =>'a',
'ẳ' =>'a',
'Ẳ' =>'a',
'ẵ' =>'a',
'Ẵ' =>'a',
'ắ' =>'a',
'Ắ' =>'a',
'ặ' =>'a',
'Ặ' =>'a',
'ầ' =>'a',
'Ầ' =>'a',
'ẩ' =>'a',
'Ẩ' =>'a',
'ẫ' =>'a',
'Ẫ' =>'a',
'ấ' =>'a',
'Ấ' =>'a',
'ậ' =>'a',
'Ậ' =>'a',
'ẻ' =>'e',
'Ẻ' =>'e',
'ẽ' =>'e',
'Ẽ' =>'e',
'ẹ' =>'e',
'Ẹ' =>'e',
'ề' =>'e',
'Ề' =>'e',
'ể' =>'e',
'Ể' =>'e',
'ễ' =>'e',
'Ễ' =>'e',
'ế' =>'e',
'Ế' =>'e',
'ệ' =>'e',
'Ệ' =>'e',
'ỉ' =>'i',
'Ỉ' =>'i',
'ị' =>'i',
'Ị' =>'i',
'ỏ' =>'o',
'Ỏ' =>'o',
'ọ' =>'o',
'Ọ' =>'o',
'ồ' =>'o',
'Ồ' =>'o',
'ổ' =>'o',
'Ổ' =>'o',
'ỗ' =>'o',
'Ỗ' =>'o',
'ố' =>'o',
'Ố' =>'o',
'ộ' =>'o',
'Ộ' =>'o',
'ờ' =>'o',
'Ờ' =>'o',
'ở' =>'o',
'Ở' =>'o',
'ỡ' =>'o',
'Ỡ' =>'o',
'ớ' =>'o',
'Ớ' =>'o',
'ợ' =>'o',
'Ợ' =>'o',
'ủ' =>'u',
'Ủ' =>'u',
'ụ' =>'u',
'Ụ' =>'u',
'ừ' =>'u',
'Ừ' =>'u',
'ử' =>'u',
'Ử' =>'u',
'ữ' =>'u',
'Ữ' =>'u',
'ứ' =>'u',
'Ứ' =>'u',
'ự' =>'u',
'Ự' =>'u',
'ỷ' =>'y',
'Ỷ' =>'y',
'ỹ' =>'y',
'Ỹ' =>'y',
'ỵ' =>'y',
'Ỵ' =>'y',

Thanks

@leofeyer
Copy link
Member

Fixed in c034adc.

@tiennguyenpy
Copy link
Author

Thank you Leo.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants