Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s2t converts “背包” to “揹包,” but t2s or tw2s doesn’t do the opposite #752

Open
NaitLee opened this issue Dec 16, 2022 · 3 comments

Comments

@NaitLee
Copy link

NaitLee commented Dec 16, 2022

The definition of “揹”-included phrases (of s2t) are around here.

But there isn’t 揹 背 in TSCharacters.txt.

To reproduce:

$ echo "背包" | opencc -c s2t.json | opencc -c t2s.json
揹包

AFAIK there isn’t any usage of “揹” in Simplified Chinese. So please add “背” as a simplification :)

Notes:
It seems that “揹” is just a variation of “背”, in/for both Traditional and Simplified Chinese.
Both “揹” and “背” are seen in Web search results (of sites that use Traditional Chinese). So that both are correct, anyway.

@ayaka14732
Copy link
Collaborator

「揹」就是異體字,建議刪除

@NaitLee
Copy link
Author

NaitLee commented Jan 9, 2023

根據 OpenCC 「能分則不合」的原則,像「揹」這樣算是細分用法的字其實合乎邏輯。
此處,「揹」算作傳統字。
但一些字典(如這裏)說爲異體字。據說《康熙字典》《說文解字》均未收錄此字。
「背」下部從「肉」,可指肩膀與後背,動詞上已經有「負荷」的含義。根據這邏輯可能「揹」要算異體。
從相關互聯網搜索(多爲港、臺網店商品)來看,「背包」和「揹包」都有使用。
具體作何決策還待專家考察 😄

不管怎樣,需要爲 t2s 添加此組合:簡體不使用「揹」,若出現則需要替換掉。

@danny0838
Copy link
Contributor

danny0838 commented Jan 9, 2023

《通用規範漢字表》中有列出「背」是規範字,「揹」是異體字。OpenCC 所謂的簡體字就是中國規範字,按此原則上應將《通用規範漢字表》中的異體字轉為規範字。

除了此字以外,還有一大堆可以按相同原則轉為規範字的異體字。我很久以前就在 #492 提過 PR,但當時老大說要再研議,不曉得目前考慮得如何了……。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants