Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

用utf8转成gbk再转回去 转不回去了 #7

Open
DoFelix opened this issue Apr 28, 2021 · 4 comments
Open

用utf8转成gbk再转回去 转不回去了 #7

DoFelix opened this issue Apr 28, 2021 · 4 comments

Comments

@DoFelix
Copy link

DoFelix commented Apr 28, 2021

package main

import (
"fmt"
"github.com/axgle/mahonia"
)

func main() {

str :="你好" 
fmt.Println("UTF-8 to GBK: ",ConvertToString(str,"utf8","gbk"))

// data :=ConvertToString(str,"utf8","gbk")
fmt.Println("GBK to UTF-8: ",ConvertToString(ConvertToString(str,"utf8","gbk"),"gbk","utf8"))

}

func ConvertToString(src string, srcCode string, tagCode string) string {
srcCoder := mahonia.NewDecoder(srcCode)
srcResult := srcCoder.ConvertString(src)
tagCoder := mahonia.NewDecoder(tagCode)
_, cdata, _ := tagCoder.Translate([]byte(srcResult), true)
result := string(cdata)
return result
}

@DoFelix
Copy link
Author

DoFelix commented Apr 28, 2021

UTF-8 to GBK: 浣犲ソ
GBK to UTF-8: 娴g姴銈�

@ayanmw
Copy link

ayanmw commented Jun 29, 2021

既然是转换, 那你就应该使用Encoder, 把你的 ConverToString 修改一下就可以了:

func ConvertToString(src string, srcCode string, tagCode string) string {
	srcCoder := convlib.NewDecoder(srcCode)
	srcResult := srcCoder.ConvertString(src)
	//fmt.Println("srcResult["+srcCode+"]=", src, " => ", srcResult)

	tagCoder := convlib.NewEncoder(tagCode)
	result := tagCoder.ConvertString(srcResult)
	//fmt.Println("toResult["+tagCode+"]=", srcResult, " => ", result)
	return result
}

修改后的结果为:
srcResult[UTF-8]= 你好 => 你好
toResult[GBK]= 你好 => ����
UTF-8 to GBK: ����
srcResult[UTF-8]= 你好 => 你好
toResult[GBK]= 你好 => ����
srcResult[GBK]= ���� => 你好
toResult[UTF-8]= 你好 => 你好
GBK to UTF-8: 你好

PS:以上是在UTF-8 中打印,只有UTF8编码是正常,其他编码是乱的.

@ayanmw
Copy link

ayanmw commented Jun 29, 2021

其实可以优化一下, 也就是 如果其中有一方是 UTF8 , 那么则不需要转换 ,反正 UTF8 转UTF8 是一样的。

如果srcCode desCode 都不是 UTF8 , 其实会使用UTF8 作为中间层,转换。

@xmapst
Copy link

xmapst commented Dec 29, 2021

其实可以优化一下, 也就是 如果其中有一方是 UTF8 , 那么则不需要转换 ,反正 UTF8 转UTF8 是一样的。

如果srcCode desCode 都不是 UTF8 , 其实会使用UTF8 作为中间层,转换。

有现成的吗🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants