Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

测试集的格式 #8

Open
guanleiming opened this issue Dec 17, 2018 · 3 comments
Open

测试集的格式 #8

guanleiming opened this issue Dec 17, 2018 · 3 comments

Comments

@guanleiming
Copy link

https://upload-images.jianshu.io/upload_images/12081581-70d412eebb570280?imageMogr2/auto-orient/strip%7CimageView2/2/w/323

您看看这样格式的可以吗?不行的话,那测试集的格式就必须是“label,txt”这种格式的吗?

@jimichan
Copy link
Member

12081581-70d412eebb570280
可以看到。格式可以。
你label和之前的文本中间是空格分隔?
你训练的样例有多少条,控制台上显示的读取词条的数量和label的数量是否显示正确。

fasttext对中文和英文基本一样,label分类的数量不易过多,比如超过100个。

@guanleiming
Copy link
Author

default
大概就是十来个label吧,但是在每条最后面加上__label__xxxx,如果改成最前面加__label__xxxx这种格式是有效的,但是如果训练集的样本过少就会导致每个label的概率非常平均,就算把label的完整的一模一样的一段进行测试的概率也几乎是平均的,但是样本多起来了之后,测试的概率也变高了,没有那么平均,请问您在做的时候是否会出现这种现象?这种现象是否是样本少导致的过拟合?

@jimichan
Copy link
Member

jimichan commented Dec 18, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants