-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fastText的label有上限 #44
Comments
训练代码为
|
确定是分类问题吗?label数量这么大 |
我想做模糊文本到唯一id的映射,即使缺字多字依旧能尽可能匹配,为此专门做了汉字编码,希望对于相似字也能实现匹配
|
你这个应该去用词向量或者simhash之类的方案,不应该用文本分类 |
感谢建议,我尝试更换下方法 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
这是打印读取train的结果
可看到读取上限为185898,而train中我提供的label数为1300000+,为了排除数据问题,我将原本train以150000分割为9个文件,依次进行读取测试,结果均能正常返回label读取数,基本可排除是数据文件的问题
fastText是确定的设置了这个上限吗还是文件读取量有上限?原train文件有480MB大小,分割后最大为52MB
The text was updated successfully, but these errors were encountered: