We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
希望我用中文问问题不会失礼~ 研一小白想请教您几个问题: 1.数据featindex.txt和featindex.fm.txt是什么关系?我观察到featindex.txt中大多数是8、10、12列的编码(在这里要再问一句这些编码是自己随意设定的吗?感觉没有顺序呀?)别的列的编码呢? 2.数据中标签为1的样本数量远小于标签为0的样本,需要做什么操作来处理这种情况吗?样本的不均衡会影响结果吗?
The text was updated successfully, but these errors were encountered:
你好 pursuit1994, 1.featindex.txt 是所有需要用的特征都做了编码(a:b c)a是特征的序列,b是对应的值,c是编码。featindex.fm.txt只是对部分特征做了编码。编码是随机的。 2.不平衡的情况的确存在,我们在训练的时候会随机删除一些0的样本的。
Sorry, something went wrong.
您好, 非常感谢您的解惑~!您的回答帮了我很多~ 另外想再请教一下有关ID的特征是否需要做编码呢,感觉编码后数据会变得很大,而且我感觉ID除了链接别的field的信息外好像没别的作用了,想问下您的意见呢。
No branches or pull requests
希望我用中文问问题不会失礼~
研一小白想请教您几个问题:
1.数据featindex.txt和featindex.fm.txt是什么关系?我观察到featindex.txt中大多数是8、10、12列的编码(在这里要再问一句这些编码是自己随意设定的吗?感觉没有顺序呀?)别的列的编码呢?
2.数据中标签为1的样本数量远小于标签为0的样本,需要做什么操作来处理这种情况吗?样本的不均衡会影响结果吗?
The text was updated successfully, but these errors were encountered: