- 从线上获取 doc pair 格式为 : doc_id + \t + doc_id
- 根据需求送评标注
- 基于 MongoUtils 构造 File_Fields, 格式为 : doc_id + \t + doc_id + \t + jstr + \t + jstr
- 基于 ARFFUtils + File_Fields 构造 File.arff 注: 如需新增特征或查看特征计算详情, 见 FeatureUtils, 且新增特征需保证特征数量与 arff header 保持一致.
- 基于 ModelUtils + File.arff 训练模型
-
Notifications
You must be signed in to change notification settings - Fork 1
newsbreak-yancy/doc-cluster-and-dedup-random-forest
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published