You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
rdd.mapPartitions(iteratos => myFunctions(iteratos, words))
def myFunctions(iterator: Iterator[String], word: String): Iterator[mutable.HashMap[Integer, Float]] = {
val wordVecModel = new WordVectorModel("data/model.txt")
val docmentsModel = new DocVectorModel(wordVecModel)
val sets = mutable.Set[mutable.HashMap[Integer, Float]]()
for (iterm <- iterator) {
val arrays=iterm.split("\t")
val id=arrays(0).toInt
val contents =iterm
docmentsModel.addDocument(id, HanLP.convertToSimplifiedChinese(contents))
val list = docmentsModel.nearest(word)
import scala.collection.JavaConversions._
for (id <- list) {
val maps = mutable.HashMap[Integer, Float]()
maps.put(id.getKey, id.getValue)
sets.add(maps)
}
}
sets.iterator
}
注意事项
请确认下列注意事项:
版本号
当前最新版本号是:portable-1.6.6
我使用的版本是:portable-1.6.6
我的问题
训练的word2vec 模型,放在hdfs上,用spark分布式的加载调用失败问题?
触发代码
错误输出
其他信息
单机可以加载,放在集群上加载就不行
#默认的IO适配器如下,该适配器是基于普通文件系统的。
IOAdapter=com.npl.spark.HadoopFileIoAdapter 也已经重写了
The text was updated successfully, but these errors were encountered: