Skip to content

Code to allow running BIDMach on Spark including HDFS integration and lightweight sparse model updates (Kylix).

Notifications You must be signed in to change notification settings

jamesjia94/BIDMach_Spark

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BIDMach_Spark

Code to allow running BIDMach on Spark including HDFS integration and lightweight sparse model updates (Kylix).

Dependencies

This repo depends on BIDMat, and also on lz4 and hadoop. Assuming you have hadoop installed and working, and that you've built a working BIDMat jar, copy these files into the lib directory of this repo. i.e.

cp BIDMat/BIDMat.jar BIDMach_Spark/lib
cp BIDMat/lib/lz4-*.*.jar BIDMach_Spark/lib

you'll also need the hadoop common library from your hadoop installation:

cp $HADOOP_HOME/share/hadoop/common/hadoop-common-*.*.jar BIDMach_Spark/lib

and then

cd BIDMach_Spark
./sbt package

will build BIDMatHDFS.jar. Copy this back to the BIDMat lib directory:

cp BIDMatHDFS.jar ../BIDMat/lib

Make sure $HADOOP_HOME is set to the hadoop home directory (usually /use/local/hadoop), and make sure hdfs is running:

$HADOOP_HOME/sbin/start-dfs.sh

Then you should have HDFS access with BIDMat by invoking

BIDMat/bidmath
saveFMat("hdfs://localhost:9000/filename.fmat")
or
saveFMat("hdfs://filename.fmat")

Hadoop Config

The hadoop quickstart guides dont mention this but you need to set the hdfs config to point to a persistent set of directories to hold the HDFS data. Here's a typical hdfs-site.xml:
 
<configuration>
     <property>
         <name>dfs.replication</name>
         <value>1</value>
     </property>
     <property>
         <name>dfs.name.dir</name>
         <value>/data/hdfs/name</value>
     </property>
     <property>
         <name>dfs.data.dir</name>
         <value>/data/hdfs/data</value>
     </property>
</configuration>

About

Code to allow running BIDMach on Spark including HDFS integration and lightweight sparse model updates (Kylix).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 88.7%
  • Scala 7.0%
  • Shell 4.3%