Spark-Seperate-And-Conquer-Algorithm

MSc Thesis

This repository contains code for my MSc work with regards to implementing a Separate and Conquer Algorithm in Spark as well as its evaluation metrics (Accuracy, Precision, Recall) that was used to analyze large datasets using distributed in-memory data processing capabilities in Spark across a number of commodity machines (1, 5, 10, 15, 20) in a cluster.

Setting up Spark

For Mac Users, you can use brew install apache-spark to easily install Spark.

Alternatively, you can download it from:

[Spark] (http://spark.apache.org/downloads.html)

Quick start on Spark and to test your Spark installation: http://spark.apache.org/docs/latest/quick-start.html

Additional Notes:

You can set up your IDE (IntelliJ or Eclipse) to run a Spark application locally inside the IDE without packaging a uber jar:
- https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-IDESetup

The Maven Dependency required in the project:

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>1.6.2</version>
</dependency>

If you are interested, you can read about my project and problems that I encountered: [Wiki Page With Project Progress] (http://timothy22000.wikidot.com/main)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
doc		doc
src		src
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark-Seperate-And-Conquer-Algorithm

Setting up Spark

About

Releases

Packages

Languages

timothy22000/Spark-Seperate-And-Conquer-Algorithm

Folders and files

Latest commit

History

Repository files navigation

Spark-Seperate-And-Conquer-Algorithm

Setting up Spark

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages