layout

title

type

navigation

global

Third-Party Projects

page singular

weight	show
5	true

This page tracks external software projects that supplement Apache Spark and add to its ecosystem.

spark-packages.org

spark-packages.org is an external, community-managed list of third-party libraries, add-ons, and applications that work with Apache Spark. You can add a package as long as you have a GitHub repository.

Infrastructure Projects

Spark Job Server - REST interface for managing and submitting Spark jobs on the same cluster (see blog post for details)
SparkR - R frontend for Spark
MLbase - Machine Learning research project on top of Spark
Apache Mesos - Cluster management system that supports running Spark
Alluxio (née Tachyon) - Memory speed virtual distributed storage system that supports running Spark
Spark Cassandra Connector - Easily load your Cassandra data into Spark and Spark SQL; from Datastax
FiloDB - a Spark integrated analytical/columnar database, with in-memory option capable of sub-second concurrent queries
ElasticSearch - Spark SQL Integration
Spark-Scalding - Easily transition Cascading/Scalding code to Spark
Zeppelin - an IPython-like notebook for Spark. There is also ISpark, and the Spark Notebook.
IBM Spectrum Conductor with Spark - cluster management software that integrates with Spark
EclairJS - enables Node.js developers to code against Spark, and data scientists to use Javascript in Jupyter notebooks.
SnappyData - an open source OLTP + OLAP database integrated with Spark on the same JVMs.
GeoSpark - Geospatial RDDs and joins
Spark Cluster Deploy Tools for OpenStack

Applications Using Spark

Apache Mahout - Previously on Hadoop MapReduce, Mahout has switched to using Spark as the backend
Apache MRQL - A query processing and optimization system for large-scale, distributed data analysis, built on top of Apache Hadoop, Hama, and Spark
BlinkDB - a massively parallel, approximate query engine built on top of Shark and Spark
Spindle - Spark/Parquet-based web analytics query engine
Spark Spatial - Spatial joins and processing for Spark
Thunderain - a framework for combining stream processing with historical data, think Lambda architecture
DF from Ayasdi - a Pandas-like data frame implementation for Spark
Oryx - Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
ADAM - A framework and CLI for loading, transforming, and analyzing genomic data using Apache Spark

Additional Language Bindings

C# / .NET

CLR for Spark

Clojure

clj-spark
Sparkling

Groovy

groovy-spark-example

Julia

Spark.jl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

third-party-projects.md

third-party-projects.md

spark-packages.org

Infrastructure Projects

Applications Using Spark

Additional Language Bindings

C# / .NET

Clojure

Groovy

Julia

Files

third-party-projects.md

Latest commit

History

third-party-projects.md

File metadata and controls

spark-packages.org

Infrastructure Projects

Applications Using Spark

Additional Language Bindings

C# / .NET

Clojure

Groovy

Julia