Skip to content
Shaunak Vairagare edited this page Nov 20, 2017 · 6 revisions

SpatialHadoop

SpatialHadoop is an extensive extension to Hadoop that provides efficient handling of big spatial data. It provides spatial data types, spatial indexes, spatial operations, and visualization. The data types include Point, Linestring, and Polygon. It also ships with a wide range of scalable indexes including grid, R-tree, R+-tree, Quad-tree, K-d tree, Z-Curve, and Hilbert Curve. In addition, it uses these indexes to speed up various spatial operations such as range query, k nearest neighbor (kNN), spatial join, Voronoi Diagram, Delaunay Triangulation, polygon union, convex hull, skyline, and others. SpatialHadoop contains also an extensible visualization framework that can produce single-level and multilevel visualizations for big spatial data.

Install

The easiest way to install SpatialHadoop is to build a portable jar that can run on a standard Apache Hadoop.

  1. Obtain the latest version of SpatialHadoop

    git clone https://github.com/aseldawy/spatialhadoop2.git

  2. Build a portable jar

    mvn assembly:assembly -DskipTests

  3. Run the main class of SpatialHadoop using your installed Hadoop cluster

    hadoop jar spatialhadoop-2.4-uber.jar

Alternatively, you can integrate it with your installed Hadoop cluster by expanding the binary package into your Hadoop home directory.

tar -C $HADOOP_HOME -xvzf spatialhadoop-2.4-bin.tar.gz

After that, you can run the main class of SpatialHadoop using the command

shadoop

Usage

To generate a 1 GB file that contains rectangles, run the command

shadoop generate test.rects size:1.gb shape:rect mbr:0,0,1000000,1000000

Build a grid index over the generated file

shadoop index test.rects sindex:grid test.grid shape:rect

Run a range query that selects rectangles overlapping the query area defined by the box with the two corners (10, 20) and (2000, 3000). Results are stored in the output file rangequery.out

shadoop rangequery test.grid rect:10,10,2000,3000 rangequery.out shape:rect

Further References