-
Notifications
You must be signed in to change notification settings - Fork 98
Home
SpatialHadoop is an extensive extension to Hadoop that provides efficient handling of big spatial data. It provides spatial data types, spatial indexes, spatial operations, and visualization. The data types include Point
, Linestring
, and Polygon
. It also ships with a wide range of scalable indexes including grid, R-tree, R+-tree, Quad-tree, K-d tree, Z-Curve, and Hilbert Curve. In addition, it uses these indexes to speed up various spatial operations such as range query, k nearest neighbor (kNN), spatial join, Voronoi Diagram, Delaunay Triangulation, polygon union, convex hull, skyline, and others. SpatialHadoop contains also an extensible visualization framework that can produce single-level and multilevel visualizations for big spatial data.
The easiest way to install SpatialHadoop is to build a portable jar that can run on a standard Apache Hadoop.
-
Obtain the latest version of SpatialHadoop
git clone https://github.com/aseldawy/spatialhadoop2.git
-
Build a portable jar
mvn assembly:assembly -DskipTests
-
Run the main class of SpatialHadoop using your installed Hadoop cluster
hadoop jar spatialhadoop-2.4-uber.jar
Alternatively, you can integrate it with your installed Hadoop cluster by expanding the binary package into your Hadoop home directory.
tar -C $HADOOP_HOME -xvzf spatialhadoop-2.4-bin.tar.gz
After that, you can run the main class of SpatialHadoop using the command
shadoop
To generate a 1 GB file that contains rectangles, run the command
shadoop generate test.rects size:1.gb shape:rect mbr:0,0,1000000,1000000
Build a grid index over the generated file
shadoop index test.rects sindex:grid test.grid shape:rect
Run a range query that selects rectangles overlapping the query area defined by the box with the two corners (10, 20) and (2000, 3000). Results are stored in the output file rangequery.out
shadoop rangequery test.grid rect:10,10,2000,3000 rangequery.out shape:rect