This project intends to leverage the recommended implementation path for geospatial, "geo near" queries against a data set of places of interest (POIs) backed by various data stores. The test data set contains POIs only for the San Francisco Bay Area. The Ratpack and RxJava powered API allows for the insertion and query of POIs.
POST operations to the /places
end point insert a single record into the backing data store (see below for an example record).
GET operations to /places
w/ the point and radius information will query the backing data store. The expected structure of the query
endpoint is /places/$latitude/$longitude/$radius
w/ a radius in meters.
The available backing data stores are:
- Mongo
- Redis (3.2 beta)
- Elasticsearch
- RethinkDB
Requirements:
- Install the latest version of Java 8 JDK
- Install the latest version of Groovy
- Install the latest version of Gradle
- Install siege, a command line load testing tool. (presumably via brew:
brew install siege
)
Note: Unless your OS has up-to-date packages for Groovy, Gradle, etc... I recommend installing Java 8 and leveraging sdkman for the installation of the other JVM-related tools.
Data:
The data includes the name, address details, and relevant category information for each entry. There are a little more than 650,000 entries in the supplied, test data set. The data set and conversion scripts can be found in the data branch. The data set itself is tar.gz'ed
Sample record:
{
"name": "Fitness SF Fillmore",
"address": "1455 Fillmore St",
"city": "San Francisco",
"state": "CA",
"zipCode": "94115",
"telephoneNumber": "(415) 927-4653",
"neighborhoods": [
"Thomas Paine Square",
"Japantown",
"Western Addition"
],
"categories": [
"Sports and Recreation",
"Gyms and Fitness Centers"
],
"latitude": 37.782874,
"longitude": -122.432868
}
Load Test:
The data branch also has a script, GenerateLoadTestScripts.groovy which is capable of reading the data set and generating two files:
places_query_loadtest_urls.txt
- represents a query matching each point (lat, long pair) in the original data set with a random distance of [25-50-100-250-500-100].places_insert_loadtest_urls.txt
- represents the actual record payloads for insertion to the API.
These files are intended for consumption by siege, a command line load testing tool. It's straightforward usage is detailed below...
A sample query load test:
siege --concurrent=1000 -f places_query_loadtest_urls.txt --benchmark -t60S --quiet
A sample insertion load test:
siege --concurrent=250 -H 'Content-Type: application/json' -f places_insert_loadtest_urls.txt --reps=once --quiet
Caveats:
Take note of the caveats mentioned when installing siege. If many connection resets are observed at siege, especially under relatively light lode, the OS level TIME_WAIT timeout might need tweaked. Note the caveat when you install siege via brew...
Mac OS X has only 16K ports available that won't be released until socket
TIME_WAIT is passed. The default timeout for TIME_WAIT is 15 seconds.
Consider reducing in case of available port bottleneck.
You can check whether this is a problem with netstat:
# sysctl net.inet.tcp.msl
net.inet.tcp.msl: 15000
# sudo sysctl -w net.inet.tcp.msl=1000
net.inet.tcp.msl: 15000 -> 1000