Skip to content

A "geo near" performance test project w/ ratpack as app backend shuttling queries to various data stores; mongo, redis, rethink, elasticsearch, etc...

Notifications You must be signed in to change notification settings

joshdurbin/places

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Places -- Geo Near Performance Tests

This project intends to leverage the recommended implementation path for geospatial, "geo near" queries against a data set of places of interest (POIs) backed by various data stores. The test data set contains POIs only for the San Francisco Bay Area. The Ratpack and RxJava powered API allows for the insertion and query of POIs.

POST operations to the /places end point insert a single record into the backing data store (see below for an example record).

GET operations to /places w/ the point and radius information will query the backing data store. The expected structure of the query endpoint is /places/$latitude/$longitude/$radius w/ a radius in meters.

The available backing data stores are:

  1. Mongo
  2. Redis (3.2 beta)
  3. Elasticsearch
  4. RethinkDB

Requirements:

  1. Install the latest version of Java 8 JDK
  2. Install the latest version of Groovy
  3. Install the latest version of Gradle
  4. Install siege, a command line load testing tool. (presumably via brew: brew install siege)

Note: Unless your OS has up-to-date packages for Groovy, Gradle, etc... I recommend installing Java 8 and leveraging sdkman for the installation of the other JVM-related tools.


Data:

The data includes the name, address details, and relevant category information for each entry. There are a little more than 650,000 entries in the supplied, test data set. The data set and conversion scripts can be found in the data branch. The data set itself is tar.gz'ed

Sample record:

{
  "name": "Fitness SF Fillmore",
  "address": "1455 Fillmore St",
  "city": "San Francisco",
  "state": "CA",
  "zipCode": "94115",
  "telephoneNumber": "(415) 927-4653",
  "neighborhoods": [
    "Thomas Paine Square",
    "Japantown",
    "Western Addition"
  ],
  "categories": [
    "Sports and Recreation",
    "Gyms and Fitness Centers"
  ],
  "latitude": 37.782874,
  "longitude": -122.432868
}

Load Test:

The data branch also has a script, GenerateLoadTestScripts.groovy which is capable of reading the data set and generating two files:

  1. places_query_loadtest_urls.txt - represents a query matching each point (lat, long pair) in the original data set with a random distance of [25-50-100-250-500-100].
  2. places_insert_loadtest_urls.txt - represents the actual record payloads for insertion to the API.

These files are intended for consumption by siege, a command line load testing tool. It's straightforward usage is detailed below...

A sample query load test:

siege --concurrent=1000 -f places_query_loadtest_urls.txt --benchmark -t60S --quiet

A sample insertion load test:

siege --concurrent=250 -H 'Content-Type: application/json' -f places_insert_loadtest_urls.txt --reps=once --quiet


Caveats:

Take note of the caveats mentioned when installing siege. If many connection resets are observed at siege, especially under relatively light lode, the OS level TIME_WAIT timeout might need tweaked. Note the caveat when you install siege via brew...

Mac OS X has only 16K ports available that won't be released until socket
TIME_WAIT is passed. The default timeout for TIME_WAIT is 15 seconds.
Consider reducing in case of available port bottleneck.

You can check whether this is a problem with netstat:

    # sysctl net.inet.tcp.msl
    net.inet.tcp.msl: 15000

    # sudo sysctl -w net.inet.tcp.msl=1000
    net.inet.tcp.msl: 15000 -> 1000

About

A "geo near" performance test project w/ ratpack as app backend shuttling queries to various data stores; mongo, redis, rethink, elasticsearch, etc...

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages