couch-statsd-backend

Backend for StatsD, which publishes stats to CouchDB

Installation

Within your statsd install:

npm install couch-statsd-backend

Configuration

{

    port: 8125

    ,deleteIdleStats: true

    //, couchhost: '192.168.3.21'    //default: localhost
    //, couchport: '5984'            //default: 5984
    , couchdb: 'abtest'


    //all below are optional
    //, debug: true

    //,bulk_size: 2500      // (0 to disable)
    //,bulk_wait: 50

    //, id_prefix: "c0-"
    //, id_prefix: process.pid

    //choose just one
    //, id_generator: "couch"
    //, id_generator: "cuid"
    //, id_generator: "node-uuid-v1"
    //, id_generator: "node-uuid-v4"
    //, id_generator: "custom"
    //only needed for 'custom'
    //, uuid_generator: function(num){ return Math.round(new Date().getTime() / 1000) + '-' + num; }

      , backends: [ "./backends/couch" ]

}

Doc _id notes

id_prefix: Depending on the setup it may be useful to prefix the _id with a string to identify the statsd process which sent the doc. Default is not to add anything.

It is possible to choose how the couch doc._id is generated - by default a single request to couchhost/_uuids is made per flush but an _id can be generated with javascript instead.

If 'cuid' or 'node-uuid' are chosen you may need to first install the npms as they are not installed by default.

npm install cuid

and

npm install node-uuid

If custom is chosen then uuid_generator must also be set eg:

  , id_generator: "custom"
  , uuid_generator: function(num){ return Math.round(new Date().getTime() / 1000) + '-' + num; }

Note: The uuid_generator function is passed 'num' which is incremented each call within a flush, this is reset on each flush.

###Bulking

bulk_size: Number of docs to submit per bulk request. To disable breaking up of bulk doc send set this to 0 (if disabled all docs will be sent in a single request).

bulk_wait: Time to wait (ms) between sending each bulk of docs for a flush to couch. Be careful with this if total time to insert all docs per flush > time period of flush then docs to be posted will just keep growing.

It should be ok for the total time to insert all the bulks to run over the statsd flush period if a sudden rush of metrics arrive once in a while but if this keeps happening i suspect node will run out of memory to hold the backlog.

##Supported Metrics

Counters
Counter Rates
Gauges
Timers
Timer Data
Sets

###Counters & Counter Rates

Counters and counter rates are combined into one doc - this can be easily modified to create a separate docs for the counter_rates by un-commenting the a few lines.

echo "myservice.mycount:67|c" | nc -n -w 1 -u 192.168.3.22 8125

Will add a doc like:

{
   "_id": "f83a62b3456f4c2451cb7a166100f3e8",
   "_rev": "1-d68546bae9169fa4b4a72b3ffe4167b9",
   "type": "counter",
   "name": "myservice.mycount",
   "count": 67,
   "rate": 6.7,
   "ts": 1424818189
}

If two counts of the same name within same flush period

echo "myservice.mycount:20|c" | nc -n -w 1 -u 192.168.3.22 8125 
echo "myservice.mycount:30|c" | nc -n -w 1 -u 192.168.3.22 8125

Will add a doc like:

{
   "_id": "f83a62b3456f4c2451cb7a1661010a10",
   "_rev": "1-89dc6493c2418acb351e9767e6653c65",
   "type": "counter",
   "name": "myservice.mycount",
   "count": 50,
   "rate": 5,
   "ts": 1424818219
}

###Gauges

echo "myservice.mygauge:42|g" | nc -n -w 1 -u 192.168.3.22 8125

Will add a doc like:

{
   "_id": "f83a62b3456f4c2451cb7a1661012b57",
   "_rev": "1-390bc981b63e2ca16d2ad42c042f9c1b",
   "type": "gauge",
   "name": "myservice.mygauge",
   "value": 42,
   "ts": 1424819100
}

###Timers & Timer Data

Timers and timer data are combined into one doc - this can be easily modified to create a separate docs for the timer_data by un-commenting the a few lines.

echo "myservice.mytimer:231|ms" | nc -n -w 1 -u 192.168.3.22 8125

Will add a doc like:

{
   "_id": "2d0f912baefa5e8c6e131e29c4482e67",
   "_rev": "1-30071b20880b8f0139a403856e0365d9",
   "type": "timers",
   "name": "myservice.mytimer",
   "durations": [
       231
   ],
   "mean_90": 231,
   "upper_90": 231,
   "sum_90": 231,
   "std": 0,
   "upper": 231,
   "lower": 231,
   "count": 1,
   "count_ps": 0.1,
   "sum": 231,
   "mean": 231,
   "median": 231,
   "ts": 1425055422
}

If more than one timer is received withing the same flush interval eg:

echo "myservice.mytimer:30|ms" | nc -n -w 1 -u 192.168.3.22 8125
echo "myservice.mytimer:23|ms" | nc -n -w 1 -u 192.168.3.22 8125
echo "myservice.mytimer:51|ms" | nc -n -w 1 -u 192.168.3.22 8125

Will add a doc like:

{
   "_id": "2d0f912baefa5e8c6e131e29c44835d1",
   "_rev": "1-5786babae3d5c82d6511a5ee7770fd86",
   "type": "timers",
   "name": "myservice.mytimer",
   "durations": [
       23,
       30,
       51
   ],
   "mean_90": 34.666666666666664,
   "upper_90": 51,
   "sum_90": 104,
   "std": 11.897712198383164,
   "upper": 51,
   "lower": 23,
   "count": 3,
   "count_ps": 0.3,
   "sum": 104,
   "mean": 34.666666666666664,
   "median": 30,
   "ts": 1425055602
}

###Sets

echo "myservice.myset:129|s" | nc -n -w 1 -u 192.168.3.22 8125

Will add a doc like:

{
   "_id": "f83a62b3456f4c2451cb7a16617f1d84",
   "_rev": "1-07926b51ef9b5a163a9bde8f3635223c",
   "type": "set",
   "name": "myservice.myset",
   "set": {
       "129": "129"
   },
   "ts": 1424819898
}

*I am not sure if the set data structure is correct?

##Notes/Future Changes A test db with snappy compression of 2 million docs consumed 0.5Gb of space.

I suspect having the full field names in each document increases the storage requirements so may be better to save the metric field names as single letters for example for a 'counter' type:

{
   "_id": "f83a62b3456f4c2451cb7a1661010a10",
   "_rev": "1-89dc6493c2418acb351e9767e6653c65",
   "t": "c",
   "n": "myservice.mycount",
   "c": 50,
   "r": 5,
   "ts": 1424818219
}

where:

t = type  (where c = counter, g = gauge, t = timer and s = set)
n = name
c = count
r = rate

It may even be worth encoding the 'type' within the '_id' ?

Also on testing i did hit issues when adding more that 20k documents every 3 seconds (before i added the bulk stuff so need to retest)

not 100% sure where the bottle neck was. Look at sending each bulk batch off to a different couchdb server in a round robin fasion so each couchdb doesnt need to handle all metrics from a single flush - this may not be needed with couch 2.0 clustering stuff :).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

couch-statsd-backend

Installation

Configuration

Doc _id notes

Files

README.md

Latest commit

History

README.md

File metadata and controls

couch-statsd-backend

Installation

Configuration

Doc _id notes