Skip to content

Creating Rollups for Kairos

Brian Hawkins edited this page Dec 31, 2015 · 3 revisions

This is the eventual problem everyone runs into with Kairos. You begin using Kairos and it just sucks data in like a vacuum cleaner. You lie awake at night thinking up new metrics to capture in your ever growing Kairos cluster. Then after days of gather data you get the crazy idea of query for the whole lot and then are rather dismayed when it doesn't return the metric crap ton of data in under 300 ms. Because it is pretty hard to visualize 10 million data points, you most likely are trying to aggregate it. The problem is that Kairos needs to read all 10 million data points from Cassandra in order to aggregate it down to what you can see.

Here is the solution - rollups!

We are working on a built in solution to create rollups of data for you but it just isn't finished yet. Fortunately all the tools are there so you can pretty easily do it yourself. This post will walk you through creating rollups of your data so you can query a years worth of data in just a few seconds.

In order to do create rollups you will need 1.1.1 release of Kairos or later. In 1.1.1 we added two new aggregators that are key to creating rollups: save_as and trim. Here is what they do

Trim: Pretty simple really, it just trims off the first, last or both data points from a query result. It is a poor mans way of not needing to calculate start and end times exactly. Lets look at an example - Say I want to sum up the last hour of data. The clock says 3:12 and I want to sum up the data from 2 to 3. If I query back using relative time 1h and use align_sampling I'll get two data points that will be only partial. I'll get one data point for 2:12 - 3:00 and another for 3:00 - 3:12. If I query back 2h I'll get three data points, the first and last data points will only be partial aggregations but the middle one will represent the full hour from 2 to 3. With the trim aggregator I can query back more than the time I want and then use it to trim off the partial aggregations on the front and back of the query.

Save As: Again pretty simple, it saves the results of the query as a new metric.

Here is how I used these two aggregators to create a poor mans rollup using cron and a bash script. First the query:

{
  "metrics": [
	{
	  "tags": {
	    "datacenter": [
	      "AWSUSE"
	    ],
	    "environment": [
	      "production"
	    ]
	  },
	  "name": "Cassandra.System.df.cassandra.df_complex.used.value",
	  "group_by": [
	    {
	      "name": "tag",
	      "tags": [
	        "host"
	      ]
	    }
	  ],
	  "aggregators": [
	    {
	      "name": "max",
	      "align_sampling": true,
	      "sampling": {
	        "value": "1",
	        "unit": "hours"
	      }
	    },
	    {
	      "name": "trim",
	      "trim": "both"
	    },
	    {
	      "name": "save_as",
	      "metric_name": "Cassandra.System.df.cassandra.df_complex.used.rollup"
	    }
	  ]
	}
  ],
  "cache_time": 0,
  "start_relative": {
	"value": "HOUR_START",
	"unit": "hours"
  },
  "end_relative": {
	"value": "HOUR_END",
	"unit": "hours"
  }
}

In the above query I wanted to aggregate the total disk space usage over an hour. Instead of specifying the start and end time I used place holders that my script would swap out at run time. An important note with the save as aggregator, whatever you group by will also be available as a tag in the new metric. Because I grouped by host I'll be able to group by host or query by host when using the new rollup metric.

Now for the bash magic:

#!/bin/bash

host=kairos.lab.net:8080

FILE=$1
DAYS=$2

for i in `seq 1 $DAYS`;
do
	let HOUR_END=$i*24
	let HOUR_START=$HOUR_END+25

	cat $FILE | sed "s/HOUR_START/$HOUR_START/g" | sed "s/HOUR_END/$HOUR_END/g" > modified.json
	curl --data-binary @modified.json --header "Content-Type: application/json" http://$host/api/v1/datapoints/query
	echo $i
done

I run this script once a week to rollup 7 days of data. It queries the data for one day at a time. Swapping out the start and end time as it goes.

Put the script in a cron job and now you can query for a years worth of data in just a few seconds.