-
Notifications
You must be signed in to change notification settings - Fork 417
Events
Wiki ▸ API Reference ▸ Collector ▸ Events
In essence, Cube is a structured logging system. Events are time-stamped JSON blobs that you can query later to compute metrics. Provided you have the capacity to store all these events, this approach can be much more powerful than precomputing aggregate metrics (e.g., collectd + Graphite). For example, by logging the latency of individual requests to a website, you can compute latency histograms and quantiles across arbitrary groups of requests, giving you a much better sense—the full multimodal distribution—of your website's performance.
When you send events to Cube, two fields are required:
- type - a name for grouping events (such as "request"); typically singular, of the form [a-z_][a-zA-Z0-9_]*.
- time - the event time in ISO 8601 format, UTC.
For example, here's a minimal event that might be used as a hit counter:
{
"type": "request",
"time": "2011-09-12T21:33:12Z"
}
The type determines the name of the underlying collection the the Mongo database. For example, if you send an event of type "request", then Cube will store the event in the collection "request_events", and store the associated metrics in the collection "request_metrics". These collections and the necessary associated indexes will be created automatically, if they do not already exist.
While it is possible to use a single event type for all events you send to Cube, it's a good idea to use descriptive event types. By storing events in separate collections, you can create custom indexes for those events, and you can control the size of the associated metrics cache. By default, the only index on the events table is by time ({t: 1}
). If you frequently perform queries using a particular data field, then you should add an index to Cube's backing Mongo database. For example, if you frequently query "request" events by the field "path", then create an index on path and time: {"d.path": 1, t: 1}
. This will greatly improve the performance of finding events for a particular path within a given time range.
Events may also include two optional fields:
- id - a unique identifier, for replacing existing events.
- data - a data object, for storing additional event data.
By specifying an id
, you allow Cube to replace a previous event with new data. (Note: the new event must have the same time as the old event; otherwise, Cube will only invalidate the metrics associated with the new time.) Unique identifiers are often used when replicating data from another data source (such as a SQL database). The ID can be any JSON object, but is most commonly an integer. For example:
{
"type": "request",
"time": "2011-09-12T21:33:12Z",
"id": 42,
"data": {
"duration_ms": 241
}
}
The data
field stores arbitrary JSON that you wish to associated with the event. Typically this is an object that contains a set of key-value pairs; however, you can store any JSON data, such as numbers, strings, booleans, arrays, nested objects, etc. Currently, Cube restricts the property names you can use when storing events: names must be of the form [a-zA-Z_][a-zA-Z0-9_$].
{
"type": "request",
"time": "2011-09-12T21:33:12Z",
"data": {
"host": "web14",
"path": "/search",
"query": {
"q": "flowers"
},
"duration_ms": 241,
"status": 200,
"user_agent": "Chrome/13.0.782.112"
}
}
Note that Mongo is generally more efficient if you use short property names. Perhaps in the future it'd be nice if Mongo (or an alternative datastore) could detect repeated properties across events and store them symbolically.
Internally, events are transformed slightly for more efficient representation. The above "request" event is represented in the Mongo request_events
collection as:
{
"_id" : ObjectId("47cc67093475061e3d95369d"),
"t": ISODate("2011-09-12T21:33:12Z"),
"d": {
"host": "web14",
"path": "/search",
"query": {
"q": "flowers"
},
"duration_ms": 241,
"status": 200,
"user_agent": "Chrome/13.0.782.112"
}
}