Bagheera

Version: 0.15-SNAPSHOT

REST service for Mozilla Metrics. This service currently uses Apache Kafka as its backing data store, then provides a few implementations of Kafka consumers to pull and persist to various data sinks.

Version Compatability

This code is built with the following assumptions. You may get mixed results if you deviate from these versions.

Kafka 0.7.1+
Protocol Buffers 2.4.1+
Hadoop 0.20.2+
HBase 0.90+

Prerequisites

Protocol Buffers
Zookeeper (for Kafka)
Kafka
Hadoop (if using HDFS based consumer)
HBase (if using HBase based consumer)

Building

To make a jar you can do:

mvn package

The jar file is then located under target.

Running an instance

Make sure your Kafka and Zookeeper servers are running first (see Kafka documentation)

In order to run bagheera on another machine you will probably want to use the dist assembly like so:

mvn assembly:assembly

The zip file now under the target directory should be deployed to BAGHEERA_HOME on the remote server.

To run Bagheera you can use bin/bagheera or copy the init.d script by the same name from bin/init.d to /etc/init.d. The init script assumes an installation of bagheera at /usr/lib/bagheera, but this can be modified by changing the BAGHEERA_HOME variable near the top of that script. Here is an example of using the regular bagheera script:

bin/bagheera 8080

REST Request Format

#####URI /submit/namespace/id | /1.0/submit/namespace/id##### POST/PUT

The namespace is required and is only accepted if it is in the configured white-list.
The id is optional although if you provide it currently it needs to be a valid UUID unless id validation is disabled on the namespace.
The payload content length must be less than the configured maximum.

DELETE

The namespace is required and is only accepted if it is in the configured white-list.
The id is required although if you provide it currently it needs to be a valid UUID unless id validation is disabled on the namespace.

Here's the list of HTTP response codes that Bagheera could send back:

201 Created - Returns the id submitted/generated. (default)
403 Forbidden - Violated access restrictions. Most likely because of the method used.
413 Request Too Large - Request payload was larger than the configured maximum.
400 Bad Request - Returned if the POST/PUT failed validation in some manner.
404 Not Found - Returned if the URI path doesn't exist or if the URI was not in the proper format.
500 Server Error - General server error. Someone with access should look at the logs for more details.

Example Bagheera Configuration (conf/bagheera.properties)

# valid namespaces (whitelist only, comma separated)
valid.namespaces=mynamespace,othernamespace
max.content.length=1048576

Example Kafka Producer Configuration (conf/kafka.producer.properties)

# comma delimited list of ZK servers
zk.connect=127.0.0.1:2181
# use bagheera message encoder
serializer.class=com.mozilla.bagheera.serializer.BagheeraEncoder
# asynchronous producer
producer.type=async
# compression.code (0=uncompressed,1=gzip,2=snappy)
compression.codec=2
# batch size (one of many knobs to turn in kafka depending on expected data size and request rate)
batch.size=100

Example Kafka Consumer Configuration (conf/kafka.consumer.properties)

# kafka consumer properties
zk.connect=127.0.0.1:2181
fetch.size=1048576
#serializer.class=com.mozilla.bagheera.serializer.BagheeraDecoder
# bagheera specific kafka consumer properties
consumer.threads=2

Notes on consumers

We currently use the consumers implemented here, but it may also be of interest to look at systems such as Storm to process the messages. Storm contains a Kafka spout (consumer) and there are at least a couple of HBase bolts (processing/sink) already out there.

License

All aspects of this software are distributed under Apache Software License 2.0. See LICENSE file for full license text.

Contributors

Xavier Stevens (@xstevens)
Daniel Einspanjer (@deinspanjer)
Anurag Phadke (@anuragphadke)
Mark Reid (@reid_write)
Harsha Chintalapani

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Bagheera

REST service for Mozilla Metrics. This service currently uses Apache Kafka as its backing data store, then provides a few implementations of Kafka consumers to pull and persist to various data sinks.

Version Compatability

Prerequisites

Building

Running an instance

REST Request Format

Example Bagheera Configuration (conf/bagheera.properties)

Example Kafka Producer Configuration (conf/kafka.producer.properties)

Example Kafka Consumer Configuration (conf/kafka.consumer.properties)

Notes on consumers

License

Contributors

Files

README.md

Latest commit

History

README.md

File metadata and controls

Bagheera

REST service for Mozilla Metrics. This service currently uses Apache Kafka as its backing data store, then provides a few implementations of Kafka consumers to pull and persist to various data sinks.

Version Compatability

Prerequisites

Building

Running an instance

REST Request Format

Example Bagheera Configuration (conf/bagheera.properties)

Example Kafka Producer Configuration (conf/kafka.producer.properties)

Example Kafka Consumer Configuration (conf/kafka.consumer.properties)

Notes on consumers

License

Contributors