Version: 0.15-SNAPSHOT
REST service for Mozilla Metrics. This service currently uses Apache Kafka as its backing data store, then provides a few implementations of Kafka consumers to pull and persist to various data sinks.
This code is built with the following assumptions. You may get mixed results if you deviate from these versions.
- Kafka 0.7.1+
- Protocol Buffers 2.4.1+
- Hadoop 0.20.2+
- HBase 0.90+
- Protocol Buffers
- Zookeeper (for Kafka)
- Kafka
- Hadoop (if using HDFS based consumer)
- HBase (if using HBase based consumer)
To make a jar you can do:
mvn package
The jar file is then located under target
.
Make sure your Kafka and Zookeeper servers are running first (see Kafka documentation)
In order to run bagheera on another machine you will probably want to use the dist assembly like so:
mvn assembly:assembly
The zip file now under the target
directory should be deployed to BAGHEERA_HOME
on the remote server.
To run Bagheera you can use bin/bagheera
or copy the init.d script by the same name from bin/init.d
to /etc/init.d
. The init script assumes an installation of bagheera at /usr/lib/bagheera
, but this can be modified by changing the BAGHEERA_HOME
variable near the top of that script. Here is an example of using the regular bagheera script:
bin/bagheera 8080
#####URI /submit/namespace/id | /1.0/submit/namespace/id##### POST/PUT
- The namespace is required and is only accepted if it is in the configured white-list.
- The id is optional although if you provide it currently it needs to be a valid UUID unless id validation is disabled on the namespace.
- The payload content length must be less than the configured maximum.
DELETE
- The namespace is required and is only accepted if it is in the configured white-list.
- The id is required although if you provide it currently it needs to be a valid UUID unless id validation is disabled on the namespace.
Here's the list of HTTP response codes that Bagheera could send back:
- 201 Created - Returns the id submitted/generated. (default)
- 403 Forbidden - Violated access restrictions. Most likely because of the method used.
- 413 Request Too Large - Request payload was larger than the configured maximum.
- 400 Bad Request - Returned if the POST/PUT failed validation in some manner.
- 404 Not Found - Returned if the URI path doesn't exist or if the URI was not in the proper format.
- 500 Server Error - General server error. Someone with access should look at the logs for more details.
# valid namespaces (whitelist only, comma separated)
valid.namespaces=mynamespace,othernamespace
max.content.length=1048576
# comma delimited list of ZK servers
zk.connect=127.0.0.1:2181
# use bagheera message encoder
serializer.class=com.mozilla.bagheera.serializer.BagheeraEncoder
# asynchronous producer
producer.type=async
# compression.code (0=uncompressed,1=gzip,2=snappy)
compression.codec=2
# batch size (one of many knobs to turn in kafka depending on expected data size and request rate)
batch.size=100
# kafka consumer properties
zk.connect=127.0.0.1:2181
fetch.size=1048576
#serializer.class=com.mozilla.bagheera.serializer.BagheeraDecoder
# bagheera specific kafka consumer properties
consumer.threads=2
We currently use the consumers implemented here, but it may also be of interest to look at systems such as Storm to process the messages. Storm contains a Kafka spout (consumer) and there are at least a couple of HBase bolts (processing/sink) already out there.
All aspects of this software are distributed under Apache Software License 2.0. See LICENSE file for full license text.
- Xavier Stevens (@xstevens)
- Daniel Einspanjer (@deinspanjer)
- Anurag Phadke (@anuragphadke)
- Mark Reid (@reid_write)
- Harsha Chintalapani