Contents
reaper
is a simple tool to collect access logs from web servers and
publish the logs to an external message queue.

- Collect log on TCP/UDP syslog
- Syslog RFC3154 or RFC5424
- Collect log on stdin
- Parse logs formats: JSON, key/values, common, combined
- Stream access logs with websocket
- Download logs with HTTP
- Filter out unwanted log lines (predicate in Javascript)
- Can write collected logs to stdout, stderr, file
- Can write collected logs to databases: PostgreSQL/TimescaleDB, Elasticsearch
- Can write collected logs to message brokers: RabbitMQ, nsqd, STOMP enabled message broker
- Can write collected logs to a distributed log: Kafka
- Can write collected logs to a redis list
- Can forward collected logs to another reaper instance
- Should work on any *NIX
Alpha. Version 0.1.0.
reaper is functional and be used in simple environments. But it lacks proper test cases and performance testing in busy environments.
Binary releases
https://github.com/stephane-martin/reaper/releases
Just copy the binary in your PATH.
Compile from source
git clone https://github.com/stephane-martin/reaper
in an appropriate folder (GOPATH…)make debug
ormake release
Currently reaper does not use a configuration file. Arguments are passed on the command line or with environment variables.
reaper --help
reaper (command) --help
Start reaper with --tcp 127.0.0.1:1514
. Here 127.0.0.1 is the listen
address.
Start reaper with --udp 127.0.0.1:1514
.
This can be used with nginx or caddy. In nginx.conf:
access_log syslog:server=127.0.0.1:1514,facility=daemon,tag=nginxaccess,severity=info jrich;
By default the syslog protocol is supposed to be RFC3164. Use the global flag ‘–rfc5424’ to switch to RFC5424.
Start reaper with --stdin
.
This can be used with Apache. For example in Apache configuration:
CustomLog "||/path/to/reaper --format combined --stdin" combined
reaper needs to know the format in which the web server writes access
logs entries. Use the --format
flag.
reaper --udp 127.0.0.1:1514 --format json
Example nginx configuration:
log_format jrich escape=json '{' '"timestamp":"$time_iso8601",' '"method":"$request_method",' '"scheme":"$scheme",' '"host":"$host",' '"server":"$server_name",' '"uri":"$uri",' '"duration":$request_time,' '"length":$request_length,' '"status":$status,' '"sent":$bytes_sent,' '"agent":"$http_user_agent",' '"remoteaddr":"$remote_addr",' '"remoteuser":"$remote_user"' '}'; access_log syslog:server=127.0.0.1:1514,facility=daemon,tag=nginxaccess,severity=info jrich;
reaper --udp 127.0.0.1:1514 --format kv
Example nginx configuration:
log_format rich 'remote_addr="$remote_addr" remote_user="$remote_user" time="$time_iso8601" length=$request_length' ' host="$host" request="$request_uri" uri="$uri" status=$status bytes_sent=$bytes_sent agent="$http_user_agent"' ' duration=$request_time upstream_duration=$upstream_response_time method="$request_method" scheme="$scheme"' ' server="$server_name"';
reaper --udp 127.0.0.1:1514 --format common
reaper --udp 127.0.0.1:1514 --format combined
The --filterout EXPR
global flag can be set to specify a filter.
EXPR is a javascript expression that can use the log entry fields. If the EXPR is True, the entry is filtered out. Multiple –filterout flags can be used. In that case, an entry is filtered out if any of the expressions is True.
Example:
reaper --udp 127.0.0.1:1514 --format json --filterout 'host=="example.org"' stdout
Log entries for requests to http://example.org will be filtered out.
Please note that filtering is not free from a performance point of view. It uses an embedded Javascript engine.
reaper can forward access logs to various destinations. The type of the destination is selected through a command on reaper command line, after the previous global flags.
When the destination is not reachable, log entries are buffered in the embedded nsqd instance. When the destination is reachable again, buffered entries will be forwarded. So you do not need to start the destination before reaper.
Each destination has specific flags to configure it.
reaper --udp 127.0.0.1 stdout
reaper --udp 127.0.0.1 stderr
reaper --udp 127.0.0.1 file --filename /tmp/access.log
=> write log entries to /tmp/access.logreaper --udp 127.0.0.1 file --gzip --filename /tmp/access.log.gz
=> write compressed log entries to /tmp/access.log.gz
Forward logs to a RabbitMQ exchange.
reaper --udp 127.0.0.1 rabbitmq --uri "amqp://guest:guest@localhost:5672/" --exchange exname --routing-key key --type direct
This will forward entries to a RabbitMQ broker, located at localhost:5672, using guest/guest as credentials, to the / virtual host, in the direct exchange exname, and with “key” as a routing key.
./reaper_debug --udp 127.0.0.1:1514 stomp --login user --passcode password --host virtualhost --destination /queue/reaper --addr 192.168.1.2:61613
Forward logs to an Elasticsearch server.
reaper --udp 127.0.0.1 elasticsearch --url http://127.0.0.1:9200 --index indexname
Forward logs to Redis, using a redis list (think LPOP, RPUSH).
reaper --udp 127.0.0.1 redis --addr 127.0.0.1:6379 --listname thelistkey --database 6 --password pass
reaper --udp 127.0.0.1 kafka --broker 192.168.1.2:9092 --broker 192.168.1.3:9092 --broker 192.168.1.4:9092 --topic topicname
First you need to create a table in PostgreSQL that is consistent with the log format.
For example:
+------------+--------------------------+-------------------+ | Column | Type | Modifiers | |------------+--------------------------+-------------------+ | timestamp | timestamp with time zone | not null | | method | text | default ''::text | | scheme | text | default ''::text | | host | text | default ''::text | | server | text | default ''::text | | uri | text | default ''::text | | duration | double precision | default 0 | | length | integer | default 0 | | status | integer | default 0 | | sent | integer | default 0 | | agent | text | default ''::text | | remoteaddr | text | default ''::text | | remoteuser | text | default ''::text | +------------+--------------------------+-------------------+ Indexes: "reaper_duration_timestamp_idx" btree (duration, "timestamp" DESC) "reaper_host_timestamp_idx" btree (host, "timestamp" DESC) "reaper_length_timestamp_idx" btree (length, "timestamp" DESC) "reaper_method_timestamp_idx" btree (method, "timestamp" DESC) "reaper_remoteaddr_timestamp_idx" btree (remoteaddr, "timestamp" DESC) "reaper_scheme_timestamp_idx" btree (scheme, "timestamp" DESC) "reaper_sent_timestamp_idx" btree (sent, "timestamp" DESC) "reaper_server_timestamp_idx" btree (server, "timestamp" DESC) "reaper_timestamp_idx" btree ("timestamp" DESC)
Then:
reaper --udp 127.0.0.1:1514 pgsql \ --uri "postgres://user:[email protected]/dbname" --table tablename --fields "timestamp,method,scheme,host,server,uri,duration,length,status,sent,agent,remoteaddr,remoteuser"
reaper --udp 127.0.0.1:1514 nsq --addr 192.168.1.2:4150 --topic topicname --json
On machine A 192.168.1.2 (with web server):
reaper --udp 127.0.0.1:1514 nsq --addr 192.168.1.3:4150 --topic embedded
On machine B 192.168.1.3:
reaper --nsqd-address 192.168.1.3 --nsqd-tcp-port 4150 pgsql ...
If started with --http-address
, reaper exposes a HTTP API.
Endpoints:
/status => just returns 200 HTTP status code.
/metrics => prometheus metrics (with the embedded nsqd metrics).
POST /download/:clientid?wait=3000&size=1000 => creates a channel of access logs entries and download entries.
size is the number of entries to be returned. wait is the number of milliseconds to wait
After the first POST call, a nsq channel is created. All received entries will be copied to this channel. Each successive POST call with return different entries.
DELETE /download/:clientid => delete a previously created channel
If started with --websocket-address
, reaper exposes a websocket
endpoint.
- /stream: stream received entries to the websocket client.
By default reaper own logs are written on stderr.
The logging level can be set with --loglevel
[debug, info, warn,
error, crit].
Alternatively reaper can use syslog with --syslog
reaper embeds a nsqd service (https://nsq.io). When access logs entries are received on TCP, UDP or stdin, they are first stored in the embedded nsqd. Thus, reaper only deletes an access log entry when it has been reliably sent to the configured destination.
Forwarding to the destination is done asynchronously to achieve good performance.
https://github.com/stephane-martin/reaper/blob/master/CHANGELOG.md