Skip to content

Commit

Permalink
Merge pull request barryclark#30 from Dutzu/master
Browse files Browse the repository at this point in the history
Syntax highlighting fix
  • Loading branch information
hlgr360 committed Feb 4, 2016
2 parents 9084438 + cd5ee61 commit 86e8ba0
Show file tree
Hide file tree
Showing 2 changed files with 18 additions and 15 deletions.
20 changes: 11 additions & 9 deletions _posts/2016-01-11-log-aggregation.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,10 +73,11 @@ This is a pain because if you want to properly visualize a set of log messages g

Let's take a look at what fluentd sends to Elasticsearch. Here is a sample log file with 2 log messages:

~~~java
~~~
2015-11-12 06:34:01,471 [ ajp-apr-127.0.0.1-8009-exec-3] LogInterceptor INFO ==== Request ===
2015-11-12 06:34:01,473 [ ajp-apr-127.0.0.1-8009-exec-3] LogInterceptor INFO GET /monitor/broker/ HTTP/1.1
~~~
{: .language-java}

A message sent to Elasticsearch from fluentd would contain these values:

Expand All @@ -87,6 +88,7 @@ A message sent to Elasticsearch from fluentd would contain these values:
2015-11-12 06:34:01 -0800 tag.common: {"message":"[ ajp-apr-127.0.0.1-8009-exec-3] LogInterceptor INFO GET /monitor/broker/ HTTP/1.1\n","time_as_string":"2015-11-12 06:34:01 -0800"}
~~~
{: .language-java}

I added the `time_as_string` field in there just so you can see the literal string that is sent as the time value.

Expand All @@ -107,15 +109,15 @@ Next you need to parse the timestamp of your logs into separate date, time and m
</record>
</filter>
~~~

{: .language-xml}
The result is that the above sample will come out like this:


~~~
2015-12-12 05:26:15 -0800 akai.common: {"date_string":"2015-11-12","time_string":"06:34:01","msec":"471","message":"[ ajp-apr-127.0.0.1-8009-exec-3] LogInterceptor INFO ==== Request ===","@timestamp":"2015-11-12T06:34:01.471Z"}
2015-12-12 05:26:15 -0800 akai.common: {"date_string":"2015-11-12","time_string":"06:34:01","msec":"473","message":"[ ajp-apr-127.0.0.1-8009-exec-3] LogInterceptor INFO GET /monitor/broker/ HTTP/1.1\n","@timestamp":"2015-11-12T06:34:01.473Z"}
~~~

{: .language-java}
*__Note__: you can use the same record_transformer filter to remove the 3 separate time components after creating the `@timestamp` field via the `remove_keys` option.*

### Do not analyse
Expand Down Expand Up @@ -146,14 +148,14 @@ Using this example configuration I tried to create a pie chart showing the numbe
</record>
</filter>
~~~

{: .language-xml}
Sample output from stdout:


~~~
2015-12-12 06:01:35 -0800 clear: {"date_string":"2015-10-15","time_string":"06:37:32","msec":"415","message":"[amelJettyClient(0xdc64419)-706] jetty:test/test INFO totallyAnonymousContent: http://whyAreYouReadingThis?:)/history/3374425?limit=1","@timestamp":"2015-10-15T06:37:32.415Z","sourceProject":"Test-Analyzed-Field"}
~~~

{: .language-java}
And here is the result of trying to use it in a visualization:

{:.center}
Expand All @@ -175,11 +177,11 @@ curl -XPUT localhost:9200/_template/template_doru -d '{
"settings" : {....
}'
~~~

{: .language-bash}
The main thing to note in the whole template is this section:


~~~ json
~~~
"string_fields" : {
"match" : "*",
"match_mapping_type" : "string",
Expand All @@ -192,12 +194,12 @@ The main thing to note in the whole template is this section:
}
}
~~~

{: .language-json}
This tells Elasticsearch that for any field of type string that it receives it should create a mapping of type string that is analyzed + another field that adds a `.raw` suffix that will not be analyzed.

The `not_analyzed` suffixed field is the one you can safely use in visualizations, but do keep in mind that this creates the scenario mentioned before where you can have up to 40% inflation in storage requirements because you will have both analyzed and not_analyzed fields in store.

# Have fun
So, now you know what we went through here at [HaufeDev](http://haufe-lexware.github.io/) and what problems we faced and how we can overcome them.

If you want to give it a try you can take a look at [our docker templates on github](https://github.com/Haufe-Lexware/docker-templates), there you will find a [logaggregation template](https://github.com/Haufe-Lexware/docker-templates/tree/master/logaggregation) for an EFK setup + a shipper that can transfer messages securely to the EFK solution and you can have it up and running in a matter of minutes.
If you want to give it a try you can take a look at [our docker templates on github](https://github.com/Haufe-Lexware/docker-templates), there you will find a [logaggregation template](https://github.com/Haufe-Lexware/docker-templates/tree/master/logaggregation) for an EFK setup + a shipper that can transfer messages securely to the EFK solution and you can have it up and running in a matter of minutes.
13 changes: 7 additions & 6 deletions _posts/2016-01-18-fluentd-log-parsing.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ The simplest approach is to just parse all messages using the common denominator

In the case of a typical log file a configuration can be something like this (but not necessarily):

~~~ xml
~~~
<source>
type tail
path /var/log/test.log
Expand All @@ -39,7 +39,7 @@ In the case of a typical log file a configuration can be something like this (bu
format1 /(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2},\d{3}) (?<message>(.|\s)*)/
</source>
~~~

{: .language-xml}
You will notice we still do a bit of parsing, the minimal level would be to just have a multiline format to split the log contents into separate messages and then to push the contents on.

The reason we do not just put everything into a single field with a greedy regex pattern is to have the correct timestamp pushed showing the time of the log and not the time when the log message was read by the log shipper, along with the rest of the message.
Expand Down Expand Up @@ -90,6 +90,7 @@ An example of this is shown in the configuration below:
type stdout
</match>
~~~
{: .language-ruby}

This approach is useful when we have multiline log messages within our logfile and the messages themselves have different formats for the content. Still, the important thing to note is that all log messages are prefixed by a standard timestamp, this is key to succesfully splitting messages correctly.

Expand All @@ -99,10 +100,10 @@ Fluentd will continue to read logfile lines and keep them in a buffer until a li


Looking at the example, all our log messages (single or multiline) will take the form:
~~~ json
~~~
{ "time":"2015-10-15 08:21:04,716", "message":"[ ttt-grp-127.0.0.1-8119-test-11] LogInterceptor INFO HTTP/1.1 200 OK" }
~~~

{: .language-json}
Being tagged with log.unprocessed all the messages will be caught by the *rewrite_tag_filter* match tag and it is at this point that we can pinpoint what type of contents each message has and we can re-tag them for individual processing.

This module is key to the whole mechanism as the *rewrite_tag_filter* takes the role of a router. You can use this module to redirect messages to different processing modules or even outputs depending on the rules you define in it.
Expand Down Expand Up @@ -159,7 +160,7 @@ An example of this approach can be seen below:
</pattern>
</source>
~~~

{: .language-ruby}
When choosing this path there are multiple issues you need to be aware of:
* The pattern matching is done sequentially and the first pattern that matches the message is used to parse it and the message is passed along
* You need to make sure the most specific patterns are higher in the list and the more generic ones lower
Expand Down Expand Up @@ -208,7 +209,7 @@ AKA_ARGO_LOG2 %{AKAIDATESTAMP2:time} %{WORD:argoComponent} *%{LOGLEVEL:logLevel}
AKA_ARGO_SOURCE (GC|CMS)
AKA_ARGO_GC \[%{AKA_ARGO_SOURCE:source} %{AKA_GREEDYMULTILINE:message}
~~~

{: .language-bash}

To use Grok you will need to install the *fluent-plugin-grok-parser* and then you can use grok patterns with any of the other techniques previously described with regex: Multiline, Multi-format.

Expand Down

0 comments on commit 86e8ba0

Please sign in to comment.