Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document support for setting severity #51

Open
mr-salty opened this issue Jan 7, 2016 · 11 comments
Open

document support for setting severity #51

mr-salty opened this issue Jan 7, 2016 · 11 comments
Assignees

Comments

@mr-salty
Copy link
Contributor

mr-salty commented Jan 7, 2016

Splitting this off from #43 from @sadovnychyi

Is it possible to capture severity information using regexp like format /^(?<message>(?<time>[^ ]*\s*[^ ]* [^ ]*) .*)$/ in config file?

That sample is used to capture timestamp from syslogs, but does not seems to work properly. WOuld like to see documentation about this.

@mr-salty
Copy link
Contributor Author

mr-salty commented Jan 7, 2016

We've had a TODO to document this properly, but the short answer is yes, you can use a regexp (or json) to set the severity for a log entry, the key is 'severity' (#4 also discusses making that name configurable). http://docs.fluentd.org/articles/in_tail has generic documentation on how to do this.

The value of 'severity' ultimately has to be one of the enum values accepted by the Cloud Logging API as documented here. However, the plugin also has a list of translations from other common values (WARN=WARNING, FINE=DEBUG, etc). Case is ignored in all cases.

Numeric values will also be translated into valid severity values (0=DEFAULT, 100=INFO, ... 800=EMERGENCY). Values are rounded down to the nearest 100, and anything >=800 maps to EMERGENCY.

@j-walker23
Copy link

Does anyone have any examples on how to do this for the syslog config file? It would be cool if the default syslog could get the severity parsed. I am having trouble with the format regex.

@qingling128
Copy link
Contributor

@j-walker23

A typical GCP instance has syslog formatted as below:

Mar 27 15:09:01 agent-test-20170119 CRON[6659]: (root) CMD (  [ -x /usr/lib/php5/sessionclean ] && /usr/lib/php5/sessionclean)
Mar 27 15:17:01 agent-test-20170119 CRON[6707]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Mar 27 15:29:39 agent-test-20170119 collectd[5346]: write_gcm: Asking metadata server for auth token

severity does not seem to be in the message itself by default. I guess you have some customized logs with severity in the log messages? To figure out a proper regex, this doc for tail plugin format might be a good reference.

A sample syslog format is also included there:

format /^(?<time>[^ ]*\s*[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$/
time_format %b %d %H:%M:%S

Hope this helps.

@j-walker23
Copy link

j-walker23 commented Apr 16, 2017

@qingling128 Thanks for your response. The syslog format is what i was having trouble with.
The google-fluentd default format is:
format /^(?<message>(?<time>[^ ]*\s*[^ ]* [^ ]*) .*)$/
Trying to figure out how to add severity in there with no similar nested named group examples was pretty confusing for a while.

Although pretty ugly i finally got stackdriver syslogs to show severity. I probably could have used look(ahead|behind)s to remove the duplication but regex isn't my strong suit.
Here is what i ended up with in case anyone needs this before they do an official fix. Complete snippet added to my startup-script.sh.

mkdir -p /etc/google-fluentd/config.d/
cat >/etc/google-fluentd/config.d/syslog.conf << EOF
<source>
  type tail

  format /^(?<message>(?<time>[^ ]*\s*[^ ]* [^ ]*) .*?[^INFO|ERROR|WARNING|DEBUG|CRITICAL]*(?<severity>INFO|ERROR|WARNING|DEBUG|CRITICAL)?.*?)$/

  path /var/log/syslog
  pos_file /var/lib/google-fluentd/pos/syslog.pos
  read_from_head true
  tag syslog
</source>
EOF

service google-fluentd reload

I still don't understand why the format from the examples like the one you posted do not have <time> nested in the <message> named regex group. But the syslogformat does. Not important, just curious. Any idea?

@igorpeshansky
Copy link
Member

igorpeshansky commented Apr 16, 2017

@j-walker23, you can remove the [^INFO|ERROR|WARNING|DEBUG|CRITICAL]* group from your regex — it doesn't do anything.
Embedding time in the message allows us to preserve the exact line from syslog as message in case we mis-parse the timestamp and you need to find an exact log line.
Note that your question is related to GoogleCloudPlatform/fluentd-catch-all-config#10, which requests structured log parsing in the default configs.

@igorpeshansky
Copy link
Member

Sorry about accidentally closing this — fat-fingered the "Close and comment" button in the mobile UI...

@j-walker23
Copy link

@igorpeshansky awesome, thanks for explaining why. That makes sense.

Because i don't control all log formats to syslog the severity can be anywhere after the time if at all. That was the only way i could get it to work was by having the optional any char repeater excluding the severities to be able to pick up in all scenarios.

@igorpeshansky
Copy link
Member

@j-walker23, I was talking specifically about the square-bracketed group I've quoted, which is effectively equivalent to [^ABCDEFGILNORTUW|]*, and doesn't do what you seem to think it does. The previous .*? should be enough to cover all of the relevant prefixes. The parenthesized <severity> capture is fine.

@j-walker23
Copy link

Gotcha. Thanks for your help.

@marcinrosinski
Copy link

Just in case someone might find it useful, had similar issue/requirement, managed to solve it, here is the solution:

https://stackoverflow.com/questions/44095746/stackdriver-logging-log-severity-levels-not-reported-received-when-sent-via-sy/44096765#44096765

@jkohen
Copy link
Contributor

jkohen commented Jul 31, 2019

@igorpeshansky can you see if there's something left to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants