Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grok GREEDYDATA not greedy, or last log line symbol dissappear (tail plugin at least) #4998

Closed
andyceo opened this issue Nov 16, 2018 · 1 comment · Fixed by #5608
Closed
Labels
area/tail bug unexpected problem or unintended behavior
Milestone

Comments

@andyceo
Copy link

andyceo commented Nov 16, 2018

Relevant telegraf.conf:

[agent]
  interval = "1s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "1s"
  flush_jitter = "0s"
  precision = ""
  debug = false
  quiet = false
  logfile = "/tmp/telegraf.log"

[[outputs.file]]
  files = ["stdout"]
  data_format = "influx"

[[inputs.tail]]
  name_override = "logs"
  files = [
    "/tmp/test.log",
  ]
  from_beginning = true
  pipe = false
  watch_method = "inotify"
  data_format = "grok"
  grok_patterns = ["%{GREEDYDATA:log_line:string}"]

System info:

  • Ubuntu 4.4.0-139-generic More mysql metrics #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • Telegraf 1.8.3 (git: HEAD f297910)

Steps to reproduce:

  1. Create /tmp/telegraf.conf file given earlier

  2. Create /tmp/test.log with following data:

     192.168.2.219 - - [19/Aug/2018:15:38:07 +0000] "GET / HTTP/1.1" 200 117 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0"
     10.0.6.29 - - [19/Aug/2018:17:44:40 +0000] "GET / HTTP/1.1" 200 119 "-" "Go-http-client/1.1"
     "This is the string"
     And this is too!
     'Another one'
    
  3. Execute telegraf with telegraf --config /tmp/telegraf.conf

Output would be something like this:

logs,host=newhope,path=/tmp/test.log log_line="192.168.2.219 - - [19/Aug/2018:15:38:07 +0000] \"GET / HTTP/1.1\" 200 117 \"-\" \"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0" 1542388327005459777
logs,host=newhope,path=/tmp/test.log log_line="10.0.6.29 - - [19/Aug/2018:17:44:40 +0000] \"GET / HTTP/1.1\" 200 119 \"-\" \"Go-http-client/1.1" 1542388327005495981
logs,host=newhope,path=/tmp/test.log log_line="This is the string" 1542388327005520572
logs,host=newhope,path=/tmp/test.log log_line="And this is too!" 1542388327005532212
logs,host=newhope,path=/tmp/test.log log_line="'Another one'" 1542388327005593750

Expected behavior:

Every line must parsed and returned as-is.

Actual behavior:

As you can see, in some lines, last symbol, the double quote in given example, was mystically eaten. For the line entirely in double quotes, both leading and tailing quotes are gone. String in single quotes and without any quotes, was parsed as expexted.

Additional info:

If I use %{HTTPD_COMBINEDLOG} instead of %{GREEDYDATA:log_line:string} in grok patterns, every correct Nginx combined log parsed ok, as expected.

@danielnelson danielnelson added the bug unexpected problem or unintended behavior label Nov 16, 2018
@danielnelson danielnelson added this to the 1.10.0 milestone Nov 16, 2018
@russorat russorat modified the milestones: 1.10.0, 1.11.0 Jan 14, 2019
@sslupsky
Copy link

sslupsky commented Mar 19, 2019

I have encountered this issue today.

Here is an example. The input is captured by the tail plugin configured to use the grok data input format. The input is configured as follows:

[[inputs.tail]]
  files = ["/motescan/scannet03/log/ag_history.log"]
  name_override = "test3" 
  from_beginning = false
  pipe = false
  watch_method = "inotify"
  data_format = "grok"
  grok_patterns = ['^ts=%{TIMESTAMP_ISO8601:timestamp:ts-"2006-01-02T15:04:05.999"} %{GREEDYDATA:message:string}']


The "message" field is parsed using a parser processor configured as follows:

[[processors.parser]]
  namepass = [ "test3" ]
  order = 1
  parse_fields = [ "message" ]
  drop_original = false
  merge = "override"
  data_format = "logfmt"


The input and output lines are as follows:

input:
ts=2019-03-19T13:05:27.679 level=info id=cmdopts nwkwait=480 msg="ok"

output:
test3,application_name=grok,city=edmonton,country=ca,device_type=gateway,device_version=2v0,host=ip-xxx,hostname=scannet03,organization=scanimetrics,path=/motescan/scannet03/log/ag_history.log,project=P100 message="level=info id=cmdopts nwkwait=480 msg=\"ok" 1553000727679000000

Note the quoted string value for the "msg" field in the input line. Also note the position of this field in the input line and particularly the double quote at the end of the line.

In the output line there are two issues. First, the "message" field is missing the second escaped double quote that goes with the "msg" field value. Second, the "message" field has not been parsed by the logfmt parser. I believe this is what is throwing a logfmt syntax error message in the telegraf log (unterminated quoted value). Accordingly, the parser processor ignores the line which explains why the parsed fields and tags do not appear in the output.

As the OP has described, I believe I have tracked this problem down and isolated it to the log lines that end with a double quote. Thus, the lines that end with msg="some text" are mangled and are missing the second escaped double quote. I have confirmed that any line from the input that ends without a double quote is properly parsed by the grok input data processor and passed along to the logfmt parser processor. This appears to be a bug and I think it is with the grok input data format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tail bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants