Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout exception caused fatal error in grok filter #95

Closed
algestam opened this issue Oct 13, 2016 · 10 comments
Closed

Timeout exception caused fatal error in grok filter #95

algestam opened this issue Oct 13, 2016 · 10 comments
Assignees
Labels

Comments

@algestam
Copy link

  • Version: Logstash 5.0.0-rc1 (logstash-filter-grok-3.2.2), Elasticsearch 5.0.0-rc1, Kibana 5.0.0-rc1
  • Operating System: Ubuntu 16.04 x64
  • Config File (if you have sensitive info, please remove it):
  • Sample Data:
  • Steps to Reproduce: Make the grok filter generate a timeout exception

I am trying out the upcoming 5.0.0 elasticsearch stack and have stumbled upon a problem that I have not encountered before.

The problem is that Logstash fails with a fatal error undefined method 'pattern' for nil:NilClass>. Before the fatal error occured, a number of exceptions (7) of type PeriodicPoller: exception occured
starting at 2016-10-12T18:48:34,436 (~50 minutes before the fatal error)

The latest entries in the log looks like this:

[2016-10-12T19:35:54,747][ERROR][logstash.instrument.periodicpoller.jvm] PeriodicPoller: exception {:poller=>#<LogStash::Instrument::PeriodicPoller::JVM:0x2880ff6f @task=#<Concurrent::TimerTask:0x549645fe @observers=#<Concurrent::Collection::CopyOnNotifyObserverSet:0x631f50cf @observers={#<LogStash::Instrument::PeriodicPoller::JVM:0x2880ff6f ...>=>:update}>, @timeout_interval=60.0, @running=#<Concurrent::AtomicBoolean:0x7f5655a4>, @StoppedEvent=#<Concurrent::Event:0x19095605 @set=false, @iteration=0>, @execution_interval=1.0, @do_nothing_on_deref=true, @run_now=nil, @freeze_on_deref=nil, @executor=#<Concurrent::SafeTaskExecutor:0x9f4c3 @task=#<Proc:0x7dab948d@/logstash/logstash-core/lib/logstash/instrument/periodic_poller/base.rb:52>, @exception_class=StandardError>, @StopEvent=#<Concurrent::Event:0x309d05e9 @set=false, @iteration=0>, @value=nil, @copy_on_deref=nil, @dup_on_deref=nil>, @peak_threads=38, @peak_open_fds=7824, @metric=#<LogStash::Instrument::Metric:0x63f7c481 @collector=#<LogStash::Instrument::Collector:0x1e3fead8 @agent=nil, @metric_store=#<LogStash::Instrument::MetricStore:0x2c179bd0 @store=#<Concurrent::Map:0xdb97d23 @default_proc=nil>, @structured_lookup_mutex=#<Mutex:0x55d38330>, @fast_lookup=#<Concurrent::Map:0x85d86c7 @default_proc=nil>>, @observer_state=true, @snapshot_task=#<Concurrent::TimerTask:0x18844507 @observers=#<Concurrent::Collection::CopyOnNotifyObserverSet:0x5e126351 @observers={#<LogStash::Instrument::Collector:0x1e3fead8 ...>=>:update}>, @timeout_interval=600.0, @running=#<Concurrent::AtomicBoolean:0x739e762b>, @StoppedEvent=#<Concurrent::Event:0x4988a922 @set=false, @iteration=0>, @execution_interval=1.0, @do_nothing_on_deref=true, @run_now=nil, @freeze_on_deref=nil, @executor=#<Concurrent::SafeTaskExecutor:0x44d53a9 @task=#<Proc:0x1e4b6e21@/logstash/logstash-core/lib/logstash/instrument/collector.rb:87>, @exception_class=StandardError>, @StopEvent=#<Concurrent::Event:0x1ee47ced @set=false, @iteration=0>, @value=false, @copy_on_deref=nil, @dup_on_deref=nil>>>, @options={:polling_interval=>1, :polling_timeout=>60}>, :result=>nil, :exception=>#<Concurrent::TimeoutError: Concurrent::TimeoutError>, :executed_at=>2016-10-12 19:35:54 +0000}
[2016-10-12T19:36:24,695][ERROR][logstash.pipeline        ] Exception in pipelineworker, the pipeline stopped processing new events, please check your filter configuration and restart Logstash. {"exception"=>#<NoMethodError: undefined method `pattern' for nil:NilClass>, "backtrace"=>["/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-grok-3.2.2/lib/logstash/filters/grok/timeout_exception.rb:12:in `message'", "/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-grok-3.2.2/lib/logstash/filters/grok.rb:304:in `filter'", "/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-grok-3.2.2/lib/logstash/filters/grok.rb:302:in `filter'", "/logstash/logstash-core/lib/logstash/filters/base.rb:156:in `multi_filter'", "org/jruby/RubyArray.java:1613:in `each'", "/logstash/logstash-core/lib/logstash/filters/base.rb:153:in `multi_filter'", "/logstash/logstash-core/lib/logstash/filter_delegator.rb:41:in `multi_filter'", "(eval):1687:in `filter_func'", "/logstash/logstash-core/lib/logstash/pipeline.rb:260:in `filter_batch'", "org/jruby/RubyProc.java:281:in `call'", "/logstash/logstash-core/lib/logstash/util/wrapped_synchronous_queue.rb:178:in `each'", "org/jruby/RubyHash.java:1342:in `each'", "/logstash/logstash-core/lib/logstash/util/wrapped_synchronous_queue.rb:177:in `each'", "/logstash/logstash-core/lib/logstash/pipeline.rb:258:in `filter_batch'", "/logstash/logstash-core/lib/logstash/pipeline.rb:246:in `worker_loop'", "/logstash/logstash-core/lib/logstash/pipeline.rb:225:in `start_workers'"]}
[2016-10-12T19:36:25,099][FATAL][logstash.runner          ] An unexpected error occurred! {:error=>#<NoMethodError: undefined method `pattern' for nil:NilClass>, :backtrace=>["/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-grok-3.2.2/lib/logstash/filters/grok/timeout_exception.rb:12:in `message'", "/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-grok-3.2.2/lib/logstash/filters/grok.rb:304:in `filter'", "/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-grok-3.2.2/lib/logstash/filters/grok.rb:302:in `filter'", "/logstash/logstash-core/lib/logstash/filters/base.rb:156:in `multi_filter'", "org/jruby/RubyArray.java:1613:in `each'", "/logstash/logstash-core/lib/logstash/filters/base.rb:153:in `multi_filter'", "/logstash/logstash-core/lib/logstash/filter_delegator.rb:41:in `multi_filter'", "(eval):1687:in `filter_func'", "/logstash/logstash-core/lib/logstash/pipeline.rb:260:in `filter_batch'", "org/jruby/RubyProc.java:281:in `call'", "/logstash/logstash-core/lib/logstash/util/wrapped_synchronous_queue.rb:178:in `each'", "org/jruby/RubyHash.java:1342:in `each'", "/logstash/logstash-core/lib/logstash/util/wrapped_synchronous_queue.rb:177:in `each'", "/logstash/logstash-core/lib/logstash/pipeline.rb:258:in `filter_batch'", "/logstash/logstash-core/lib/logstash/pipeline.rb:246:in `worker_loop'", "/logstash/logstash-core/lib/logstash/pipeline.rb:225:in `start_workers'"]}

The elasticsearch logs has no indications that something has gone wrong and kibana can continue to use it to generate graphs etc.

Looking at the grok filter code, it seems like that the @grok variable is set to nil in lib/logstash/filters/grok/timeout_exception.rb causing the message method to throw a NoMethodError.

If you need help with reproducing the problem, I can try to make a smaller data set/config.

@jsvd
Copy link
Member

jsvd commented Oct 13, 2016

are you running logstash in debug mode in that production environment? it seems the uncaught exception happens while logging an event in debug mode: https://github.com/logstash-plugins/logstash-filter-grok/blob/v3.2.2/lib/logstash/filters/grok.rb#L302

@algestam
Copy link
Author

No, I am not running in debug mode AFAIK. Please advise me on how to verify if logstash runs in debug mode or not.

I believe that message is logged due to a timeout here: https://github.com/logstash-plugins/logstash-filter-grok/blob/v3.2.2/lib/logstash/filters/grok.rb#L304

For some reason, the timeout exception is initialized with an empty @grok variable and the message method fails here: https://github.com/logstash-plugins/logstash-filter-grok/blob/v3.2.2/lib/logstash/filters/grok/timeout_exception.rb#L12

@jsvd
Copy link
Member

jsvd commented Oct 13, 2016

looking at the rest of the stack trace you showed:

[2016-10-12T19:36:25,099][FATAL][logstash.runner          ] An unexpected error occurred! {:error=>#<NoMethodError: undefined method `pattern' for nil:NilClass>, :backtrace=>[
  "/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-grok-3.2.2/lib/logstash/filters/grok/timeout_exception.rb:12:in `message'",
  "/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-grok-3.2.2/lib/logstash/filters/grok.rb:304:in `filter'",
  "/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-grok-3.2.2/lib/logstash/filters/grok.rb:302:in `filter'",
  "/logstash/logstash-core/lib/logstash/filters/base.rb:156:in `multi_filter'", "org/jruby/RubyArray.java:1613:in `each'",
  # ..
]}

line 302 of grok.rb is:

      @logger.debug? and @logger.debug("Event now: ", :event => event)

which is really strange..

@algestam
Copy link
Author

algestam commented Oct 13, 2016

Well yes but I don't think that line 302 is ever reached when this occurs since a TimeoutException is raised. When the exception is logged it fails on line 304:

@logger.warn(e.message)

The reason for this seems to be that @grok is nil (perhaps the timeout was raised before grok was set to a value?).

@jsvd
Copy link
Member

jsvd commented Oct 13, 2016

in the normal code path, @grok is set properly:

    rescue ::LogStash::Filters::Grok::TimeoutException => e
      # These fields aren't present at the time the exception was raised
      # so we add them here.
      # We could store this metadata in the @threads_to_start_time hash
      # but that'd come at a perf cost and this works just as well.
      e.grok = grok
      e.field = field
      e.value = value
      raise e
    end

I understand that the other rescue block in grok.rb:304 should be setting e.grok = @grok like this one does, but I'm still curious about what code path lead to the TimeoutException leaking out of grok_till_timeout

andrewvc added a commit to andrewvc/logstash-filter-grok that referenced this issue Oct 13, 2016
Previously we could see errors as in logstash-plugins#95 due to some very esoteric
race conditions where Thread#raise would raise outside of the rescue
context. This patch changes the mechanism to be setting Thread.interrupt
which is more robust.
@andrewvc
Copy link
Contributor

Thanks so much for reporting this! #96 should fix it once merged :)

@algestam
Copy link
Author

Thanks! I will try out the fix when the next version is released.

andrewvc added a commit to andrewvc/logstash-filter-grok that referenced this issue Oct 13, 2016
Previously we could see errors as in logstash-plugins#95 due to some very esoteric
race conditions where Thread#raise would raise outside of the rescue
context. This patch changes the mechanism to be setting Thread.interrupt
which is more robust.
andrewvc added a commit to andrewvc/logstash-filter-grok that referenced this issue Oct 13, 2016
Previously we could see errors as in logstash-plugins#95 due to some very esoteric
race conditions where Thread#raise would raise outside of the rescue
context. This patch changes the mechanism to be setting Thread.interrupt
which is more robust.
elasticsearch-bot pushed a commit that referenced this issue Oct 14, 2016
Previously we could see errors as in #95 due to some very esoteric
race conditions where Thread#raise would raise outside of the rescue
context. This patch changes the mechanism to be setting Thread.interrupt
which is more robust.

Fixes #96
@sjivan
Copy link

sjivan commented Oct 30, 2016

Did this fix make the 5.0 release? I'm seeing a similar issue with the 5.0 release.

https://discuss.elastic.co/t/grok-terminating-logstash-5-0/64307/2

Edit : looks like the fix was made against v3.2.3 which is the version that is bundled with LS 5.0 so it appears to still be happening

@andrewvc
Copy link
Contributor

andrewvc commented Nov 1, 2016

@sjivan thanks for reporting this. This looks like a separate issue involving interrupts during mutexes. It looks like an interrupt during a Mutex raises an unexpected exception type. I'll work on a patch.

@andrewvc
Copy link
Contributor

andrewvc commented Nov 1, 2016

Moving this to #97

@andrewvc andrewvc closed this as completed Nov 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants