Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logstash error on POD shutdown -shutdown process appears to be stalled due to busy or blocked plugins #13814

Open
adessureaultcoveo opened this issue Feb 23, 2022 · 2 comments

Comments

@adessureaultcoveo
Copy link

Logstash information:

Please include the following information:

  1. Logstash version : 7.16.1
  2. Logstash installation source (docker)
  3. How is Logstash being run (AWS sts / kubernetes)

Plugins installed:

logstash-codec-avro
logstash-codec-cef
logstash-codec-cloudtrail
logstash-codec-collectd
logstash-codec-dots
logstash-codec-edn
logstash-codec-edn_lines
logstash-codec-es_bulk
logstash-codec-fluent
logstash-codec-graphite
logstash-codec-gzip_lines
logstash-codec-json
logstash-codec-json_lines
logstash-codec-line
logstash-codec-msgpack
logstash-codec-multiline
logstash-codec-netflow
logstash-codec-plain
logstash-codec-rubydebug
logstash-filter-aggregate
logstash-filter-anonymize
logstash-filter-cidr
logstash-filter-clone
logstash-filter-csv
logstash-filter-date
logstash-filter-de_dot
logstash-filter-dissect
logstash-filter-dns
logstash-filter-drop
logstash-filter-elasticsearch
logstash-filter-fingerprint
logstash-filter-geoip
logstash-filter-grok
logstash-filter-http
logstash-filter-json
logstash-filter-json_encode
logstash-filter-kv
logstash-filter-memcached
logstash-filter-metrics
logstash-filter-mutate
logstash-filter-prune
logstash-filter-ruby
logstash-filter-sleep
logstash-filter-split
logstash-filter-syslog_pri
logstash-filter-throttle
logstash-filter-translate
logstash-filter-truncate
logstash-filter-urldecode
logstash-filter-useragent
logstash-filter-uuid
logstash-filter-xml
logstash-input-azure_event_hubs
logstash-input-beats
└── logstash-input-elastic_agent (alias)
logstash-input-couchdb_changes
logstash-input-dead_letter_queue
logstash-input-elasticsearch
logstash-input-exec
logstash-input-file
logstash-input-ganglia
logstash-input-gelf
logstash-input-generator
logstash-input-graphite
logstash-input-heartbeat
logstash-input-http
logstash-input-http_poller
logstash-input-imap
logstash-input-jms
logstash-input-kinesis
logstash-input-pipe
logstash-input-redis
logstash-input-s3
logstash-input-snmp
logstash-input-snmptrap
logstash-input-sqs
logstash-input-stdin
logstash-input-syslog
logstash-input-tcp
logstash-input-twitter
logstash-input-udp
logstash-input-unix
logstash-integration-elastic_enterprise_search
 ├── logstash-output-elastic_app_search
 └──  logstash-output-elastic_workplace_search
logstash-integration-jdbc
 ├── logstash-input-jdbc
 ├── logstash-filter-jdbc_streaming
 └── logstash-filter-jdbc_static
logstash-integration-kafka
 ├── logstash-input-kafka
 └── logstash-output-kafka
logstash-integration-rabbitmq
 ├── logstash-input-rabbitmq
 └── logstash-output-rabbitmq
logstash-output-amazon_es
logstash-output-cloudwatch
logstash-output-csv
logstash-output-elasticsearch
logstash-output-email
logstash-output-file
logstash-output-graphite
logstash-output-http
logstash-output-lumberjack
logstash-output-nagios
logstash-output-null
logstash-output-pipe
logstash-output-redis
logstash-output-s3
logstash-output-sns
logstash-output-sqs
logstash-output-stdout
logstash-output-tcp
logstash-output-udp
logstash-output-webhdfs
logstash-patterns-core

JVM (e.g. java -version):

  1. JVM version (java -version):
openjdk version "11.0.13" 2021-10-19
OpenJDK Runtime Environment Temurin-11.0.13+8 (build 11.0.13+8)
OpenJDK 64-Bit Server VM Temurin-11.0.13+8 (build 11.0.13+8, mixed mode)
  1. JVM installation :
    /usr/share/logstash/jdk/

OS version (uname -a if on a Unix-like system):
Linux siem-logstash-1 5.11.0-1028-aws #31~20.04.1-Ubuntu SMP Fri Jan 14 14:37:50 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:
When we scale down the number of pods in AWS the logstash process quits on error

Steps to reproduce:

Have two replicas of our logstash pods running using :
kubectl scale sts k8s-pod-name --replicas 2

Wait until the pods are running.

Scale down the pods using: kubectl scale sts k8s-pod-name --replicas 1

The logstash process quits on error unable to process our input queues into S3 buckets. The ingest are using multiple pipelines input, coming from AWS SQS, then pumping them into AWS S3.

The error:

logEvent.message
The shutdown process appears to be stalled due to busy or blocked plugins. Check the logs for more information.

loggerName
org.logstash.execution.ShutdownWatcherExt
** Logs **:
Ruby stack trace around the error:

"inflight_count"=>0,
"stalling_threads_info"=>
   {
       "other"=>
       [
        {
            "thread_id"=>91,
            "name"=>"[bullish-dev]<sqs",
            "current_call"=>"uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/protocol.rb:181:in `wait_readable'"
        },
        {
            "thread_id"=>94,
            "name"=>"[bullish-dev]<sqs",
            "current_call"=>"uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/protocol.rb:181:in `wait_readable'"
        }
       ],
       [
           "LogStash::Filters::Mutate",
           {
               "remove_field"=>["field_1", "field_2"],
               "id"=>"remove_useless_fields"
            }
            ]=>[
                {
                    "thread_id"=>77,
                    "name"=>"[bullish-dev]>worker0", "current_call"=>"[...]/logstash-core/lib/logstash/java_pipeline.rb:299:in `block in start_workers'"
                }
                ]
            }
        }
@yaauie
Copy link
Member

yaauie commented Feb 25, 2022

This one may need some back-and-forth in our discuss forums. Github issues are typically reserved for minimal reproductions of confirmed bugs.


In the warning message (not an error), we see that there are three (or more, it is truncated) threads still doing work.

When a Logstash process receives a shutdown signal, it immediately begins the process of shutting down: first, input plugins are told to stop, so that we prevent more work from getting into the queues. Next, any pipelines that are using a persistent queue and NOT configured to drain that queue are told to stop picking up work from their queues. Finally, we wait for the workers to finish up the work that they have already picked up off of their queue. This is the bit that can take a while and is really dependent on configuration and the shape of your pipeline. It is normal for the shutdown watcher to log a few messages during this timeline.

The first two indicate that two sqs input plugins in your bullish-dev that are waiting for a networking response. I looked over at the plugin repo and found an old bug indicating that it is possible for the AWS client's QueuePoller to internally loop when it doesn't receive any messages, which prevents it from seeing that it should be shut down. I've opened a PR to fix that (logstash-plugins/logstash-input-sqs#65), and will get it merged and released shortly. If this is your problem, we should have a fix out soon

@JPLachance
Copy link

Greetings,

After upgrading to the logstash-input-sqs 3.3.2 plugin (released today), Logstash stops within seconds instead of staying stuck forever! 🥳

That being said, we still have the "The shutdown process appears to be stalled due to busy or blocked plugins. Check the logs for more information." error message in our Logstash logs.

That error is then followed by:

{"inflight_count"=>0, "stalling_threads_info"=>{"other"=>[{"thread_id"=>80, "name"=>"[bullish-dev]<sqs", "current_call"=>"uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/net/protocol.rb:181:in `wait_readable'"}], ["LogStash::Filters::Json", {"source"=>"default", "id"=>"bcac8084c1248faa492ec71b7b274941aca0126c75b43de0bce303d00d671432"}]=>[{"thread_id"=>62, "name"=>"[bullish-dev]>worker0", "current_call"=>"[...]/logstash-core/lib/logstash/java_pipeline.rb:299:in `block in start_workers'"}]}}

Then we have Pipeline terminated for that specific pipeline a second later.

So, Logstash stops, but still logs error in the process.

Is it something we could address?

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants