Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telegraf does not exit on Linux #1603

Closed
butitsnotme opened this issue Aug 8, 2016 · 1 comment
Closed

Telegraf does not exit on Linux #1603

butitsnotme opened this issue Aug 8, 2016 · 1 comment

Comments

@butitsnotme
Copy link
Contributor

Bug report

When Telegraf is being run as a Linux daemon (through either Systemd or SysvInit) when told to stop it does not. It must be manually killed using kill -9 . I've included my telegraf.conf below, but it also occurs on the default telegraf.conf. (The longest I've left it is overnight, so it doesn't seem to be waiting for the next collect cycle, it was still running ~16 after being told to stop).

When using Systemd (on Ubuntu) during normal operation it lists two proceses in the cgroup, /bin/sh and /usr/bin/telegraf, the first being the parent of the second. When systemctl stop telegraf is executed the first (/bin/sh) exits, but the second does not, it will still be running hours later.

Relevant telegraf.conf:

# Configuration for telegraf agent
[agent]
  ## Default data collection interval for all inputs
  interval = "10m"
  ## Rounds collection interval to 'interval'
  ## ie, if interval="10s" then always collect on :00, :10, :20, etc.
  round_interval = true

  ## Telegraf will send metrics to outputs in batches of at
  ## most metric_batch_size metrics.
  metric_batch_size = 1000
  ## For failed writes, telegraf will cache metric_buffer_limit metrics for each
  ## output, and will flush this buffer on a successful write. Oldest metrics
  ## are dropped first when this buffer fills.
  metric_buffer_limit = 10000

  ## Collection jitter is used to jitter the collection by a random amount.
  ## Each plugin will sleep for a random time within jitter before collecting.
  ## This can be used to avoid many plugins querying things like sysfs at the
  ## same time, which can have a measurable effect on the system.
  collection_jitter = "0s"

  ## Default flushing interval for all outputs. You shouldn't set this below
  ## interval. Maximum flush_interval will be flush_interval + flush_jitter
  flush_interval = "10s"
  ## Jitter the flush interval by a random amount. This is primarily to avoid
  ## large write spikes for users running a large number of telegraf instances.
  ## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
  flush_jitter = "0s"

  ## Run telegraf in debug mode
  debug = false
  ## Run telegraf in quiet mode
  quiet = false
  ## Override default hostname, if empty use os.Hostname()
  hostname = "<set but redacted>"
  ## If set to true, do no set the "host" tag in the telegraf agent.
  omit_hostname = false

System info:

Telegraf - version 0.13.2
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.1 LTS
Release:        16.04
Codename:       xenial

Steps to reproduce:

  1. Install telegraf on Linux (tested on Ubuntu 16.04, CentOS 6, CentOS 7)
  2. Start the service (using the normal OS mechanism)
  3. run ps aux | grep telegraf
  4. Observe that there is an sh process and a telegraf process (the sh process is the parent of the telegraf process)
  5. Stop the service (using the same OS tool)
  6. run ps aux | grep telegraf
  7. Observe that the telegraf process is still running
  8. Wait a while (30mins-1h should be enough for demonstration)
  9. Run ps aux | grep telegraf again
  10. Observe that it is still running.

Expected behavior:

Telegraf should exit within a few seconds of being stopped by the OS (at absolute most, the next time it runs the collect).

Actual behavior:

It continues running indefinitely.

Additional info:

There is nothing abnormal in the logs, they just show telegraf running...

@sparrc
Copy link
Contributor

sparrc commented Aug 8, 2016

this is fixed in 1.0, see #1252 & #1279

@sparrc sparrc closed this as completed Aug 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants