Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL Input plugin crashes at startup if server is unreachable #13282

Closed
neelayu opened this issue May 18, 2023 · 3 comments · Fixed by #13289
Closed

SQL Input plugin crashes at startup if server is unreachable #13282

neelayu opened this issue May 18, 2023 · 3 comments · Fixed by #13289
Labels
bug unexpected problem or unintended behavior help wanted Request for community participation, code, contribution size/m 2-4 day effort

Comments

@neelayu
Copy link
Contributor

neelayu commented May 18, 2023

Relevant telegraf.conf

[[inputs.sql]]
  dsn = "<server dsn>"
  driver = "sqlserver"
  [[inputs.sql.query]]
      ... query specific

Logs from Telegraf

2023-05-18T09:54:29Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"my-host", Flush Interval:10s
2023-05-18T09:54:44Z E! [telegraf] Error running agent: starting input inputs.sql: connecting to database failed: unable to open tcp connection with host '<redacted>': dial tcp redacted: i/o timeout
exit status 1

System info

master

Docker

No response

Steps to reproduce

  1. start telegraf with sample config containing sql input plugin containing unreachable server
  2. telegraf exits even if other inputs are valid

Expected behavior

Technically, only the plugin should fail(or exit), but this causes whole telegraf process to exit.

Actual behavior

Telgraf exits with code 1

Additional info

sqlserver input plugin doesn't error out. It simply accumulates the error.
Other plugins like vsphere try to address this behaviour in case of disconnected servers. #12828

@neelayu neelayu added the bug unexpected problem or unintended behavior label May 18, 2023
@neelayu
Copy link
Contributor Author

neelayu commented May 18, 2023

cc @goswamisandeep

@powersj
Copy link
Contributor

powersj commented May 18, 2023

Telgraf exits with code 1

This is expected behavior if you cannot connect to an input. For example, how do we know that the you did not typo the password, IP address, or other connection details. Additionally, if telegraf just starts up without an error it can lead users to think that data is getting collected successfully, when in fact it is not.

We have allowed on a per-plugin basis the addition of an option to let connections retry on start up. This is done via a setting that is disabled by default as you have seen in other places.

We would be happy to see a PR with this sort of change.

@powersj powersj added help wanted Request for community participation, code, contribution size/m 2-4 day effort labels May 18, 2023
@goswamisandeep
Copy link
Contributor

@powersj problem is that there are multiple sql servers that we are monitoring and for each instance we have specified config in different [[inputs.sql]] .

Now lets say there are 100 sql instances that we are monitoring and only one of them is unreachable , telegraf exits and does not monitor 99 healthy sql instances .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior help wanted Request for community participation, code, contribution size/m 2-4 day effort
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants