-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(inputs.mongodb): add an option to bypass connection errors on start #11629
fix(inputs.mongodb): add an option to bypass connection errors on start #11629
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments about moving the connect to mongo from the Init
to the Start
method still counts.
Oh yes, sorry. Will fix. Edit: fixed |
Here is a refactor, I did not find an example of the |
This is the interface: input.go And these are some sample implementations: telegraf/plugins/inputs/sql/sql.go Lines 354 to 392 in b07e94b
telegraf/plugins/inputs/mqtt_consumer/mqtt_consumer.go Lines 151 to 166 in b07e94b
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, but shouldn't the ping then also be done at the start of the gathering for the server, it will reduce the amount of errors every gather interval?
Sounds good to me but how do we want that ? In the current implementation on master if one of the Mongo servers becomes unreachable after init/start errors will be written in the logs. IMO, it's ok to attach the behavior you describe to the new option, WDYT ? |
Yeah, so I was thinking, if this is enabled, then first ping the device and then only try to gather all the metrics if it does respond ok? Does that sounds reasonable? |
LGTM, i will implement it asap ! |
There is just one behavior left we may want to discuss : When Sorry for the the review requests mess, my internet was lagging and Github just goes crazy for no reason |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean this is the current default behaviour? I think that's good, they can change to retry when they don't want that. But when thinking about that, "retry" might give wrong impression. I would expect telegraf to retry in the same poll interval. Maybe the value for this option should be "skip"?
Also, I don't get the new logic in Gather when behaviour is "retry". So seems like it returns and skips collection for the server if there is no error? That sounds a bit strange.
Sounds good to me to work on that in a future PR. |
@Hipska I changed the wording from "retry" to "skip", I also find it more clear. Thanks. |
Sorry I didn't get your point, if behavior is skip, we ping, if ping fails, we drop fetch from the current server : if m.DisconnectedServersBehavior == "skip" {
if err := srv.ping(); err != nil {
return
}
} |
Nevermind, I was indeed looking wrong. Maybe we could add a debug log message with the content of the error before we return? |
Download PR build artifacts for linux_amd64.tar.gz, darwin_amd64.tar.gz, and windows_amd64.zip. 🥳 This pull request decreases the Telegraf binary size by -1.04 % for linux amd64 (new size: 150.2 MB, nightly size 151.7 MB) 📦 Click here to get additional PR build artifactsArtifact URLs |
The start method handler did not match the interface, nor was there a stop function. As a result, start was never called and the plugin was never setting up the servers to connect to and collect from correctly. This was introduced in influxdata#11629. fixes: influxdata#11830
Required for all PRs
resolves #10078
This PR replaces #10086.
Summary of changes :
ignore_unreachable_hosts
to not return an error on init if a ping fails