Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(inputs.mongodb): add an option to bypass connection errors on start #11629

Merged
merged 27 commits into from
Sep 7, 2022
Merged

fix(inputs.mongodb): add an option to bypass connection errors on start #11629

merged 27 commits into from
Sep 7, 2022

Conversation

papey
Copy link
Contributor

@papey papey commented Aug 6, 2022

Required for all PRs

resolves #10078

This PR replaces #10086.

Summary of changes :

  • Added an option ignore_unreachable_hosts to not return an error on init if a ping fails

@telegraf-tiger telegraf-tiger bot added the fix pr to fix corresponding bug label Aug 6, 2022
@Hipska Hipska requested review from sspaink and Hipska August 8, 2022 07:38
Copy link
Contributor

@Hipska Hipska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments about moving the connect to mongo from the Init to the Start method still counts.

@papey
Copy link
Contributor Author

papey commented Aug 8, 2022

My comments about moving the connect to mongo from the Init to the Start method still counts.

Oh yes, sorry. Will fix.

Edit: fixed

@sspaink sspaink changed the title Fix(inputs/mongodb): add an option to by pass connexions error on init fix(inputs.mongodb): add an option to by pass connexions error on init Aug 8, 2022
@papey
Copy link
Contributor Author

papey commented Aug 8, 2022

Here is a refactor, I did not find an example of the Start function, so not sure if correct. If not correct, can you provide some example ? Thanks a lot for the feedback.

@Hipska Hipska added area/mongodb plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins labels Aug 9, 2022
@Hipska
Copy link
Contributor

Hipska commented Aug 9, 2022

This is the interface: input.go

And these are some sample implementations:

func (s *SQL) Start(_ telegraf.Accumulator) error {
var err error
// Connect to the database server
s.Log.Debugf("Connecting to %q...", s.Dsn)
s.db, err = dbsql.Open(s.driverName, s.Dsn)
if err != nil {
return err
}
// Set the connection limits
// s.db.SetConnMaxIdleTime(time.Duration(s.MaxIdleTime)) // Requires go >= 1.15
s.db.SetConnMaxLifetime(time.Duration(s.MaxLifetime))
s.db.SetMaxOpenConns(s.MaxOpenConnections)
s.db.SetMaxIdleConns(s.MaxIdleConnections)
// Test if the connection can be established
s.Log.Debugf("Testing connectivity...")
ctx, cancel := context.WithTimeout(context.Background(), time.Duration(s.Timeout))
err = s.db.PingContext(ctx)
cancel()
if err != nil {
return fmt.Errorf("connecting to database failed: %v", err)
}
// Prepare the statements
for i, q := range s.Queries {
s.Log.Debugf("Preparing statement %q...", q.Query)
ctx, cancel := context.WithTimeout(context.Background(), time.Duration(s.Timeout))
stmt, err := s.db.PrepareContext(ctx, q.Query) //nolint:sqlclosecheck // Closed in Stop()
cancel()
if err != nil {
return fmt.Errorf("preparing query %q failed: %v", q.Query, err)
}
s.Queries[i].statement = stmt
}
return nil
}

func (m *MQTTConsumer) Start(acc telegraf.Accumulator) error {
m.state = Disconnected
m.acc = acc.WithTracking(m.MaxUndeliveredMessages)
m.sem = make(semaphore, m.MaxUndeliveredMessages)
m.ctx, m.cancel = context.WithCancel(context.Background())
m.client = m.clientFactory(m.opts)
// AddRoute sets up the function for handling messages. These need to be
// added in case we find a persistent session containing subscriptions so we
// know where to dispatch persisted and new messages to. In the alternate
// case that we need to create the subscriptions these will be replaced.
for _, topic := range m.Topics {
m.client.AddRoute(topic, m.recvMessage)
}
m.state = Connecting
return m.connect()
}

Copy link
Contributor

@Hipska Hipska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, but shouldn't the ping then also be done at the start of the gathering for the server, it will reduce the amount of errors every gather interval?

plugins/inputs/mongodb/mongodb.go Outdated Show resolved Hide resolved
plugins/inputs/mongodb/mongodb.go Outdated Show resolved Hide resolved
plugins/inputs/mongodb/mongodb.go Outdated Show resolved Hide resolved
plugins/inputs/mongodb/mongodb_server_test.go Outdated Show resolved Hide resolved
plugins/inputs/mongodb/README.md Outdated Show resolved Hide resolved
@papey
Copy link
Contributor Author

papey commented Aug 11, 2022

Looks good overall, but shouldn't the ping then also be done at the start of the gathering for the server, it will reduce the amount of errors every gather interval?

Sounds good to me but how do we want that ?

In the current implementation on master if one of the Mongo servers becomes unreachable after init/start errors will be written in the logs.

IMO, it's ok to attach the behavior you describe to the new option, WDYT ?

@Hipska
Copy link
Contributor

Hipska commented Aug 12, 2022

Yeah, so I was thinking, if this is enabled, then first ping the device and then only try to gather all the metrics if it does respond ok? Does that sounds reasonable?

@papey
Copy link
Contributor Author

papey commented Aug 12, 2022

LGTM, i will implement it asap !

@papey papey requested review from reimda and Hipska and removed request for reimda and Hipska August 23, 2022 16:59
@papey
Copy link
Contributor Author

papey commented Aug 23, 2022

There is just one behavior left we may want to discuss :

When Gather is called on an unreachable server it waits a very looong time before returning back an error. This is the current implementation and this PR do not change anything about it. We may want to change it in another PR. WDYT ?

Sorry for the the review requests mess, my internet was lagging and Github just goes crazy for no reason

Copy link
Contributor

@Hipska Hipska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean this is the current default behaviour? I think that's good, they can change to retry when they don't want that. But when thinking about that, "retry" might give wrong impression. I would expect telegraf to retry in the same poll interval. Maybe the value for this option should be "skip"?

Also, I don't get the new logic in Gather when behaviour is "retry". So seems like it returns and skips collection for the server if there is no error? That sounds a bit strange.

plugins/inputs/mongodb/mongodb.go Outdated Show resolved Hide resolved
@reimda
Copy link
Contributor

reimda commented Aug 24, 2022

When Gather is called on an unreachable server it waits a very looong time before returning back an error. This is the current implementation and this PR do not change anything about it. We may want to change it in another PR. WDYT ?

Sounds good to me to work on that in a future PR.

@papey
Copy link
Contributor Author

papey commented Aug 30, 2022

@Hipska I changed the wording from "retry" to "skip", I also find it more clear. Thanks.

@papey
Copy link
Contributor Author

papey commented Aug 30, 2022

Do you mean this is the current default behaviour? I think that's good, they can change to retry when they don't want that. But when thinking about that, "retry" might give wrong impression. I would expect telegraf to retry in the same poll interval. Maybe the value for this option should be "skip"?

Also, I don't get the new logic in Gather when behaviour is "retry". So seems like it returns and skips collection for the server if there is no error? That sounds a bit strange.

Sorry I didn't get your point, if behavior is skip, we ping, if ping fails, we drop fetch from the current server :

			if m.DisconnectedServersBehavior == "skip" {
				if err := srv.ping(); err != nil {
					return
				}
			}

@papey papey requested a review from Hipska August 30, 2022 15:13
@Hipska Hipska changed the title fix(inputs.mongodb): add an option to by pass connexions error on init fix(inputs.mongodb): add an option to bypass connection errors on start Aug 31, 2022
@Hipska
Copy link
Contributor

Hipska commented Aug 31, 2022

Sorry I didn't get your point, if behavior is skip, we ping, if ping fails, we drop fetch from the current server :

Nevermind, I was indeed looking wrong. Maybe we could add a debug log message with the content of the error before we return?

@telegraf-tiger
Copy link
Contributor

telegraf-tiger bot commented Sep 7, 2022

Download PR build artifacts for linux_amd64.tar.gz, darwin_amd64.tar.gz, and windows_amd64.zip.
Downloads for additional architectures and packages are available below.

🥳 This pull request decreases the Telegraf binary size by -1.04 % for linux amd64 (new size: 150.2 MB, nightly size 151.7 MB)

📦 Click here to get additional PR build artifacts

Artifact URLs

DEB RPM TAR GZ ZIP
amd64.deb aarch64.rpm darwin_amd64.tar.gz windows_amd64.zip
arm64.deb armel.rpm darwin_arm64.tar.gz windows_i386.zip
armel.deb armv6hl.rpm freebsd_amd64.tar.gz
armhf.deb i386.rpm freebsd_armv7.tar.gz
i386.deb ppc64le.rpm freebsd_i386.tar.gz
mips.deb riscv64.rpm linux_amd64.tar.gz
mipsel.deb s390x.rpm linux_arm64.tar.gz
ppc64el.deb x86_64.rpm linux_armel.tar.gz
riscv64.deb linux_armhf.tar.gz
s390x.deb linux_i386.tar.gz
linux_mips.tar.gz
linux_mipsel.tar.gz
linux_ppc64le.tar.gz
linux_riscv64.tar.gz
linux_s390x.tar.gz
static_linux_amd64.tar.gz

@sspaink sspaink merged commit e46f90e into influxdata:master Sep 7, 2022
powersj added a commit to powersj/telegraf that referenced this pull request Sep 20, 2022
The start method handler did not match the interface, nor was there a
stop function. As a result, start was never called and the plugin was
never setting up the servers to connect to and collect from correctly.

This was introduced in influxdata#11629.

fixes: influxdata#11830
dba-leshop pushed a commit to dba-leshop/telegraf that referenced this pull request Oct 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/mongodb fix pr to fix corresponding bug plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[inputs.mongodb] Telegraf crash on init if conn to MongoDB fails
4 participants