-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][io] Update Elasticsearch sink idle cnx timeout to 30s #19377
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -83,7 +83,6 @@ public RestClient(ElasticSearchConfig elasticSearchConfig, BulkProcessor.Listene | |
// idle+expired connection evictor thread | ||
this.executorService = Executors.newSingleThreadScheduledExecutor(); | ||
this.executorService.scheduleAtFixedRate(() -> { | ||
configCallback.connectionManager.closeExpiredConnections(); | ||
configCallback.connectionManager.closeIdleConnections( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Q: Is it required at all to evict idle connections? I wonder what's wrong with long lived connection that has a life cycle coupled with that of the sink instance. If it is not required, we could drop the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good question. The motivation of this PR assumes that closing these connections is necessary, however, I am not sure that it is. The fundamental risk is that something between the client and the server closes the connection. In my mind, the canonical example is a network load balancer with a 4 or 5 minute timeout. Closing expired and idle connections is one solution to prevent such errors due to inactivity. While troubleshooting the underlying behavior this PR aims to fix, I came across elastic/elasticsearch#65213, which indicates that an alternative solution is to enable socket keepalives and to decrease the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After re-reading that elasticsearch issue, it could be reasonable to move in the direction of enabling tcp keep-alives. At the very least, I think we should merge this and fix the existing default values. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@michaeljmarshall On managed cloud k8s environments, the OS settings are already properly tuned. related comment: #14841 (comment) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @lhotari - that was not the case in the AKS cluster that I was testing with as of yesterday. When I tried to override the settings using https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/#setting-sysctls-for-a-pod, I got an error because overriding |
||
config.getConnectionIdleTimeoutInMs(), TimeUnit.MILLISECONDS); | ||
}, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this config should be validated against bulkFlushIntervalInMs when bulk API is enabled - something like
connectionIdleTimeoutInMs > 2 * bulkFlushIntervalInMs
because it seems the connection will set idle by design in-between flushesThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be fine to let the connection get closed in that scenario. My main goal here is to make sure we have working defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I understand both comments are probably outside scope of this PR: First step is to have working defaults, and later maybe make them foolproof...