You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the indices in Elasticsearch are configured as read_only (with any of index.blocks.read_only or index.blocks.write) for maintenance purposes, Filebeat is logging the following and dropping the data. There's no backoff mechanism in place and the data is lost.
Note that FB is raising a WARN and not even an error, when there's indeed data loss.
On the other hand, if the cluster is set in read_only mode (with cluster.blocks.read_only (cluster level setting)), all works as expected. This is logged by Filebeat and the backoff mechanism works fine. Data is not lost.
2019-09-04T07:35:57.549Z ERROR pipeline/output.go:121 Failed to publish events: 403 Forbidden: {"error":{"root_cause":[{"type":"cluster_block_exception","reason":"blocked by: [FORBIDDEN/6/cluster read-only (api)];"}],"type":"cluster_block_exception","reason":"blocked by: [FORBIDDEN/6/cluster read-only (api)];"},"status":403}
2019-09-04T07:35:57.596Z INFO elasticsearch/client.go:743 Attempting to connect to Elasticsearch version 7.2.0
2019-09-04T07:35:57.783Z INFO [index-management] idxmgmt/std.go:252 Auto ILM enable success.
2019-09-04T07:36:01.667Z ERROR pipeline/output.go:100 Failed to connect to backoff(elasticsearch(https://a073fd7d4f61450d97cdca4dba0cc4df.eu-west-1.aws.found.io:9243)): Connection marked as failed because the onConnect callback failed: failed to check for policy name 'filebeat-7.3.1': (status=403) {"error":{"root_cause":[{"type":"cluster_block_exception","reason":"blocked by: [FORBIDDEN/6/cluster read-only (api)];"}],"type":"cluster_block_exception","reason":"blocked by: [FORBIDDEN/6/cluster read-only (api)];"},"status":403}: 403 Forbidden: {"error":{"root_cause":[{"type":"cluster_block_exception","reason":"blocked by: [FORBIDDEN/6/cluster read-only (api)];"}],"type":"cluster_block_exception","reason":"blocked by: [FORBIDDEN/6/cluster read-only (api)];"},"status":403}
# Backoff works fine.
2019-09-04T07:39:00.163Z INFO pipeline/output.go:93 Attempting to reconnect to backoff(elasticsearch(https://a073fd7d4f61450d97cdca4dba0cc4df.eu-west-1.aws.found.io:9243)) with 7 reconnect attempt(s)
2019-09-04T07:39:00.163Z INFO [publisher] pipeline/retry.go:189 retryer: send unwait-signal to consumer
2019-09-04T07:39:00.163Z INFO [publisher] pipeline/retry.go:191 done
2019-09-04T07:39:00.163Z INFO [publisher] pipeline/retry.go:166 retryer: send wait signal to consumer
2019-09-04T07:39:00.163Z INFO [publisher] pipeline/retry.go:168 done
In the working scenario the message is raised as an ERROR (and there's no data loss). In the non-working scenario, there is data loss and Filebeat is logging just a WARN. Doesn't look accurate.
Version: Verified on 6.8 and 7.3
Operating System: Linux
Steps to Reproduce: Set destination index in read_only mode and back to normal a few minutes later to see how data is lost. Set cluster in read_only mode and back to normal a few minutes later to see how data is not lost.
As a final comment: in both described scenarios (working and not-working) the response from Elasticsearch is 403, but with different payload.
The text was updated successfully, but these errors were encountered:
I'm not familiar with this part of the code. Maybe we should have a separate status check for status == 403 in this case? Instead of nonIndexable, it should be counted as fails. @urso WDYT?
Most probably related: elastic/elasticsearch#49393
If Elasticsearch starts sending 429 instead of 403 in this case (index in read-only) then the back-off mechanism should work.
HTTP 4xx indicate a client error. Retrying on client errors can lead to Beats getting stuck, as the event might not be indexible. The only special 4xx status code indicating to the client to retry is 429 (well, there is the non-standard 449 status code, but we don't handle this).
We depend on Elasticsearch returning correct status codes for particular situations. Nothing we can/should do on the Beats side. This is what elastic/elasticsearch#49393 (and other related issues) is about: have Elasticsearch return more sensible status codes.
As there will be no change in Beats, I'm closing it. Please comment/object if you think otherwise.
When the indices in Elasticsearch are configured as read_only (with any of
index.blocks.read_only
orindex.blocks.write
) for maintenance purposes, Filebeat is logging the following and dropping the data. There's no backoff mechanism in place and the data is lost.Note that FB is raising a WARN and not even an error, when there's indeed data loss.
On the other hand, if the cluster is set in read_only mode (with
cluster.blocks.read_only
(cluster level setting)), all works as expected. This is logged by Filebeat and the backoff mechanism works fine. Data is not lost.In the working scenario the message is raised as an ERROR (and there's no data loss). In the non-working scenario, there is data loss and Filebeat is logging just a WARN. Doesn't look accurate.
As a final comment: in both described scenarios (working and not-working) the response from Elasticsearch is 403, but with different payload.
The text was updated successfully, but these errors were encountered: