Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index in read-only mode not handled properly by Filebeat #13844

Closed
eedugon opened this issue Sep 30, 2019 · 3 comments
Closed

Index in read-only mode not handled properly by Filebeat #13844

eedugon opened this issue Sep 30, 2019 · 3 comments
Assignees
Labels
Filebeat Filebeat Team:Integrations Label for the Integrations team

Comments

@eedugon
Copy link
Contributor

eedugon commented Sep 30, 2019

When the indices in Elasticsearch are configured as read_only (with any of index.blocks.read_only or index.blocks.write) for maintenance purposes, Filebeat is logging the following and dropping the data. There's no backoff mechanism in place and the data is lost.

Note that FB is raising a WARN and not even an error, when there's indeed data loss.

2019-09-04T08:11:12.384Z	WARN	elasticsearch/client.go:535	Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Time{wall:0x16e86bcd, ext:63703181468, loc:(*time.Location)(nil)}, Meta:common.MapStr{"pipeline":"filebeat-7.3.1-apache-access-default"}, Fields:common.MapStr{"agent":common.MapStr{"ephemeral_id":"ac7b7012-7ffa-4e74-9974-4ca6e5a7b58b", "hostname":"f10453d46724", "id":"2d08bdc4-174c-4a75-ae8b-dfa911d62ba4", "type":"filebeat", "version":"7.3.1"}, "container":common.MapStr{"id":"84e89d283c71af548cb3a1a4f96130520021cc4b681fa3fad3a8538afd26d71d", "image":common.MapStr{"name":"httpd:2.4"}, "name":"my-apache-app"}, "docker":common.MapStr{"container":common.MapStr{"labels":common.MapStr{"co_elastic_logs/fileset_stderr":"error", "co_elastic_logs/fileset_stdout":"access", "co_elastic_logs/module":"apache2", "co_elastic_metrics/hosts":"${data.host}:${data.port}", "co_elastic_metrics/metricsets":"status", "co_elastic_metrics/module":"apache"}}}, "ecs":common.MapStr{"version":"1.0.1"}, "event":common.MapStr{"dataset":"apache.access", "module":"apache"}, "fileset":common.MapStr{"name":"access"}, "host":common.MapStr{"name":"f10453d46724"}, "input":common.MapStr{"type":"container"}, "log":common.MapStr{"file":common.MapStr{"path":"/var/lib/docker/containers/84e89d283c71af548cb3a1a4f96130520021cc4b681fa3fad3a8538afd26d71d/84e89d283c71af548cb3a1a4f96130520021cc4b681fa3fad3a8538afd26d71d-json.log"}, "offset":3181}, "message":"172.17.0.1 - - [04/Sep/2019:08:11:08 +0000] \"GET /caca3 HTTP/1.1\" 404 196", "service":common.MapStr{"type":"apache"}, "stream":"stdout"}, Private:file.State{Id:"", Finished:false, Fileinfo:(*os.fileStat)(0xc0000a3380), Source:"/var/lib/docker/containers/84e89d283c71af548cb3a1a4f96130520021cc4b681fa3fad3a8538afd26d71d/84e89d283c71af548cb3a1a4f96130520021cc4b681fa3fad3a8538afd26d71d-json.log", Offset:3327, Timestamp:time.Time{wall:0xbf53f9f04f421fbe, ext:96510498560, loc:(*time.Location)(0x30d3080)}, TTL:-1, Type:"container", Meta:map[string]string(nil), FileStateOS:file.StateOS{Inode:0x361ad7, Device:0x801}}, TimeSeries:false}, Flags:0x1} (status=403): {"type":"cluster_block_exception","reason":"index [filebeat-7.3.1-2019.09.04-000001] blocked by: [FORBIDDEN/5/index read-only (api)];"}
2019-09-04T08:11:12.385Z	WARN	elasticsearch/client.go:535	Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Time{wall:0x16e86bcd, ext:63703181468, loc:(*time.Location)(nil)}, Meta:common.MapStr{"pipeline":"filebeat-7.3.1-apache-error-pipeline"}, Fields:common.MapStr{"agent":common.MapStr{"ephemeral_id":"ac7b7012-7ffa-4e74-9974-4ca6e5a7b58b", "hostname":"f10453d46724", "id":"2d08bdc4-174c-4a75-ae8b-dfa911d62ba4", "type":"filebeat", "version":"7.3.1"}, "container":common.MapStr{"id":"84e89d283c71af548cb3a1a4f96130520021cc4b681fa3fad3a8538afd26d71d", "image":common.MapStr{"name":"httpd:2.4"}, "name":"my-apache-app"}, "docker":common.MapStr{"container":common.MapStr{"labels":common.MapStr{"co_elastic_logs/fileset_stderr":"error", "co_elastic_logs/fileset_stdout":"access", "co_elastic_logs/module":"apache2", "co_elastic_metrics/hosts":"${data.host}:${data.port}", "co_elastic_metrics/metricsets":"status", "co_elastic_metrics/module":"apache"}}}, "ecs":common.MapStr{"version":"1.0.1"}, "event":common.MapStr{"dataset":"apache.error", "module":"apache"}, "fileset":common.MapStr{"name":"error"}, "host":common.MapStr{"name":"f10453d46724"}, "input":common.MapStr{"type":"container"}, "log":common.MapStr{"file":common.MapStr{"path":"/var/lib/docker/containers/84e89d283c71af548cb3a1a4f96130520021cc4b681fa3fad3a8538afd26d71d/84e89d283c71af548cb3a1a4f96130520021cc4b681fa3fad3a8538afd26d71d-json.log"}, "offset":3181}, "message":"172.17.0.1 - - [04/Sep/2019:08:11:08 +0000] \"GET /caca3 HTTP/1.1\" 404 196", "service":common.MapStr{"type":"apache"}, "stream":"stdout"}, Private:file.State{Id:"", Finished:false, Fileinfo:(*os.fileStat)(0xc0001dd860), Source:"/var/lib/docker/containers/84e89d283c71af548cb3a1a4f96130520021cc4b681fa3fad3a8538afd26d71d/84e89d283c71af548cb3a1a4f96130520021cc4b681fa3fad3a8538afd26d71d-json.log", Offset:3327, Timestamp:time.Time{wall:0xbf53f9f04f473c6f, ext:96510833585, loc:(*time.Location)(0x30d3080)}, TTL:-1, Type:"container", Meta:map[string]string(nil), FileStateOS:file.StateOS{Inode:0x361ad7, Device:0x801}}, TimeSeries:false}, Flags:0x1} (status=403): {"type":"cluster_block_exception","reason":"index [filebeat-7.3.1-2019.09.04-000001] blocked by: [FORBIDDEN/5/index read-only (api)];"}

On the other hand, if the cluster is set in read_only mode (with cluster.blocks.read_only (cluster level setting)), all works as expected. This is logged by Filebeat and the backoff mechanism works fine. Data is not lost.

2019-09-04T07:35:57.549Z	ERROR	pipeline/output.go:121	Failed to publish events: 403 Forbidden: {"error":{"root_cause":[{"type":"cluster_block_exception","reason":"blocked by: [FORBIDDEN/6/cluster read-only (api)];"}],"type":"cluster_block_exception","reason":"blocked by: [FORBIDDEN/6/cluster read-only (api)];"},"status":403}


2019-09-04T07:35:57.596Z	INFO	elasticsearch/client.go:743	Attempting to connect to Elasticsearch version 7.2.0
2019-09-04T07:35:57.783Z	INFO	[index-management]	idxmgmt/std.go:252	Auto ILM enable success.
2019-09-04T07:36:01.667Z	ERROR	pipeline/output.go:100	Failed to connect to backoff(elasticsearch(https://a073fd7d4f61450d97cdca4dba0cc4df.eu-west-1.aws.found.io:9243)): Connection marked as failed because the onConnect callback failed: failed to check for policy name 'filebeat-7.3.1': (status=403) {"error":{"root_cause":[{"type":"cluster_block_exception","reason":"blocked by: [FORBIDDEN/6/cluster read-only (api)];"}],"type":"cluster_block_exception","reason":"blocked by: [FORBIDDEN/6/cluster read-only (api)];"},"status":403}: 403 Forbidden: {"error":{"root_cause":[{"type":"cluster_block_exception","reason":"blocked by: [FORBIDDEN/6/cluster read-only (api)];"}],"type":"cluster_block_exception","reason":"blocked by: [FORBIDDEN/6/cluster read-only (api)];"},"status":403}

# Backoff works fine.

2019-09-04T07:39:00.163Z	INFO	pipeline/output.go:93	Attempting to reconnect to backoff(elasticsearch(https://a073fd7d4f61450d97cdca4dba0cc4df.eu-west-1.aws.found.io:9243)) with 7 reconnect attempt(s)
2019-09-04T07:39:00.163Z	INFO	[publisher]	pipeline/retry.go:189	retryer: send unwait-signal to consumer
2019-09-04T07:39:00.163Z	INFO	[publisher]	pipeline/retry.go:191	  done
2019-09-04T07:39:00.163Z	INFO	[publisher]	pipeline/retry.go:166	retryer: send wait signal to consumer
2019-09-04T07:39:00.163Z	INFO	[publisher]	pipeline/retry.go:168	  done

In the working scenario the message is raised as an ERROR (and there's no data loss). In the non-working scenario, there is data loss and Filebeat is logging just a WARN. Doesn't look accurate.

  • Version: Verified on 6.8 and 7.3
  • Operating System: Linux
  • Steps to Reproduce: Set destination index in read_only mode and back to normal a few minutes later to see how data is lost. Set cluster in read_only mode and back to normal a few minutes later to see how data is not lost.

As a final comment: in both described scenarios (working and not-working) the response from Elasticsearch is 403, but with different payload.

@kaiyan-sheng
Copy link
Contributor

kaiyan-sheng commented Nov 18, 2019

Hmmm seems like the problem is caused by code https://github.com/elastic/beats/blob/master/libbeat/outputs/elasticsearch/client.go#L517

I'm not familiar with this part of the code. Maybe we should have a separate status check for status == 403 in this case? Instead of nonIndexable, it should be counted as fails. @urso WDYT?

@urso urso self-assigned this Dec 3, 2019
@eedugon
Copy link
Contributor Author

eedugon commented Dec 4, 2019

Most probably related: elastic/elasticsearch#49393
If Elasticsearch starts sending 429 instead of 403 in this case (index in read-only) then the back-off mechanism should work.

@urso
Copy link

urso commented Dec 4, 2019

HTTP 4xx indicate a client error. Retrying on client errors can lead to Beats getting stuck, as the event might not be indexible. The only special 4xx status code indicating to the client to retry is 429 (well, there is the non-standard 449 status code, but we don't handle this).

We depend on Elasticsearch returning correct status codes for particular situations. Nothing we can/should do on the Beats side. This is what elastic/elasticsearch#49393 (and other related issues) is about: have Elasticsearch return more sensible status codes.

As there will be no change in Beats, I'm closing it. Please comment/object if you think otherwise.

@urso urso closed this as completed Dec 4, 2019
@andresrc andresrc added the Team:Integrations Label for the Integrations team label Mar 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Filebeat Filebeat Team:Integrations Label for the Integrations team
Projects
None yet
Development

No branches or pull requests

4 participants