Filebeat autodiscovery for docker seems to miss collecting logs of crashed containers #10374

farodin91 · 2019-01-28T14:01:04Z

We are running a multi-node swarm. If services crashes and produces a log entry with the crash exception, these logs are not forward to our Logstash. Besides, we are able to see these logs with docker log.

Please include configurations and logs if available.

For confirmed bugs, please report:

Version: 6.5.4
Operating System: docker.elastic.co/beats/filebeat:6.5.1
Discuss Forum URL:
Steps to Reproduce:

filebeat.yml

logging.metrics.enabled: false

filebeat.registry_file: ${path.data}/registry

filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml

filebeat.autodiscover:
  providers:
    - type: docker
      hints.enabled: true

fields:
  env: ${swarm.environment}

output.logstash:
  hosts: ["${logstash.url}:${logstash.port}"]
  slow_start: true

docker-compose.yml

version: '3.2'

services:
  logstash:
    image: logstash_image
    volumes:
      - /usr/share/logstash/queue/:/usr/share/logstash/queue/
    deploy:
      mode: replicated
      replicas: 1

  filebeat:
    image: logstash_image
    volumes:
     - /var/lib/docker/containers/:/var/lib/docker/containers/:ro
     - /var/run/docker.sock:/var/run/docker.sock:ro
     - /usr/share/filebeat/data/:/usr/share/filebeat/data/
     - /etc/hostname:/etc/hostname:ro
     - /var/log/:/var/log/:ro
    environment:
      swarm.environment: develop
    deploy:
      mode: global

networks:
  default:

Which modules are you running?

Only system and docker autodiscover

Have you checked filebeat logs for errors?

There is one error which is already report and fixed in the master #9305

Have you checked if filebeat is reading the log file (registry file contains offset, log includes info message on Start/Stop of a harvester)?

I see only logs up to the registry position.

date	level	path	message
2018-12-05T14:41:57.938Z	INFO	log/input.go:138	Configured paths: [/var/lib/docker/containers/02da23a669acb638c061b582999f0a9262e01fce1d2e2624ab745f22c2902b48/*.log]
2018-12-05T14:41:57.938Z	INFO	input/input.go:114	Starting input of type: docker; ID: 11189854344855006298
2018-12-05T14:41:57.938Z	INFO	log/harvester.go:254	Harvester started for file: /var/lib/docker/containers/02da23a669acb638c061b582999f0a9262e01fce1d2e2624ab745f22c2902b48/02da23a669acb638c061b582999f0a9262e01fce1d2e2624ab745f22c2902b48-json.log
2018-12-05T14:43:13.375Z	INFO	input/input.go:149	input ticker stopped
2018-12-05T14:43:13.375Z	INFO	input/input.go:167	Stopping Input: 11189854344855006298
2018-12-05T14:43:13.375Z	INFO	log/harvester.go:275	Reader was closed: /var/lib/docker/containers/02da23a669acb638c061b582999f0a9262e01fce1d2e2624ab745f22c2902b48/02da23a669acb638c061b582999f0a9262e01fce1d2e2624ab745f22c2902b48-json.log. Closing.

Why we are not seeing these logs? In logstash.

Copied from https://discuss.elastic.co/t/filebeat-autodiscovery-for-docker-seems-to-miss-collecting-logs-of-crashed-containers/159324/3

alvarolobato · 2019-02-04T14:32:28Z

@jsoriano can you have a look at this?, please

jsoriano · 2019-02-05T13:48:00Z

In kubernetes autodiscover the cleanup_timeout option is used to give some time to the inputs to end collecting logs. In docker we should add a similar option, if not the input can be stopped before the whole file has been read.

farodin91 · 2019-02-21T15:29:53Z

@jsoriano Any progress? Do you need any information?

jsoriano · 2019-02-22T17:38:58Z

@farodin91 I have given a quick try to add the cleanup_timeout option to docker autodiscover. With this, configurations are not removed until some time after the container has been stopped (defaults to 60s), so filebeat can have some time to collect logs after the container crashed.
It'd be good if you could give a try before merging to see if it solves your issues. You can find the patch in #10905

farodin91 · 2019-02-23T15:37:48Z

I will try it Monday.

farodin91 · 2019-02-25T12:45:10Z

It is possible to get a docker image to test this.

jsoriano · 2019-02-27T17:26:34Z

@farodin91 I have pushed jsoriano/filebeat:6.5.4-10905-1 docker image with a build of 6.5.4 with this patch.

PR will need some work as there are some tests failing.

farodin91 · 2019-02-28T09:18:51Z

It works.
Thank you.

jsoriano · 2019-02-28T14:07:07Z

@farodin91 thanks for testing it!

cleanup_timeout is used in kubernetes autodiscover to wait some time before the configurations associated to stopped containers are removed. Add an equivalent option to docker autodiscover. Fix #10374

farodin91 · 2019-03-05T11:57:13Z

@jsoriano What version will contain the fix?

jsoriano · 2019-03-05T12:13:30Z

@farodin91 it is not included in any version yet, so I guess the first one with this will be 7.1.0.

farodin91 · 2019-03-06T10:53:52Z

Is there any release date for 7.1.0?

jsoriano · 2019-03-06T12:10:30Z

@farodin91 not yet, sorry.

But I am thinking now that we could backport this to 7.0 and 6.7, but disabled by default (with cleanup_timeout set to zero) so the default behaviour doesn't change but users affected by this like you can already start using it. Would it work for you?

farodin91 · 2019-03-06T20:00:31Z

This would work for me.
Thank you

cleanup_timeout is used in kubernetes autodiscover to wait some time before the configurations associated to stopped containers are removed. Add an equivalent option to docker autodiscover. Fix elastic#10374 (cherry picked from commit f771497)

…iscover (#11244) cleanup_timeout is used in kubernetes autodiscover to wait some time before the configurations associated to stopped containers are removed. Add an equivalent option to docker autodiscover. Fix #10374 (cherry picked from commit f771497)

…iscover (#11245) cleanup_timeout is used in kubernetes autodiscover to wait some time before the configurations associated to stopped containers are removed. Add an equivalent option to docker autodiscover. Fix #10374 (cherry picked from commit f771497)

jsoriano · 2019-03-14T15:19:30Z

@farodin91 we have backported #10905 to 6.7 and 7.0. In 6.7 it will be disabled by default (configured with zero cleanup timeout). On this version you'll need to set cleanup_timeout: 60s to have the same behaviour as the default in 7.0.

ruflin added review Filebeat Filebeat containers Related to containers use case Team:Integrations Label for the Integrations team labels Jan 28, 2019

jsoriano self-assigned this Feb 5, 2019

jsoriano mentioned this issue Feb 22, 2019

Add cleanup_timeout option to docker autodiscover #10905

Merged

jsoriano closed this as completed in #10905 Feb 28, 2019

This was referenced Mar 14, 2019

Cherry-pick #10905 to 7.0: Add cleanup_timeout option to docker autodiscover #11244

Merged

Cherry-pick #10905 to 6.7: Add cleanup_timeout option to docker autodiscover #11245

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filebeat autodiscovery for docker seems to miss collecting logs of crashed containers #10374

Filebeat autodiscovery for docker seems to miss collecting logs of crashed containers #10374

farodin91 commented Jan 28, 2019

alvarolobato commented Feb 4, 2019

jsoriano commented Feb 5, 2019

farodin91 commented Feb 21, 2019

jsoriano commented Feb 22, 2019

farodin91 commented Feb 23, 2019

farodin91 commented Feb 25, 2019

jsoriano commented Feb 27, 2019

farodin91 commented Feb 28, 2019

jsoriano commented Feb 28, 2019

farodin91 commented Mar 5, 2019

jsoriano commented Mar 5, 2019

farodin91 commented Mar 6, 2019

jsoriano commented Mar 6, 2019

farodin91 commented Mar 6, 2019

jsoriano commented Mar 14, 2019

Filebeat autodiscovery for docker seems to miss collecting logs of crashed containers #10374

Filebeat autodiscovery for docker seems to miss collecting logs of crashed containers #10374

Comments

farodin91 commented Jan 28, 2019

alvarolobato commented Feb 4, 2019

jsoriano commented Feb 5, 2019

farodin91 commented Feb 21, 2019

jsoriano commented Feb 22, 2019

farodin91 commented Feb 23, 2019

farodin91 commented Feb 25, 2019

jsoriano commented Feb 27, 2019

farodin91 commented Feb 28, 2019

jsoriano commented Feb 28, 2019

farodin91 commented Mar 5, 2019

jsoriano commented Mar 5, 2019

farodin91 commented Mar 6, 2019

jsoriano commented Mar 6, 2019

farodin91 commented Mar 6, 2019

jsoriano commented Mar 14, 2019