Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filebeat autodiscovery for docker seems to miss collecting logs of crashed containers #10374

Closed
farodin91 opened this issue Jan 28, 2019 · 15 comments
Assignees
Labels
containers Related to containers use case Filebeat Filebeat review Team:Integrations Label for the Integrations team

Comments

@farodin91
Copy link

We are running a multi-node swarm. If services crashes and produces a log entry with the crash exception, these logs are not forward to our Logstash. Besides, we are able to see these logs with docker log.

Please include configurations and logs if available.

For confirmed bugs, please report:

  • Version: 6.5.4
  • Operating System: docker.elastic.co/beats/filebeat:6.5.1
  • Discuss Forum URL:
  • Steps to Reproduce:

filebeat.yml

logging.metrics.enabled: false

filebeat.registry_file: ${path.data}/registry

filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml

filebeat.autodiscover:
  providers:
    - type: docker
      hints.enabled: true

fields:
  env: ${swarm.environment}

output.logstash:
  hosts: ["${logstash.url}:${logstash.port}"]
  slow_start: true

docker-compose.yml

version: '3.2'

services:
  logstash:
    image: logstash_image
    volumes:
      - /usr/share/logstash/queue/:/usr/share/logstash/queue/
    deploy:
      mode: replicated
      replicas: 1

  filebeat:
    image: logstash_image
    volumes:
     - /var/lib/docker/containers/:/var/lib/docker/containers/:ro
     - /var/run/docker.sock:/var/run/docker.sock:ro
     - /usr/share/filebeat/data/:/usr/share/filebeat/data/
     - /etc/hostname:/etc/hostname:ro
     - /var/log/:/var/log/:ro
    environment:
      swarm.environment: develop
    deploy:
      mode: global

networks:
  default:

Which modules are you running?

Only system and docker autodiscover

Have you checked filebeat logs for errors?

There is one error which is already report and fixed in the master #9305

Have you checked if filebeat is reading the log file (registry file contains offset, log includes info message on Start/Stop of a harvester)?

I see only logs up to the registry position.

date level path message
2018-12-05T14:41:57.938Z INFO log/input.go:138 Configured paths: [/var/lib/docker/containers/02da23a669acb638c061b582999f0a9262e01fce1d2e2624ab745f22c2902b48/*.log]
2018-12-05T14:41:57.938Z INFO input/input.go:114 Starting input of type: docker; ID: 11189854344855006298
2018-12-05T14:41:57.938Z INFO log/harvester.go:254 Harvester started for file: /var/lib/docker/containers/02da23a669acb638c061b582999f0a9262e01fce1d2e2624ab745f22c2902b48/02da23a669acb638c061b582999f0a9262e01fce1d2e2624ab745f22c2902b48-json.log
2018-12-05T14:43:13.375Z INFO input/input.go:149 input ticker stopped
2018-12-05T14:43:13.375Z INFO input/input.go:167 Stopping Input: 11189854344855006298
2018-12-05T14:43:13.375Z INFO log/harvester.go:275 Reader was closed: /var/lib/docker/containers/02da23a669acb638c061b582999f0a9262e01fce1d2e2624ab745f22c2902b48/02da23a669acb638c061b582999f0a9262e01fce1d2e2624ab745f22c2902b48-json.log. Closing.

Why we are not seeing these logs? In logstash.

Copied from https://discuss.elastic.co/t/filebeat-autodiscovery-for-docker-seems-to-miss-collecting-logs-of-crashed-containers/159324/3

@ruflin ruflin added review Filebeat Filebeat containers Related to containers use case Team:Integrations Label for the Integrations team labels Jan 28, 2019
@alvarolobato
Copy link

@jsoriano can you have a look at this?, please

@jsoriano
Copy link
Member

jsoriano commented Feb 5, 2019

In kubernetes autodiscover the cleanup_timeout option is used to give some time to the inputs to end collecting logs. In docker we should add a similar option, if not the input can be stopped before the whole file has been read.

@jsoriano jsoriano self-assigned this Feb 5, 2019
@farodin91
Copy link
Author

@jsoriano Any progress? Do you need any information?

@jsoriano
Copy link
Member

@farodin91 I have given a quick try to add the cleanup_timeout option to docker autodiscover. With this, configurations are not removed until some time after the container has been stopped (defaults to 60s), so filebeat can have some time to collect logs after the container crashed.
It'd be good if you could give a try before merging to see if it solves your issues. You can find the patch in #10905

@farodin91
Copy link
Author

I will try it Monday.

@farodin91
Copy link
Author

It is possible to get a docker image to test this.

@jsoriano
Copy link
Member

@farodin91 I have pushed jsoriano/filebeat:6.5.4-10905-1 docker image with a build of 6.5.4 with this patch.

PR will need some work as there are some tests failing.

@farodin91
Copy link
Author

It works.
Thank you.

@jsoriano
Copy link
Member

@farodin91 thanks for testing it!

jsoriano added a commit that referenced this issue Feb 28, 2019
cleanup_timeout is used in kubernetes autodiscover to wait some time before the
configurations associated to stopped containers are removed. Add an equivalent
option to docker autodiscover.

Fix #10374
@farodin91
Copy link
Author

@jsoriano What version will contain the fix?

@jsoriano
Copy link
Member

jsoriano commented Mar 5, 2019

@farodin91 it is not included in any version yet, so I guess the first one with this will be 7.1.0.

@farodin91
Copy link
Author

Is there any release date for 7.1.0?

@jsoriano
Copy link
Member

jsoriano commented Mar 6, 2019

@farodin91 not yet, sorry.

But I am thinking now that we could backport this to 7.0 and 6.7, but disabled by default (with cleanup_timeout set to zero) so the default behaviour doesn't change but users affected by this like you can already start using it. Would it work for you?

@farodin91
Copy link
Author

This would work for me.
Thank you

jsoriano added a commit to jsoriano/beats that referenced this issue Mar 14, 2019
cleanup_timeout is used in kubernetes autodiscover to wait some time before the
configurations associated to stopped containers are removed. Add an equivalent
option to docker autodiscover.

Fix elastic#10374

(cherry picked from commit f771497)
jsoriano added a commit to jsoriano/beats that referenced this issue Mar 14, 2019
cleanup_timeout is used in kubernetes autodiscover to wait some time before the
configurations associated to stopped containers are removed. Add an equivalent
option to docker autodiscover.

Fix elastic#10374

(cherry picked from commit f771497)
jsoriano added a commit to jsoriano/beats that referenced this issue Mar 14, 2019
cleanup_timeout is used in kubernetes autodiscover to wait some time before the
configurations associated to stopped containers are removed. Add an equivalent
option to docker autodiscover.

Fix elastic#10374

(cherry picked from commit f771497)
jsoriano added a commit that referenced this issue Mar 14, 2019
…iscover (#11244)

cleanup_timeout is used in kubernetes autodiscover to wait some time before the
configurations associated to stopped containers are removed. Add an equivalent
option to docker autodiscover.

Fix #10374

(cherry picked from commit f771497)
jsoriano added a commit that referenced this issue Mar 14, 2019
…iscover (#11245)

cleanup_timeout is used in kubernetes autodiscover to wait some time before the
configurations associated to stopped containers are removed. Add an equivalent
option to docker autodiscover.

Fix #10374

(cherry picked from commit f771497)
@jsoriano
Copy link
Member

@farodin91 we have backported #10905 to 6.7 and 7.0. In 6.7 it will be disabled by default (configured with zero cleanup timeout). On this version you'll need to set cleanup_timeout: 60s to have the same behaviour as the default in 7.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
containers Related to containers use case Filebeat Filebeat review Team:Integrations Label for the Integrations team
Projects
None yet
Development

No branches or pull requests

4 participants