Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-pick #18564 to 7.8: [Autodiscover] Check if runner is already running before starting again #18689

Merged
merged 2 commits into from
May 21, 2020

Conversation

ChrsMark
Copy link
Member

Cherry-pick of PR #18564 to 7.8 branch. Original message:

What does this PR do?

This PR fixes runner reload so as not to start a new runner if a runner for the same configuration is already running. This can happen in Autodiscover if we have a container queued for termination and a new one with the very same configuration. This will lead into having 2 identical configurations in reload. The first one will be skipped but the second one will create new runner while the previous is still running. This is the tricky if/else block that cause this problem when we have 2 identical configurations:

if _, ok := stopList[hash]; ok {

For more information check the related Discuss topic: https://discuss.elastic.co/t/multiple-monitoring-cycles-after-recreating-docker-image/231565/9

Why is it important?

In case of autodiscovery catches a new start event will try to start a new runner without checking if a runner is already running. This will lead in overriding the in list of runner the old one with the new one without stoping the old one. The result will be to have 2 runners running (one will be orphan and untracked).

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

  1. Enable autodiscover:
metricbeat.autodiscover:
  providers:
    - type: docker
      templates:
        - condition:
            contains:
              docker.container.image: prometheus
          config:
            - module: prometheus
              metricsets: ["collector"]
              hosts: "${data.host}:${data.port}"
  1. Start Metricbeat: ./metricbeat -e -d "module,autodiscover"
  2. Start a container that matches the template using this docker-compose project: https://github.com/ChrsMark/docker-prometheus-playground
  3. Edit the Prometheus service by adding a new label on it:
  prometheus:
    labels:
    - "some=Some"
  1. Restart the service with docker-compose up -d
  2. Verify that no new runner start:
    There is no: 2020-05-15T08:25:01.563Z DEBUG [module] module/wrapper.go:127 Starting Wrapper[name=prometheus, len(metricSetWrappers)=1]
    And there is:
2020-05-21T07:47:11.487Z	DEBUG	[autodiscover]	cfgfile/list.go:62	Starting reload procedure, current runners: 1
2020-05-21T07:47:11.487Z	DEBUG	[autodiscover]	cfgfile/list.go:80	Start list: 0, Stop list: 0

Related issues

Discuss: https://discuss.elastic.co/t/multiple-monitoring-cycles-after-recreating-docker-image/231565/9

This might solve #12011 too.

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label May 21, 2020
@ChrsMark ChrsMark added Team:Integrations Label for the Integrations team Team:Platforms Label for the Integrations - Platforms team labels May 21, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label May 21, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-platforms (Team:Platforms)

@elasticmachine
Copy link
Collaborator

elasticmachine commented May 21, 2020

💚 Build Succeeded

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #18689 updated]

  • Start Time: 2020-05-21T11:37:46.727+0000

  • Duration: 74 min 16 sec

Test stats 🧪

Test Results
Failed 0
Passed 6612
Skipped 1053
Total 7665

Steps errors

Expand to view the steps failures

  • Name: Report to Codecov
    • Description: curl -sSLo codecov https://codecov.io/bash for i in auditbeat filebeat heartbeat libbeat metricbeat packetbeat winlogbeat journalbeat do FILE="${i}/build/coverage/full.cov" if [ -f "${FILE}" ]; then bash codecov -f "${FILE}" fi done

    • Duration: 2 min 22 sec

    • Start Time: 2020-05-21T12:14:44.622+0000

    • log

@ChrsMark ChrsMark merged commit e4c0fed into elastic:7.8 May 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport review Team:Integrations Label for the Integrations team Team:Platforms Label for the Integrations - Platforms team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants