Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-pick #24742 to 7.x: Refactor kubernetes autodiscover to avoid skipping short-living pods #25167

Merged
merged 1 commit into from
Apr 20, 2021

Conversation

jsoriano
Copy link
Member

@jsoriano jsoriano commented Apr 20, 2021

Cherry-pick of PR #24742 to 7.x branch. Original message:

What does this PR do?

Refactor logic in kubernetes autodiscover that decides when to generate events to try to address #22718.

Kubernetes autodiscover can generate events without network information now (without host or port/ports). This allows to generate events for pods that haven't started yet, or have succeeded/failed before generating a running event. These events still include the container id, so they can be used to collect logs. Still, no start event is generated if no pod ip and no container ids are available.

Some helpers have been added to obtain relevant information from pods and their containers.

Some additional small refactors are done to improve readability.

Why is it important?

Current logic is checking similar things at the pod and container levels, try to simplify this logic focusing in containers only.

No matter what is the state of the pod, if there is a container running or trying to run, even if it is unhealthy, some configuration should be generated, so logs can be collected.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works. But more tests would be needed.
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • Fix update so it definitely stops configurations, respecting the cleanup timeout.
  • Double-check that nothing breaks with events without data.host.
    This is logged at the debug level:
    2021-04-08T12:53:34.453+0200	DEBUG	[autodiscover]	template/config.go:156	Configuration template cannot be resolved: field 'data.host' not available in event or environment accessing 'hosts' (source:'/home/jaime/tmp/metricbeat-autodiscover-k8s.yml')
    
  • Actually check that the refactor solves the issues and doesn't break anything else.
  • Try to add more testing. A couple of cases added to current tests.
  • Check that nothing breaks with hints including data.host or data.ports.
    • data.ports.* doesn't seem to be working, check if this is a regression. Named ports were not added to container events. They are now.
    • Multiple configurations generated for pods with multiple containers, expected? compare with master. False alarm.
  • Check autodiscover with heartbeat.

Author's notes for the future

  • When using autodiscover templates, if the condition matches pod conditions, container events will match too because they also include pod metadata. In pods with multiple containers this may lead to events with metadata of the incorrect container. This could be possibly avoided by matching per pod metadata only in the pod events. This is also happening in master, not solving in this PR.
  • data.ports is not included in container-level events, so it doesn't work when conditions for specific containers are used.

How to test this PR locally

  • Check that everything keeps running with some autodiscover happy cases.
  • Check that logs are collected from a pod in CrashLoopBackOff (kubectl run --image=redis redis -- echo foo).

Related issues

Use cases

Collect logs from containers in short-living or failing pods.

…lastic#24742)

Refactor logic in kubernetes autodiscover that decides when
to generate events to try to address issues with short-living
containers.

Kubernetes autodiscover can generate events without network
information now (without host or port/ports). This allows to generate
events for pods that haven't started yet, or have succeeded/failed
before generating a running event. These events still include the
container id, so they can be used to collect logs. Still, no start event
is generated if no pod ip and no container ids are available.

Some helpers have been added to obtain relevant information from
pods and their containers.

Some additional small refactors are done to improve readability.

(cherry picked from commit b4b6e6e)
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Apr 20, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Apr 20, 2021
@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #25167 opened

  • Start Time: 2021-04-20T08:58:56.446+0000

  • Duration: 158 min 47 sec

  • Commit: 0a8c5b9

Test stats 🧪

Test Results
Failed 0
Passed 46992
Skipped 5176
Total 52168

Trends 🧪

Image of Build Times

Image of Tests

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 46992
Skipped 5176
Total 52168

@jsoriano jsoriano merged commit 2f4b0ce into elastic:7.x Apr 20, 2021
@zube zube bot added [zube]: Done and removed [zube]: Inbox labels Apr 20, 2021
@jsoriano jsoriano deleted the backport_24742_7.x branch April 20, 2021 13:23
@zube zube bot removed the [zube]: Done label Jul 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport Team:Integrations Label for the Integrations team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants