Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-pick #19033 to 7.x: Auditbeat: Fixes for system/socket dataset #19079

Merged
merged 1 commit into from
Jun 9, 2020

Conversation

adriansr
Copy link
Contributor

@adriansr adriansr commented Jun 9, 2020

Cherry-pick of PR #19033 to 7.x branch. Original message:

What does this PR do?

Fixes two problems with the system/socket dataset:

  • A bug in the internal state of the socket dataset that lead to an infinite loop in systems were the kernel aggressively reuses sockets (observed in 2.6 / CentOS/RHEL 6.x).

  • Socket expiration wasn't working as expected due to it using an uninitialized timestamp: Flows were expiring at every check.

Also fixes other two minor issues:

  • A flow could be terminated twice by different code paths leading to wrong numFlows calculation and duplicated flows indexed.
  • Decoupled the status debug log and socket cleanup into separate goroutines so that logging is still performed under high load situations.

Why is it important?

It has been observed that the dataset would use 100% CPU and stop reporting events. During testing it was discovered that socket expiration, a new feature to prevent excessive memory usage, wasn't working as expected.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

The infinite loop is easy to trigger in RHEL 6.x by running:

nmap -n -sT 127.0.0.1 -p 1-65535

Fixes two problems with the system/socket dataset:

- A bug in the internal state of the socket dataset that lead to an infinite
  loop in systems were the kernel aggressively reuses sockets (observed
  in kernel 2.6 / CentOS/RHEL 6.x).
- Socket expiration wasn't working as expected due to it using an
  uninitialized timestamp: Flows were expiring at every check.

Also fixes other two minor issues:

- A flow could be terminated twice by different code paths leading to wrong
  numFlows calculation and duplicated flows indexed.
- Decoupled the status debug log and socket cleanup into separate goroutines
  so that logging is still performed under high load situations.

(cherry picked from commit 665b67f)
@adriansr adriansr requested a review from a team as a code owner June 9, 2020 16:07
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jun 9, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/siem (Team:SIEM)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jun 9, 2020
@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #19079 opened]

  • Start Time: 2020-06-09T16:08:04.720+0000

  • Duration: 40 min 34 sec

Test stats 🧪

Test Results
Failed 0
Passed 209
Skipped 33
Total 242

@adriansr adriansr merged commit 0997b5f into elastic:7.x Jun 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants