Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Auditbeat] Reduce system/socket logging noise #15890

Merged
merged 4 commits into from
Feb 4, 2020

Conversation

adriansr
Copy link
Contributor

@adriansr adriansr commented Jan 28, 2020

What does this PR do?

This PR updates the logging of system/socket dataset to be less noisy, especially when -d '*' is used. Also updates some code that could introduce a difficult to spot bug in the future.

  • Make ThreadEnter errors less noisy

    The socket dataset uses ThreadEnter(event) / ThreadLeave(event) as a
    single-event per-thread state tracking, useful for correlating function
    calls and their return values.

    However, in some cases functions are stacked, like sys_execve() calling
    itself recursively or inet6_create() calling inet_create(). This results
    in an existing stored event to be evicted, which is not a problem but
    currently is causing a warning to be printed to the logs.

    This patch makes two changes to this situation:

    • Only print warnings from the state machine when socketdetailed selector
      is enabled. The state machine currently only generates warnings for
      ThreadEnter/ThreadExit issues.

    • Change ThreadEnter errors to be constructed on demand by their Error()
      method, so that the somewhat expensive fmt.Sprintf() / event.String()
      is only invoked if the error is going to be printed to the log.

    This is a huge CPU saving in systems where this benign eviction is
    happening frequently.

  • socketdetailed selector has to be enabled explicitly

    This selector is extremely noisy. This change excludes it from being
    enabled when debug is enabled with -d '*' and requires it to be
    explicitly defined: -d '*,socketdetailed'.

  • Print template variables when socket debug is enabled

    It makes no sense to print guesses' progress in regular debug (socket)
    and the resulting template variables only when socketdetailed is set.

  • Fix syscall arguments usage in guesses

    The syscall parameter templates (SYS_Pn) were not valid until
    guess_syscall_args runs. As the variables SYS_Pn already existed,
    another guess using them could use the wrong values because the
    dependency mechanism only checks if the variable exists.

    The fix for this is to have temporary variables (_SYS_Pn) and have
    guess_syscall_args create the definitive ones.

    This didn't cause any bug as the only guess that used syscall
    args is the new guess_deref which is enabled on demand via an
    environment variable for diagnostic purposes.

Why is it important?

Two reasons:

  • In some cases this dataset would generate noisy and misleading messages in the logs at INFO level. This causes unnecessary concern and increased CPU usage.
  • The socketdetailed selector being activated by default when -d '*' is specified caused the logs to be full of kprobe events at rates of 10.000 per second or more. Log files would rotate every few minutes.

Checklist

  • My code follows the style guidelines of this project

  • I have commented my code, particularly in hard-to-understand areas

  • I have made corresponding changes to the documentation

  • I have made corresponding change to the default configuration files

  • I have added tests that prove my fix is effective or that my feature works

  • I haven't added a changelog entry because I don't think these changes have entity to deserve it.

The socket dataset uses ThreadEnter(event) / ThreadLeave(event) as a
single-event per-thread state tracking, useful for correlating function
calls and their return values.

However, in some cases functions are stacked, like sys_execve() calling
itself recursively or inet6_create() calling inet_create(). This results
in an existing stored event to be evicted, which is not a problem but
currently is causing a warning to be printed to the logs.

This patch makes two changes to this situation:
- Only print warnings from the state machine when socketdetailed selector
  is enabled. The state machine currently only generates warnings for
  ThreadEnter/ThreadExit issues.

- Change ThreadEnter errors to be constructed on demand by their Error()
  method, so that the somewhat expensive fmt.Sprintf() / event.String()
  is only invoked if the error is going to be printed to the log.

This is a huge CPU saving in systems where this benign eviction is
happening frequently.
This selector is extremely noisy. This change excludes it from being
enabled when debug is enabled with `-d '*'` and requires it to be
explicitly defined: `-d '*,socketdetailed'`.
It makes no sense to print guesses' progress in regular debug (socket)
and the resulting template variables only when socketdetailed is set.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/siem (Team:SIEM)

The syscall parameter templates (`SYS_Pn`) were not valid until
guess_syscall_args runs. As the variables `SYS_Pn` already existed,
another guess using them could use the wrong values because the
dependency mechanism only checks if the variable exists.

The fix for this is to have temporary variables (`_SYS_Pn`) and have
guess_syscall_args create the definitive ones.

This didn't cause any bug as the only guess that used syscall
args is the new guess_deref which is enabled on demand via an
environment variable for diagnostic purposes.
Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@adriansr adriansr merged commit 32a3cde into elastic:master Feb 4, 2020
@adriansr adriansr added the needs_backport PR is waiting to be backported to other branches. label Mar 19, 2020
adriansr added a commit to adriansr/beats that referenced this pull request Mar 19, 2020
* Make ThreadEnter errors less noisy

The socket dataset uses ThreadEnter(event) / ThreadLeave(event) as a
single-event per-thread state tracking, useful for correlating function
calls and their return values.

However, in some cases functions are stacked, like sys_execve() calling
itself recursively or inet6_create() calling inet_create(). This results
in an existing stored event to be evicted, which is not a problem but
currently is causing a warning to be printed to the logs.

This patch makes two changes to this situation:
- Only print warnings from the state machine when socketdetailed selector
  is enabled. The state machine currently only generates warnings for
  ThreadEnter/ThreadExit issues.

- Change ThreadEnter errors to be constructed on demand by their Error()
  method, so that the somewhat expensive fmt.Sprintf() / event.String()
  is only invoked if the error is going to be printed to the log.

This is a huge CPU saving in systems where this benign eviction is
happening frequently.

* socketdetailed selector has to be enabled explicitly

This selector is extremely noisy. This change excludes it from being
enabled when debug is enabled with `-d '*'` and requires it to be
explicitly defined: `-d '*,socketdetailed'`.

* Print template variables when socket debug is enabled

It makes no sense to print guesses' progress in regular debug (socket)
and the resulting template variables only when socketdetailed is set.

* Fix syscall arguments usage in guesses

The syscall parameter templates (`SYS_Pn`) were not valid until
guess_syscall_args runs. As the variables `SYS_Pn` already existed,
another guess using them could use the wrong values because the
dependency mechanism only checks if the variable exists.

The fix for this is to have temporary variables (`_SYS_Pn`) and have
guess_syscall_args create the definitive ones.

This didn't cause any bug as the only guess that used syscall
args is the new guess_deref which is enabled on demand via an
environment variable for diagnostic purposes.

(cherry picked from commit 32a3cde)
@adriansr adriansr added v7.7.0 and removed needs_backport PR is waiting to be backported to other branches. labels Mar 19, 2020
adriansr added a commit that referenced this pull request Mar 19, 2020
* Make ThreadEnter errors less noisy

The socket dataset uses ThreadEnter(event) / ThreadLeave(event) as a
single-event per-thread state tracking, useful for correlating function
calls and their return values.

However, in some cases functions are stacked, like sys_execve() calling
itself recursively or inet6_create() calling inet_create(). This results
in an existing stored event to be evicted, which is not a problem but
currently is causing a warning to be printed to the logs.

This patch makes two changes to this situation:
- Only print warnings from the state machine when socketdetailed selector
  is enabled. The state machine currently only generates warnings for
  ThreadEnter/ThreadExit issues.

- Change ThreadEnter errors to be constructed on demand by their Error()
  method, so that the somewhat expensive fmt.Sprintf() / event.String()
  is only invoked if the error is going to be printed to the log.

This is a huge CPU saving in systems where this benign eviction is
happening frequently.

* socketdetailed selector has to be enabled explicitly

This selector is extremely noisy. This change excludes it from being
enabled when debug is enabled with `-d '*'` and requires it to be
explicitly defined: `-d '*,socketdetailed'`.

* Print template variables when socket debug is enabled

It makes no sense to print guesses' progress in regular debug (socket)
and the resulting template variables only when socketdetailed is set.

* Fix syscall arguments usage in guesses

The syscall parameter templates (`SYS_Pn`) were not valid until
guess_syscall_args runs. As the variables `SYS_Pn` already existed,
another guess using them could use the wrong values because the
dependency mechanism only checks if the variable exists.

The fix for this is to have temporary variables (`_SYS_Pn`) and have
guess_syscall_args create the definitive ones.

This didn't cause any bug as the only guess that used syscall
args is the new guess_deref which is enabled on demand via an
environment variable for diagnostic purposes.

(cherry picked from commit 32a3cde)
andrewstucki pushed a commit to andrewstucki/beats that referenced this pull request Apr 4, 2020
This unrelated bug was introduced by elastic#15890 (unreleased).
andrewstucki pushed a commit that referenced this pull request Apr 6, 2020
* Add code for reaping sockets after a set timeout

* Fix broken fileset under 32-bit OS.

This unrelated bug was introduced by #15890 (unreleased).

* Add docs

* Update changelog

* Fix ordering issue

Co-authored-by: Adrian Serrano <[email protected]>
andrewstucki pushed a commit to andrewstucki/beats that referenced this pull request Apr 6, 2020
* Add code for reaping sockets after a set timeout

* Fix broken fileset under 32-bit OS.

This unrelated bug was introduced by elastic#15890 (unreleased).

* Add docs

* Update changelog

* Fix ordering issue

Co-authored-by: Adrian Serrano <[email protected]>
(cherry picked from commit ff68af6)
andrewstucki pushed a commit to andrewstucki/beats that referenced this pull request Apr 6, 2020
* Add code for reaping sockets after a set timeout

* Fix broken fileset under 32-bit OS.

This unrelated bug was introduced by elastic#15890 (unreleased).

* Add docs

* Update changelog

* Fix ordering issue

Co-authored-by: Adrian Serrano <[email protected]>
(cherry picked from commit ff68af6)
andrewstucki pushed a commit that referenced this pull request Apr 6, 2020
…17513)

* [Auditbeat] Memory leak and syscall fix (#17500)

* Add code for reaping sockets after a set timeout

* Fix broken fileset under 32-bit OS.

This unrelated bug was introduced by #15890 (unreleased).

* Add docs

* Update changelog

* Fix ordering issue

Co-authored-by: Adrian Serrano <[email protected]>
(cherry picked from commit ff68af6)

* fix changelog
andrewstucki pushed a commit that referenced this pull request Apr 7, 2020
…17514)

* [Auditbeat] Memory leak and syscall fix (#17500)

* Add code for reaping sockets after a set timeout

* Fix broken fileset under 32-bit OS.

This unrelated bug was introduced by #15890 (unreleased).

* Add docs

* Update changelog

* Fix ordering issue

Co-authored-by: Adrian Serrano <[email protected]>
(cherry picked from commit ff68af6)

* fix changelog
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
… fix (elastic#17514)

* [Auditbeat] Memory leak and syscall fix (elastic#17500)

* Add code for reaping sockets after a set timeout

* Fix broken fileset under 32-bit OS.

This unrelated bug was introduced by elastic#15890 (unreleased).

* Add docs

* Update changelog

* Fix ordering issue

Co-authored-by: Adrian Serrano <[email protected]>
(cherry picked from commit a92b736)

* fix changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants