Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auditbeat 7.7.x Poor Performance: 100%+ CPU Usage with System Module Socket Dataset Enabled #19141

Closed
BenB196 opened this issue Jun 11, 2020 · 16 comments · Fixed by #19764
Closed

Comments

@BenB196
Copy link

BenB196 commented Jun 11, 2020

Auditbeat 7.7.x with the System Module Socket Dataset enabled, will randomly start using 100%+ CPU on some servers. This was not an issue prior to 7.7.x.

Restarting the Auditbeat services causes CPU usage to go back to normal for a bit, but it will eventually start having issues again.

This issue doesn't seem to happen on every server, running Auditbeat on ~100 servers with the same config (below), the issue appears to occur on 10-15% of the servers. I see the issue on both OpenSUSE and CentOS servers, on multiple different kernels, and running different apps.

Screenshot showing issue (Percentages on the graph are of total CPU, not of individual cores, this example server has 4 cores, meaning Auditbeat is using one of them completely for itself):

image

Version Output:

auditbeat version
auditbeat version 7.7.1 (amd64), libbeat 7.7.1 [932b273e8940575e15f10390882be205bad29e1f built 2020-05-28 15:20:33 +0000 UTC]

System versions:

# uname -a
Linux server 3.10.0-1062.9.1.el7.x86_64 #1 SMP Fri Dec 6 15:49:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

# cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

Configuration:

###################### Auditbeat Configuration #########################

#==========================  Modules configuration =============================
auditbeat.modules:

- module: auditd
  resolve_ids: true
  failure_mode: silent
  backlog_limit: 8192
  rate_limit: 0
  include_raw_message: false
  include_warnings: false
  backpressure_stratgey: auto
  # Load audit rules from separate files. Same format as audit.rules(7).
  audit_rule_files: [ '${path.config}/audit.rules.d/*.conf' ]
  audit_rules: |
    ## Define audit rules here.
    ## Create file watches (-w) or syscall audits (-a or -A). Uncomment these
    ## examples or add your own rules.

    ## If you are on a 64 bit platform, everything should be running
    ## in 64 bit mode. This rule will detect any use of the 32 bit syscalls
    ## because this might be a sign of someone exploiting a hole in the 32
    ## bit API.
    -a always,exit -F arch=b32 -S all -F key=32bit-abi

    ## Executions.
    -a always,exit -F arch=b64 -S execve,execveat -k exec

    ## External access (warning: these can be expensive to audit).
    -a always,exit -F arch=b64 -S accept,bind,connect -F key=external-access

    ## Identity changes.
    -w /etc/group -p wa -k identity
    -w /etc/passwd -p wa -k identity
    -w /etc/gshadow -p wa -k identity
    -w /etc/shadow -p wa -k identity

    ## Unauthorized access attempts.
    -a always,exit -F arch=b32 -S open,creat,truncate,ftruncate,openat,open_by_handle_at -F exit=-EACCES -k access
    -a always,exit -F arch=b32 -S open,creat,truncate,ftruncate,openat,open_by_handle_at -F exit=-EPERM -k access
    -a always,exit -F arch=b64 -S open,creat,truncate,ftruncate,openat,open_by_handle_at -F exit=-EACCES -k access
    -a always,exit -F arch=b64 -S open,creat,truncate,ftruncate,openat,open_by_handle_at -F exit=-EPERM -k access

- module: file_integrity
  paths:
  - /bin
  - /usr/bin
  - /sbin
  - /usr/sbin
  - /etc
  - /root
  - /usr/local/bin
  - /home
  exclude_files:
  - '(?i)\.sw[nop]$'
  - '~$'
  - '/\.git($|/)'
  - '\.rrd$'
  include_files: []
  scan_at_start: true
  scan_rate_per_sec: 50 MiB
  max_file_size: 100 MiB
  hash_types: [md5,sha256]
  recursive: true

- module: system
  datasets:
    - host    # General host information, e.g. uptime, IPs
    - login   # User logins, logouts, and system boots.
    - package # Installed, updated, and removed packages
    - process # Started and stopped processes
    - socket  # Opened and closed sockets
    - user    # User information

  # How often datasets send state updates with the
  # current state of the system (e.g. all currently
  # running processes, all open sockets).
  state.period: 12h

  # Enabled by default. Auditbeat will read password fields in
  # /etc/passwd and /etc/shadow and store a hash locally to
  # detect any changes.
  user.detect_password_changes: true

  # File patterns of the login record files.
  login.wtmp_file_pattern: /var/log/wtmp*
  login.btmp_file_pattern: /var/log/btmp*

#================================ Outputs =====================================

#----------------------------- Logstash output --------------------------------
output.logstash:
  # The Logstash hosts
  hosts: ["<snipped>"]
  loadbalance: true

#================================ Processors =====================================

processors:
  - add_host_metadata: ~
  - add_tags:
      tags: [auditbeat]
  - dns:
      type: reverse
      fields:
        server.ip: server.hostname
        client.ip: client.hostname
        source.ip: source.hostname
        destination.ip: destination.hostname
      nameservers: ['<snipped>']
      tag_on_failure: [_dns_reverse_lookup_failed]

#================================ Logging =====================================

logging.level: info
logging.to_files: true
logging.files:
  path: /var/log/auditbeat
  name: auditbeat
  keepfiles: 2
  permissions: 0600
  rotateeverybytes: 5242880

#============================== X-Pack Monitoring ===============================
monitoring.enabled: true
monitoring.elasticsearch:
  hosts: ["<snipped>"]
  protocol: "https"
  username: "<snipped>"
  password: "<snipped>"
  ssl.enabled: true
  ssl.verification_mode: full
  ssl.certificate_authorities: ["<snipped>"]
monitoring.cluster_uuid: "<snipped>"

For confirmed bugs, please report:

  1. Install Auditbeat 7.7.x (I used the RPM file and installed via RPM)
  2. Configure Auditbeat to use System module with Socket portion
  3. Start Auditbeat
  4. Wait for Auditbeat to start consuming more CPU than it should
  5. Stop Auditbeat
  6. Remove Socket portion of System module
  7. Start Auditbeat
  8. Auditbeat will no longer use more CPU than it should
@BenB196 BenB196 changed the title Auditbeat 7.7.x Poor 100%+ CPU Usage with System Module Socket Dataset Enabled Auditbeat 7.7.x Poor Performance: 100%+ CPU Usage with System Module Socket Dataset Enabled Jun 11, 2020
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jun 11, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/siem (Team:SIEM)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jun 11, 2020
@awei82
Copy link

awei82 commented Jun 16, 2020

I'm running into the exact same issue too with Auditbeat 7.7.1 - running on Ubuntu 16.04.

@adriansr
Copy link
Contributor

It looks like you're running into the issue fixed by #19033.

The fix was too late for 7.7.1, but it will make it into 7.8.0.

@adriansr
Copy link
Contributor

Fix available in 7.8.0

@tlandschoff-scale
Copy link

I have that version installed and I am still seeing this problem:

$ auditbeat version
auditbeat version 7.8.0 (amd64), libbeat 7.8.0 [f79387d32717d79f689d94fda1ec80b2cf285d30 built 2020-06-14 18:11:10 +0000 UTC]

According to perf top, this is where the CPU time goes:

  42,62%  auditbeat [.] runtime.mapaccess2_fast64
  15,19%  auditbeat [.] github.com/elastic/beats/v7/x-pack/auditbeat/module/system/socket.(*state).ExpireOlder
  10,50%  auditbeat [.] runtime.aeshash64
   7,76%  auditbeat [.] github.com/elastic/beats/v7/x-pack/auditbeat/module/system/socket.(*state).onSockDestroyed
   3,57%  auditbeat [.] time.Time.Before
   2,92%  auditbeat [.] github.com/elastic/beats/v7/x-pack/auditbeat/module/system/socket.(*socket).Timestamp

As this call stack suggests, removing the socket dataset from the system module makes this problem go away:

--- auditbeat.yml.cpuhog	2020-06-23 09:22:49.122378568 +0200
+++ auditbeat.yml	2020-06-23 09:22:58.938317272 +0200
@@ -59,7 +59,7 @@
     - host    # General host information, e.g. uptime, IPs
     - login   # User logins, logouts, and system boots.
     - process # Started and stopped processes
-    - socket  # Opened and closed sockets
+    # - socket  # Opened and closed sockets
     - user    # User information
 
   # How often datasets send state updates with the

@btnrsec
Copy link

btnrsec commented Jun 29, 2020

Fix available in 7.8.0

I have upgraded a client to auditbeat 7.8.0 and am still experiencing the same issue (on Ubuntu 16.04.6 LTS). One client upgraded from 7.6.1 (without the socket issue) to 7.8.0 and is now getting high CPU usage. Still the workaround is to uncomment the socket dataset.

@BenB196
Copy link
Author

BenB196 commented Jun 30, 2020

@adriansr could this issue be reopened as the issue does not appear to be fixed in 7.8.0?

@adriansr
Copy link
Contributor

adriansr commented Jun 30, 2020

Reopening.

Can someone please provide the output of running Auditbeat with -httpprof :8080 and once it's using 100% cpu, run curl 'http://localhost:8080/debug/pprof/profile?seconds=30' -o profile.prof and share the profile.prof binary file.

@adriansr adriansr reopened this Jun 30, 2020
@BenB196
Copy link
Author

BenB196 commented Jul 1, 2020

@adriansr Here are 3 servers with the issue. Attached zip file contains the 3 profiles:

Server A:

#uname -a
Linux assetmgmt01 4.12.14-lp150.12.82-default #1 SMP Tue Nov 12 16:32:38 UTC 2019 (c939e24) x86_64 x86_64 x86_64 GNU/Linux

#cat /etc/os-release
NAME="openSUSE Leap"
VERSION="15.0"
ID="opensuse-leap"
ID_LIKE="suse opensuse"
VERSION_ID="15.0"
PRETTY_NAME="openSUSE Leap 15.0"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:opensuse:leap:15.0"
BUG_REPORT_URL="https://bugs.opensuse.org"
HOME_URL="https://www.opensuse.org/"

Server B:

#uname -a
Linux dmiml01-stg 4.12.14-lp150.12.82-default #1 SMP Tue Nov 12 16:32:38 UTC 2019 (c939e24) x86_64 x86_64 x86_64 GNU/Linux

#cat /etc/os-release
NAME="openSUSE Leap"
VERSION="15.0"
ID="opensuse-leap"
ID_LIKE="suse opensuse"
VERSION_ID="15.0"
PRETTY_NAME="openSUSE Leap 15.0"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:opensuse:leap:15.0"
BUG_REPORT_URL="https://bugs.opensuse.org"
HOME_URL="https://www.opensuse.org/"

Server C:

#uname -a
Linux dnsdist 4.18.0-147.5.1.el8_1.x86_64 #1 SMP Wed Feb 5 02:00:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

#cat /etc/os-release
NAME="CentOS Linux"
VERSION="8 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Linux 8 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-8"
CENTOS_MANTISBT_PROJECT_VERSION="8"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="8"

auditbeat_profiles.zip

@wixaw
Copy link
Contributor

wixaw commented Jul 17, 2020

Hello
7.8 dont fix this issue for me
I take this opportunity to say that commenting on "socket" reduced the CPU, but after a while the CPU increased again
Whereas with 7.5 auditbeat was completely transparent on my servers
Now It is only on servers where there is Apache that it does not work properly
I'm still using the original configuration file

image

Thanks

@vinnytroia
Copy link

same.

@andrewstucki
Copy link
Contributor

andrewstucki commented Aug 4, 2020

@wixaw & @vinnytroia what versions of auditbeat are you running? The fix for the bug I found was shipped in 7.8.1 which was released on July 27th--trying to determine if this is another issue or if you just need to upgrade the patch version.

@vinnytroia
Copy link

vinnytroia commented Aug 4, 2020 via email

@wixaw
Copy link
Contributor

wixaw commented Aug 25, 2020

Hello
I had not seen the information in the 7.8.1 release
I installed 7.8.1 on my servers and have no more CPU issues
Thank you

@mileskelsey
Copy link

I still see this problem in version 7.9.3

@HaZet1968
Copy link

I still have the problem (with version 7.9.1) on machines with a lot of network traffic (e.g squid, webserver), too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.