-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate the best way to decide when to read system logs from files or journald #10797
Comments
Since all Debian 12 installations use systemd-journald, maybe a condition like |
A condition might be enough to star with, at this moment I'm not sure which information about the distros (like name and flavour) are available to use as conditions in the policy. It is also a general problem of detecting it in all Linux hosts so we don't have to manually update it whenever a new distro/version starts (or stops) using journald for system logs. The last bit of the challenge is (maybe not covered in this issue) is how to handle ingest pipelines and other assets that expect the event to be on a specific format (mostly the plain text form the traditional log files) that is different than what the journald input will create. The ingest pipelines might just be a matter of updating them to also support the events from the journald input as they're capable of quite complex logics. |
@belimawr as discussed yesterday, could you come with some options that you identified to solve this issue and support operating system basing themself on journald instead of syslog? |
Problem statementTo support the system integration on Debian 12 and other distributions Currently Filebeat's system module implements this by by looking if Observations
QuestionsDo we need support a scenario where the system integration can collect system logs from any supported OS without any configuration?If we add a toggle in the in the system logs to enable journald My answer: Do we need keep the Elastic-Agent running different process for actual input Beats types?Honestly, what is a input type here? Is it a Beat input type or just If it is the latter, then we can use the system-logs input that will Best optionUse the
|
After discussion in the data plane team meeting today, we concluded that the current approach of the new systemlogs input does not work well for agent. Instead, we should pursue an approach based on agent's input conditions. https://www.elastic.co/guide/en/fleet/current/dynamic-input-configuration.html. This will clearly show which input is active in Elastic Agent's state and health reports. First, we will revert the change in 8.16 that causes the system module to use the systemlogs input by default. This is so that no user is forced to use this input by default while we evaluate if we need it at all. Then, we will work to implement a conditions based approach. This will require the introduction of a new OS version or distribution (Debian 12, Windows 10, etc) field in Elastic Agent's host provider. This condition can then be used by us to specify which OS versions require use of journald by default. For example: inputs:
- type: log
id: a
paths:
- /var/log/*.log
condition: ${host.os_version} != 'debian12'
- type: journald
id: b
condition: ${host.os_version} == 'debian12' It must be possible for users to force use of journald or syslog regardless of the condition in the integration. We need to confirm that this can be done with the existing package templating support. The current OS conditional support for winlog can likely be used as a reference. The system logs input is being removed from the Filebeat system module to give us the freedom to see if we can also use this approach with Beats. Beats also have support for conditions, but primarily for autodiscovery which may not do what we need. While we evaluate this we do not want users using the new input. After we have support for conditions based on the OS version, we should evaluate a way to simplify this detection. For example, the inputs would be conditional on the presence of the syslog file paths in the file system, but agent currently has no way for us to do this. If it is simpler to implement a condition based on the paths in the log input, we should pursue that immediately to avoid having to maintain the OS version detection logic. |
I've been doing some testing and managed to get the conditions working, some key points:
This allows users to install the integration accepting the defaults and have it working on any supported OS while still allowing them to fine tune when to use log or journald input to their specific needs. Screenshot of the integrationThe copy can be greatly improved, I'm focusing on the overall user experience here Policy exampleI removed some fields for simplicity, but that's the rendering from Fleet inputs:
- id: journald-system-e23caacf-2836-4068-981e-5e7cd7ffe3cc
name: system-1
revision: 1
type: journald
use_output: default
meta:
package:
name: system
version: 1.61.1
data_stream:
namespace: default
package_policy_id: e23caacf-2836-4068-981e-5e7cd7ffe3cc
streams:
- id: journald-system.syslog-e23caacf-2836-4068-981e-5e7cd7ffe3cc
type: journald
data_stream:
dataset: null
condition: '${host.os_version} == "12 (bookworm)"'
- id: logfile-system-e23caacf-2836-4068-981e-5e7cd7ffe3cc
name: system-1
revision: 1
type: logfile
use_output: default
meta:
package:
name: system
version: 1.61.1
data_stream:
namespace: default
package_policy_id: e23caacf-2836-4068-981e-5e7cd7ffe3cc
streams:
- id: logfile-system.auth-e23caacf-2836-4068-981e-5e7cd7ffe3cc
data_stream:
dataset: system.auth
type: logs
condition: '${host.os_version} != "12 (bookworm)"'
ignore_older: 72h
paths:
- /var/log/auth.log*
- /var/log/secure*
tags:
- system-auth
- id: logfile-system.syslog-e23caacf-2836-4068-981e-5e7cd7ffe3cc
data_stream:
dataset: system.syslog
type: logs
condition: '${host.os_version} != "12 (bookworm)"'
paths:
- /var/log/messages*
- /var/log/syslog*
- /var/log/system* Elastic-Agent status output
Another option to have each distro listed explicitly in the conditions field, is to provide a list in the host provider for the "use systemd distros", then we can simplify the condition to something like
cc: @nimarezainia, @cmacknz |
Filtering by os/version is going to be interesting, here are some examples of what we get with go-sysinfo:
Amazon Linux 2
Debian 12:
Debian 11:
Ubuntu 22.04:
|
The integration and condition approach is working as expected, with some refining to do on the conditions, also expected. You are going to need to filter on both the distro name and version, case in point being Amazon Linux, we do not want a condition just against the number "2" as that is far too ambiguous. |
Yes, version worked for Debian 12, but I had no hope it would hold true for other distros. Amazon Linux is the most interesting one, platform is more specific than family. |
Sharing more of the progress and challenges. I've opened a draft PR, if anybody wants to follow the code (#11618) and a draft PR with the Elastic-Agent changes: elastic/elastic-agent#5941. Regarding the challenges, to properly manage the ingest pipelines, I created new data_streams for the journald version of the syslog and auth data streams, this allows them to have their own tests as well. Journald provides much more structured information while traditional log files need to rely on more parsing or even fetching information from the host system, this makes part of the processing very different. This creates an interesting situation where Another challenge is the system tests (when |
We need to track fixing this as it will break all queries and visualizations. Make sure there is a bug tracking a fix for this. |
I'll try to fix it as part of my PR. |
Amazon Linux 2023 only uses journald, here is the host info:
|
I managed to fix the value for I also tested the dashboards, they all seem to work well, I added some screenshots in the integrations PR. |
Debian 12 has stopped writing system logs to traditional log files and now only uses journald by default (see release notes).
This makes the system integration unable to ingest some data because it expects to read direct from files.
We need to find the best way to detect the whether files or journald is used to store the system logs and configure the correct input (log/filestream or journald).
There is a similar issue in the Beats repository to handle the same situatin in Filebeat's system module: elastic/beats#40526.
The text was updated successfully, but these errors were encountered: