Filebeat: ingest Elasticsearch structured audit logs #8852

ycombinator · 2018-10-31T15:48:37Z

Resolves #8831.

This PR teaches the elasticsearch/audit fileset to ingest structured audit logs in addition to the semi-structured audit logs, which it already knows how to ingest.

ruflin

I was initially thinking of a different implementation in which the decoding of the json happens on the Filebeat side. The advantage is that then user can filter on ingest time based on these fields. The disadvantage is that we need to find out on the fileset side if it's JSON or not. Would it be possible instead of having all these "if" statements two have 2 different pipelines instead to have a cleaner code? I think there are multiple options to get to the same result and not sure yet which one is the best implementation.

This change will also need an addition to the docs and changelog.

ruflin · 2018-11-01T08:27:00Z

filebeat/module/elasticsearch/audit/ingest/pipeline.json

+        },
+        {
+            "grok": {
+                "if": "ctx.first_char != '{'",


I assume this will required Elasticsearch 6.5 or newer?

filebeat/module/elasticsearch/audit/test/test.log

filebeat/module/elasticsearch/audit/test/test.log-expected.json

ycombinator · 2018-11-01T08:46:49Z

Would it be possible instead of having all these "if" statements two have 2 different pipelines instead to have a cleaner code?

I would like that as well, but I'm not sure how to achieve it :) Ideally we could use https://www.elastic.co/guide/en/elasticsearch/reference/master/pipeline-processor.html but I'm not sure how to push multiple ingest pipelines into ES for the same fileset. AFAICT that's not currently supported in Filebeat but maybe we should add support for that?

ruflin · 2018-11-02T12:48:36Z

You are correct that at the moment it's not supported but we should add it as this will happen in other places too. We probably still need a "root" pipeline that we send all data to and which routes then the events. Or would you do the separation on the Ingest side already?

ycombinator · 2018-11-02T13:28:24Z

You are correct that at the moment it's not supported but we should add it as this will happen in other places too.

Okay, great. I'm going to suspend this PR and create a new one just to introduce this multi-pipeline functionality. This PR will then depend on the new PR.

We probably still need a "root" pipeline that we send all data to and which routes then the events. Or would you do the separation on the Ingest side already?

I was thinking Beats' job would be just to create the necessary pipelines. The separation would then happen in the root pipeline.

ycombinator · 2018-11-02T22:45:02Z

As noted in my previous comment, I've started work on teaching Filebeat to support multiple ingest pipelines here: #8914.

Motivated by #8852 (comment). Starting with 6.5.0, Elasticsearch Ingest Pipelines have gained the ability to: - run sub-pipelines via the [`pipeline` processor](https://www.elastic.co/guide/en/elasticsearch/reference/6.5/pipeline-processor.html), and - conditionally run processors via an [`if` field](https://www.elastic.co/guide/en/elasticsearch/reference/6.5/ingest-processors.html). These abilities combined present the opportunity for a fileset to ingest the same _logical_ information presented in different formats, e.g. plaintext vs. json versions of the same log files. Imagine an entry point ingest pipeline that detects the format of a log entry and then conditionally delegates further processing of that log entry, depending on the format, to another pipeline. This PR allows filesets to specify one or more ingest pipelines via the `ingest_pipeline` property in their `manifest.yml`. If more than one ingest pipeline is specified, the first one is taken to be the entry point ingest pipeline. #### Example with multiple pipelines ```yaml ingest_pipeline: - pipeline-ze-boss.json - pipeline-plain.json - pipeline-json.json ``` #### Example with a single pipeline _This is just to show that the existing functionality will continue to work as-is._ ```yaml ingest_pipeline: pipeline.json ``` Now, if the root pipeline wants to delegate processing to another pipeline, it must use a `pipeline` processor to do so. This processor's `name` field will need to reference the other pipeline by its name. To ensure correct referencing, the `name` field must be specified as follows: ```json { "pipeline" : { "name": "{< IngestPipeline "pipeline-plain" >}" } } ``` This will ensure that the specified name gets correctly converted to the corresponding name in Elasticsearch, since Filebeat prefixes it's "raw" Ingest pipeline names with `filebeat-<version>-<module>-<fileset>-` when loading them into Elasticsearch.

#9811) Cherry-pick of PR #8914 to 6.x branch. Original message: Motivated by #8852 (comment). Starting with 6.5.0, Elasticsearch Ingest Pipelines have gained the ability to: - run sub-pipelines via the [`pipeline` processor](https://www.elastic.co/guide/en/elasticsearch/reference/6.5/pipeline-processor.html), and - conditionally run processors via an [`if` field](https://www.elastic.co/guide/en/elasticsearch/reference/6.5/ingest-processors.html). These abilities combined present the opportunity for a fileset to ingest the same _logical_ information presented in different formats, e.g. plaintext vs. json versions of the same log files. Imagine an entry point ingest pipeline that detects the format of a log entry and then conditionally delegates further processing of that log entry, depending on the format, to another pipeline. This PR allows filesets to specify one or more ingest pipelines via the `ingest_pipeline` property in their `manifest.yml`. If more than one ingest pipeline is specified, the first one is taken to be the entry point ingest pipeline. #### Example with multiple pipelines ```yaml ingest_pipeline: - pipeline-ze-boss.json - pipeline-plain.json - pipeline-json.json ``` #### Example with a single pipeline _This is just to show that the existing functionality will continue to work as-is._ ```yaml ingest_pipeline: pipeline.json ``` Now, if the root pipeline wants to delegate processing to another pipeline, it must use a `pipeline` processor to do so. This processor's `name` field will need to reference the other pipeline by its name. To ensure correct referencing, the `name` field must be specified as follows: ```json { "pipeline" : { "name": "{< IngestPipeline "pipeline-plain" >}" } } ``` This will ensure that the specified name gets correctly converted to the corresponding name in Elasticsearch, since Filebeat prefixes it's "raw" Ingest pipeline names with `filebeat-<version>-<module>-<fileset>-` when loading them into Elasticsearch.

ycombinator · 2019-01-02T20:40:01Z

Will rebase on master once ~~#9851~~ #9855 has been merged.

ycombinator · 2019-01-03T00:35:12Z

@ruflin After #8914 was merged recently, I resurrected this PR to use the functionality introduced in #8914. Would love your review when you get a chance. Thanks!

ruflin

Seeing a needs_backport label here I think we should discuss what our compatibility promise here is.

If someone runs gets logs from ES 6.3 with FB 6.7 and sends data to 6.3, I assume the pipeline would stop working? Or in other words, a user upgrading FB from 6.3 to 6.7, the ingestions would stop.

ruflin · 2019-01-07T08:37:05Z

filebeat/module/elasticsearch/audit/ingest/pipeline-json.json

+            }
+        },
+        {
+            "dot_expander": {


This is only need to make the event look nicer?

No, actually (and unfortunately IMO), it is required for the next processor (rename) to work. If I remove this dot_expander processor entry, I will get an error like so from ES when it tries to execute the rename processor:

field [elasticsearch.audit.event.type] doesn't exist

Perhaps we should file an enhancement request around this with ES?

elastic/elasticsearch#37507

ruflin · 2019-01-07T08:38:53Z

filebeat/module/elasticsearch/audit/test/test-audit.log-expected.json

+        "elasticsearch.audit.action": "cluster:admin/xpack/security/realm/cache/clear",
+        "elasticsearch.audit.event_type": "access_granted",
+        "elasticsearch.audit.layer": "transport",
+        "elasticsearch.audit.origin_address": "127.0.0.1",


Looks like quite a few fields here we should map to ECS (follow up PR).

Agreed. I've never done an ECS conversion before. Would you mind pointing me to a PR that did a similar conversion and I could use as a reference? Thanks!

Here you have quite a list of PR's: #8655

ycombinator · 2019-01-08T12:53:13Z

Seeing a needs_backport label here I think we should discuss what our compatibility promise here is.

If someone runs gets logs from ES 6.3 with FB 6.7 and sends data to 6.3, I assume the pipeline would stop working? Or in other words, a user upgrading FB from 6.3 to 6.7, the ingestions would stop.

Yes, this is true (and not ideal, of course).

The whole reason behind wanting to get this change into 6.x was so that the ES team could deprecate the plaintext audit log in 6.7 and then remove it in 7.0.

If you recall, this PR is built on top of #8914, which introduces the ability for Filebeat modules to have multiple ingest pipelines with an entrypoint pipeline. In that PR you had brought up the version compatibility issue as well: #8914 (comment). @urso brought this up with me off-PR as well, so we decided that I would make a follow up PR to #8914 and add a version check. If the user is running Filebeat against an ES < 6.5.0 and using a Filebeat module with multiple pipelines, we will throw an error and stop.

Now, obviously this means that this is a breaking change in a minor version. However, the only module to use this feature would be the Elasticsearch module and it is currently marked as beta. Given the benefit I mentioned earlier about letting the ES team deprecate the plaintext audit log in 6.7.0 and remove it in 7.0.0, I would like to suggest that we allow this breaking change with the Elasticsearch Filebeat module, provided the follow up PR with the version check is done first.

Thoughts?

ycombinator · 2019-01-25T22:24:36Z

jenkins, test this

ycombinator · 2019-01-25T22:53:23Z

jenkins, test this

This is a "forward port" of #8852. In #8852, we taught Filebeat to ingest either structured or unstructured ES audit logs but the resulting fields conformed to the 6.x mapping structure. In this PR we also teach Filebeat to ingest either structured or unstructured ES audit logs but the resulting fields conform to the 7.0 (ECS-based) mapping structure.

ycombinator added Filebeat Filebeat v7.0.0-alpha1 v6.6.0 needs_backport PR is waiting to be backported to other branches. in progress Pull request is currently in progress. review and removed in progress Pull request is currently in progress. labels Oct 31, 2018

ycombinator requested a review from ruflin November 1, 2018 02:05

ruflin reviewed Nov 1, 2018

View reviewed changes

ycombinator force-pushed the filebeat-elasticsearch-structured-audit-log branch from 343d223 to 72ad1c1 Compare November 1, 2018 09:00

ycombinator mentioned this pull request Nov 2, 2018

Accept multiple ingest pipelines in Filebeat #8914

Merged

ycombinator added in progress Pull request is currently in progress. and removed review labels Nov 2, 2018

ycombinator added v6.7.0 and removed v6.6.0 labels Dec 21, 2018

ycombinator mentioned this pull request Dec 27, 2018

Cherry-pick #8914 to 6.x: Accept multiple ingest pipelines in Filebeat #9811

Merged

ycombinator force-pushed the filebeat-elasticsearch-structured-audit-log branch from 0ccd910 to 45d21d7 Compare January 2, 2019 18:49

ycombinator requested a review from a team as a code owner January 2, 2019 18:49

ycombinator force-pushed the filebeat-elasticsearch-structured-audit-log branch from cc6784e to 90ec899 Compare January 2, 2019 20:18

ycombinator force-pushed the filebeat-elasticsearch-structured-audit-log branch from 90ec899 to dd93205 Compare January 2, 2019 23:59

ruflin reviewed Jan 7, 2019

View reviewed changes

ycombinator force-pushed the filebeat-elasticsearch-structured-audit-log branch from 116639c to d21a462 Compare January 25, 2019 22:05

ycombinator added 21 commits January 26, 2019 05:52

Amending pipeline to handle structured logs

f06e2ff

Adding globs for structured audit log files

36f9ba0

Fixing up pipeline

b4ba744

Fixing up log fixture

c99efdd

Only build URI field if it's not going to be empty

dad4795

Fixing up log fixture

73e7219

Updating fields.ymls

5888ace

Splitting test logs and expected files

ec290bd

Fixing up test fixtures

30d23f9

Reverting content in original expected test file

87452c1

Fixing offsets after splitting files

79e5d88

Updating generated files

bbc5d39

Fixing up pipeline

700291c

Regenerating golden files

b3d899e

Using multiple pipelines

527d057

Regenerating golden files

cdf7046

Adding CHANGELOG entries

c381143

Regenerating generated files

2b4c477

Updating golden files for 6.x

04657ed

Removing file accidentally ported over from master

79765ad

Rebasing...

2667c21

ycombinator force-pushed the filebeat-elasticsearch-structured-audit-log branch from d21a462 to 2667c21 Compare January 26, 2019 13:52

ycombinator merged commit 3334adf into elastic:6.x Jan 26, 2019

ycombinator deleted the filebeat-elasticsearch-structured-audit-log branch January 26, 2019 14:50

This was referenced Jan 27, 2019

Fixing regression in macOS path #10351

Merged

Ingest ES structured audit logs #10352

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filebeat: ingest Elasticsearch structured audit logs #8852

Filebeat: ingest Elasticsearch structured audit logs #8852

ycombinator commented Oct 31, 2018 •

edited

Loading

ruflin left a comment

ruflin Nov 1, 2018

ycombinator Nov 1, 2018

ycombinator commented Nov 1, 2018 •

edited

Loading

ruflin commented Nov 2, 2018

ycombinator commented Nov 2, 2018

ycombinator commented Nov 2, 2018

ycombinator commented Jan 2, 2019 •

edited

Loading

ycombinator commented Jan 3, 2019

ruflin left a comment

ruflin Jan 7, 2019

ycombinator Jan 8, 2019

ruflin Jan 9, 2019 •

edited

Loading

ycombinator Jan 15, 2019

ruflin Jan 7, 2019

ycombinator Jan 13, 2019

ruflin Jan 16, 2019

ycombinator commented Jan 8, 2019

ycombinator commented Jan 25, 2019

ycombinator commented Jan 25, 2019

Filebeat: ingest Elasticsearch structured audit logs #8852

Filebeat: ingest Elasticsearch structured audit logs #8852

Conversation

ycombinator commented Oct 31, 2018 • edited Loading

ruflin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ycombinator commented Nov 1, 2018 • edited Loading

ruflin commented Nov 2, 2018

ycombinator commented Nov 2, 2018

ycombinator commented Nov 2, 2018

ycombinator commented Jan 2, 2019 • edited Loading

ycombinator commented Jan 3, 2019

ruflin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruflin Jan 9, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ycombinator commented Jan 8, 2019

ycombinator commented Jan 25, 2019

ycombinator commented Jan 25, 2019

ycombinator commented Oct 31, 2018 •

edited

Loading

ycombinator commented Nov 1, 2018 •

edited

Loading

ycombinator commented Jan 2, 2019 •

edited

Loading

ruflin Jan 9, 2019 •

edited

Loading