Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fluent: docs: update grep for style #1458

Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 66 additions & 34 deletions pipeline/filters/grep.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,33 @@
---
description: Select or exclude records per patterns
description: Select or exclude records using patterns
---

# Grep

The _Grep Filter_ plugin allows you to match or exclude specific records based on regular expression patterns for values or nested values.
The _Grep Filter_ plugin lets you match or exclude specific records based on
regular expression patterns for values or nested values.

## Configuration Parameters
## Configuration parameters

The plugin supports the following configuration parameters:

| Key | Value Format | Description |
| :--- | :--- | :--- |
| Regex | KEY REGEX | Keep records in which the content of KEY matches the regular expression. |
| Exclude | KEY REGEX | Exclude records in which the content of KEY matches the regular expression. |
| Logical_Op| Operation | Specify which logical operator to use. `AND` , `OR` and `legacy` are allowed as an Operation. Default is `legacy` for backward compatibility. In `legacy` mode the behaviour is either AND or OR depending whether the `grep` is including (uses AND) or excluding (uses OR). Only available from 2.1+. |
| Key | Value Format | Description |
| ------------ | ------------ | ----------- |
| `Regex` | KEY REGEX | Keep records where the content of KEY matches the regular expression. |
| `Exclude` | KEY REGEX | Exclude records where the content of KEY matches the regular expression. |
| `Logical_Op` | Operation | Specify a logical operator: `AND`, `OR` or `legacy` (default). In `legacy` mode the behaviour is either `AND` or `OR` depending on whether the `grep` is including (uses AND) or excluding (uses OR). Available from 2.1 or higher. |

#### Record Accessor Enabled
### Record Accessor Enabled

This plugin enables the [Record Accessor](../../administration/configuring-fluent-bit/classic-mode/record-accessor.md) feature to specify the KEY. Using the _record accessor_ is suggested if you want to match values against nested values.
Enable the [Record Accessor](../../administration/configuring-fluent-bit/classic-mode/record-accessor.md)
feature to specify the KEY. Use the record accesso to match values against nested
esmerel marked this conversation as resolved.
Show resolved Hide resolved
values.

## Getting Started
## Filter records

In order to start filtering records, you can run the filter from the command line or through the configuration file. The following example assumes that you have a file called `lines.txt` with the following content:
To start filtering records, run the filter from the command line or through the
configuration file. The following example assumes that you have a file named
`lines.txt` with the following content:

```text
{"log": "aaa"}
Expand All @@ -35,20 +40,25 @@ In order to start filtering records, you can run the filter from the command lin
{"log": "ggg"}
```

### Command Line
### Command line

> Note: using the command line mode need special attention to quote the regular expressions properly. It's suggested to use a configuration file.
When using the command line, pay special attention to quote the regular expressions.
esmerel marked this conversation as resolved.
Show resolved Hide resolved
Using a configuration file might be easier.

The following command will load the _tail_ plugin and read the content of `lines.txt` file. Then the _grep_ filter will apply a regular expression rule over the _log_ field \(created by tail plugin\) and only _pass_ the records which field value starts with _aa_:
The following command loads the [tail](../../pipeline/inputs/tail) plugin and
reads the content of `lines.txt` file. Then the `grep` filter applies a regular
esmerel marked this conversation as resolved.
Show resolved Hide resolved
expression rule over the `log` field created by the `tail` plugin and only passes
records with a field value starting with `aa`:

```text
$ bin/fluent-bit -i tail -p 'path=lines.txt' -F grep -p 'regex=log aa' -m '*' -o stdout
```

### Configuration File
### Configuration file

{% tabs %}
{% tab title="fluent-bit.conf" %}

```python
[SERVICE]
parsers_file /path/to/parsers.conf
Expand All @@ -67,9 +77,11 @@ $ bin/fluent-bit -i tail -p 'path=lines.txt' -F grep -p 'regex=log aa' -m '*' -o
name stdout
match *
```

{% endtab %}

{% tab title="fluent-bit.yaml" %}

```yaml
service:
parsers_file: /path/to/parsers.conf
Expand All @@ -87,14 +99,21 @@ pipeline:
match: '*'

```

{% endtab %}
{% endtabs %}

The filter allows to use multiple rules which are applied in order, you can have many _Regex_ and _Exclude_ entries as required.
The filter allows you to use multiple rules which are applied in order. You can
esmerel marked this conversation as resolved.
Show resolved Hide resolved
have as many `Regex` and `Exclude` entries as required.

### Nested fields example

If you want to match or exclude records based on nested values, you can use a [Record Accessor ](../../administration/configuring-fluent-bit/classic-mode/record-accessor.md)format as the KEY name. Consider the following record example:
To match or exclude records based on nested values, you can use
[Record
Accessor](../../administration/configuring-fluent-bit/classic-mode/record-accessor.md)
format as the `KEY` name.

Consider the following record example:

```javascript
{
Expand All @@ -113,40 +132,45 @@ If you want to match or exclude records based on nested values, you can use a [R
}
```

if you want to exclude records that match given nested field \(for example `kubernetes.labels.app`\), you can use the following rule:
For example, to exclude records that match the nested field `kubernetes.labels.app`,
use the following rule:

{% tabs %}
{% tab title="fluent-bit.conf" %}

```python
[FILTER]
Name grep
Match *
Exclude $kubernetes['labels']['app'] myapp
```
{% endtab %}

{% endtab %}
{% tab title="fluent-bit.yaml" %}

```yaml
filters:
- name: grep
match: '*'
exclude: $kubernetes['labels']['app'] myapp
```

{% endtab %}
{% endtabs %}

### Excluding records missing/invalid fields

It may be that in your processing pipeline you want to drop records that are missing certain keys.
### Excluding records with missing or invalid fields

A simple way to do this is just to `exclude` with a regex that matches anything, a missing key will fail this check.
You might want to drop records that are missing certain keys.

Here is an example that checks for a specific valid value for the key as well:
One way to do this is to `exclude` with a regex that matches anything. A missing
key fails this check.

The followinfg example checks for a specific valid value for the key:

{% tabs %}
{% tab title="fluent-bit.conf" %}
```

```text
# Use Grep to verify the contents of the iot_timestamp value.
# If the iot_timestamp key does not exist, this will fail
# and exclude the row.
Expand All @@ -156,30 +180,34 @@ Here is an example that checks for a specific valid value for the key as well:
Match iots_thread.*
Regex iot_timestamp ^\d{4}-\d{2}-\d{2}
```
{% endtab %}

{% endtab %}
{% tab title="fluent-bit.yaml" %}

```yaml
filters:
- name: grep
alias: filter-iots-grep
match: iots_thread.*
regex: iot_timestamp ^\d{4}-\d{2}-\d{2}
```

{% endtab %}
{% endtabs %}

The specified key `iot_timestamp` must match the expected expression - if it does not or is missing/empty then it will be excluded.
The specified key `iot_timestamp` must match the expected expression. If it doesn't,
or is missing or empty, then it will be excluded.

### Multiple conditions

If you want to set multiple `Regex` or `Exclude`, you can use `Logical_Op` property to use logical conjuction or disjunction.

Note: If `Logical_Op` is set, setting both 'Regex' and `Exclude` results in an error.
If you want to set multiple `Regex` or `Exclude`, use the `Logical_Op` property
to use a logical conjuction or disjunction.

If `Logical_Op` is set, setting both `Regex` and `Exclude` results in an error.

{% tabs %}
{% tab title="fluent-bit.conf" %}

```python
[INPUT]
Name dummy
Expand All @@ -196,9 +224,11 @@ Note: If `Logical_Op` is set, setting both 'Regex' and `Exclude` results in an e
[OUTPUT]
Name stdout
```

{% endtab %}

{% tab title="fluent-bit.yaml" %}

```yaml
pipeline:
inputs:
Expand All @@ -215,11 +245,13 @@ pipeline:
outputs:
- name: stdout
```

{% endtab %}
{% endtabs %}

Output will be
```
The output looks similar to:

```text
Fluent Bit v2.0.9
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
Expand All @@ -236,4 +268,4 @@ Fluent Bit v2.0.9
[2023/01/22 09:46:49] [ info] [output:stdout:stdout.0] worker #0 started
[0] dummy: [1674348410.558341857, {"endpoint"=>"localhost", "value"=>"something"}]
[0] dummy: [1674348411.546425499, {"endpoint"=>"localhost", "value"=>"something"}]
```
```