Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fields using wildcard type should not specify ignore_above param #30096

Closed
ebeahan opened this issue Jan 28, 2022 · 4 comments · Fixed by #30668
Closed

Fields using wildcard type should not specify ignore_above param #30096

ebeahan opened this issue Jan 28, 2022 · 4 comments · Fixed by #30668
Labels
ecs Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team v8.0.0

Comments

@ebeahan
Copy link
Member

ebeahan commented Jan 28, 2022

Version: 8.0.0-rc1, 7.16.2
Operating System: Seen on Windows and MacOS

Description

Index templates created by Beats are still specifying an ignore_above setting for wildcard fields.

Example from the winlogbeat-8.0.0-rc1 template on the process.command_line field:

            "command_line": {
              "ignore_above": 1024,
              "type": "wildcard",
              "fields": {
                "text": {
                  "type": "match_only_text"
                }
              }

Long field values are not being indexed as expected with wildcard. Kibana confirms:

Screen Shot 2022-01-28 at 3 21 14 PM

Keyword fields include an ignore_above param in their ECS field definitions, but wildcard fields do not:

https://github.com/elastic/beats/blob/8.0/libbeat/_meta/fields.ecs.yml#L4443-L4454

Steps to Reproduce:

  • Install Beats 8.0.0-rc1 with default settings and run setup to install default index templates.
  • Index events with values > 1024 in wildcard fields.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@andrewkroh andrewkroh added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Jan 31, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@adriansr
Copy link
Contributor

adriansr commented Mar 2, 2022

@ebeahan

This is caused by the following code in libbeat:

switch f.IgnoreAbove {
case 0: // Use libbeat default
property["ignore_above"] = defaultIgnoreAbove
case -1: // Use ES default
default: // Use user value
property["ignore_above"] = f.IgnoreAbove
}

Which is treating a missing (zero) value as a request to use libbeat's default (1024). This behavior was copied from keyword fields.

Also, I see that event.original (keyword) is suffering from the same problem. ECS doesn't specify an ignore_above which causes libbeat to inject the default 1024 value. I understand we may want to fix this one too.

I wonder what the best approach would be here. Removing this behavior for wildcards only impacts one field outside ECS: x-pack/filebeat/module/threatintel/anomalithreatstream/_meta/fields.yml defines one wildcard field for a custom dataset field.

But for keywords it will be much trickier as they are used everywhere and we have relied on the default being injected for a long time.

This could also be fixed in ECS' Beats generator by forcing an ignore_above: -1 when the setting is missing, to ensure libbeat doesn't overwrite it.

/cc @kvch

@ebeahan
Copy link
Member Author

ebeahan commented Mar 2, 2022

The event.original field isn't indexed already (index: false and doc_values: false), so having an ignore_above value specified isn't necessary.

A big benefit wildcard has over keyword is the indexing of strings larger than the 32,766 byte Lucene max, but users can't gain that benefit if ignore_above is set to ignore string values greater than 1024 chars.

I only suggest we remove ignore_above from type: wildcard. There have been conversations about increasing the ignore_above standard value to higher than 1024, but let's leave that topic out of this discussion.

adriansr added a commit to adriansr/beats that referenced this issue Mar 3, 2022
Modifies libbeat's template processor to stop hardcoding a default
`ignore_above` limit of 1024 on wildcard fields. This behavior was
inherited from keyword fields.

Closes elastic#30096
adriansr added a commit that referenced this issue Mar 7, 2022
Modifies libbeat's template processor to stop hardcoding a default
`ignore_above` limit of 1024 on wildcard fields. This behavior was
inherited from keyword fields.

Closes #30096
mergify bot pushed a commit that referenced this issue Mar 7, 2022
Modifies libbeat's template processor to stop hardcoding a default
`ignore_above` limit of 1024 on wildcard fields. This behavior was
inherited from keyword fields.

Closes #30096

(cherry picked from commit 677229f)

# Conflicts:
#	libbeat/template/processor_test.go
mergify bot pushed a commit that referenced this issue Mar 7, 2022
Modifies libbeat's template processor to stop hardcoding a default
`ignore_above` limit of 1024 on wildcard fields. This behavior was
inherited from keyword fields.

Closes #30096

(cherry picked from commit 677229f)
adriansr added a commit that referenced this issue Mar 7, 2022
#30708)

Modifies libbeat's template processor to stop hardcoding a default
`ignore_above` limit of 1024 on wildcard fields. This behavior was
inherited from keyword fields.

Closes #30096

(cherry picked from commit 677229f)

Co-authored-by: Adrian Serrano <[email protected]>
adriansr added a commit that referenced this issue Mar 7, 2022
Modifies libbeat's template processor to stop hardcoding a default
`ignore_above` limit of 1024 on wildcard fields. This behavior was
inherited from keyword fields.

Closes #30096

(cherry picked from commit 677229f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ecs Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team v8.0.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants