Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate fields skipped in type assertion #147

Open
mtojek opened this issue Oct 26, 2020 · 9 comments
Open

Validate fields skipped in type assertion #147

mtojek opened this issue Oct 26, 2020 · 9 comments
Labels
enhancement New feature or request

Comments

@mtojek
Copy link
Contributor

mtojek commented Oct 26, 2020

Follow-up on:
#143 (comment)
#143 (comment)

Currently we skip validation for generic fields present in every (most?) document collected by Elastic Agents but undefined in fields.yml files (not part of integrations). Sample fields: agent.*, elastic_agent.*.

The goal is to introduce validation for the special fields as well.

@mtojek
Copy link
Contributor Author

mtojek commented Oct 27, 2020

See: https://beats-ci.elastic.co/blue/organizations/jenkins/Beats%2Felastic-package/detail/PR-143/35/tests/

one or more errors found in documents stored in metrics-apache.status-ep data stream: [0] field "cloud.availability_zone" is not defined
[1] field "cloud.instance.id" is not defined
[2] field "cloud.instance.name" is not defined
[3] field "cloud.machine.type" is not defined
[4] field "cloud.project.id" is not defined
[5] field "cloud.provider" is not defined
[6] field "event.duration" is not defined
[7] field "event.module" is not defined
[8] field "host.architecture" is not defined
[9] field "host.containerized" is not defined
[10] field "host.hostname" is not defined
[11] field "host.ip" is not defined
[12] field "host.mac" is not defined
[13] field "host.name" is not defined
[14] field "host.os.codename" is not defined
[15] field "host.os.family" is not defined
[16] field "host.os.kernel" is not defined
[17] field "host.os.name" is not defined
[18] field "host.os.platform" is not defined
[19] field "host.os.version" is not defined
[20] field "service.address" is not defined
one or more errors found in documents stored in logs-apache.access-ep data stream: [0] field "cloud.instance.id" is not defined
[1] field "cloud.instance.name" is not defined
[2] field "cloud.provider" is not defined
[3] field "event.dataset" is not defined
[4] field "event.outcome" is not defined
[5] field "host.architecture" is not defined
[6] field "host.containerized" is not defined
[7] field "host.hostname" is not defined
[8] field "host.ip" is not defined
[9] field "host.mac" is not defined
[10] field "host.name" is not defined
[11] field "host.os.codename" is not defined
[12] field "host.os.family" is not defined
[13] field "host.os.kernel" is not defined
[14] field "host.os.name" is not defined
[15] field "host.os.platform" is not defined
[16] field "host.os.version" is not defined
[17] field "input.type" is not defined
one or more errors found in documents stored in logs-apache.error-ep data stream: [0] field "cloud.availability_zone" is not defined
[1] field "cloud.instance.id" is not defined
[2] field "cloud.instance.name" is not defined
[3] field "cloud.machine.type" is not defined
[4] field "cloud.project.id" is not defined
[5] field "cloud.provider" is not defined
[6] field "event.category" is not defined
[7] field "event.kind" is not defined
[8] field "event.timezone" is not defined
[9] field "event.type" is not defined
[10] field "host.architecture" is not defined
[11] field "host.containerized" is not defined
[12] field "host.hostname" is not defined
[13] field "host.ip" is not defined
[14] field "host.mac" is not defined
[15] field "host.name" is not defined
[16] field "host.os.codename" is not defined
[17] field "host.os.family" is not defined
[18] field "host.os.kernel" is not defined
[19] field "host.os.name" is not defined
[20] field "host.os.platform" is not defined
[21] field "host.os.version" is not defined
[22] field "input.type" is not defined

@mtojek
Copy link
Contributor Author

mtojek commented May 28, 2021

Looking at this issue, it seems to be a good candidate to implement while we will introduce support for processors' fields in elastic/package-spec#63 .

@mtojek
Copy link
Contributor Author

mtojek commented Jul 19, 2021

Related: elastic/package-spec#199

@mtojek
Copy link
Contributor Author

mtojek commented Nov 19, 2021

This issue is relatively old now, so I assume it isn't really important. I will close it for now.

@andrewkroh
Copy link
Member

I think we need to revisit the check for event.*. There have been many reported issues due to a lack of mappings for various event fields. I did not realize this wasn't being validated.

isFieldFamilyMatching("event", key) || // too many common fields

andrewkroh added a commit to andrewkroh/integrations that referenced this issue Mar 9, 2022
Several event field mappings were missing (tests do not validate event.* as per elastic/elastic-package#147).
After adding those mappings some of the data types didn't match so I added a few convert processors.

I modified the pipeline tests to use simulated data from the Beats decode_cef processor.

Fixes: elastic#2805
andrewkroh added a commit to elastic/integrations that referenced this issue Mar 10, 2022
Several event field mappings were missing (tests do not validate event.* as per elastic/elastic-package#147).
After adding those mappings some of the data types didn't match so I added a few convert processors.

I modified the pipeline tests to use simulated data from the Beats decode_cef processor.

Fixes: #2805
@mtojek
Copy link
Contributor Author

mtojek commented Mar 10, 2022

I think we need to revisit the check for event.*. There have been many reported issues due to a lack of mappings for various event fields. I did not realize this wasn't being validated.

I remember that we disabled them because it means that we have to add more exactly the same fields to every data stream. We tried to figure out an option to introduce common fields and the topic was deprioritized. I agree that's a high time to refresh the discussion :)

cc @jsoriano @jlind23

andrewkroh added a commit to andrewkroh/elastic-package that referenced this issue Mar 22, 2022
This enables field validation for `event.*`. There are exemptions for fields contained in the
Fleet managed .fleet_component_template that is added to all data streams. This template
contains event.ingested and event.agent_id_status.

Relates elastic#147
eyalkraft pushed a commit to build-security/integrations that referenced this issue Mar 30, 2022
Several event field mappings were missing (tests do not validate event.* as per elastic/elastic-package#147).
After adding those mappings some of the data types didn't match so I added a few convert processors.

I modified the pipeline tests to use simulated data from the Beats decode_cef processor.

Fixes: elastic#2805
@jsoriano
Copy link
Member

Reopening as the skip is still in the code, and there seems to be related issues that could be detected by the skipped validation.

@jsoriano jsoriano reopened this Apr 13, 2022
andrewkroh added a commit to andrewkroh/elastic-package that referenced this issue Apr 14, 2022
This enables field validation for `event.*`. There are exemptions for fields contained in the
Fleet managed .fleet_component_template that is added to all data streams. This template
contains event.ingested and event.agent_id_status.

Relates elastic#147
andrewkroh added a commit to andrewkroh/elastic-package that referenced this issue Apr 14, 2022
This enables field validation for `event.*`. There are exemptions for fields contained in the
Fleet managed .fleet_component_template that is added to all data streams. This template
contains event.ingested and event.agent_id_status.

Relates elastic#147
@andrewkroh
Copy link
Member

I wanted to mention that there are two fields under event that we probably do want to skip validating because Fleet automatically adds a mapping for them. Those are "event.agent_id_status" and "event.ingested".

Perhaps something like this:

diff --git a/internal/fields/validate.go b/internal/fields/validate.go
index afed354..f149e57 100644
--- a/internal/fields/validate.go
+++ b/internal/fields/validate.go
@@ -263,16 +263,29 @@ func skipValidationForField(key string) bool {
 	return isFieldFamilyMatching("agent", key) ||
 		isFieldFamilyMatching("elastic_agent", key) ||
 		isFieldFamilyMatching("cloud", key) || // too many common fields
-		isFieldFamilyMatching("event", key) || // too many common fields
 		isFieldFamilyMatching("host", key) || // too many common fields
 		isFieldFamilyMatching("metricset", key) || // field is deprecated
-		isFieldFamilyMatching("event.module", key) // field is deprecated
+		isFleetManagedMapping(key)
 }
 
 func isFieldFamilyMatching(family, key string) bool {
 	return key == family || strings.HasPrefix(key, family+".")
 }
 
+// isFleetManagedMapping return true if the field is contained in a Fleet
+// managed component template that is added to all data streams.
+//
+// The template is controlled in elastic/kibana at:
+// https://github.com/elastic/kibana/blob/v8.1.0/x-pack/plugins/fleet/server/constants/fleet_es_assets.ts#L24-L39
+func isFleetManagedMapping(key string) bool {
+	switch key {
+	case "event.agent_id_status",
+		"event.ingested":
+		return true
+	}
+	return false
+}
+
 func isFieldTypeFlattened(key string, fieldDefinitions []FieldDefinition) bool {
 	definition := FindElementDefinition(key, fieldDefinitions)
 	return definition != nil && definition.Type == "flattened"

@jsoriano
Copy link
Member

Gave a try to this in the context of elastic/package-spec#399, but I think that we need some kind of centralized schema for data providers (elastic/package-spec#199).
So Fleet can install the mappings for these providers, and elastic-package can also query this data for validations of these fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants