[EPM] Add mapping field types to index template generation #59894

skh · 2020-03-11T13:43:54Z

❗️DO NOT MERGE❗️ until #59376 is merged. Change this PR's base to master once it is.

Summary

Mostly implements #55865

mostly follows the logic in https://github.com/elastic/beats/blob/master/libbeat/template/processor.go how to handle various data types
does not, however, care about backward compatibility with anything earlier than 7.6, which simplifies the logic quite a bit
~~flattens all fields (instead of expanding all fields, as beats does) after discussion with @ruflin~~
expands (un-flattens) and deduplicates all fields
silently removes alias fields with a path pointing to a non-existing field
still missing:
- handling of object data type
- integration tests
- all other settings in the generated index templates need to be reviewed
there is some code duplication now between fields/field.ts and kibana/index_pattern/install.ts which I kept because it still might diverge

How to test this?

On a fresh elasticsearch setup (i.e. with system & base package not yet installed) open the IngestManager UI. The initial package installation, including index template generation, should succeed without error.
Installation of all other available packages should likewise succeed without error.
Inspect the mappings in the generated index templates in Kibana's dev console (GET _cat/templates, then GET /_template/$TEMPLATE_NAME). The relevant templates are logs-$PACKAGE.$DATASET and metrics-$PACKAGE.$DATASET, so e.g. metrics-system.process
Inspect server/services/epm/elasticsearch/template/__snapshots__/template.test.ts.snap, especially the snapshot for the system.yml test
server/services/epm/fields/__snapshots__/field.test.ts.snap might also be interesting, as it shows the intermediate data structure after processFields(), but before the transformation into the mappings structure.

elasticmachine · 2020-03-11T13:44:43Z

Pinging @elastic/ingest-management (Feature:EPM)

* move expand stage to expandFields * fix expandFields * add deduplication stage dedupFields

kibanamachine · 2020-03-12T18:34:13Z

💚 Build Succeeded

continuous-integration/kibana-ci/pull-request
Commit: 8ed6eb1

History

💚 Build #32757 succeeded 2c6e6e4
💛 Build #32751 was flaky dd117de
💚 Build #32705 succeeded 75ac4c1
💔 Build #32407 failed 95d9d66

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

neptunian · 2020-03-13T13:17:13Z

@skh Personally, I'm not sure I see how useful it is to have a large snapshot with no unit tests. If I'm using the snapshot to understand the mappings logic, it would be difficult. It also seems like a snapshot that big is probably very redundant in the use cases so we just have long tests and bigger files unnecessarily, unless they are all unique use cases? I think we should at least create unit tests similar to these so we know what we are testing for https://github.com/elastic/beats/blob/d9a4c9c240a9820fab15002592e5bb6db318543b/libbeat/template/processor_test.go . I did something similar with the index patterns https://github.com/elastic/kibana/blob/master/x-pack/plugins/ingest_manager/server/services/epm/kibana/index_pattern/install.test.ts. Also, a snapshot test or a unit test for expandFields, dedupFields, validateAliasFields and getField seems like it would be pretty straightforward.

skh · 2020-03-13T14:04:37Z

@neptunian Good point, I can add more specific tests.

The reason I added such a large test file and snapshot was that this contains real data from the system package, which was very useful during development. It will, however, grow stale over time and lose usefulness so I'm fine with replacing it with something else.

skh · 2020-03-13T14:27:00Z

Changing the base to master looked messy so I changed it back. Please continue reviewing, I will NOT merge this into feature-ingest, but I might need to create a new branch from master and cherry-pick my changes.

neptunian · 2020-03-13T19:51:16Z

x-pack/plugins/ingest_manager/server/services/epm/fields/field.ts

+};
+
+export function processFields(fields: Fields): Fields {
+  expandFields(fields);


can we make expandFields more functional by not mutating the fields array like this?

This is a left-over of the old processFields implementation. Would it be acceptable to change it in a follow-up PR?

Yes, please.

neptunian · 2020-03-13T21:44:49Z

x-pack/plugins/ingest_manager/server/services/epm/fields/field.ts

+ * These can result from expandFields when the input contains dotted field
+ * names that share parts of their hierarchy.
+ */
+function dedupFields(fields: Fields): Fields {


if there are two fields that have the same name but are not a group type, it throws an error. Is that desirable? I thought this was possible.

The two fields could be acceptable if the are exactly the same. I can amend that.

If they differ in any way (different type, or for some types, different in other properties) that would result in two different mappings for the same field present. Even if ES accepts that, I don't think it is desirable.

Okay. If two fields are have the same name and are not group types with fields, I don't think it should throw an error, but continue. If they have the same name, but different types, we should perhaps continue as well? I'm not sure if we should throw an error and stop the install. @ruflin what do you think?

Currently there is an error when trying to install nginx which has a field with the same name but different types with one or both types not being a group type :

UnhandledPromiseRejectionWarning: Error: Can't merge fields {"name":"answers","level":"extended","type":"object","object_type":"keyword","description":"An array containing an object for each answer section returned by the server.\nThe main keys that should be present in these objects are defined by ECS. Records that have more information may contain more keys than what ECS defines.\nNot all DNS data sources give all details about DNS answers. At minimum, answer objects must contain the data key. If more information is available, map as much of it to ECS as possible, and add any additional fields to the answer objects as custom fields."} and {"name":"answers","type":"group","fields":[{"name":"class","level":"extended","type":"keyword","ignore_above":1024,"description":"The class of DNS data contained in this resource record.","example":"IN"}]}

It depends. If we create the Kibana index pattern, I think we should ignore some errors as the data might come from different packages and there might be some conflicts (there shouldn't be). If we generate the template for a package, all should fit perfectly. In the best case we detect it already when building the package that something is off.

What should be possible is that the same field shows up twice and one has more details. For example foo is keyword and later also as ignore_above: 1024. But in general we should try to keep things as simple as possible as this kind of inheritance is a problem for index templates today.

My suggestion: Lets ignore errors for now as part of this PR to keep it moving forward, we need all the other changes in this PR. And then we follow up on a discussion on how we handle conflicts exactly.

neptunian · 2020-03-13T21:49:58Z

x-pack/plugins/ingest_manager/server/services/epm/fields/field.ts

+function dedupFields(fields: Fields): Fields {
+  const dedupedFields: Fields = [];
+  fields.forEach(field => {
+    const found = dedupedFields.find(f => {


Is there a more efficient way to dedup instead of having the fields array search itself for each field. Could you create an object that checks if a key exists instead or something like that?

Is this an issue with the size of the input this function gets? It is at most the content of any one fields.yml file from a dataset in a package.

If it is an issue, can we change it in a follow-up PR? I can open an issue so we don't forget.

It's not the input length (fields), its that its searching another list (dedupedFields) for each field to check if its duplicated. Since this is only a single package it's not a big deal, but having another PR would be ideal where we use an object and set each field name as a property of the object and then for each field check if it's been set (instead of looping through the dupes array).

Yes, but dedupedFields only can grow to the length of fields, so if I'm not mistaken this is still O(n^2), which is not terrible with the data we have in the packages.

I'll happily open another issue to fix this soon, but I don't think it should be a blocker.

neptunian · 2020-03-13T22:06:59Z

@neptunian Good point, I can add more specific tests.

The reason I added such a large test file and snapshot was that this contains real data from the system package, which was very useful during development. It will, however, grow stale over time and lose usefulness so I'm fine with replacing it with something else.

I'm okay with keeping it, so long as we have other unit tests. It's just that it doesn't necessarily prove that everything is correct as every possible use case is probably not met in the system package? I guess I see it as more of a backup test unless the snapshot illustrated distinct readable cases.

skh · 2020-03-16T14:14:54Z

Obsoleted by #60266

skh added 3 commits March 11, 2020 14:24

Add properties needed for index templates to Field

e0a4571

Add data type handling to template generation

6dd96df

Adjust tests

95d9d66

skh changed the title ~~55865 index template types~~ [EPM] Add mapping field types to index template generation Mar 11, 2020

skh added the Feature:EPM Fleet team's Elastic Package Manager (aka Integrations) project label Mar 11, 2020

Update fields test snapshots

75ac4c1

ruflin assigned skh Mar 12, 2020

skh added 5 commits March 12, 2020 13:12

Remove duplicate fields from test file

4250b87

Add test cases

b067dd8

Enhance processFields

dd117de

* move expand stage to expandFields * fix expandFields * add deduplication stage dedupFields

Use processField() to preprocess fields

2c6e6e4

Remove alias fields with invalid path

432bbb0

skh marked this pull request as ready for review March 12, 2020 16:35

Remove obsolete code.

b8a2284

skh requested a review from neptunian March 12, 2020 16:54

Fix documentation.

8ed6eb1

skh changed the base branch from feature-ingest to master March 13, 2020 14:22

skh changed the base branch from master to feature-ingest March 13, 2020 14:24

neptunian reviewed Mar 13, 2020

View reviewed changes

skh closed this Mar 16, 2020

skh mentioned this pull request Mar 18, 2020

[EPM] Add mapping field types to index template generation v2 #60266

Merged

jen-huang added the Team:Fleet Team label for Observability Data Collection Fleet team label Mar 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EPM] Add mapping field types to index template generation #59894

[EPM] Add mapping field types to index template generation #59894

skh commented Mar 11, 2020 •

edited by jen-huang

Loading

elasticmachine commented Mar 11, 2020

kibanamachine commented Mar 12, 2020

neptunian commented Mar 13, 2020 •

edited

Loading

skh commented Mar 13, 2020 •

edited

Loading

skh commented Mar 13, 2020 •

edited

Loading

neptunian Mar 13, 2020 •

edited

Loading

skh Mar 16, 2020

neptunian Mar 16, 2020

skh Mar 18, 2020

neptunian Mar 13, 2020 •

edited

Loading

skh Mar 16, 2020

neptunian Mar 17, 2020 •

edited

Loading

ruflin Mar 17, 2020

neptunian Mar 13, 2020 •

edited

Loading

skh Mar 16, 2020

neptunian Mar 16, 2020 •

edited

Loading

skh Mar 18, 2020

skh Mar 18, 2020

neptunian commented Mar 13, 2020

skh commented Mar 16, 2020

[EPM] Add mapping field types to index template generation #59894

[EPM] Add mapping field types to index template generation #59894

Conversation

skh commented Mar 11, 2020 • edited by jen-huang Loading

Summary

How to test this?

elasticmachine commented Mar 11, 2020

kibanamachine commented Mar 12, 2020

💚 Build Succeeded

History

neptunian commented Mar 13, 2020 • edited Loading

skh commented Mar 13, 2020 • edited Loading

skh commented Mar 13, 2020 • edited Loading

neptunian Mar 13, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neptunian Mar 13, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neptunian Mar 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neptunian Mar 13, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neptunian Mar 16, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neptunian commented Mar 13, 2020

skh commented Mar 16, 2020

skh commented Mar 11, 2020 •

edited by jen-huang

Loading

neptunian commented Mar 13, 2020 •

edited

Loading

skh commented Mar 13, 2020 •

edited

Loading

skh commented Mar 13, 2020 •

edited

Loading

neptunian Mar 13, 2020 •

edited

Loading

neptunian Mar 13, 2020 •

edited

Loading

neptunian Mar 17, 2020 •

edited

Loading

neptunian Mar 13, 2020 •

edited

Loading

neptunian Mar 16, 2020 •

edited

Loading