-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EPM] Add mapping field types to index template generation #59894
Conversation
Pinging @elastic/ingest-management (Feature:EPM) |
* move expand stage to expandFields * fix expandFields * add deduplication stage dedupFields
💚 Build SucceededHistory
To update your PR or re-run it, just comment with: |
@skh Personally, I'm not sure I see how useful it is to have a large snapshot with no unit tests. If I'm using the snapshot to understand the mappings logic, it would be difficult. It also seems like a snapshot that big is probably very redundant in the use cases so we just have long tests and bigger files unnecessarily, unless they are all unique use cases? I think we should at least create unit tests similar to these so we know what we are testing for https://github.com/elastic/beats/blob/d9a4c9c240a9820fab15002592e5bb6db318543b/libbeat/template/processor_test.go . I did something similar with the index patterns https://github.com/elastic/kibana/blob/master/x-pack/plugins/ingest_manager/server/services/epm/kibana/index_pattern/install.test.ts. Also, a snapshot test or a unit test for |
@neptunian Good point, I can add more specific tests. The reason I added such a large test file and snapshot was that this contains real data from the |
Changing the base to |
}; | ||
|
||
export function processFields(fields: Fields): Fields { | ||
expandFields(fields); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make expandFields
more functional by not mutating the fields
array like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a left-over of the old processFields
implementation. Would it be acceptable to change it in a follow-up PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* These can result from expandFields when the input contains dotted field | ||
* names that share parts of their hierarchy. | ||
*/ | ||
function dedupFields(fields: Fields): Fields { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there are two fields that have the same name but are not a group type, it throws an error. Is that desirable? I thought this was possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two fields could be acceptable if the are exactly the same. I can amend that.
If they differ in any way (different type, or for some types, different in other properties) that would result in two different mappings for the same field present. Even if ES accepts that, I don't think it is desirable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. If two fields are have the same name and are not group types with fields, I don't think it should throw an error, but continue. If they have the same name, but different types, we should perhaps continue as well? I'm not sure if we should throw an error and stop the install. @ruflin what do you think?
Currently there is an error when trying to install nginx which has a field with the same name but different types with one or both types not being a group type :
UnhandledPromiseRejectionWarning: Error: Can't merge fields {"name":"answers","level":"extended","type":"object","object_type":"keyword","description":"An array containing an object for each answer section returned by the server.\nThe main keys that should be present in these objects are defined by ECS. Records that have more information may contain more keys than what ECS defines.\nNot all DNS data sources give all details about DNS answers. At minimum, answer objects must contain the
data key. If more information is available, map as much of it to ECS as possible, and add any additional fields to the answer objects as custom fields."} and {"name":"answers","type":"group","fields":[{"name":"class","level":"extended","type":"keyword","ignore_above":1024,"description":"The class of DNS data contained in this resource record.","example":"IN"}]}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends. If we create the Kibana index pattern, I think we should ignore some errors as the data might come from different packages and there might be some conflicts (there shouldn't be). If we generate the template for a package, all should fit perfectly. In the best case we detect it already when building the package that something is off.
What should be possible is that the same field shows up twice and one has more details. For example foo
is keyword
and later also as ignore_above: 1024
. But in general we should try to keep things as simple as possible as this kind of inheritance is a problem for index templates today.
My suggestion: Lets ignore errors for now as part of this PR to keep it moving forward, we need all the other changes in this PR. And then we follow up on a discussion on how we handle conflicts exactly.
function dedupFields(fields: Fields): Fields { | ||
const dedupedFields: Fields = []; | ||
fields.forEach(field => { | ||
const found = dedupedFields.find(f => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a more efficient way to dedup instead of having the fields array search itself for each field. Could you create an object that checks if a key exists instead or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an issue with the size of the input this function gets? It is at most the content of any one fields.yml
file from a dataset in a package.
If it is an issue, can we change it in a follow-up PR? I can open an issue so we don't forget.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not the input length (fields), its that its searching another list (dedupedFields) for each field to check if its duplicated. Since this is only a single package it's not a big deal, but having another PR would be ideal where we use an object and set each field name as a property of the object and then for each field check if it's been set (instead of looping through the dupes array).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but dedupedFields
only can grow to the length of fields
, so if I'm not mistaken this is still O(n^2), which is not terrible with the data we have in the packages.
I'll happily open another issue to fix this soon, but I don't think it should be a blocker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with keeping it, so long as we have other unit tests. It's just that it doesn't necessarily prove that everything is correct as every possible use case is probably not met in the system package? I guess I see it as more of a backup test unless the snapshot illustrated distinct readable cases. |
Obsoleted by #60266 |
❗️DO NOT MERGE❗️ until #59376 is merged. Change this PR's base to master once it is.
Summary
Mostly implements #55865
flattens all fields (instead of expanding all fields, as beats does) after discussion with @ruflinpath
pointing to a non-existing fieldobject
data typefields/field.ts
andkibana/index_pattern/install.ts
which I kept because it still might divergeHow to test this?
GET _cat/templates
, thenGET /_template/$TEMPLATE_NAME
). The relevant templates arelogs-$PACKAGE.$DATASET
andmetrics-$PACKAGE.$DATASET
, so e.g.metrics-system.process
server/services/epm/elasticsearch/template/__snapshots__/template.test.ts.snap
, especially the snapshot for thesystem.yml
testserver/services/epm/fields/__snapshots__/field.test.ts.snap
might also be interesting, as it shows the intermediate data structure afterprocessFields()
, but before the transformation into themappings
structure.