-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ECS tooling rewrite brought about by the need to reuse field sets as a different name #864
Conversation
…d of 'top level').
Bunch of stuff moving to new script schema_processor.py
Anything can be merged over other fields. Here's not the place to enforce good ECS citizenry.
My brain can't deal with Python list comprehensions
I'm not super happy with the way I'm currently fixing the tracing fields getting mangled. I would have preferred a change that didn't create so many changes in To illustrate, where we used to have client:
description: 'A client is defined ...'
fields:
address: # <-- contextual name
name: address
flat_name: client.address
type: keyword Now we have this instead: client:
description: 'A client is defined ...'
fields:
client.address: # <-- full name
name: address
flat_name: client.address
type: keyword I took this approach because the current attribute The new approach of adding I'm open to suggestions on how we can solve this better, if there's any :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple more observations, asking more-so for my own better understanding of the changes.
One more use case that throws a wrench in things...in our custom endpoint schemas we have a Since this PR isn't making that change yet it's not a blocker but something to consider since that change is coming up soon. Could we implement a flag that determines whether self-reused fieldsets get carried with the fieldset in regular reuse? |
Haha I had been wondering if this assumption would hold. Seems like it doesn't 😂 There's no inherent reason to prevent this wholesale. The two cases that currently make use of self-nestings fall on both sides of the question:
So I'm open to adding an option to handle whether the self-nestings follow along. Another way we may be able to work around this on the endpoint side is by you explicitly reusing twice, like this: - name: process
...
reusable:
expected:
- at: target
as: process
- at: target.process
as: parent Right now this would probably break, as I've been using the "nest as" notation exclusively for self-nestings. So the code makes assumptions in this direction for now. But the semantics of the schema DSL make it sensible to define things like this. We could adjust the tools to support this as well. So these are two ways we can make this work, I think. Happy to go either way. |
Ah I didn't think about reusing it twice, that's a good idea and probably the simplest and most versatile solution. |
FYI 2 of the branches for follow-up PRs (once this is merged), if you're curious:
I haven't started work on the 3rd PR to put in place the various improvements I've noted along the way. |
@marshallmain The code may still need adjustments to make it work, though. |
@marshallmain Are there any blockers for you here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - no additional comments or outstanding issues from my end.
@webmat I was recently working on go struct generation using the
It would make parsing the file more straight forward, and since every field additionally has the Don't feel blocked by my comments, just some thoughts and feedback around working with the nested fields. |
@simitt Funny you ask that. First, to directly answer your question of documenting the intermediate parent fields, this is already supported. We do it for So if you need to do this to add context on a parent field in your custom fields, or in fields you'd like to submit as a PR to ECS, go ahead! It works 🙂 As an aside, the format you describe is precisely the format of the new git-ignored file introduced by this PR, at For now I'm holding off on publishing a third official representation for the fields. It seems like 3 would be too much. Although I'm curious if other folks also think it would be good to replace ecs_nested.yml with this new format. I think ecs_nested is easier to parse in that there's only 2 levels of details, the field sets details, and a simple list of all of their fields. On the other hand, when building some of the artifacts like the Elasticsearch template, we do have to re-create the deeply nested structure again, to match ES' way of defining all the fields 😂 So we're not adding it for now, but if people think there's a need for that, let us know 🙂 |
Thanks for the feedback everyone |
This PR introduces a significant rewrite of the ECS tooling. It's ultimately meant to allow us to reuse field sets as a different name inside themselves, which was not possible before. Examples of this are reusing
process
atprocess.parent
, or the upcoming ways to capture multiple users in an event (#809).This PR does not use the new features to fix how
process.parent.*
fields are currently defined (they're currently explicitly duplicated, and there are subtle mistakes), nor does it introduce the new user fields. The new features are however thoroughly tested in the unit tests.The Plan
Since this was a significant rewrite, I've tried to reduce the amount of changes to ECS per se to a minimum.
This way reviewers can review the generated files (the asciidoc, and ecs_nested.yml), in order to confirm that things are still working as expected. Of course a deep review of the code is welcome for those who have time, as well 😉
I will open up follow-up PRs to this one:
TODO
. Additional cleanup and improvements noticed when working on #864 #871process.parent.*
fields with the new nesting mechanism instead of manual duplication Define process.parent via the new field reuse mechanism #868user.*
fields to represent multiple users in an event, as described in the [ECS] Multiple users in an event proposal #809 proposal Define fields to allow representing multiple users in an event. #869TODO
I'm opening this PR as a draft because there's still one thing that must be fixed before this is fully ready. However I wanted to open it as early as possible, to leave more time for review.
root=true
just like Base fields (meaning they're not nested undertracing.
). However it contains two nested ID fields,trace.id
andtransaction.id
. This rewrite didn't account for this, and one of them gets overwritten.Major changes
schemas/*.yml
)--include
result in schema definitions that are "good ECS citizens".--include
CLI flag will soon be used by the new RFC process, to build ECS artifacts that include early stage (unofficial) major additions. See Document ECS RFC process #833 for the first step of introducing this new process.schemas/README.md
) was updated significantly.generated/ecs/ecs_nested.yml
generated/ecs/ecs_nested.yml
, the format of the arrayreusable.expected
is changing in a breaking way. It used to be an array of flat names (e.g.client.user
,server.user
...). Now it's an array of objects ({at:client, as:user, full: client.user}
,{at:user, as:target, full: user.target}
). Programs that consume this file will need to adjust to this new format, in order to be able to consume this file as of ECS 1.6.0. This is necessary to support field set reuse as a different name.source
field set, if you think of reusinguser
atsource.user
) now have a new attribute calledreused_here
, listing all field sets that are, well, reused here :-)nestings
which was a list of flat names and is not enough to capture reuse as a different name.schemas/*.yml
{at: process, as: parent}
or{at: user, as: target}
. See comments inschemas/process.yml
andschemas/user.yml
.reusable.order
, to ensure chained reuses happen in the right order (group => user, then user => many places). We used to do reuse by reference, which caused problems and added complexity.order
is used exactly once in all of ECS, forgroup
. So this better approach sounded overkill.generated/ecs/
that contains the deeply nested representation of fields. This is a representation we've had to develop a while ago, to allow field merging. But it's not meant to be published. So the file is generated atgenerated/ecs/ecs.yml
, but it's ignored by git. Its purpose is to help us debug the code, not to publish a third file format :-)reused_here
made this possible.generator.py
contains almost no logic now. All of the logic is now in new scripts with narrower focus. This makes them much easier to test, without massive test setups. The new scripts are inscripts/schema/*.py
. They also supersedescripts/schema_reader.py
which is now gone.Caveats
generated/ecs/ecs_nested.yml
: chained reuses don't populatenestings
all the way anymore. Example:group
is reused inuser
, thenuser
inclient
: client'snestings
attribute will no longer listclient.user.group
.