-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change stream.* fields to dataset.* fields #482
Comments
👍 |
Yeah I like this new proposal better. |
I believe there are at least 2 queries in Ingest Manager that currently use the stream fields. |
I'll leave this open until Monday 2020-06-01 and if by then no objections are raised, will proceed with the implementation. |
@ruflin one key question is whether dataset.type will have separate values for logs and events? I think in ECS there is event.kind with a value of event but not log. If we align the values with event.kind as you mentioned, it might make sense to also open a PR to add log to event.kind. |
There's currently no plan to add the value "log" to |
@mostlyjason Good point, I missed that |
@ruflin what determines the allowed values for |
@mostlyjason @ruflin I don't think we should not allow everything under that field, we could expand it layer. I would expect the allowed types be listed as part of the indexing strategy document? |
We will enforce it in the validation code of the package-registry. So if a package creator uses a value not allowed, the package is invalid and cannot be published. ++ on publishing it. We will figure it out where, potentially ECS ;-) |
Closing this issue as I think we should move forward here. Follow up implementation issue can be found here: #491 |
@ruflin Did you forget to actually close the issue? ;-) |
For the new indexing strategy currently the fields used are
stream.type
,stream.dataset
,stream.namespace
. Over the last weeks it showed that these fields might not be optimal so the proposal is to change it todataset.type
,dataset.name
,dataset.namespace
.Note: This issue is in the package registry as at the moment the registry enforces these fields and public but it will have many other places that need update if we move forward with this.
What is the problem with stream.* fields?
stream.*
came initially out of building the Elastic Agent configuration as there we have inputs with streams, and each stream goes to a single dataset. But anyone can use the new indexing strategy so it should not be tied to a specific technology.stream.type
also can be content which is not necessarily a stream. See also [Meta] Add ECS Dataset fields ecs#845logs-nginx.access-default
andlogs-nginx.access-prod
are two different datasets.Based on the above I came to the conclusion that
dataset
should be an object and used for the indexing strategy fields.One alternative that was discussed is using
datastream
instead as eachdataset
is stored in a datastream. But not each datastream is a dataset per this definition and it would attach it again to a specific technology implementation.The other alternative discussed was using existing ECS fields like
event.kind
andevent.dataset
but as the types are different (constant_keyword), this does not work and we will be even more strict on names than currently in these fields. But the idea is that these fields will be closely linked on possible values.Benefits of dataset.*
Using
dataset.*
also solves some existing problems:stream.*
conflicts with an existing docker input field in Filebeat which is a keywordinput.type
in the Elastic Agent config fromdataset.type
. Even if theinput.type
is log, thedataset.type
could bemetrics
if the log file contains metrics.stream
andstreams
.Changes needed
Places to change current stream.* implementation:
This change will likely have no impact on the UI side.
The text was updated successfully, but these errors were encountered: