-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ECS] [TSDB] Centralisation of Dimension Fields #5193
Comments
From the context of packages having ownership of service integration team, there exist certain fields that are part of every package and they are potential candidates of becoming dimension fields.
|
When metrics are collected from a resource running in cloud or in a container, below mentioned fields are potential candidates of becoming dimension fields
Should subnet / network name be include ? TBD |
|
CC @tommyers-elastic @gizas @felixbarny for any suggestions/comments on the TSDB ECS dimension fields. Once we close on these ECS fields, we can raise a PR for the same. |
@kruskall has tried to define the dimensions for APM (elastic/apm-server#9730) but quickly hit the dimension limit (elastic/elasticsearch#93564). |
I like the direction elastic/elasticsearch#93564 is taking. An initial shortcut might be to just increase the limit to 32 which could already help. Looking at elastic/apm-server#9730, it seems there is some overlap with the dimensions proposed here but there are also quite a few dimensions which I would argue are potentially unique to apm data. It would be nice if all the default dimensions can be set in ECS (or a common base) but are only used / applied when there is actual data. This goes back to my question: Is the limit reached when a field for the dimension is there or if it is in the mapping itself. If mapping is already enough, will it help to have it in the dynamic template? For each default dimension define, I would like to see us have a note on why it is a dimension. There went a lot of thought into which dimensions to pick and it should be persisted and shared. |
Can we consider the above mentioned list for Service Integration and for Cloud (#5193 (comment)). We may have to expect new common fields added to the above list as we test new scenario. We know, in such cases, explicit Is the above list good enough to work towards preparing RFC in ECS ? |
I think, having |
I assume so, you know best :-)
Is a |
There are no references i could find that says a container.id repeats in a cluster. I understand the scenario you are referring to - the scenario where multiple k8s cluster are provisioned. Cluster name is a good logical segregation in such cases. Reference : https://cloud.google.com/stackdriver/docs/solutions/gke/observing It may then be checked
These may be the questions that can be asked to the owners of GKE / EKS integration team. |
My preference would be not to modify the ecs dimension fields based on an application or cluster technology / deployment architecture. Every technology such a As part of integration enhacement to use TSDB, it is expected these unique fields that represent a resource is identified carefully as dimension field in the integration. |
|
Can you share some details why |
As you rightly said, this is unnecessary. |
Can we summarise the final list of ECS fields which are dimensions below and one-line description for each, providing the rationale? |
When the dimension limit is removed (elastic/elasticsearch#93564), can we just make any non-metric (keyword?) field a dimension by default. I don't see the value of spending our time on finding out what good dimension fields are. Other TSDBs only support two types of fields: metrics and dimensions. Can we just operate under the same mental model? |
Its a good point, in particular, it's not very clear what is the difference between not-dimension meta fields which are keywords vs. dimension fields. TSDB documents will primarily contain a combination of metric fields, dimension fields and meta fields. From a document query point of view, assuming dimension fields and meta fields behave the same. Currently missing any specific details, which links number of dimension fields vs. TSDB size/performance. I am assuming there is a relationship. Hoping someone from ES team, can provide more details on this. In the mean time, we can just continue to annotate dimensions, as is this is the ask for TSDB enablement. |
@martijnvg could you give us some guidance on the impact of having a lot of dimensions, assuming the _tsid is a hash and there are no size restrictions. See also elastic/elasticsearch#93564 (comment). What if any negative consequences do we need to expect if we declare too many dimensions or when making all non-metric fields a dimension by default? Note that this is the default in other TSDBs so if there are negative consequences in ES when treating non-metric fields as dimensions by default, I'd be curious to have your thoughts on whether they're tolerable, and if not, what we could do to minimize the impact so that we can work with ES like with any other TSDB. |
@felixbarny I need to think more about this.
How are keyword labels modelled in this model? |
Keyword labels would be mapped as a dimension. By default, everything except actual metrics would be a dimension. |
While we discuss the limitations and the possible future enhancements, i would like to freeze the ecs fields which must be marked as dimension fields.
|
Lets separate immediate changes from future plans. TSDB is to be released soonish and we want to adopt it in integrations to also make sure it all works as expected. This is where we need the list from @agithomas . These are all ECS fields and if we add it to ECS, all integrations will have these dimensions by default as soon as ECS is updated. Everything using ECS will have a field annotated as dimension from there on, but as long as TSDB is not enabled, it wont have any effect. @agithomas List LGTM Then there is the mid term and long term and I agree with @felixbarny , ideally we should not have to think about dimensions at all but this will likely not happen immediately. I suggest to keep the "no dimension" discussion in the Elasticsearch issue. |
Based on the recommendations, The new list will be
|
@lalit-satapathy ,can you please help by approving , if there are no further queries? |
Lets update the TSDB migration document to change from host.ip to host.name |
The above
The above list may be needed when more fields are added to the ECS & used. For example - details of the subnet (for on-prem infrastructure). The above list will be used to prepare the RFC-1 of RFC-0 @ruflin , @felixbarny , @martijnvg Kindly help by reviewing the new list mentioned here |
I stumbled over the following line.
If 2 agents are monitoring the same resource, shouldn't it be the same time serie? Can you provide an example on where this happens, this likely clarifies things. |
We can have one policy deployed on any number of agents. This permits two agents monitoring same resource. This may be done intentionally or accidentally by the customer. Case 1: If intentionally, it is important that agent.id should be part of a dimension field so that data can be recorded as separate timeseries. A valid usecase i can think here is - a standalone elastic-agent may be running on single node monitoring several infra assets. The admin on understanding a problem related to disk or over-utilisation choose to migrate to a different system. As part of cut-over, during maintenance window, it is important that the user verifies data received from new agent is consistent . Without including agent.id, the data in ES from new agent will be recorded in staggered manner. Case 2: If agent policy is installed accidentally on more than on agents, is elasticsearch expected to do the de-duplication making use of dimension field constraint (not a feature) of timeseries database ? We think, It may be best that a datastore is a true representation of data received from the upstream system, in this case integration packages. |
At the moment, I would rather opt for too many then too few dimensions so I'm good with the approach. |
…ment) Signed-off-by: Tetiana Kravchenko <[email protected]>
Hi! We just realized that we haven't looked into this issue in a while. We're sorry! We're labeling this issue as |
Scope
The text was updated successfully, but these errors were encountered: