-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add daemonset name when enriching pod logs or metrics #25816
Comments
Pinging @elastic/integrations (Team:Integrations) |
Thanks for opening this @eedugon. Your point looks really valid about controller's information. We can definitely add information about Daemonsets and even generalise that for other other types of resources. This bring us to the suggestion you mention about |
@ChrsMark could you elaborate on the potential compatibility in the UI? Where do you see it could break? |
My concern is mostly about if we could/should replace existing fields like In addition, as I mentioned in the previous comment maybe we can just add those 2 fields additionally to what we have and avoid replacing what we have so far with the following approach: Let's take Pod as an example. We can populate its metadata with |
We can coordinate this change with the UI.
Firstly, I think the different types of resources are called Workloads in Kubernetes, so it's best we maintain consistency. Workloads can be represented by e.g. As you can see, currently there's no workload related field in use that can break the UI. I can easily imagine that the introduction of workloads would increase the usefulness of the Kubernetes inventory. For example, by introducing Kubernetes Workloads as another inventory along with Kubernetes Pods, Kubernetes Nodes that provides a workload related perspective, with the option to group the inventory by workload type. This way the use will quickly see what types and how many workloads are running on a given Kubernetes cluster. In any case we will coordinate any change to the ingested data with the UI to make sure everything works as it should.
I think by just adding a specific workload name we won't capture the fact that |
Thanks for the feedback @sorantis ! Summarising what we would like to do here:
@jsoriano feel free to share any comment/feedback :). |
TLDR; +1 to add the daemonset fields. For the controller/workload fields, on a first look it looks tempting to do that, but after thinking a bit more I wonder if this would be so useful, specially thinking that:
We have to take into account that there can be a chain of controllers for a single resource, for instance when a We will have to decide what to store as workload type and name for any given resource that can have a chain of controllers. If we leave CRDs apart by now, we probably want to store the top-level controller in these fields, but there can be still tricky cases. A use case I would like to remark is a failing upgrade of a Deployment. When a deployment is modified, it creates a new ReplicaSet to start containers with it, while old ones are stopped at the configured pace. If there are problems, operators will probably want to check metrics and logs of both ReplicaSets separatedly, to compare or check what can be going wrong in the new one. Even if the upgrade doesn't fail, on big deployments operators may want to compare the metrics to see that everything goes well in the new version and revert if needed. So, Deployments are more useful in general and for inventory purpouses, but ReplicaSets are also useful for certain operations, or to compare the behaviour of different deployed versions. Another point is about leveraging this in the UI, there is a limited set of Workloads (~4), and they have different nature, so we may still want to consume their data differently. For example for Deployments and ReplicaSets you want to see the health of the instances, but for CronJobs/Jobs you are more interested on a list of the last executions and their result. If implemented, this will probably lead to different UIs, and then it would be probably ok to have this in different fields as it is now.
I don't think we can do that, we would need different aliases for data stored in the same index/data stream, what is not possible. |
@jsoriano , i don't think we have to replicate exactly the data model used in Kubernetes in terms of In my view, the majority of users don't create Regarding CRDs support I guess that's a totally different scope, sorry for having brought the attention here. Probably that has much more implications than what I thought. In my view, consolidating (even if we have to duplicate some information) the potential deployment, statefulset, daemonset, job, etc, Of course I agree with your technical view in low level where the relations are more complex, I'm just suggesting to offer a simplification of those relations trying to find the most common use cases. Also I agree that different workloads have different nature and it might be insteresting to have dedicated visualizations per workload type. In such case the consolidation of fields wouldn't add much benefit, I totally agree. I was more focused on |
Just thinking out loud now :) We could have:
For example an Elasticsearch pod could be showed as part of an |
Could you describe some use cases where this would be important in visualizations focused on pods? I see it can be useful to see what was the workload that controls the pod, but I don't see under what use case this can be so relevant. kubectl doesn't explicity show this information in commands focused on pods. On the other hand I see several use cases where having specific visualizations of specific workloads can be really useful:
This is a bit the "pets vs. cattle" thing, pods would be the pets and workloads would be the cattle, and you are usually more interested on the cattle point of view. In this case each kind cattle has its own particularities.
No worries, this is a very interesting discussion 🙂 Maybe we can create a different meta issue to add the metadata of other controllers/workload types, and keep here this discussion about adding common workload fields.
I think that keeping separated fields will be needed at least for the Deployment/ReplicaSet case. Even if ReplicaSet is a bit of an implementation detail in Kubernetes, and not usually created on its own, they allows to distinguish one deployed version from others, what is useful to monitor rolling upgrades, or to compare historical performance of different versions of the same deployment (other fields could be used for this too, but this one is the native way to do it in Kubernetes). Regarding forcing us to add extra fields, in any case when we want to support a new type we need to dedicate some effort to collect it and test it, adding the extra field would be a little extra. Depending on how the workload name and type would be collected, it would require additional effort in any case, for example the Deployment cannot be obtained directly from the Pod, you need to obtain it from the ReplicaSet. There are not so many types of workloads to consider having to add a single field or not an advantage. If this were a completely generic thing, I would totally agree with having generic fields. |
I'm +1 on adding In this regard, I think we can close this one when #26808 is merged and continue with the follow up issue. @jsoriano what do you think? @MichaelKatsoulis if we agree on this, could you please take care of the follow up issue so as to not lose the discussion around workload fields? |
SGTM, thanks @ChrsMark and @MichaelKatsoulis! |
I have created #26817 as a follow up issue |
Describe the enhancement:
When using add_kubernetes_metadata we populate a lot of fields, some of them related with the
controller
of the pod, like :kubernetes.deployment.name
(implemented in #23610)kubernetes.statefulset.name
It would be interesting to enrich the documents of pods belonging to daemonsets with something like
kubernetes.daemonset.name
Describe a specific use case for the enhancement or feature:
Being able to create visualizations or filter logs based on the daemonset name.
Considering that there are other
kinds
of resources where this might be beneficial (jobs
,cronjobs
or even CRDs), a different approach could be storing the fields:kubernetes.controller.kind
: Values could be "Deployment", "StatefulSet", "DaemonSet", etc.kubernetes.controller.name
: the name of the resource controlling this pod (when applicable).That change in the schema would also allow much flexible and powerful visualizations because having data of the same nature (who controls the pod) in different fields is challenging.
For your consideration ;)
cc: @ChrsMark
The text was updated successfully, but these errors were encountered: