Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add daemonset name when enriching pod logs or metrics #25816

Closed
eedugon opened this issue May 21, 2021 · 13 comments · Fixed by #26808
Closed

Add daemonset name when enriching pod logs or metrics #25816

eedugon opened this issue May 21, 2021 · 13 comments · Fixed by #26808
Labels
kubernetes Enable builds in the CI for kubernetes Team:Integrations Label for the Integrations team

Comments

@eedugon
Copy link
Contributor

eedugon commented May 21, 2021

Describe the enhancement:
When using add_kubernetes_metadata we populate a lot of fields, some of them related with the controller of the pod, like :
kubernetes.deployment.name (implemented in #23610)
kubernetes.statefulset.name

It would be interesting to enrich the documents of pods belonging to daemonsets with something like kubernetes.daemonset.name

Describe a specific use case for the enhancement or feature:
Being able to create visualizations or filter logs based on the daemonset name.


Considering that there are other kinds of resources where this might be beneficial (jobs, cronjobs or even CRDs), a different approach could be storing the fields:

kubernetes.controller.kind: Values could be "Deployment", "StatefulSet", "DaemonSet", etc.
kubernetes.controller.name: the name of the resource controlling this pod (when applicable).

That change in the schema would also allow much flexible and powerful visualizations because having data of the same nature (who controls the pod) in different fields is challenging.

For your consideration ;)

cc: @ChrsMark

@eedugon eedugon added the kubernetes Enable builds in the CI for kubernetes label May 21, 2021
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label May 21, 2021
@ChrsMark ChrsMark added the Team:Integrations Label for the Integrations team label May 24, 2021
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label May 24, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

@ChrsMark
Copy link
Member

Thanks for opening this @eedugon. Your point looks really valid about controller's information. We can definitely add information about Daemonsets and even generalise that for other other types of resources. This bring us to the suggestion you mention about kubernetes.controller.kind and kubernetes.controller.name which looks more than sane and is technically doable already through the api's meta (https://github.com/kubernetes/apimachinery/blob/master/pkg/apis/meta/v1/types.go#L312-L315). What I'm not so sure is if this change would be compatible with the UI too so this should be also evaluated by @sorantis and coordinated carefully. Maybe we can keep what we have so far as is and these 2 fields as extra information. In addition, this also looks like a possible candidate for orchestrator ECS.

fyi @jsoriano @exekias

@sorantis
Copy link
Contributor

@ChrsMark could you elaborate on the potential compatibility in the UI? Where do you see it could break?

@ChrsMark
Copy link
Member

@ChrsMark could you elaborate on the potential compatibility in the UI? Where do you see it could break?

My concern is mostly about if we could/should replace existing fields like kubernetes.deployment.name with kubernetes.controller.name if we finally decide that we want to introduce kubernetes.controller.name and kubernetes.controller.kind (or kubernetes.controller.type). I'm not sure what fields exactly are used, or planned to be used, by UI.

In addition, as I mentioned in the previous comment maybe we can just add those 2 fields additionally to what we have and avoid replacing what we have so far with the following approach:

Let's take Pod as an example. We can populate its metadata with kubernetes.statefulset.name and kubernetes.deployment.name but also add kubernetes.controller.name (same value as Deployment's name) and kubernetes.controller.type.

@sorantis
Copy link
Contributor

We can coordinate this change with the UI.

@ChrsMark could you elaborate on the potential compatibility in the UI? Where do you see it could break?

My concern is mostly about if we could/should replace existing fields like kubernetes.deployment.name with kubernetes.controller.name if we finally decide that we want to introduce kubernetes.controller.name and kubernetes.controller.kind (or kubernetes.controller.type). I'm not sure what fields exactly are used, or planned to be used, by UI.

Firstly, I think the different types of resources are called Workloads in Kubernetes, so it's best we maintain consistency. Workloads can be represented by e.g. kubernetes.workload.type, kubernetes.workload.name.
The Metrics UI today works with the following non-metric fields:

Screen Shot 2021-05-24 at 15 04 32

As you can see, currently there's no workload related field in use that can break the UI. I can easily imagine that the introduction of workloads would increase the usefulness of the Kubernetes inventory. For example, by introducing Kubernetes Workloads as another inventory along with Kubernetes Pods, Kubernetes Nodes that provides a workload related perspective, with the option to group the inventory by workload type. This way the use will quickly see what types and how many workloads are running on a given Kubernetes cluster.

In any case we will coordinate any change to the ingested data with the UI to make sure everything works as it should.

In addition, as I mentioned in the previous comment maybe we can just add those 2 fields additionally to what we have and avoid replacing what we have so far with the following approach:

Let's take Pod as an example. We can populate its metadata with kubernetes.statefulset.name and kubernetes.deployment.name but also add kubernetes.controller.name (same value as Deployment's name) and kubernetes.controller.type.

I think by just adding a specific workload name we won't capture the fact that kubernetes.statefulset is a workload type and hence the UI and dashboards will have to instrument that relationship manually. So +1 on adding the workload type and name fields. Is it possible to use aliasing here? Like if kubernetes.workload.type= "StatefulSet", then alias kubernetes.workload.name with kubernetes.statefulset.name. This can potentially remove the need for storing duplicate information.

@ChrsMark
Copy link
Member

Thanks for the feedback @sorantis ! Summarising what we would like to do here:

  1. Add initial request for kubernetes.daemonset.name and kubernetes.job/cronjob.name (we already have these fields coming from state_* respective metricsets).
  2. Introduce new fields kubernetes.controller/workload.name and kubernetes.controller/workload.type. We need to decide about controller vs workload. I think workload is more user friendly.
  3. Use aliases to improve storage efficiency (see Verify ECS alignment in container fields reported by kubernetes module and meta processor #23585 for relevant information). Store only kubernetes.deployment.name or kubernetes.daemonset.name and then alias kubernetes.workload.name to this.

@jsoriano feel free to share any comment/feedback :).

@jsoriano
Copy link
Member

jsoriano commented May 28, 2021

TLDR; +1 to add the daemonset fields. For the controller/workload fields, on a first look it looks tempting to do that, but after thinking a bit more I wonder if this would be so useful, specially thinking that:

  • There are going to be resources that belong to a chain of controllers, and there are use cases to keep storing all of them.
  • Different controller/workload kinds have different natures, so they are probably going to be queried in different ways in any case.
  • We may need to duplicate data for this.

2. Introduce new fields kubernetes.controller/workload.name and kubernetes.controller/workload.type. We need to decide about controller vs workload. I think workload is more user friendly.

We have to take into account that there can be a chain of controllers for a single resource, for instance when a Pod is created by a Deployment, its controller is a ReplicaSet controlled by the Deployment. If we consider CRDs as suggested in the description, a Deployment could itself be controlled by another resource. So, from the technical point of view, the controller of the pod would be the replicaset, the controller of the replicaset would be the deployment and the controller of the deployment would be the CRD, but the user is usually interested only in the Deployment, or the CRD. On the other side, there may be pods controlled only by a ReplicaSet created directly by an operator (or a CRD) without a Deployment (there was a time when Deployments didn't even exist and we deployed things using ReplicationControllers, the legacy resource replaced by ReplicaSet).

We will have to decide what to store as workload type and name for any given resource that can have a chain of controllers. If we leave CRDs apart by now, we probably want to store the top-level controller in these fields, but there can be still tricky cases.

A use case I would like to remark is a failing upgrade of a Deployment. When a deployment is modified, it creates a new ReplicaSet to start containers with it, while old ones are stopped at the configured pace. If there are problems, operators will probably want to check metrics and logs of both ReplicaSets separatedly, to compare or check what can be going wrong in the new one. Even if the upgrade doesn't fail, on big deployments operators may want to compare the metrics to see that everything goes well in the new version and revert if needed. So, Deployments are more useful in general and for inventory purpouses, but ReplicaSets are also useful for certain operations, or to compare the behaviour of different deployed versions.

Another point is about leveraging this in the UI, there is a limited set of Workloads (~4), and they have different nature, so we may still want to consume their data differently. For example for Deployments and ReplicaSets you want to see the health of the instances, but for CronJobs/Jobs you are more interested on a list of the last executions and their result. If implemented, this will probably lead to different UIs, and then it would be probably ok to have this in different fields as it is now.

3. Use aliases to improve storage efficiency (see #23585 for relevant information). Store only kubernetes.deployment.name or kubernetes.daemonset.name and then alias kubernetes.workload.name to this.

I don't think we can do that, we would need different aliases for data stored in the same index/data stream, what is not possible. kubernetes.workload.name could be an alias to only one of the other fields, but not to both depending on the case. So we would need to duplicate this data.

@eedugon
Copy link
Contributor Author

eedugon commented Jun 2, 2021

@jsoriano , i don't think we have to replicate exactly the data model used in Kubernetes in terms of who controls what.

In my view, the majority of users don't create ReplicaSets directly, and in some cases they might not be aware of their existence. Also replicaSets usually have dynamic names (when created by a different workload), so it's difficult to track them as they won't be totally recognized by users. I'm not saying to not store the associated ReplicaSet, as it could be useful for some cases and it wouldn't make sense to lose that information.

Regarding CRDs support I guess that's a totally different scope, sorry for having brought the attention here. Probably that has much more implications than what I thought.

In my view, consolidating (even if we have to duplicate some information) the potential deployment, statefulset, daemonset, job, etc, name and the type of high level workload (deployment, statefulset, etc) in the same fields would add a lot of flexibility to dashboards and visualizations focused on pods, as it would allow creating selection boxes to filter different types of workloads. Anyway we can definitely move this conversation somewhere else, as the issue was created just for the DaemonSet metadata that was missing :-D (sorry for the offtopic!)

Of course I agree with your technical view in low level where the relations are more complex, I'm just suggesting to offer a simplification of those relations trying to find the most common use cases.

Also I agree that different workloads have different nature and it might be insteresting to have dedicated visualizations per workload type. In such case the consolidation of fields wouldn't add much benefit, I totally agree. I was more focused on pods analysis.

@eedugon
Copy link
Contributor Author

eedugon commented Jun 2, 2021

Just thinking out loud now :)

We could have:

  • kubernetes.workload.name + type: for high level workloads like Deployment, StatefulSet, etc. (or keep the separated field names, which would force us to add extra fields in the future if a new type is supported).

  • kubernetes.crd.name + type: for CRDs support (if we support this in the future, and I don't know if this is technically easy or feasible anyway)

For example an Elasticsearch pod could be showed as part of an Elasticsearch resource and as part of a StatefulSet. Then the ReplicaSet could still be stored in the same field as it is now in case of more in-deep filtering needed.
If there are more intermediate levels of ownership we should consider if they are really useful or not.

@jsoriano
Copy link
Member

jsoriano commented Jun 3, 2021

In my view, consolidating (even if we have to duplicate some information) the potential deployment, statefulset, daemonset, job, etc, name and the type of high level workload (deployment, statefulset, etc) in the same fields would add a lot of flexibility to dashboards and visualizations focused on pods, as it would allow creating selection boxes to filter different types of workloads.

Could you describe some use cases where this would be important in visualizations focused on pods? I see it can be useful to see what was the workload that controls the pod, but I don't see under what use case this can be so relevant. kubectl doesn't explicity show this information in commands focused on pods.
We can implement this and have this kind of view, but if they are not so relevant in real use cases, should we make the effort?

On the other hand I see several use cases where having specific visualizations of specific workloads can be really useful:

  • For deployments, a view that answers how many healthy pods are there, how are they performing, during a release how many pods are pending to upgrade...
  • For daemonsets, a view that answers if they are all healthy, that helps to understand which nodes have it running and which ones don't.
  • For statefulsets, apart of the health of the pods, its view could also show information about their volume claims.
  • For cronjobs, a list of the last executions and metrics about them.

This is a bit the "pets vs. cattle" thing, pods would be the pets and workloads would be the cattle, and you are usually more interested on the cattle point of view. In this case each kind cattle has its own particularities.

Anyway we can definitely move this conversation somewhere else, as the issue was created just for the DaemonSet metadata that was missing :-D (sorry for the offtopic!)

No worries, this is a very interesting discussion 🙂 Maybe we can create a different meta issue to add the metadata of other controllers/workload types, and keep here this discussion about adding common workload fields.

In my view, the majority of users don't create ReplicaSets directly, and in some cases they might not be aware of their existence. Also replicaSets usually have dynamic names (when created by a different workload), so it's difficult to track them as they won't be totally recognized by users. I'm not saying to not store the associated ReplicaSet, as it could be useful for some cases and it wouldn't make sense to lose that information.

  • kubernetes.workload.name + type: for high level workloads like Deployment, StatefulSet, etc. (or keep the separated field names, which would force us to add extra fields in the future if a new type is supported).

I think that keeping separated fields will be needed at least for the Deployment/ReplicaSet case. Even if ReplicaSet is a bit of an implementation detail in Kubernetes, and not usually created on its own, they allows to distinguish one deployed version from others, what is useful to monitor rolling upgrades, or to compare historical performance of different versions of the same deployment (other fields could be used for this too, but this one is the native way to do it in Kubernetes).

Regarding forcing us to add extra fields, in any case when we want to support a new type we need to dedicate some effort to collect it and test it, adding the extra field would be a little extra. Depending on how the workload name and type would be collected, it would require additional effort in any case, for example the Deployment cannot be obtained directly from the Pod, you need to obtain it from the ReplicaSet.

There are not so many types of workloads to consider having to add a single field or not an advantage. If this were a completely generic thing, I would totally agree with having generic fields.

@ChrsMark
Copy link
Member

ChrsMark commented Jul 9, 2021

I'm +1 on adding daemonset.name for now and continue the discussion about the workload fields in a separate one.

In this regard, I think we can close this one when #26808 is merged and continue with the follow up issue. @jsoriano what do you think?

@MichaelKatsoulis if we agree on this, could you please take care of the follow up issue so as to not lose the discussion around workload fields?

@jsoriano
Copy link
Member

jsoriano commented Jul 9, 2021

In this regard, I think we can close this one when #26808 is merged and continue with the follow up issue. @jsoriano what do you think?

SGTM, thanks @ChrsMark and @MichaelKatsoulis!

@MichaelKatsoulis
Copy link
Contributor

I have created #26817 as a follow up issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kubernetes Enable builds in the CI for kubernetes Team:Integrations Label for the Integrations team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants