You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For analytic and batch workflows, having precise telemetry is terribly important. We need to analyze run time as well as memory and CPU usage to be able to further tune scheduling of jobs.
What would be terribly nice to have is a hierarchy of stats posted to MetricsServer or the Prometheus PushGateway, or some configurable endpoint. All of the standard pod stats that are reported by metrics-server would be great.
I'm not terribly clear on the mechanics of metrics-server -> Prometheus. I'm assuming that there is some logic that tells it to only report on pods if they were spun up by Deployments or similar. Maybe there is something that can be issued from the Argo operator when it is creating new workflows to ensure they are captured into Prometheus.
I am happy to help implement this, but I have no idea where to begin with regards to where to hook for stat collection and reporting.
The text was updated successfully, but these errors were encountered:
* Prometheus metrics server
* Use unstructured informer
* Fix linter errors
* Use a dedicated informer for metrics
* Pass context to RunServer. Close the http server
* Check the return value in metrics defer Close
Feature Request
For analytic and batch workflows, having precise telemetry is terribly important. We need to analyze run time as well as memory and CPU usage to be able to further tune scheduling of jobs.
What would be terribly nice to have is a hierarchy of stats posted to MetricsServer or the Prometheus PushGateway, or some configurable endpoint. All of the standard pod stats that are reported by metrics-server would be great.
I'm not terribly clear on the mechanics of metrics-server -> Prometheus. I'm assuming that there is some logic that tells it to only report on pods if they were spun up by Deployments or similar. Maybe there is something that can be issued from the Argo operator when it is creating new workflows to ensure they are captured into Prometheus.
I am happy to help implement this, but I have no idea where to begin with regards to where to hook for stat collection and reporting.
The text was updated successfully, but these errors were encountered: