[Feature] Telemetry Reporting to Metrics Server and/or Prometheus #896

rrichardson · 2018-06-28T00:01:55Z

Feature Request

For analytic and batch workflows, having precise telemetry is terribly important. We need to analyze run time as well as memory and CPU usage to be able to further tune scheduling of jobs.

What would be terribly nice to have is a hierarchy of stats posted to MetricsServer or the Prometheus PushGateway, or some configurable endpoint. All of the standard pod stats that are reported by metrics-server would be great.

I'm not terribly clear on the mechanics of metrics-server -> Prometheus. I'm assuming that there is some logic that tells it to only report on pods if they were spun up by Deployments or similar. Maybe there is something that can be issued from the Argo operator when it is creating new workflows to ensure they are captured into Prometheus.

I am happy to help implement this, but I have no idea where to begin with regards to where to hook for stat collection and reporting.

* Prometheus metrics server * Use unstructured informer * Fix linter errors * Use a dedicated informer for metrics * Pass context to RunServer. Close the http server * Check the return value in metrics defer Close

jessesuen · 2018-08-13T06:04:16Z

Implemented in b9cffe9

bbc88ks mentioned this issue Aug 6, 2018

Issue #896 - Prometheus metrics and telemetry #935

Merged

jessesuen added this to the v2.2 milestone Aug 13, 2018

jessesuen closed this as completed Aug 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Telemetry Reporting to Metrics Server and/or Prometheus #896

[Feature] Telemetry Reporting to Metrics Server and/or Prometheus #896

rrichardson commented Jun 28, 2018

jessesuen commented Aug 13, 2018

[Feature] Telemetry Reporting to Metrics Server and/or Prometheus #896

[Feature] Telemetry Reporting to Metrics Server and/or Prometheus #896

Comments

rrichardson commented Jun 28, 2018

jessesuen commented Aug 13, 2018