-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Start measuring Tekton Pipelines performance #540
Comments
Hello @bobcatfish |
My gut feeling is that I'd lean more toward exposing the metrics from Pipelines itself:
Question: I'm not super familair with Prometheus, how vital would it be to making metrics usable? Could we simply emit the metrics, and allow the user to provide their own metrics gathering mechanism (which could be prometheus but could be something else), or would it make more sense for us to include Prometheus out of the box? (I've very sensitive to adding new dependencies, esp. since I'm under the impression that managing Prometheus is a job in itself, but maybe I'm wrong!) Another option, which I think is a variation on your first suggestion @pradeepitm12 : |
+1 we're looking at the same thing and just started looking at prometheus too, hopefully we can help each other out here. 📈 |
Maybe the first thing to do would be to identify the metrics we're interested in? I'm not super familiar with prometheus but I would think before we want to monitor the metrics, we'd want to figure out what needs monitoring (maybe there's a Jenkins/Jenkins X precedent we can draw on :D?) |
We had our first meeting regarding observability, specifically metrics, today and work is now underway. There are a couple of other issues that overlap in theme with this one. I am linking them together here for us to review later and figure out which to keep and which to close. |
Often, as a developer or administartor(ops) I want some insights about pipeline behavior in terms time taken to execute pipleinerun/taskrun, its success or failure ratio, pod latencies etc. At present tekton pipelines has very limted ways to surface such information or its hard to get those details looking at resources yamls. This patch exposes above mentioned pipelines metrics on '/metrics' endpoint using knative `pkg/metrics` package. User can collect such metrics using prometheus, stackdriver or other supported metrics system. To some extent its solves - tektoncd#540 - tektoncd#164
Often, as a developer or administartor(ops) I want some insights about pipeline behavior in terms time taken to execute pipleinerun/taskrun, its success or failure ratio, pod latencies etc. At present tekton pipelines has very limted ways to surface such information or its hard to get those details looking at resources yamls. This patch exposes above mentioned pipelines metrics on '/metrics' endpoint using knative `pkg/metrics` package. User can collect such metrics using prometheus, stackdriver or other supported metrics system. To some extent its solves - tektoncd#540 - tektoncd#164
Often, as a developer or administrator(ops) I want some insights about pipeline behavior in terms of time taken to execute pipleinerun/taskrun, its success or failure ratio, pod latencies etc. At present tekton pipelines has very limited ways to surface such information or it's hard to get those details looking at resources yamls. This patch exposes above mentioned pipelines metrics on '/metrics' endpoint using knative `pkg/metrics` package. User can collect such metrics using prometheus, stackdriver or other supported metrics system. To some extent its solves - tektoncd#540 - tektoncd#164
Often, as a developer or administrator(ops) I want some insights about pipeline behavior in terms of time taken to execute pipleinerun/taskrun, its success or failure ratio, pod latencies etc. At present tekton pipelines has very limited ways to surface such information or it's hard to get those details looking at resources yamls. This patch exposes above mentioned pipelines metrics on '/metrics' endpoint using knative `pkg/metrics` package. User can collect such metrics using prometheus, stackdriver or other supported metrics system. To some extent its solves - tektoncd#540 - tektoncd#164
Often, as a developer or administrator(ops) I want some insights about pipeline behavior in terms of time taken to execute pipleinerun/taskrun, its success or failure ratio, pod latencies etc. At present tekton pipelines has very limited ways to surface such information or it's hard to get those details looking at resources yamls. This patch exposes above mentioned pipelines metrics on '/metrics' endpoint using knative `pkg/metrics` package. User can collect such metrics using prometheus, stackdriver or other supported metrics system. To some extent its solves - tektoncd#540 - tektoncd#164
Often, as a developer or administrator(ops) I want some insights about pipeline behavior in terms of time taken to execute pipleinerun/taskrun, its success or failure ratio, pod latencies etc. At present tekton pipelines has very limited ways to surface such information or it's hard to get those details looking at resources yamls. This patch exposes above mentioned pipelines metrics on '/metrics' endpoint using knative `pkg/metrics` package. User can collect such metrics using prometheus, stackdriver or other supported metrics system. To some extent its solves - tektoncd#540 - tektoncd#164
Often, as a developer or administrator(ops) I want some insights about pipeline behavior in terms of time taken to execute pipleinerun/taskrun, its success or failure ratio, pod latencies etc. At present tekton pipelines has very limited ways to surface such information or it's hard to get those details looking at resources yamls. This patch exposes above mentioned pipelines metrics on '/metrics' endpoint using knative `pkg/metrics` package. User can collect such metrics using prometheus, stackdriver or other supported metrics system. To some extent its solves - tektoncd#540 - tektoncd#164
Often, as a developer or administrator(ops) I want some insights about pipeline behavior in terms of time taken to execute pipleinerun/taskrun, its success or failure ratio, pod latencies etc. At present tekton pipelines has very limited ways to surface such information or it's hard to get those details looking at resources yamls. This patch exposes above mentioned pipelines metrics on '/metrics' endpoint using knative `pkg/metrics` package. User can collect such metrics using prometheus, stackdriver or other supported metrics system. To some extent its solves - tektoncd#540 - tektoncd#164
Often, as a developer or administrator(ops) I want some insights about pipeline behavior in terms of time taken to execute pipleinerun/taskrun, its success or failure ratio, pod latencies etc. At present tekton pipelines has very limited ways to surface such information or it's hard to get those details looking at resources yamls. This patch exposes above mentioned pipelines metrics on '/metrics' endpoint using knative `pkg/metrics` package. User can collect such metrics using prometheus, stackdriver or other supported metrics system. To some extent its solves - #540 - #164
Issues go stale after 90d of inactivity. /lifecycle stale Send feedback to tektoncd/plumbing. |
We haven't worked on this lately but it is an item in our roadmap and I think we should keep it open. /lifecycle frozen |
Use patch -p1 instead of git am to apply patch
I want to start gathering some requirements around this and get it moving :D |
#3521 has some use cases that we might be able use |
This PR starts a TEP to begin to measure tekton pipelines performance and address tektoncd/pipeline#540 This first iteration just tries to describe the problem vs suggesting the solution. It DOES recommend measuring SLOs and SLIs as a goal, which is kind of part of the solution, so if we think it's useful we could step back even further, but I think this is a reasonable path forward, curious what other folks think!
This PR starts a TEP to begin to measure tekton pipelines performance and address tektoncd/pipeline#540 This first iteration just tries to describe the problem vs suggesting the solution. It DOES recommend measuring SLOs and SLIs as a goal, which is kind of part of the solution, so if we think it's useful we could step back even further, but I think this is a reasonable path forward, curious what other folks think!
This PR starts a TEP to begin to measure tekton pipelines performance and address tektoncd/pipeline#540 This first iteration just tries to describe the problem vs suggesting the solution. It DOES recommend measuring SLOs and SLIs as a goal, which is kind of part of the solution, so if we think it's useful we could step back even further, but I think this is a reasonable path forward, curious what other folks think!
This PR starts a TEP to begin to measure tekton pipelines performance and address tektoncd/pipeline#540 This first iteration just tries to describe the problem vs suggesting the solution. It DOES recommend measuring SLOs and SLIs as a goal, which is kind of part of the solution, so if we think it's useful we could step back even further, but I think this is a reasonable path forward, curious what other folks think!
This PR starts a TEP to begin to measure tekton pipelines performance and address tektoncd/pipeline#540 This first iteration just tries to describe the problem vs suggesting the solution. It DOES recommend measuring SLOs and SLIs as a goal, which is kind of part of the solution, so if we think it's useful we could step back even further, but I think this is a reasonable path forward, curious what other folks think!
@bobcatfish, any tektone performance white paper have? as so far, how many pipeline run or run we can support in middle cluster (just like: 1 master + 1 compute node.) |
Expected Behavior
We should be measuring performance for Pipelines. This task includes both adding the actual measurement mechanism and also the design re. what exactly we want to measurement.
Some ideas for measurement:
Requirements
Actual Behavior
We do not measure or track this.
Additional Info
The text was updated successfully, but these errors were encountered: