Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

Add endpoint exposing runtime metrics #524

Merged
merged 29 commits into from
Mar 27, 2023
Merged

Conversation

hamersaw
Copy link
Contributor

@hamersaw hamersaw commented Feb 16, 2023

TL;DR

This PR implements an endpoint to expose runtime metrics from flyteadmin.

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

^^^

Tracking Issue

flyteorg/flyte#3272

Follow-up issue

NA

Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
@codecov
Copy link

codecov bot commented Feb 23, 2023

Codecov Report

Merging #524 (dad0ed5) into master (680cac3) will increase coverage by 1.78%.
The diff coverage is 67.35%.

❗ Current head dad0ed5 differs from pull request most recent head a4a2330. Consider uploading reports for the commit a4a2330 to get more accurate results

@@            Coverage Diff             @@
##           master     #524      +/-   ##
==========================================
+ Coverage   58.51%   60.29%   +1.78%     
==========================================
  Files         169      170       +1     
  Lines       15558    13144    -2414     
==========================================
- Hits         9103     7925    -1178     
+ Misses       5654     4385    -1269     
- Partials      801      834      +33     
Flag Coverage Δ
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/rpc/adminservice/base.go 3.88% <0.00%> (+0.34%) ⬆️
pkg/rpc/adminservice/execution.go 0.00% <0.00%> (ø)
pkg/manager/impl/metrics_manager.go 70.83% <70.83%> (ø)
pkg/repositories/transformers/node_execution.go 74.04% <84.61%> (+2.86%) ⬆️
pkg/repositories/transformers/task_execution.go 76.03% <84.61%> (+2.28%) ⬆️
pkg/rpc/adminservice/metrics.go 100.00% <100.00%> (ø)

... and 150 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@hamersaw hamersaw requested a review from eapolinario as a code owner March 17, 2023 16:32
Signed-off-by: Daniel Rammer <[email protected]>
Signed-off-by: Daniel Rammer <[email protected]>
EngHabu
EngHabu previously approved these changes Mar 24, 2023
// iterate over task executions
for index, taskExecution := range taskExecutions {
if index > 0 {
*spans = append(*spans, createOperationSpan(taskExecutions[index-1].Closure.UpdatedAt, taskExecution.Closure.CreatedAt, nodeReset))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this efficient?
Is it equivalent to:

newSpans := *spans
...
newSpans = append(newSpans, ...)
...
...
*spans = newSpans

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, everything I've read says this is the same efficiency.

pkg/manager/impl/metrics_manager.go Outdated Show resolved Hide resolved
@hamersaw hamersaw merged commit cca9e17 into master Mar 27, 2023
@hamersaw hamersaw deleted the feature/runtime-metrics branch March 27, 2023 20:01
eapolinario pushed a commit that referenced this pull request Sep 6, 2023
* metrics manager working for task nodes

Signed-off-by: Daniel Rammer <[email protected]>

* dynamic tasks working

Signed-off-by: Daniel Rammer <[email protected]>

* subworkflow node working

Signed-off-by: Daniel Rammer <[email protected]>

* refactored to allow branch and gate node parsing

Signed-off-by: Daniel Rammer <[email protected]>

* added node transition times

Signed-off-by: Daniel Rammer <[email protected]>

* sorting node and task executions

Signed-off-by: Daniel Rammer <[email protected]>

* added metrics metrics ... inception

Signed-off-by: Daniel Rammer <[email protected]>

* fixed duplicate metrics

Signed-off-by: Daniel Rammer <[email protected]>

* branch node working?

Signed-off-by: Daniel Rammer <[email protected]>

* implemented gate node

Signed-off-by: Daniel Rammer <[email protected]>

* working with partial completions and failures and cache hits

Signed-off-by: Daniel Rammer <[email protected]>

* added unit tests

Signed-off-by: Daniel Rammer <[email protected]>

* fixed most lint issues

Signed-off-by: Daniel Rammer <[email protected]>

* fixed lint issues

Signed-off-by: Daniel Rammer <[email protected]>

* added docs

Signed-off-by: Daniel Rammer <[email protected]>

* updated flyteidl dependency

Signed-off-by: Daniel Rammer <[email protected]>

* fixed unit tests

Signed-off-by: Daniel Rammer <[email protected]>

* chnaged to local flyteidl for testing

Signed-off-by: Daniel Rammer <[email protected]>

* using task event reported_at timestamps for updated_at in task execution models

Signed-off-by: Daniel Rammer <[email protected]>

* updated flyteidl

Signed-off-by: Daniel Rammer <[email protected]>

* fixed lint issues

Signed-off-by: Daniel Rammer <[email protected]>

* using reported_at for node executions

Signed-off-by: Daniel Rammer <[email protected]>

* updated flyteidl

Signed-off-by: Daniel Rammer <[email protected]>

* fixes unit tests and lint issues

Signed-off-by: Daniel Rammer <[email protected]>

* updated flyteidl

Signed-off-by: Daniel Rammer <[email protected]>

* bumped flyteidl deps

Signed-off-by: Daniel Rammer <[email protected]>

* using consts for start-node and end-node

Signed-off-by: Daniel Rammer <[email protected]>

---------

Signed-off-by: Daniel Rammer <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants