Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not emit execution id label by default in single binary #5704

Merged
merged 2 commits into from
Aug 29, 2024

Conversation

eapolinario
Copy link
Contributor

Tracking issue

Closes #3991

Why are the changes needed?

Single binary emits execution-id labels to a bunch of flytepropeller and flyteadmin metrics, but since the golang prometheus golang client is setup in such a way that no metric is ever dropped (as described in prometheus/client_golang#920 (comment)), this ends up causing an apparent memory leak due to how these high cardinality metrics are stored in-memory.

What changes were proposed in this pull request?

This PR removes the execution-id label from the single binary metrics and adds a comment to hint at the potential problems caused by that setting.

I confirmed that the metric is not enabled by default in any component, nor is set in any of the shipped helm charts, while being left as a last resort debugging tool.

One could argue that we should never have had that label in the first place and instead reach out to other debugging means, like open telemetry, etc, but that's a separate discussion.

How was this patch tested?

I had a local test where I'd have a workflow composed of a single cached task. Running that in a tight loop and confirming that the increase in memory of the single binary executable is now orders of magnitude slower than before the PR.

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

Copy link

codecov bot commented Aug 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 36.17%. Comparing base (5c147ef) to head (cc714d3).
Report is 155 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #5704   +/-   ##
=======================================
  Coverage   36.17%   36.17%           
=======================================
  Files        1302     1302           
  Lines      109614   109614           
=======================================
+ Hits        39653    39658    +5     
+ Misses      65816    65810    -6     
- Partials     4145     4146    +1     
Flag Coverage Δ
unittests-datacatalog 51.37% <ø> (ø)
unittests-flyteadmin 55.33% <ø> (+0.03%) ⬆️
unittests-flytecopilot 12.17% <ø> (ø)
unittests-flytectl 62.22% <ø> (+0.04%) ⬆️
unittests-flyteidl 7.12% <ø> (ø)
unittests-flyteplugins 53.34% <ø> (ø)
unittests-flytepropeller 41.70% <ø> (-0.02%) ⬇️
unittests-flytestdlib 55.33% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

wild-endeavor
wild-endeavor previously approved these changes Aug 29, 2024
Co-authored-by: Thomas J. Fan <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
@eapolinario eapolinario enabled auto-merge (squash) August 29, 2024 15:23
@davidmirror-ops
Copy link
Contributor

Thank you! This is great as it simplifies a bit some of the queries in the Grafana user dashboard (example)

@eapolinario eapolinario merged commit 27d9746 into master Aug 29, 2024
50 checks passed
@eapolinario eapolinario deleted the do-not-emit-execution-id-labels-in-single-binary branch August 29, 2024 20:01
pmahindrakar-oss pushed a commit that referenced this pull request Sep 9, 2024
* Do not emit execution id label by default in single binary

Signed-off-by: Eduardo Apolinario <[email protected]>

* Update flytestdlib/contextutils/context.go

Co-authored-by: Thomas J. Fan <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>

---------

Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Thomas J. Fan <[email protected]>
Signed-off-by: pmahindrakar-oss <[email protected]>
bgedik pushed a commit to bgedik/flyte that referenced this pull request Sep 12, 2024
…5704)

* Do not emit execution id label by default in single binary

Signed-off-by: Eduardo Apolinario <[email protected]>

* Update flytestdlib/contextutils/context.go

Co-authored-by: Thomas J. Fan <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>

---------

Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Eduardo Apolinario <[email protected]>
Co-authored-by: Thomas J. Fan <[email protected]>
Signed-off-by: Bugra Gedik <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Investigate memory leak in single-binary deployment
5 participants