Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Investigation] Understand latency of Admin list task executions #328

Closed
2 of 20 tasks
EngHabu opened this issue May 27, 2020 · 2 comments
Closed
2 of 20 tasks

[Investigation] Understand latency of Admin list task executions #328

EngHabu opened this issue May 27, 2020 · 2 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@EngHabu
Copy link
Contributor

EngHabu commented May 27, 2020

Describe the bug
LIst task executions is slow. We should figure out why and improve performance. If performance cannot be improved, then we need to figure out a better solution.

Expected behavior
This query should (we think) be faster than it is. It's taking about a second, which is much too slow.

SELECT
  task_executions.*
FROM
  task_executions
  LEFT JOIN tasks
    ON task_executions.project = tasks.project
    AND task_executions.domain = tasks.domain
    AND task_executions.name = tasks.name
    AND task_executions.version = tasks.version
  INNER JOIN node_executions
    ON task_executions.node_id = node_executions.node_id
    AND task_executions.execution_project = node_executions.execution_project
    AND task_executions.execution_domain = node_executions.execution_domain
    AND task_executions.execution_name = node_executions.execution_name
  INNER JOIN executions
    ON node_executions.execution_project = executions.execution_project
    AND node_executions.execution_domain = executions.execution_domain
    AND node_executions.execution_name = executions.execution_name
WHERE
  "task_executions"."deleted_at" IS NULL
  AND ((executions.execution_name = 'f939d0eea6e9c479981b')
  AND (executions.execution_project = 'ourgenesisteam')
  AND (executions.execution_domain = 'production')
  )

Anand and Katrina have already looked through the query plan and nothing looks amiss. All the right indices are being hit. Debugging this and understanding if we have a fundamental problem with our data model will require a deeper look than just going through the query plan.

Flyte component

  • Overall
  • Flyte Setup and Installation scripts
  • Flyte Documentation
  • Flyte communication (slack/email etc)
  • FlytePropeller
  • FlyteIDL (Flyte specification language)
  • Flytekit (Python SDK)
  • FlyteAdmin (Control Plane service)
  • FlytePlugins
  • DataCatalog
  • FlyteStdlib (common libraries)
  • FlyteConsole (UI)
  • Other

To Reproduce
Steps to reproduce the behavior:

  1. ...
  2. ...

Screenshots
If applicable, add screenshots to help explain your problem.

Environment
Flyte component

  • Sandbox (local or on one machine)
  • Cloud hosted
    • AWS
    • GCP
    • Azure
  • Baremetal
  • Other

Additional context
NA

@EngHabu EngHabu added bug Something isn't working untriaged This issues has not yet been looked at by the Maintainers labels May 27, 2020
@EngHabu
Copy link
Contributor Author

EngHabu commented May 27, 2020

CC @wild-endeavor

@wild-endeavor wild-endeavor self-assigned this Jun 25, 2020
@wild-endeavor wild-endeavor changed the title [Investigation] Flyte Admin 95th Percentile read latency is bursty [Investigation] Understand latency of Admin list task executions Jun 27, 2020
@katrogan
Copy link
Contributor

katrogan commented Sep 1, 2020

For context (thank you @anandswaminathan for the pointers) the list task executions for node query was querying on a subset of primary key fields (execution project, execution domain, execution name, node id) that were not indexed. Adding an index for this use case brought down query latency from the order of seconds to ~0.1 ms

@katrogan katrogan closed this as completed Sep 2, 2020
@wild-endeavor wild-endeavor removed the untriaged This issues has not yet been looked at by the Maintainers label Sep 2, 2020
@wild-endeavor wild-endeavor added this to the 0.7.0 milestone Sep 2, 2020
eapolinario pushed a commit to eapolinario/flyte that referenced this issue Dec 6, 2022
* CRD validation

Signed-off-by: Kevin Su <[email protected]>

* CRD validation

Signed-off-by: Kevin Su <[email protected]>

* Revert json tag name

Signed-off-by: Kevin Su <[email protected]>

* Address comment

Signed-off-by: Kevin Su <[email protected]>

* Address comment

Signed-off-by: Kevin Su <[email protected]>

* Fixed test

Signed-off-by: Kevin Su <[email protected]>

* Rebase

Signed-off-by: Kevin Su <[email protected]>

* Updated CRD

Signed-off-by: Kevin Su <[email protected]>

* test

Signed-off-by: Kevin Su <[email protected]>
eapolinario pushed a commit to eapolinario/flyte that referenced this issue Dec 6, 2022
eapolinario pushed a commit to eapolinario/flyte that referenced this issue Dec 20, 2022
…w standart #patch (flyteorg#328)

* chore: update eslint and prettier, use arbnb rules as base for standart
* chore: prettier run through p.1 - p8
* chore: eslint shuffle imports
* chore: eslint set few rules to warn, so we can start working on them

Signed-off-by: Nastya Rusina <[email protected]>
eapolinario pushed a commit to eapolinario/flyte that referenced this issue Dec 20, 2022
eapolinario pushed a commit to eapolinario/flyte that referenced this issue Aug 9, 2023
* CRD validation

Signed-off-by: Kevin Su <[email protected]>

* CRD validation

Signed-off-by: Kevin Su <[email protected]>

* Revert json tag name

Signed-off-by: Kevin Su <[email protected]>

* Address comment

Signed-off-by: Kevin Su <[email protected]>

* Address comment

Signed-off-by: Kevin Su <[email protected]>

* Fixed test

Signed-off-by: Kevin Su <[email protected]>

* Rebase

Signed-off-by: Kevin Su <[email protected]>

* Updated CRD

Signed-off-by: Kevin Su <[email protected]>

* test

Signed-off-by: Kevin Su <[email protected]>
eapolinario pushed a commit to eapolinario/flyte that referenced this issue Aug 9, 2023
eapolinario pushed a commit to eapolinario/flyte that referenced this issue Apr 30, 2024
austin362667 pushed a commit to austin362667/flyte that referenced this issue May 7, 2024
robert-ulbrich-mercedes-benz pushed a commit to robert-ulbrich-mercedes-benz/flyte that referenced this issue Jul 2, 2024
troychiu pushed a commit that referenced this issue Jul 8, 2024
## Overview
This PR adds support for managing orphaned pods. This is an extreme corner-case, that may be impossible, but will ensure we do not orphan pods when GCing fasttask environments.

## Test Plan
This was tested locally and added unit tests.

## Rollout Plan (if applicable)
May be rolled out immediately.

## Upstream Changes
Should this change be upstreamed to OSS (flyteorg/flyte)? If not, please uncheck this box, which is used for auditing. Note, it is the responsibility of each developer to actually upstream their changes. See [this guide](https://unionai.atlassian.net/wiki/spaces/ENG/pages/447610883/Flyte+-+Union+Cloud+Development+Runbook/#When-are-versions-updated%3F).
- [ ] To be upstreamed to OSS

## Issue
https://linear.app/unionai/issue/COR-993/support-adding-pods-to-an-orphaned-environment

## Checklist
* [x] Added tests
* [ ] Ran a deploy dry run and shared the terraform plan
* [ ] Added logging and metrics
* [ ] Updated [dashboards](https://unionai.grafana.net/dashboards) and [alerts](https://unionai.grafana.net/alerting/list)
* [x] Updated documentation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants