Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UI Feature] UX improvement when running out of project quota #3357

Open
2 tasks done
pradithya opened this issue Feb 21, 2023 · 3 comments
Open
2 tasks done

[UI Feature] UX improvement when running out of project quota #3357

pradithya opened this issue Feb 21, 2023 · 3 comments
Labels
enhancement New feature or request stale ui Admin console user interface

Comments

@pradithya
Copy link
Member

Motivation: Why do you think this is important?

Flyte provides the capability to enforce CPU and Memory quota per project. We use this feature to avoid one project from consuming shared K8S cluster resources.

As the size of the workflow is getting larger, the quota limit per project is getting hit quite often which delays the workflow execution.

Currently, there are several UX areas that could be improved to handle the project quota.

  1. When a task is scheduled but the host project is running out of quota, the timeline page still shows the task state as running. Ideally, the running state is only entered when there is a pod executing the task.

Screenshot 2023-02-21 at 2 19 20 PM

  1. There should be a way to inform users that their workflow execution is affected by project quota. As of now, we can only see this information in the flyte propeller's log
Failed to launch job, resource quota exceeded. err: [BackOffError] The operation was attempted but failed, caused by: pods "fec52e914a8c0b0b7000-n3-0" is forbidden: exceeded quota: project-quota, requested: limits.memory=64Gi, used: limits.memory=192Gi, limited: limits.memory=200Gi

Goal: What should the final outcome look like, ideally?

  • Users should be able to identify quickly if their workflow execution is affected by project quota.
  • Transition to "RUNNING" state should be accurate

Propose: Link/Inline OR Additional context

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@pradithya pradithya added enhancement New feature or request ui Admin console user interface untriaged This issues has not yet been looked at by the Maintainers labels Feb 21, 2023
@hamersaw
Copy link
Contributor

hamersaw commented Mar 6, 2023

@pradithya thanks so much for the insight here! We have been working pretty diligently to improve observability. Have you had a chance to look at the performance observability RFC? Specifically the "Runtime Metrics" section.

The current plan is to update this timeline view to breakdown the execution into separate categories, for example platform-level overhead, plugin overhead, plugin execution, etc. The main goal here is to offer users better observability into what Flyte is doing. In this example, Flyte would never enter the plugin execution phase (ie. task in 'RUNNING' phase) which would indicate that Flyte is unable to schedule them. Do you think this is fine-grained enough? The difficulty here is handling all of the failure / queued scenarios, and this is further complicated when executing different plugin types.

Re "There should be a way to inform users that their workflow execution is affected by project quota." This message should be getting displayed in the task status pane of the UI. This is meant to display the latest task status, if it is not happening we should properly handle it.

Would be very interested in hearing your thoughts on this. The PRs for the backend implementation of performance observability runtime metrics are all open for review. However, we haven't had a discussion into exactly what this looks like in the UI. I think your proposal above is very similar to what we were thinking.

@pradithya
Copy link
Member Author

@hamersaw Thanks for the RFC, that's quite comprehensive and I enjoyed reading it!

The current plan is to update this timeline view to breakdown the execution into separate categories, for example platform-level overhead, plugin overhead, plugin execution, etc.

The visualisation in the RFC is precisely what I would like to see in Flyte.

Do you think this is fine-grained enough?

If you are refering to breaking down the task state as granular as TaskExecution.Phase then it is fine-grained enough. One of the phase (WAITING_FOR_RESOURCES) is straightforward enough for users to infer that their workflow was stuck due to resource unavailability.

@eapolinario eapolinario removed the untriaged This issues has not yet been looked at by the Maintainers label Mar 10, 2023
Copy link

github-actions bot commented Dec 6, 2023

Hello 👋, this issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will engage on it to decide if it is still applicable.
Thank you for your contribution and understanding! 🙏

@github-actions github-actions bot added the stale label Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale ui Admin console user interface
Projects
None yet
Development

No branches or pull requests

3 participants