Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding functionality to support CRD validation for flyteworkflow #353

Closed
wants to merge 5 commits into from

Conversation

bnsblue
Copy link
Contributor

@bnsblue bnsblue commented Jun 16, 2020

No description provided.

eapolinario pushed a commit to eapolinario/flyte that referenced this pull request Dec 6, 2022
eapolinario pushed a commit to eapolinario/flyte that referenced this pull request Dec 20, 2022
…lyteorg#353)

* improvement: allow to enable/disable content based on admin version
* docs: add info about FeatureFlags and LocalStorage usage

Signed-off-by: Nastya Rusina <[email protected]>
eapolinario pushed a commit to eapolinario/flyte that referenced this pull request Dec 20, 2022
* Add support dev cluster

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* make generate

Signed-off-by: Kevin Su <[email protected]>

* lint

Signed-off-by: Kevin Su <[email protected]>

Signed-off-by: Kevin Su <[email protected]>
eapolinario pushed a commit to eapolinario/flyte that referenced this pull request Jul 24, 2023
* Lazy load grpc plugin

Signed-off-by: Kevin Su <[email protected]>

* rename

Signed-off-by: Kevin Su <[email protected]>

* rename

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* rename

Signed-off-by: Kevin Su <[email protected]>

* lint

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

---------

Signed-off-by: Kevin Su <[email protected]>
eapolinario pushed a commit to eapolinario/flyte that referenced this pull request Aug 9, 2023
eapolinario pushed a commit to eapolinario/flyte that referenced this pull request Aug 21, 2023
* Lazy load grpc plugin

Signed-off-by: Kevin Su <[email protected]>

* rename

Signed-off-by: Kevin Su <[email protected]>

* rename

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* rename

Signed-off-by: Kevin Su <[email protected]>

* lint

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

---------

Signed-off-by: Kevin Su <[email protected]>
@github-actions github-actions bot added the stale label Aug 26, 2023
@github-actions github-actions bot closed this Sep 2, 2023
eapolinario pushed a commit to eapolinario/flyte that referenced this pull request Apr 30, 2024
* Add support dev cluster

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* make generate

Signed-off-by: Kevin Su <[email protected]>

* lint

Signed-off-by: Kevin Su <[email protected]>

Signed-off-by: Kevin Su <[email protected]>
austin362667 pushed a commit to austin362667/flyte that referenced this pull request May 7, 2024
* Add support dev cluster

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* make generate

Signed-off-by: Kevin Su <[email protected]>

* lint

Signed-off-by: Kevin Su <[email protected]>

Signed-off-by: Kevin Su <[email protected]>
robert-ulbrich-mercedes-benz pushed a commit to robert-ulbrich-mercedes-benz/flyte that referenced this pull request Jul 2, 2024
* Add support dev cluster

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* nit

Signed-off-by: Kevin Su <[email protected]>

* make generate

Signed-off-by: Kevin Su <[email protected]>

* lint

Signed-off-by: Kevin Su <[email protected]>

Signed-off-by: Kevin Su <[email protected]>
troychiu pushed a commit that referenced this pull request Jul 8, 2024
## Overview
The PR addresses two separate issues (1) the fasttask plugin can assign tasks to a worker that is at full capacity (parallelism + backlog_length). The worker than transparently drops these tasks and the plugin failsover to another worker and (2) the fasttask plugin has no notion of `PhaseVersion` which FlytePropeller uses to determine if any updates have occurred and consequently it needs to store state to etcd. This means that the `LastAccessedAt` field on the fasttask plugin state increment will never be persisted. Therefore, all backlogged tasks will fail after the grace period occurs regardless of whether updates are sent by the worker heartbeat or not.

The former is addressed by making the worker backlog_length a suggestion, similar to how `max-parallelism` is applied within FlytePropeller. That is, the fasttask plugin will attempt to only assign tasks to full worker capacity (ie. parallelism + backlog_length), but if it assigns more (race condition) then the worker will backlog them.

The latter is fixed by adding a `PhaseVersion` field on the fasttask plugin state that is incremented with each worker task status heartbeat.

## Test Plan
This has been tested locally against a variety of backlog scenarios (ex. differing lengths, timeouts, etc).

## Rollout Plan (if applicable)
This may be rolled out to all tenants immediately.

## Upstream Changes
Should this change be upstreamed to OSS (flyteorg/flyte)? If not, please uncheck this box, which is used for auditing. Note, it is the responsibility of each developer to actually upstream their changes. See [this guide](https://unionai.atlassian.net/wiki/spaces/ENG/pages/447610883/Flyte+-+Union+Cloud+Development+Runbook/#When-are-versions-updated%3F).
- [ ] To be upstreamed to OSS

## Issue
https://linear.app/unionai/issue/COR-1455/execution-frequently-fails-due-to-missing-task-status-reporting

## Checklist
* [x] Added tests
* [ ] Ran a deploy dry run and shared the terraform plan
* [x] Added logging and metrics
* [ ] Updated [dashboards](https://unionai.grafana.net/dashboards) and [alerts](https://unionai.grafana.net/alerting/list)
* [ ] Updated documentation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant