forked from flyteorg/flyte
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[RFC] Config Override (flyteorg#3553)
* [RFC] Config Override * rfc Signed-off-by: byhsu <[email protected]> --------- Signed-off-by: byhsu <[email protected]> Co-authored-by: byhsu <[email protected]>
- Loading branch information
Showing
2 changed files
with
106 additions
and
0 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
# Flyte Config Overrides RFC | ||
|
||
|
||
https://github.com/flyteorg/flyte/issues/475 | ||
|
||
## Motivation | ||
As a user of Flyte, configuration like "resources (cpu/mem/gpu)", "catalog (enable/disable/version)", "retries", "spark config", "hive cluster" are created only at registration time. The only way to update is to re-register. This is not desirable, as there are times when the user may want to update these parameters for specific launchplans, or executions. | ||
|
||
Using `with_overrides` can provide node-level override for workflow, but this is still at compile time. Users need to use dynamic workflow if they want to pass the override values as inputs to the workflow. Moreover, with_overrides can not be applied to reference workflow because the workflow function body is not exposed. | ||
|
||
## Design | ||
|
||
By default, all task nodes are not overridable at runtime. However, users can call `.with_runtime_override(name: str)` on task nodes to register a "hook" on tasks, which allows the task to be indentified by `name` and overridden with new config at runtime. | ||
|
||
One main motivation for introducing such a "hook mechanism" based on identifiers/names is that we couldn't come up with a good answer how a good UX can be achieved when having to specify overrides in a nested workflow graph in the UI, in code, or via the CLI since all approaches would require replicating the workflow graph structure in the overrides config. This way, the overrides can be specified in a simple map and the structure of the workflow graph does not matter/does not have to be shown. | ||
|
||
```python | ||
@task | ||
def t1(): | ||
... | ||
|
||
@task | ||
def t2(): | ||
... | ||
|
||
@workflow | ||
def wf(): | ||
t1() # this task node cannot be overridden | ||
t1().with_runtime_override("task-yee") # can be overriden under the name "task-yee" | ||
t2().with_runtime_override("task-ketan") # can be overriden under the name "task-ketan" | ||
t3() # this task node cannot be overridden | ||
``` | ||
|
||
We will have reuse proto `taskNodeConfigOverride`, which contains all the overridable fields. | ||
|
||
```proto | ||
// Optional task node overrides that will be applied at task execution time. | ||
message TaskNodeOverrides { | ||
// A customizable interface to convey resources requested for a task container. | ||
Resources resources = 1; | ||
// Boolean that indicates if caching should be enabled | ||
bool cache = 2; | ||
// Boolean that indicates if identical (ie. same inputs) instances of this task should be | ||
// executed in serial when caching is enabled. | ||
bool cache_serialize = 3; | ||
// Cache version to use | ||
string cache_version = 4; | ||
// Number of times to retry this task during a workflow execution | ||
int32 retries = 5; | ||
// Boolean that indicates that this task can be interrupted and/or scheduled on nodes with lower QoS guarantees | ||
bool interruptible = 6; | ||
// Container image to use | ||
string container_image = 7; | ||
// Environment variables that should be added for this tasks execution | ||
map<string, string> environment = 9; | ||
// This argument provides configuration for a specific task types. | ||
google.protobuf.Struct task_config = 10; | ||
} | ||
``` | ||
|
||
We provide multiple ways to override the task nodes at runtime, including UI, workflow decorator, launch plan, etc. | ||
|
||
### 1. UI | ||
|
||
The registered task nodes will prompt on UI for users to assign config override values. | ||
|
||
<img src="../images/config-override-ui.png" style="width: 50%; height: 50%"> | ||
|
||
### 2. Inside workflow | ||
|
||
Users can override values inside workflow. The value will become the default values on UI. | ||
|
||
```python | ||
@workflow | ||
def wf(): | ||
t0().with_runtime_override("model_1_resources", runtime_override_config(cpu=1, mem="1Gi")) | ||
sub_wf().with_runtime_override(...).with_runtime_override(...).with_runtime_override(...) | ||
``` | ||
|
||
### 3. launch plan | ||
|
||
Users can also provide overrides to `LaunchPlan.get_or_create`. If users provide both workflow decorator and launch plan predefined values, launch plan one will override workflow decorator one. | ||
|
||
```python | ||
launch_plan.LaunchPlan.get_or_create( | ||
workflow=wf, | ||
name="your_lp_name_5", | ||
runtime_override={ | ||
"task-yee": TaskNodeConfigOverride(...), | ||
"task-ketan": TaskNodeConfigOverride(...) | ||
} | ||
) | ||
``` | ||
|
||
## Concerns | ||
|
||
1. How to resolve when two hooks have the same names? | ||
- The flytekit compiler should error out |