[RFC] Config Override #3553

ByronHsu · 2023-03-30T06:12:17Z

Tracking issue

Describe your changes

Add RFC

Note to reviewers

Need discussion on the UI part!

bstadlbauer · 2023-03-30T09:05:09Z

I've just read the linked issue (#475) and it seems like there is a lot of good conversation going on there.
From reading it, the trickiest part seems to be doing "per-node" configuration changes, mostly because node names are not trivially knowable before starting a workflow and especially for deeply nested graphs, those are hard to figure out.

Should we finish the discussion on how to best achieve that in #475 and once things are ready copy the result over to this RFC?

bstadlbauer · 2023-03-30T09:45:41Z

I've added my thoughts in #475 (comment)

davidmirror-ops · 2023-03-30T18:30:56Z

03-30-2023 Meeting notes. Byron Hsu: not sure how to implement this in the UI
FG: It would be a very complicated string. We can mark the tasks as overrideable
BH: what if your task resides at a deeper level
Eduardo: how does this work with Dynamic?
FG: in the dynamic subworkflow when I call a task could be .task.overrride. Whenever user register somethings dynamically with runtime.override: foo it could select runtime-level things like GPU
EA: basically naming overrides
FG: assign the name of the override at registration time
KU: create "override hooks"
FG: for dynamic it would be impossible
Greg Gydush: regarding the UX, (see #475 (comment)) I'd love to pass promises to the override config. That is the UX that I'd want
KU: we should not confuse them with inputs, it's config. Executions should take a config
HSU: we can parameterize it and pass it to override config
KU: with async tasks you should be able to do something like this
Bernhard: for nested workflows it could be complicated. If you run the same task twice you couldn't run it with different resources
KU: we could call it dynamic_config. In the dynamic case
Dan Rammer: scalability concerns. Checking for naming conflicts would be necessary but complicated
KU: what about making this an experimental feature

fg91 · 2023-03-30T20:16:38Z

I agree with @bstadlbauer that when workflows become very nested/deep, this syntax will be very difficult to write for the user:

wf_overrides = WFOverride(
    n0=TaskOverride(
       container_image="repo/image:0.0.1"
       limits=Resource(cpu="2")
    ),
    n1=WFOverride( # subwf
        n0=TaskOverride(
            container_image="repo/image:0.0.3"
            limits=Resource(cpu="3")
        )
    )
)

In addition, it will also be very complicated to come up with a way to show these overrides in the UI.

This is why I proposed in the contributors meeting to create "named overrides" in order to avoid replicating the DAG structure again in the overrides.

This could look e.g. like this:

@task 
def train(dataset_uri: str) -> str:
    pass

@task 
def evaluate(model_uri: str):
   ...

@workflow
def wf(...):
    model1 = train(...).with_runtime_overrides("model_1_resources")
    model2 = train(...).with_runtime_overrides("model_2_resources")
    
    evaluate(model_uri = model1)
    evaluate(model_uri = model2)

This effectively means that at registration time the author of the workflow can mark the configs of specific nodes overridable when starting an execution. By marking them this way in the workflow and not the respective task decorator, the same task, in this case train, can be executed with different sets of resources configured e.g. in FlyteConsole.

By naming the overrides, we also circumvent the nesting problem which would make it very simple to configure overrides in FlyteConsole:

In addition, the FlyteConsole would not be overloaded when pre-filling the overridable configs since this is only possible for selected tasks, not the entire (potentially giant) workflow.

Considerations:

@task 
def train(dataset_uri: str) -> str:
    pass

@dynamic 
def subwf(model_uri: str):
    model = train(...).with_runtime_overrides("model_resources")
   ...

@workflow
def wf(...):
    subwf()

Let us consider a workflow where the runtime override is configured in a dynamic subworkflow. I don't know whether it would be possible when creating an execution of wf to already know which tasks the dynamic subwf will launch and whether "they are marked overridable".
If not, we would not be to pre-fill the config in the FlyteConsole or the exection spec yaml overrides section.

However, I feel it would be absolutely acceptable if the user specifies such a named override manually in the FlyteConsole when launching the execution and then when the dynamic subworkflow registers a task with a named override that matches a name the user specified when creating the execution, the override is applied.

@hamersaw raised another concern:

Let's say team 1 maintains this workflow:

@task 
def train_task_team1():
    ....

@workflow
def wf_team1():
    train_task_team1().with_runtime_overrides("train_resources")

And let's say team 2 maintains this workflow:

from team_1 import wf_team1

@task 
def train_task_team2():
    ....

@workflow
def wf_team2():
    wf_team1()

   train_task_team2().with_runtime_overrides("train_resources")

An engineer in team 2 might start an execution in the FlyteConsole and override the resources for their train task but potentially unknowingly also for the task of team 1.

I would definitely say that the named overrides should only be valid within an execution.

I personally am not too worried about this scenario but this might be a bigger issue in large organisations.

In theory (without knowing at this point whether this is feasible) I could imagine something like this:

@workflow
def wf_team2():
    wf_team1().without_runtime_overrides()

    train_task_team2().with_runtime_overrides("train_resources")

However, this might be going too far.

yubofredwang · 2023-03-30T23:43:41Z

Hi @fg91, you approach looks pretty neat. Can you add more details on how model_1_resources is defined? Is that a different entity type other than task/workflow?

bstadlbauer · 2023-03-31T10:16:22Z

@fg91 I like this, the UX is great!

Trying to take a step back here, it seems like this is pretty much a special case of "tagging" + matching on a tag (option 2 from #475 (comment)), right?

So in theory, if .with_runtime_overrides("train_resources") would add some sort of tag to the task, say {"runtime_override": "train_resources"} this could be one of many things to match on.

I.e. on the backed, we could represent an override as something like (pseudocode; names all TBD):

class NodeMatchCriteria:
    task_name: str
    node_name: Optional[str] = None
    tags: Dict[str, str] = {}
    function_arguments: Dict[str, Any] = {}
    ...

class ConfigOverride:
    match: NodeMatchCriteria
    resources: Resources
    ...

I think if we could keep the matching a bit more general we could set us up for future usecases without too much additional implementation effort. One thing this would also allow (on the backend) is to re-run a workflow with override based on a node-name (as the node name is known from the UI in the second run).

What do you think?

ByronHsu · 2023-04-17T07:00:35Z

My concern with "named override" method is that users can only override the configs on UI/CLI, but cannot override them inside the code. There are two folds of overriding in the code.

constant value (already supported)

@task
def t1():
  ...
  
@workflow
def wf():
  t1.with_override(cpu=1)

promise value

@task
def t1():
  ...
  
@workflow
def wf(x):
  t1.with_override(cpu=x)

The second case would be extremely useful when reference workflows are shared across the team, but each team wants to tailor the config to their need.

@ref_launch_plan
def ref_wf(cpu:int):
  ...

# team1
@workflow
def wf():
   ref_wf(cpu=3)

# team2
@workflow
def wf():
   ref_wf(cpu=2)

ByronHsu · 2023-04-17T07:03:55Z

To resolve to second case above, one possible way is to define a special input parameter (task_config_override) with the special dataclass (TaskConfigOverride) that will be parsed in runtime and override taskTemplate.

@task(requests=Resources(cpu="3"), limits=Resources(cpu="4"))
def python_task(task_config_override: TaskConfigOverride):
    print("hello python task")
 
cfg = PythonTaskConfigOverride(
        requests=ResourcesOverride(cpu=requests_cpu),
        limits=ResourcesOverride(cpu=limits_cpu)
 )
 
@workflow
def python_wf():
    python_task(task_config_override=cfg)

fg91 · 2023-04-23T11:33:39Z

I'm personally not a fan of passing task resources (or their overrides) as worklow/task arguments since this mixes task logic configuration with task resource config:

@task
def t1():
 ...
 
@workflow
def wf(x):
 t1.with_override(cpu=x)

When being able to modify the code and not having to rely on overrides at runtime, what is the reason for not using the existing .with_overrides()?

…/config-override

ByronHsu · 2023-04-27T18:29:22Z

https://docs.google.com/document/d/1gaWU3lsa66APG2aD95_BeL9hFEsEsFiNGGj7KcQmt8c/edit?usp=sharing

l listed down our usecase to better step back and think from the user side

ByronHsu · 2023-05-04T20:35:37Z

To resolve the concern that users cannot override them inside the code regarding @fg91's idea. Me and @pingsutw came up with a solution.
If they don't want to specify on UI, they can pass the override config through workflow definition.

@workflow(
  override={
    "model_1_resources": <override obj 1>
    "model_2_resources": <override obj 2>
  }
)
def my_wf():
  ...

There are two ways they can override value:

specifiy in @workflow with override obj, which will become the default override value on UI
directly put override obj on UI

The idea basically combines @fg91 and @kumare3's ideas.

fg91 · 2023-05-04T21:41:15Z

That looks like a very nice UX to me @ByronHsu 👏

I think override could also become an arg to LaunchPlan.get_or_create(...).

ByronHsu · 2023-05-05T01:10:52Z

yeah that would be nice @fg91

kumare3 · 2023-05-08T04:34:17Z

@ByronHsu can we update the rfc itself with the proposal?

ByronHsu · 2023-05-08T04:48:45Z

@goyalankit What do you think ^?

rfc/system/3553-config-override.md

bstadlbauer

Thanks @ByronHsu for being persistent here even though this took longer than anticipated! Let's make sure to get this across the finish line and start implementing soon 👍

I think the UX is great now! The part that is still unclear to me is how the TaskNodeConfigOverride is tied to a particular task on the backend.

Two options would come to mind:

.with_runtime_override("task-yee") would fill a new field (something like string runtime_override_name) in the TaskNode proto. Then the WorkflowMetadata, the LaunchPlanSpec as well as the ExecutionSpec would have a new field map<string, TaskNodeConfigOverride> runtime_overrides.
We add a new field map<string, string> tags to TaskNode. Then .with_runtime_override("task-yee") would set a "runtime_override_name": "task-yee" tag on the node. The TaskNodeConfigOverride would then need to know what it applies to (similar to NodeMatchCriteria in this comment).
While the UX would be the same, this would also allow for other matches (e.g. based on node name or like here) in the future, without us needing to change the flyteidl API.

Also, should we add a sentence about precedence of these? E.g. something like UI (=Execution) > LaunchPlan > Workflow?

rfc/system/3553-config-override.md

fg91

I very much like the proposed UX now 👏

with_runtime_overrides was the first name that came to my mind when I first had this idea. @kumare3 called it with_override_hook(?) in the contributors meeting. Which name should we write into the RFC in the end?

davidmirror-ops · 2023-05-11T18:09:48Z

2023-05-11: Agree on naming before merge, also address implementation concerns. Byron might need help to distribute implementation tasks. The suggestion is to work on backend first (propeller and flytekit first). TSC happy to help kickoff implementation.

pingsutw · 2023-05-16T22:50:51Z

Here are all the different ways to override the config.

wf2.py

from flytekit import task, workflow, Resources
@task
def t1():
    print("hello world")

@workflow
def sub_wf():
    t1()

wf1.py

from flytekit import task, workflow, Resources
from wf2 import sub_wf

@task
def t0():
    print("hello world")

@workflow()
def wf():
    t0()
    sub_wf()

To override the t0 at runtime

# wf1.py
@workflow()
def wf():
    t0().with_runtime_override("model_1_resources", runtime_override_config(cpu=1, mem="1Gi"))
    ...

cpu=1, mem="1Gi" is default execution config, people are still able to update the cpu and mem on flyteConsole.

To override the t1 at runtime. update wf2.py

# wf2.py
@workflow
def sub_wf():
    t1().with_runtime_override("model_2_resources", runtime_override_config(cpu=1, mem="1Gi"))

To override the default execution config of t1 from wf1.py. Add override config to @workflow decorator. Then t1 task will use 2 cpu by default.

# wf1.py
@workflow(override={"model_2_resources": runtime_override_config(cpu=2, mem="2Gi")})
def wf():
    ...

FlyteIDL:

Add more config to TaskOverrides
Add string runtime_override_name to TaskNode
Add runtimeOverrideConfigs (map<string, TaskOverrides>) to ExecutionSpec
Add runtimeOverrideConfigs (map<string, TaskOverrides>) to LaunchPlanSpec

FlyteAdmin:

Iterate runtimeOverrideConfigs. if the runtime_override_name in the TaskNode matches one of the elements in the runtimeOverrideConfigs, then update the task config (cpu, role, etc). https://github.com/flyteorg/flyteadmin/blob/610451d21ca2fa8f598ebc26b8ebb96e423e270a/pkg/manager/impl/execution_manager.go#L780-L783

Flytekit:

Add runtimeOverrideConfigs to LaunchPlanSpec
Add runtimeOverrideConfigs to ExecutionSpec
Update Flytekit remote and CLI (pyflyte run --remote --override "{'model_1_resources': {'cpu': 3}}" wf1.py wf).

@ByronHsu is going to update IDL. cc @fg91 @bstadlbauer would you like to help work on flytekit and admin?

ByronHsu · 2023-05-17T00:27:52Z

@pingsutw do we also want to override on launchplan.get_or_create? https://github.com/flyteorg/flyte/blob/c77eeaa89a1ff7fffdc30537868b19cc98bb3ea6/rfc/system/3553-config-override.md#3-launch-plan

fg91 · 2023-05-19T11:47:25Z

@pingsutw @ByronHsu what do you think of this:

# wf1.py
@workflow
def wf():
    t0()
    sub_wf().override={"model_2_resources": runtime_override_config(cpu=2, mem="2Gi")}

instead of

# wf1.py
@workflow(override={"model_2_resources": runtime_override_config(cpu=2, mem="2Gi")})
def wf():

?

@pingsutw do we also want to override on launchplan.get_or_create?

Yes, I think this would be very useful 👍

pingsutw · 2023-05-19T20:28:58Z

@fg91 I like that, too. It will be much easier to implement.

do we also want to override on launchplan.get_or_create?

yup, we should support it as well

ByronHsu · 2023-05-19T21:51:19Z

what if users want to override a task under the subworkflow's subworkflow? and users want to do that inside wf function.
I think its more generic if users can override every overridable task on the top-level workflow.

# wf1.py
@workflow
def sub_wf():
   sub_sub_wf()
   ...

@workflow
def wf():
    t0()
    sub_wf()

pingsutw · 2023-05-19T21:59:56Z

@ByronHsu You still do it at top-level workflow.

old one:

@workflow(override={"model_2_resources": runtime_override_config(cpu=2, mem="2Gi")})
def wf():
    t0()
    sub_wf()

Fabio suggested:

@workflow
def wf():
    t0()
    sub_wf().override={"model_2_resources": runtime_override_config(cpu=2, mem="2Gi")}

ByronHsu · 2023-05-19T23:35:11Z

got it. So you meant if model_2_resources is under sub_wf's subflow's task, we can still override it.
yeah i agree @fg91 's idea is cleaner because we don't have to mess up @workflow argument.
So for task, they do

t1().with_runtime_override("model_2_resources", runtime_override_config(cpu=1, mem="1Gi"))

My idea for subworkflow is that they do

sub_wf().with_runtime_override("model_2_resources", runtime_override_config(cpu=2, mem="2Gi"), "model_1_resources", runtime_override_config(cpu=2, mem="2Gi"))

The args are .(name1, override1, name2, override2). Or passing a dict is better?
Also, do you think we can use the same method name .with_runtime_override for both task and subworkflow?

pingsutw · 2023-05-20T00:44:15Z

with_runtime_override can return promise. so it could be

sub_wf().with_runtime_override(...).with_runtime_override(...).with_runtime_override(...)

ByronHsu · 2023-05-20T01:03:32Z

We can provide 'with_runtime_override' to override a single task in a sub workflow and 'with_runtime_overrides' for multiple tasks in a sub workflow.

bstadlbauer · 2023-05-22T07:07:45Z

I'm in favor of chaining (e.g. sub_wf().with_runtime_override(...).with_runtime_override(...).with_runtime_override(...)) instead of two different methods.

@pingsutw Happy to help out!

fg91 · 2023-05-22T07:16:17Z

Instead of chaining or two methods, one with s, we could also simply pass a dict?

sub_wf().with_runtime_override({"model_2_resources": runtime_override_config(cpu=1, mem="1Gi"), "model_3_resources": ...})

Also happy to take a ticket @pingsutw :)

bstadlbauer · 2023-05-22T07:55:22Z

Also works of course 👍

rfc/system/3553-config-override.md

fg91 · 2023-05-25T15:58:17Z

To me the RFC looks ready to approve after the results of the latest discussions have been incorporated into the rfc doc itself 🚀

eapolinario

This is shaping up to be a great feature!

rfc/system/3553-config-override.md

bstadlbauer · 2023-06-07T07:33:59Z

@ByronHsu Also ok with this after all changes discussed there are incorporated in the RFC

bstadlbauer · 2023-06-07T12:34:15Z

Also, one question that came up during a first stab at implementing this - what would the UX for overriding task_config look like?

ByronHsu · 2023-06-07T16:57:05Z

@bstadlbauer how about passing a nested dict?

Signed-off-by: byhsu <[email protected]>

[RFC] Config Override

6c5107e

davidmirror-ops added the rfc A label for RFC issues label Mar 30, 2023

davidmirror-ops requested review from fg91, cosmicBboy, bstadlbauer and eapolinario March 30, 2023 17:30

bstadlbauer mentioned this pull request Mar 31, 2023

[Feature] Generic support for overrides during an execution #475

Open

13 tasks

Merge branch 'master' of https://github.com/flyteorg/flyte into byhsu…

5aee519

…/config-override

kumare3 reviewed May 10, 2023

View reviewed changes

rfc/system/3553-config-override.md Show resolved Hide resolved

kumare3 reviewed May 10, 2023

View reviewed changes

rfc/system/3553-config-override.md Show resolved Hide resolved

bstadlbauer reviewed May 10, 2023

View reviewed changes

rfc/system/3553-config-override.md Outdated Show resolved Hide resolved

rfc/system/3553-config-override.md Outdated Show resolved Hide resolved

rfc/system/3553-config-override.md Outdated Show resolved Hide resolved

fg91 reviewed May 11, 2023

View reviewed changes

rfc/system/3553-config-override.md Outdated Show resolved Hide resolved

fg91 reviewed May 11, 2023

View reviewed changes

rfc/system/3553-config-override.md Outdated Show resolved Hide resolved

fg91 reviewed May 11, 2023

View reviewed changes

rfc/system/3553-config-override.md Outdated Show resolved Hide resolved

fg91 reviewed May 11, 2023

View reviewed changes

rfc/system/3553-config-override.md Show resolved Hide resolved

fg91 previously approved these changes May 11, 2023

View reviewed changes

pingsutw previously approved these changes May 23, 2023

View reviewed changes

fg91 reviewed May 25, 2023

View reviewed changes

rfc/system/3553-config-override.md Outdated Show resolved Hide resolved

eapolinario reviewed May 25, 2023

View reviewed changes

rfc/system/3553-config-override.md Show resolved Hide resolved

rfc/system/3553-config-override.md Show resolved Hide resolved

ByronHsu mentioned this pull request May 28, 2023

Add config override idl flyteorg/flyteidl#412

Closed

8 tasks

This was referenced Jun 2, 2023

Feat: Allow config overrides at runtime flyteorg/flytekit#1672

Draft

Add "runtime overrides" working group flyteorg/community#10

Merged

rfc

b62ec71

Signed-off-by: byhsu <[email protected]>

ByronHsu dismissed stale reviews from pingsutw and fg91 via b62ec71 June 7, 2023 17:55

ByronHsu force-pushed the byhsu/config-override branch from 5af675d to b62ec71 Compare June 7, 2023 18:02

davidmirror-ops merged commit 4d7b656 into flyteorg:master Jul 4, 2023

[RFC] Config Override #3553

[RFC] Config Override #3553

Conversation

ByronHsu commented Mar 30, 2023

Tracking issue

Describe your changes

Note to reviewers

bstadlbauer commented Mar 30, 2023

bstadlbauer commented Mar 30, 2023

davidmirror-ops commented Mar 30, 2023

fg91 commented Mar 30, 2023 • edited Loading

yubofredwang commented Mar 30, 2023

bstadlbauer commented Mar 31, 2023 • edited Loading

ByronHsu commented Apr 17, 2023 • edited Loading

ByronHsu commented Apr 17, 2023 • edited Loading

fg91 commented Apr 23, 2023 • edited Loading

ByronHsu commented Apr 27, 2023

ByronHsu commented May 4, 2023 • edited Loading

fg91 commented May 4, 2023 • edited Loading

ByronHsu commented May 5, 2023

kumare3 commented May 8, 2023

ByronHsu commented May 8, 2023

bstadlbauer left a comment

Choose a reason for hiding this comment

fg91 left a comment

Choose a reason for hiding this comment

davidmirror-ops commented May 11, 2023

pingsutw commented May 16, 2023

ByronHsu commented May 17, 2023

fg91 commented May 19, 2023

pingsutw commented May 19, 2023

ByronHsu commented May 19, 2023 • edited Loading

pingsutw commented May 19, 2023

ByronHsu commented May 19, 2023 • edited Loading

pingsutw commented May 20, 2023

ByronHsu commented May 20, 2023 • edited Loading

bstadlbauer commented May 22, 2023

fg91 commented May 22, 2023

bstadlbauer commented May 22, 2023

fg91 commented May 25, 2023

eapolinario left a comment

Choose a reason for hiding this comment

bstadlbauer commented Jun 7, 2023

bstadlbauer commented Jun 7, 2023

ByronHsu commented Jun 7, 2023

fg91 commented Mar 30, 2023 •

edited

Loading

bstadlbauer commented Mar 31, 2023 •

edited

Loading

ByronHsu commented Apr 17, 2023 •

edited

Loading

ByronHsu commented Apr 17, 2023 •

edited

Loading

fg91 commented Apr 23, 2023 •

edited

Loading

ByronHsu commented May 4, 2023 •

edited

Loading

fg91 commented May 4, 2023 •

edited

Loading

ByronHsu commented May 19, 2023 •

edited

Loading

ByronHsu commented May 19, 2023 •

edited

Loading

ByronHsu commented May 20, 2023 •

edited

Loading