-
Notifications
You must be signed in to change notification settings - Fork 681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core feature] Add execution id as an available log parameter for log links UI #3696
Comments
Thank you for opening your first issue here! 🛠 |
Thank you for opening the issue. I really like the idea of adding execution id to the log parameters so that it can be used to power things like weights and biases directly from Flyte UI. Also we should have a capability of turning it on and off per task. I think @task(obs_config="{"wandb":"on", "...."}
def foo():
... Something like that? |
@fg91 if I recall you discussed some deeper integration with wandb - mind summarizing here? |
I wouldn't make it specific to wandb. I described our use case here in the rfc incubator:
The link template could, in a general case, look for instance like this: Configuring this link template on the platform side in the helm values would solve our use case. However, the assumption is that tasks then actually create the required tag in the experiment tracking server. Before this happens, the link points nowhere. Also, not all tasks might create a wandb run but would then still have a link to wandb in the Flyte console. It would, thus, be even nicer if one could add log links to the flyte console from the task itself. If I understood @cosmicBboy correctly in the contributors sync, he proposed something like this: @task
def train():
tag = flytekit.current_context().execution_id
my_experiment_tracking_api.add_tag(tag)
flytekit.current_context().add_log_link(f"https://my-experiment-tracking-server.com/?tag={tag}") However, would it be tricky to make this link show up in the flyte console immediately and not only later when the task finished? For long running trainings we would like to monitor the loss curves and are looking for easy access to this link before the training ends. If adding the link at runtime is complicated, I could also imagine the following (similar to what @kumare3 proposed above but not specific to wandb): @task(log_links="{"wandb": f"https://my-experiment-tracking-server.com/?tag={{.executionId}}")
def foo():
... Users, in their task, would then still have to make sure that this link points to something meaningful but it would be on a per-task basis and not for the entire platform. |
@fg91 exactly I do not think it should be specific to wandb |
I think it should be up to the user which tags/names they want to assign to their runs (aka whether they use execution id or a combination of execution id and version id or whether the project or domain is in there ... Maybe even which node in the workflow or which retry). Basically I think it would be great if all these vars that are already (partially )available in the flyte context could be used to template links. |
Would love to see this feature! So far we worked around it by using Deck. As the link only shows up after the execution finished, a solution to show it already during execution would be great. @task(log_links="{"wandb": f"https://my-experiment-tracking-server.com/?tag={{.executionId}}")
def foo():
... This would be an elegant solution for wandb but might bring some limitations for other tracking servers. ClearML for example uses UUIDs for both the project and the experiment:
One could get around this limitation by searching for a tag in all projects though: Linking to a single run is probably not possible with the decorator approach though. If specifying the link at runtime was possible and the link then shows up immediately, it would probably be the most versatile solution. |
I think including additional log links the the Adding logging links to the @btang-stripe thoughts? |
Not sure if this would be a follow up discussion or part of this: For our usecase I would love to have additionally the user_id which would allow to see which user did start the experiment. Also I wonder how I would automatically create a new experiment for each task. I would image something like: @task(log_links="{"wandb": f"https://my-experiment-tracking-server.com/?tag={{.executionId}}")
def train():
fqn= flytekit.current_context().task_fqn
version = flytekit.current_context().task_id
run_id = flytekit.current_context().execution_id
user_id = flytekit.current_context().user_id
experiment = my_experiment_tracking_api.get_or_create(fqn)
experiment.version(version)
experiment.run_id(run_id)
experiment.user_id(user_id) The downside is that people would still need to write this. I think it would be even more awesome if this kinda logic could be abstracted: @task(plugins=[wand_plugin])
def train():
...
def wand_plugin(task_precontext):
fqn= task_precontext.task_fqn
version = task_precontext.task_id
run_id = task_precontext.execution_id
user_id = task_precontext.user_id
task_precontext.add_log_link(f"https://my-experiment-tracking-server.com/?tag={run_id}")
experiment = my_experiment_tracking_api.get_or_create(fqn)
experiment.version(version)
experiment.run_id(run_id)
experiment.user_id(user_id) What do you think about it or how do you solve those challenges? |
@vkaiser-mb have you seen the the mlflow plugin - precontext can be done, but sadly today Flyte engine cannot read partial outputs from a task. Something folks have asked about in flytedecks as well |
These are all great suggestions! After taking another look at the actual run link on wandb — providing just the execution id would actually still not be enough to properly link back from flyte console. We'd still be missing the wandb project (which may not match the flyte project), and would like to allow users to customize their run groups. I think @fg91 's suggestion of adding something directly to the flytekit context would work nicely for our case as it would be flexible enough to handle effectively any url schema eg.,:
The url definition at the task level — while convenient — has limitations in just being able to use the parameters that are available in the log links, which is insufficient for W&B |
@kumare3 thanks for the mlflow pointer! I would totally agree that there should be a way to communicate with the backend and update the status during execution for many usecases. But I would hope that you can generate a link to an external webapp already before the execution is started (so there is no waiting time before you can see the link). @btang-stripe where or how would you get the other variables (team, project_name) from? Would this also be availalbe before execution? |
I checked the mlfow plugin example, but if I am not getting it wrong, it just wraps the trainings function and will be executed when the task is started. I would argue that the links should be created in advance, thats why in my example above its part of the task definition. This would also avoid the problem that you would need the task to end before anything can be seen on the backend. And in the Again I am not sure if this still part of this issue, but I also posted some ideas for UI costumization of mine here in the discussion part: #3569 |
Something I just realized and which would vote against my code proposals is that the webapp (wandb) config needs to be done from inside the task and can not be done in advance. Therefore forget about my suggestion having it in the task decorator. You probably realized that already. My personal feeling is that we would need to go the way @fg91 and @cosmicBboy suggested. Which would need some interactive commincation to the the backend so the link can popup immediately. (Which I think is anyways a very favorable feature and maybe would also allow something like streaming STDOUT) |
Summary of a discussion in the contributor's sync (25/5/2023): Creating such a communication channel and allowing user code to directly talk to backend services would be a major and undesirable architectural change because of the potential implications on reliability. A solution that was favoured in the discussion is to allow the user code to flush the deck while the task is still running. Currently, the deck is only uploaded to blob storage after the task has finished. This is typically too late to show links to experiment tracking servers in the Flyte console as one might want to click on them to observe the progress of the training before the training finished. If the user could trigger the upload of the deck to blob storage, one could do the following: @task
def train():
run = some_experiment_tracking_api.start_run()
deck = flytekit.Deck("Summary", ...)
deck.append(MarkdownRenderer.to_html(f"... {run.link} ..."))
deck.flush() This would then not appear in the logs section but would accomplish the same goal. What do you think about this @vkaiser-mb ? |
For another idea, if the limitation of adding a static log template to the task decorator is the limitation of log parameters, then we could extend this scheme to take parameters from elsewhere in the task definition, for example environment variables or pod labels / annotations. These can be set at the workflow or task level and then applied immediately on task start. Log links could be like:
So advantages over the previous (1) much cleaner - no 3rd party dependencies (2) available immediately on task execution, but of course not as flexible. I don't think this will work for the clearML case described above. It looks like these schemes are unable to be templatized, which seems like poor design to me. I have to admit I'm not familiar with most experiment tracking frameworks. |
@fg91 I like the idea of a flush. From UX perspective I would prefer to see a link similiar to the other loglinks already on task level and no need to press the Flyte Deck button that again opens an overlay. So I wonder if something like this would be possible: @task
def train():
fqn= flytekit.current_context().task_fqn
version = flytekit.current_context().task_id
run_id = flytekit.current_context().execution_id
user_id = flytekit.current_context().user_id
# Here is the interesting part
flytekit.current_context().log_links.append(f"https://my-experiment-tracking-server.com/?run={run_id }")
flytekit.current_context().flush
# Or flytekit.current_context().log_links.flush()
experiment = my_experiment_tracking_api.get_or_create(fqn)
experiment.version(version)
experiment.run_id(run_id)
experiment.user_id(user_id) I guess the idea of @hamersaw could also work in the w&bs usecase. |
2023-06-08 Contrib meeting notes: in terms of usability, option 2 presented at the meeting is preferable but potentially requires more effort. (See https://hackmd.io/gmwGOpD_TWiyw9_dJ_34Dg). |
@btang-stripe are you ok with closing this issue and instead continuing the discussion how to best implement this in the RFC incubator? |
Awesome — yep! let's continue the discussion there |
Motivation: Why do you think this is important?
We want to integrate other products with Flyte and be able to link to the logs. Specifically we'd like to link to wandb which requires an identifier created at runtime. Using the Flyte execution id as a wand group id would allow us to create the log link
Goal: What should the final outcome look like, ideally?
the execution id of the workflow should be available as a log parameter
Describe alternatives you've considered
Providing the ability to pass arbitrary strings as log parameters would be ideal, then we could pull the actual wandb run id instead. This, however, would not be as straight forward as passing the Flyte execution id.
Propose: Link/Inline OR Additional context
Currently the plugin machinery does not have the execution id. Talking with @kumare3 this should be updated here and here
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?
The text was updated successfully, but these errors were encountered: