Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Can't retrieve env vars from pyflyte run #4503

Closed
2 tasks done
jiwidi opened this issue Nov 30, 2023 · 12 comments · Fixed by flyteorg/flytekit#2132
Closed
2 tasks done

[BUG] Can't retrieve env vars from pyflyte run #4503

jiwidi opened this issue Nov 30, 2023 · 12 comments · Fixed by flyteorg/flytekit#2132
Labels
backlogged For internal use. Reserved for contributor team workflow. bug Something isn't working flytekit FlyteKit Python related issue waiting for reporter Used for when we need input from the bug reporter

Comments

@jiwidi
Copy link

jiwidi commented Nov 30, 2023

Describe the bug

I am not able to retrieve env pars passed to pyflyte run with the --env/--envvars parameter. They are not accessible within the workflow code.

Basing myself on documentation https://docs.flyte.org/projects/flytekit/en/latest/pyflyte.html and issue #4092 where it mentions its possibility.

Expected behavior

To pass env vars to pyflyte run and access them during code execution time

Additional context to reproduce

I have a python file main.py:

from flytekit import task, workflow
import os

@workflow
def workflow() -> None:
    print([key for key in os.environ])
    print(os.environ["DUMMY_KEY"])
    return None

Which can be run with

pyflyte run --envvars '{"DUMMY_KEY"="DUMMY_VALUE"}' main.py workflow
or
pyflyte run --env '{"DUMMY_KEY"="DUMMY_VALUE"}' main.py workflow

Both result in failure:

Failed with Unknown Exception <class 'KeyError'> Reason: "Error encountered while executing 'workflow':\n  'DUMMY_KEY'"
"Error encountered while executing 'workflow':\n  'DUMMY_KEY'"

Due to the line at print(os.environ["DUMMY_KEY"]) and no presence of DUMMY_KEY in the list of os.ENVIRON

Screenshots

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@jiwidi jiwidi added bug Something isn't working untriaged This issues has not yet been looked at by the Maintainers labels Nov 30, 2023
Copy link

welcome bot commented Nov 30, 2023

Thank you for opening your first issue here! 🛠

Copy link

dosubot bot commented Nov 30, 2023

🤖

Hello @jiwidi,

I'm Dosu, a bot here to assist you with your issues and queries regarding Flyte. While we wait for one of our human maintainers, I'm here to help you understand, troubleshoot, and resolve bugs, as well as guide you on your journey to becoming a contributor. Let's work together to resolve your issue!

Based on the information you've provided and the similar issues I've found in the Flyte repository, it seems that the environment variables passed to pyflyte run with the --env/--envvars parameter are not being exported to the environment of your script. This is why they are not accessible within your workflow code.

To make the environment variables available within the workflow code, you need to export them in your script. Here's how you can modify your script:

#!/bin/bash

# Our SDK entrypoint can be configured to call this script
# This script maps that command to the conventional location of the virtual environment in Flyte containers

set -e

# Export the environment variables
export $(cut -d= -f1 <path_to_your_env_file>)

. ${VENV}/bin/activate

exec $*

In this script, <path_to_your_env_file> should be replaced with the path to the file containing your environment variables. The cut command extracts the names of the variables, and export makes them available to the environment.

Please note that this solution assumes that your environment variables are stored in a file in the format VARNAME=value. If your environment variables are passed in a different way, you might need to adjust the script accordingly. You can refer to the flyte_venv.sh script in the Flyte repository for more details.

As for the exact mechanism by which the --env/--envvars parameter passes environment variables to pyflyte run, I wasn't able to find a specific answer within the repository. If you have more information about how you're passing the environment variables or if there's a specific part of the process you're interested in, please let me know and I'll do my best to assist you further.

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

-Dosu

Sources

@jiwidi
Copy link
Author

jiwidi commented Nov 30, 2023

🤖

Hello @jiwidi,

I'm Dosu, a bot here to assist you with your issues and queries regarding Flyte. While we wait for one of our human maintainers, I'm here to help you understand, troubleshoot, and resolve bugs, as well as guide you on your journey to becoming a contributor. Let's work together to resolve your issue!

Based on the information you've provided and the similar issues I've found in the Flyte repository, it seems that the environment variables passed to pyflyte run with the --env/--envvars parameter are not being exported to the environment of your script. This is why they are not accessible within your workflow code.

To make the environment variables available within the workflow code, you need to export them in your script. Here's how you can modify your script:

#!/bin/bash

# Our SDK entrypoint can be configured to call this script
# This script maps that command to the conventional location of the virtual environment in Flyte containers

set -e

# Export the environment variables
export $(cut -d= -f1 <path_to_your_env_file>)

. ${VENV}/bin/activate

exec $*

In this script, <path_to_your_env_file> should be replaced with the path to the file containing your environment variables. The cut command extracts the names of the variables, and export makes them available to the environment.

Please note that this solution assumes that your environment variables are stored in a file in the format VARNAME=value. If your environment variables are passed in a different way, you might need to adjust the script accordingly. You can refer to the flyte_venv.sh script in the Flyte repository for more details.

As for the exact mechanism by which the --env/--envvars parameter passes environment variables to pyflyte run, I wasn't able to find a specific answer within the repository. If you have more information about how you're passing the environment variables or if there's a specific part of the process you're interested in, please let me know and I'll do my best to assist you further.

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

-Dosu

Sources

This is not the best solution, as the env vars i would be using come from another runner so wont be put on a file. I can hack something prior to flyte running so I put them on a file with a bash script but prefer not to and do directly within the pyflyte command

Copy link

dosubot bot commented Nov 30, 2023

🤖

Based on the information you've provided, it seems like you're trying to access environment variables directly within your Flyte workflow. However, due to the way Flyte is designed, environment variables passed to pyflyte run with the --env/--envvars parameter are not directly accessible within your workflow code.

Flyte tasks are designed to be portable and reproducible across different environments. This means that they don't have direct access to the environment variables of the machine where they're being executed. Instead, Flyte tasks should be written in a way that all of their dependencies are explicitly declared, either as task inputs or as part of the task's container definition.

If you need to pass sensitive information to your tasks, such as secrets or credentials, you should use Flyte's secret management feature. This allows you to securely pass sensitive information to your tasks without exposing it in your code or your task definitions.

Here's an example of how you can modify your code to use Flyte's secret management feature:

import os
import flytekit
from flytekit import CronSchedule, LaunchPlan, Secret, task, workflow

SECRET_NAME = "user_secret"
SECRET_GROUP = "user-info"

@task(secret_requests=[Secret(group=SECRET_GROUP, key=SECRET_NAME)])
def secret_task() -> str:
    secret_val = flytekit.current_context().secrets.get(SECRET_GROUP, SECRET_NAME)
    # Please do not print the secret value, we are doing so just as a demonstration
    print(secret_val)
    return secret_val

@workflow
def wf() -> str:
    x = secret_task()
    return x

sslp = LaunchPlan.get_or_create(
    name="scheduled_secrets",
    workflow=wf,
    schedule=CronSchedule(schedule="0/1 * * * *"),
)

In this code, the secret_task function is declared with a secret_requests parameter, which is a list of Secret objects that the task needs to access. Each Secret object is defined with a group and a key, which correspond to the name of the secret in your secret management system.

Then, within the secret_task function, you can use flytekit.current_context().secrets.get(SECRET_GROUP, SECRET_NAME) to retrieve the value of the secret.

You can find more information about Flyte's secret management feature in the Flyte documentation.

I hope this helps! If you have any other questions, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@eapolinario eapolinario added flytekit FlyteKit Python related issue backlogged For internal use. Reserved for contributor team workflow. and removed untriaged This issues has not yet been looked at by the Maintainers labels Nov 30, 2023
@jasonlai1218
Copy link
Contributor

I would like to work on this, could you assign me, please?

@jasonlai1218
Copy link
Contributor

#take

@jiwidi
Copy link
Author

jiwidi commented Jan 9, 2024

@jasonlai1218 hey! did you had the time to look on this?

@jiwidi
Copy link
Author

jiwidi commented Jan 18, 2024

@eapolinario @jasonlai1218 hello?

@eapolinario
Copy link
Contributor

@jiwidi , in your example you're trying to access the environment variables in a workflow, not in a task. The arguments set via --env are passed to the tasks that compose a workflow, not the workflow itself (since that's only used at registration time to get the structure of the DAG).

Can you talk about your use case a bit more?

@eapolinario eapolinario added the waiting for reporter Used for when we need input from the bug reporter label Jan 24, 2024
@jasonlai1218 jasonlai1218 removed their assignment Jan 25, 2024
@jiwidi
Copy link
Author

jiwidi commented Jan 25, 2024

@jiwidi , in your example you're trying to access the environment variables in a workflow, not in a task. The arguments set via --env are passed to the tasks that compose a workflow, not the workflow itself (since that's only used at registration time to get the structure of the DAG).

Can you talk about your use case a bit more?

Yeah! Maybe you can tell me if there is a better way to do what i had in plan.

I want to run the same workflow but with different configurations. Lets say is an ML job that fetches from a data source, does its compute, outputs some artifacts and its finished.

I want to parametrize the data source config and the output path, the idea in the original post was to pass those with env variables as those are accessible to me when calling pyflyte. As a workaround right now I'm saving the env vars i need into a yaml file and reading it from the workflow to later on pass the values to the task.

Based on your comment, then my task should have access to the vars passed? When i tried running this workflow:

from flytekit import task, workflow
import os

@task
def sample_task() -> None:
    print([key for key in os.environ])
    print(os.environ["DUMMY_KEY"])
    return None


@workflow
def workflow() -> None:
    sample_task()
    return None

With

pyflyte run --envvars '{"DUMMY_KEY"="DUMMY_VALUE"}' main.py workflow
or
pyflyte run --env '{"DUMMY_KEY"="DUMMY_VALUE"}' main.py workflow

Both resulted in failure with no access to the var. How can i acess it within the task then?

@eapolinario
Copy link
Contributor

@jiwidi , this is a bug in local executions. You should be able to do pyflyte run --env DUMMY_KEY=DUMMY_VALUE main.py workflow. The same works on a Flyte cluster, i.e. pyflyte run --env DUMMY_KEY=DUMMY_VALUE --remote main.py workflow.

flyteorg/flytekit#2132 has a fix.

@jiwidi
Copy link
Author

jiwidi commented Jan 26, 2024

@jiwidi , this is a bug in local executions. You should be able to do pyflyte run --env DUMMY_KEY=DUMMY_VALUE main.py workflow. The same works on a Flyte cluster, i.e. pyflyte run --env DUMMY_KEY=DUMMY_VALUE --remote main.py workflow.

flyteorg/flytekit#2132 has a fix.

Hi! Thats great! Thanks for fixing the bug :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlogged For internal use. Reserved for contributor team workflow. bug Something isn't working flytekit FlyteKit Python related issue waiting for reporter Used for when we need input from the bug reporter
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants