-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce @task.bash TaskFlow decorator #30176
Conversation
A casual example of how this could work can be taken from this tutorial on using the BashOperator. Current: get_ISS_coordinates = BashOperator(
task_id="get_ISS_coordinates",
bash_command="node $AIRFLOW_HOME/include/my_java_script.js"
)
print_ISS_coordinates = BashOperator(
task_id="print_ISS_coordinates",
bash_command="Rscript $AIRFLOW_HOME/include/my_R_script.R $ISS_COORDINATES",
env={
"ISS_COORDINATES": "{{ task_instance.xcom_pull(task_ids='get_ISS_coordinates') }}"
},
append_env=True
)
get_ISS_coordinates >> print_ISS_coordinates Using @task.bash: @task.bash(append_env=True)
def get_ISS_coordinates():
return "node $AIRFLOW_HOME/include/my_java_script.js"
@task.bash(append_env=True, push_op_kwargs_to_env=True)
def print_ISS_coordinates(coords):
return f"Rscript $AIRFLOW_HOME/include/my_R_script.R $coords"
print_ISS_coordinates(coords=get_ISS_coordinates()) |
I wonder that, instead of a decorator, if it would be better to have task.bash(
"""... bash command ...""",
**kwargs,
) |
I agree with @uranusjr (I think at least why I think the comment was raised). it sounds strange to me to have a Python code to return bash commmand to execute as string. I see where it might be useful, but it has certain properties, that make it difficult to debug and reason about. While other decorators focus on the python to execue and simply change the context where they are executed (docker/k8s) - having bash decorator with returning a string that should be executed as bash command is kinda different beast altogether. I am not sure if it will be intuitive enough :) |
Although @potiuk - I would say that although this does feel a little weird it does effectively match the pattern?
You are running @uranusjr 's does look clean, but it also doesn't feel like it matches other existing patterns that we have (except maybe some |
Actually, yeah. why not. I think it's quite nice actuallly. Especially if you can make multi-line bash with |
I'm a big fan of the approach, and have been using a half-baked version of a @bash decorator. Are you setting the default of |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
Reviving this after finally getting out of a large hole. @fritz-astronomer Yes, this approach is very much inspired by the Astro SDK. @potiuk I can confirm this works for multi-line strings. @sethwoodworth Initially I was thinking about keeping the existing default for |
My comment was rather about "why do we need method at all?" (following @uranusjr comment. For me the proposed syntax: task.bash(
"""
node $AIRFLOW_HOME/include/my_java_script.js"
""",
**kwargs,
) felt a bit better than: @task.bash(append_env=True)
def get_ISS_coordinates():
return """
node $AIRFLOW_HOME/include/my_java_script.js
""" However ... I reconsidered.... I actually found a very good reason why the second approach is better. It allows for much more dynamic construction of the bash command - with all the logic executed during task execution rather than during DAG parsing. The first one allows f-strings (rsolved during parsing) and JINJA (resolved during execution) - and pretty much nothing else. When you return string from Python funtionm you kind of combine both Python code wiht logic to build command nd Bash operator to actually do it - all during task execution. This is really powerful - especially in case of mapped operators for example - much more powerful than simply passing string. For example this one would be much more difficult to express nicely in case 1): @task.bash(append_env=True)
def get_ISS_coordinates():
base = "mypy "
for file in get_all_possible_files_to_check():
base += file if not file.startswith("test_")
return base (and that's only an example). So - summarizing: I am all in for the proposal @josh-fell |
@potiuk Just about summed everything I was thinking! I completely agree being able to enrich Bash with Python is strong. Also, it's a more straightforward approach to parametrize the command string itself rather than the current approach of using |
One question I do have is should the For example, with the BashOperator(task_id="bash", bash_command="echo 'ti={{ task_instance_key_str }}'") But should this be allowed with the decorator? @task.bash
def do_bash_things():
return "echo 'ti={{ task_instance_key_str }}'" Or not, and users continue with the current TaskFlow approach of: @task.bash
def do_bash_things(task_instance_key_str=None):
return f"echo 'ti={task_instance_key_str}'" Allowing a templated string to be returned is consistent with classic |
Disabke it by default add ad parameter is_templated? WHY? I think the reason why we want it templated is to allow people to convert old operators easily but I think with python return value - f-string is more natural way and we should encourage that rather than Jinja |
Add a parameter might add some confusion. The current implementation will be able to read a templated Bash script out of the box and there could be Jinja expressions in its contents. But would users be confused about needing to add DAG authors can write the command/script they way they need it to run. It's possible DAG authors are implementing tasks from other folks who aren't involved in the Airflow implementation (e.g. a security engineer wanting to orchestrate a series of commands that a data engineer (the DAG author in this scenario) enriches to make idempotent and better fit into an Airflow context). |
We can absolutely add a note to the docs that using the f-string approach is encouraged. I completely agree it's more "Pythonic" and fits better in the TaskFlow paradigm. |
Fine with that :) |
I would argue if you need logic it is better to create a shell script file and use templating to pass in the values instead. Trying to generate a shell command with Python code—or more generally, ad-hoc code generation by hand—is a very good way to shoot yourself in the foot. |
I do not think about much logic - I think it's good to give our users a versaitle set of tools. I find ot really easy to have some patterns usable. Simple |
c6d3cf7
to
3d1367e
Compare
7cd3612
to
cb9ef21
Compare
Adding a @task.bash TaskFlow decorator to the collection of existing core TaskFlow decorators. This particular decorator will use the return value of the decorated callable as the Bash command to execute using the existing BashOperator functionality.
Adding a @task.bash TaskFlow decorator to the collection of existing core TaskFlow decorators. This particular decorator will use the return value of the decorated callable as the Bash command to execute using the existing BashOperator functionality.
TODO: