Introduce @task.bash TaskFlow decorator #30176

josh-fell · 2023-03-18T03:11:19Z

Adding a @task.bash TaskFlow decorator to the collection of existing core TaskFlow decorators. This particular decorator will use the return value of the decorated callable as the Bash command to execute using the existing BashOperator functionality.

TODO:

Add docs
Add examples

josh-fell · 2023-03-18T03:23:11Z

A casual example of how this could work can be taken from this tutorial on using the BashOperator.

Current:

get_ISS_coordinates = BashOperator(
    task_id="get_ISS_coordinates",
    bash_command="node $AIRFLOW_HOME/include/my_java_script.js"
)

print_ISS_coordinates = BashOperator(
    task_id="print_ISS_coordinates",
    bash_command="Rscript $AIRFLOW_HOME/include/my_R_script.R $ISS_COORDINATES",
    env={
        "ISS_COORDINATES": "{{ task_instance.xcom_pull(task_ids='get_ISS_coordinates') }}"
    },
    append_env=True
)

get_ISS_coordinates >> print_ISS_coordinates

Using @task.bash:

@task.bash(append_env=True)
def get_ISS_coordinates():
    return "node $AIRFLOW_HOME/include/my_java_script.js"

@task.bash(append_env=True, push_op_kwargs_to_env=True)
def print_ISS_coordinates(coords):
    return f"Rscript $AIRFLOW_HOME/include/my_R_script.R $coords"

print_ISS_coordinates(coords=get_ISS_coordinates())

uranusjr · 2023-03-22T07:19:35Z

I wonder that, instead of a decorator, if it would be better to have

task.bash(
    """... bash command ...""",
    **kwargs,
)

potiuk · 2023-03-22T11:42:32Z

I agree with @uranusjr (I think at least why I think the comment was raised). it sounds strange to me to have a Python code to return bash commmand to execute as string. I see where it might be useful, but it has certain properties, that make it difficult to debug and reason about. While other decorators focus on the python to execue and simply change the context where they are executed (docker/k8s) - having bash decorator with returning a string that should be executed as bash command is kinda different beast altogether. I am not sure if it will be intuitive enough :)

fritz-astronomer · 2023-04-21T17:00:56Z

Although @potiuk - I would say that although this does feel a little weird it does effectively match the pattern?

@task.bash
def fn(...):
   return "echo hi"

You are running echo hi, in bash. Handing off to the other system, in effect. Yes not the whole function, but 🤷

@uranusjr 's does look clean, but it also doesn't feel like it matches other existing patterns that we have (except maybe some astro-sdk stuff that has veered in that direction - e.g. load_file( ...)

potiuk · 2023-04-27T22:09:19Z

Actually, yeah. why not. I think it's quite nice actuallly. Especially if you can make multi-line bash with """ that does fill like useful thing.

sethwoodworth · 2023-05-24T14:16:58Z

I'm a big fan of the approach, and have been using a half-baked version of a @bash decorator. Are you setting the default of append_env to False to match the behavior of the BashOperator? In my use-cases, I could see wanting them always available.

github-actions · 2023-07-09T00:15:59Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

github-actions · 2023-08-24T00:10:12Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

github-actions · 2023-10-10T00:11:13Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

github-actions · 2023-11-25T00:11:09Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

josh-fell · 2023-11-28T02:42:22Z

Reviving this after finally getting out of a large hole.

@fritz-astronomer Yes, this approach is very much inspired by the Astro SDK.

@potiuk I can confirm this works for multi-line strings.

@sethwoodworth Initially I was thinking about keeping the existing default for append_env. However, I think I will ultimately do away with the push_op_kwargs_to_env parameter. It feels a bit quirky and can lead to less explicit scripting. For an "introduction" of this decorator, it seems a bit more pragmatic to not gum-up this PR but rather discuss that particular feature in a future PR. WDYT?

potiuk · 2023-11-29T00:01:19Z

@potiuk I can confirm this works for multi-line strings.

My comment was rather about "why do we need method at all?" (following @uranusjr comment.

For me the proposed syntax:

task.bash(
    """
node $AIRFLOW_HOME/include/my_java_script.js"
""",
    **kwargs,
)

felt a bit better than:

@task.bash(append_env=True)
def get_ISS_coordinates():
    return """
node $AIRFLOW_HOME/include/my_java_script.js
"""

However ... I reconsidered....

I actually found a very good reason why the second approach is better.

It allows for much more dynamic construction of the bash command - with all the logic executed during task execution rather than during DAG parsing. The first one allows f-strings (rsolved during parsing) and JINJA (resolved during execution) - and pretty much nothing else.

When you return string from Python funtionm you kind of combine both Python code wiht logic to build command nd Bash operator to actually do it - all during task execution. This is really powerful - especially in case of mapped operators for example - much more powerful than simply passing string.

For example this one would be much more difficult to express nicely in case 1):

@task.bash(append_env=True)
def get_ISS_coordinates():
    base = "mypy "
    for file in get_all_possible_files_to_check():
        base += file if not file.startswith("test_")
    return base

(and that's only an example).

So - summarizing: I am all in for the proposal @josh-fell

josh-fell · 2023-11-29T04:03:53Z

@potiuk Just about summed everything I was thinking! I completely agree being able to enrich Bash with Python is strong. Also, it's a more straightforward approach to parametrize the command string itself rather than the current approach of using env (which IMO is really for static attributes).

josh-fell · 2023-11-30T03:24:06Z

One question I do have is should the bash_command attribute (aka the return value from the callable) be templated? Clearly BashOperator has this now, but with TaskFlow historically the args passed in to the callable can be templated. So should the behavior for this particular decorator stay the same? I've been wavering a little so does anyone have any thoughts?

For example, with the BashOperator you can do:

BashOperator(task_id="bash", bash_command="echo 'ti={{ task_instance_key_str }}'")

But should this be allowed with the decorator?

@task.bash
def do_bash_things():
    return "echo 'ti={{ task_instance_key_str }}'"

Or not, and users continue with the current TaskFlow approach of:

@task.bash
def do_bash_things(task_instance_key_str=None):
    return f"echo 'ti={task_instance_key_str}'"

Allowing a templated string to be returned is consistent with classic BashOperator authoring, but disallowing it seems to align with TaskFlow authoring generally. I'm leaning towards handling a templated string to be consistent with the BashOperator.

potiuk · 2023-11-30T09:39:11Z

Disabke it by default add ad parameter is_templated?

WHY?

I think the reason why we want it templated is to allow people to convert old operators easily but I think with python return value - f-string is more natural way and we should encourage that rather than Jinja

josh-fell · 2023-12-07T02:22:42Z

Disabke it by default add ad parameter is_templated?

WHY?

I think the reason why we want it templated is to allow people to convert old operators easily but I think with python return value - f-string is more natural way and we should encourage that rather than Jinja

Add a parameter might add some confusion. The current implementation will be able to read a templated Bash script out of the box and there could be Jinja expressions in its contents. But would users be confused about needing to add is_templated_cmd parameter when using a script? Right now, no, but they would if they returned a string from the function that was templated. Maybe we just handle it and the authoring experience is less specialized and consistent between these use cases.

DAG authors can write the command/script they way they need it to run. It's possible DAG authors are implementing tasks from other folks who aren't involved in the Airflow implementation (e.g. a security engineer wanting to orchestrate a series of commands that a data engineer (the DAG author in this scenario) enriches to make idempotent and better fit into an Airflow context).

josh-fell · 2023-12-07T02:34:16Z

We can absolutely add a note to the docs that using the f-string approach is encouraged. I completely agree it's more "Pythonic" and fits better in the TaskFlow paradigm.

potiuk · 2023-12-07T04:55:04Z

We can absolutely add a note to the docs that using the f-string approach is encouraged. I completely agree it's more "Pythonic" and fits better in the TaskFlow paradigm.

Fine with that :)

uranusjr · 2023-12-07T09:52:33Z

I would argue if you need logic it is better to create a shell script file and use templating to pass in the values instead. Trying to generate a shell command with Python code—or more generally, ad-hoc code generation by hand—is a very good way to shoot yourself in the foot.

potiuk · 2023-12-07T13:18:53Z

I would argue if you need logic it is better to create a shell script file and use templating to pass in the values instead. Trying to generate a shell command with Python code—or more generally, ad-hoc code generation by hand—is a very good way to shoot yourself in the foot.

I do not think about much logic - I think it's good to give our users a versaitle set of tools. I find ot really easy to have some patterns usable. Simple if x return "this script" else return "other script" is much better if the logic is done in Python for example.

docs/apache-airflow/howto/operator/bash.rst

Adding a @task.bash TaskFlow decorator to the collection of existing core TaskFlow decorators. This particular decorator will use the return value of the decorated callable as the Bash command to execute using the existing BashOperator functionality.

github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Jul 9, 2023

josh-fell removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Jul 9, 2023

github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Aug 24, 2023

josh-fell removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Aug 25, 2023

github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Oct 10, 2023

josh-fell removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Oct 10, 2023

github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Nov 25, 2023

josh-fell removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Nov 28, 2023

josh-fell force-pushed the bash-deco branch from 2a0fa5b to 69b0c59 Compare November 29, 2023 04:12

josh-fell force-pushed the bash-deco branch from 69b0c59 to cb3f973 Compare December 1, 2023 05:06

josh-fell force-pushed the bash-deco branch 2 times, most recently from c6d3cf7 to 3d1367e Compare December 15, 2023 20:05

josh-fell marked this pull request as ready for review December 15, 2023 20:05

josh-fell requested a review from potiuk as a code owner December 15, 2023 20:05

josh-fell requested a review from uranusjr December 15, 2023 20:06

josh-fell force-pushed the bash-deco branch 2 times, most recently from 7cd3612 to cb9ef21 Compare December 16, 2023 15:06

josh-fell commented Dec 16, 2023

View reviewed changes

docs/apache-airflow/howto/operator/bash.rst Show resolved Hide resolved

josh-fell force-pushed the bash-deco branch from cb9ef21 to f91090b Compare December 16, 2023 18:36

potiuk approved these changes Dec 17, 2023

View reviewed changes

josh-fell force-pushed the bash-deco branch from f91090b to 094791c Compare December 19, 2023 02:11

josh-fell added 2 commits December 19, 2023 16:01

Refresh bash_command once and earlier

e3146e4

josh-fell force-pushed the bash-deco branch from 094791c to e3146e4 Compare December 19, 2023 21:01

josh-fell merged commit e920344 into apache:main Dec 19, 2023
52 checks passed

josh-fell deleted the bash-deco branch December 19, 2023 21:51

ephraimbuddy added this to the Airflow 2.9.0 milestone Jan 10, 2024

ephraimbuddy added the type:new-feature Changelog: New Features label Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce @task.bash TaskFlow decorator #30176

Introduce @task.bash TaskFlow decorator #30176

josh-fell commented Mar 18, 2023 •

edited

Loading

josh-fell commented Mar 18, 2023 •

edited

Loading

uranusjr commented Mar 22, 2023

potiuk commented Mar 22, 2023 •

edited

Loading

fritz-astronomer commented Apr 21, 2023 •

edited

Loading

potiuk commented Apr 27, 2023

sethwoodworth commented May 24, 2023

github-actions bot commented Jul 9, 2023

github-actions bot commented Aug 24, 2023

github-actions bot commented Oct 10, 2023

github-actions bot commented Nov 25, 2023

josh-fell commented Nov 28, 2023

potiuk commented Nov 29, 2023

josh-fell commented Nov 29, 2023

josh-fell commented Nov 30, 2023 •

edited

Loading

potiuk commented Nov 30, 2023 •

edited

Loading

josh-fell commented Dec 7, 2023

josh-fell commented Dec 7, 2023 •

edited

Loading

potiuk commented Dec 7, 2023

uranusjr commented Dec 7, 2023

potiuk commented Dec 7, 2023

Introduce @task.bash TaskFlow decorator #30176

Introduce @task.bash TaskFlow decorator #30176

Conversation

josh-fell commented Mar 18, 2023 • edited Loading

josh-fell commented Mar 18, 2023 • edited Loading

uranusjr commented Mar 22, 2023

potiuk commented Mar 22, 2023 • edited Loading

fritz-astronomer commented Apr 21, 2023 • edited Loading

potiuk commented Apr 27, 2023

sethwoodworth commented May 24, 2023

github-actions bot commented Jul 9, 2023

github-actions bot commented Aug 24, 2023

github-actions bot commented Oct 10, 2023

github-actions bot commented Nov 25, 2023

josh-fell commented Nov 28, 2023

potiuk commented Nov 29, 2023

josh-fell commented Nov 29, 2023

josh-fell commented Nov 30, 2023 • edited Loading

potiuk commented Nov 30, 2023 • edited Loading

josh-fell commented Dec 7, 2023

josh-fell commented Dec 7, 2023 • edited Loading

potiuk commented Dec 7, 2023

uranusjr commented Dec 7, 2023

potiuk commented Dec 7, 2023

josh-fell commented Mar 18, 2023 •

edited

Loading

josh-fell commented Mar 18, 2023 •

edited

Loading

potiuk commented Mar 22, 2023 •

edited

Loading

fritz-astronomer commented Apr 21, 2023 •

edited

Loading

josh-fell commented Nov 30, 2023 •

edited

Loading

potiuk commented Nov 30, 2023 •

edited

Loading

josh-fell commented Dec 7, 2023 •

edited

Loading