Using .output on non-templated fields #26938

stephenonethree · 2022-10-07T15:37:01Z

stephenonethree
Oct 7, 2022

I just discovered the .output property functionality that apparently was released in Airflow 2 for classic operators, as a simple way of accessing their output XComs. I think that this is a super useful feature because it would allow simpler connections between tasks than what I have been doing until now.

Until now, I've been explicitly giving a downstream task the task_ids and XCom names that it needs to pull from its upstreams (as hardcoded string parameters). Something like:

# pushes XCom named myvar
upstream_task=SomeOperator(task_id='upstream_task')

# this is a custom operator and within execute() I have a line roughly like:
# actual_input = context['task_instance'].xcom_pull(
#     dag_id=context['dag'].dag_id, task_ids=upstream_task_id, key=upstream_xcom_name)
this_task = OtherOperator(
    task_id='this_task',
    upstream_task_id='upstream_task',
    upstream_xcom_name='myvar'
)
upstream_task >> this_task

With .output I could simplify this to:

# pushes XCom named myvar
upstream_task=SomeOperator(task_id='upstream_task')

this_task = OtherOperator(
    task_id='this_task',
    actual_input=upstream_task.output['myvar']
)
upstream_task >> this_task

Unfortunately it seems that there is one limitation. On the TaskFlow documentation page (https://airflow.apache.org/docs/apache-airflow/2.4.1/tutorial/taskflow.html#consuming-xcoms-between-decorated-and-traditional-tasks) it says: Using the .output property as an input to another task is supported only for operator parameters listed as a template_field.

I don't use Jinja templating for very many of my parameters as it's mostly irrelevant for me. So my questions are, is there a technical reason for this limitation? If not, is this limitation something that you are considering dropping in a future Airflow version? I suppose I could just make these fields templated to get around it, but I don't really want to turn on templating if I don't expect to need it, since I suppose it introduces the possibility of incorrect interpolation (though perhaps that's a remote possibility because I don't think most of my variables will include {{ or }}.

Let me know if I should file this as a feature request instead, for now I guess the Ideas category works.

potiuk · 2022-10-07T22:34:13Z

potiuk
Oct 7, 2022
Collaborator

This is the same reason why JINJA only works on templated fields.

Why for JINJA: We do not want (for performance reasons and some ambiguities that it might produce) to walk through all the fields of all the tasks to pre-process them every time task is run. That slows down, complicates the processing and has potential of creating unforeseen problems - currently both templating and output processing can be used in arbitrary nested fields of the parameters passed - even if they are complex structures. Running discovery and preprocessing such structures to find out if you need to pull extra data from Xcom (which is the case) would lead to extra overhead and might have unintended behaviours.

Why for output: I think this choice was made mainly to keep consistent behaviour. It's far less problematic with unintended behaviours (where arbitrary string would have to be pre-processed by Jinja) and probably in this case it could have been actually less of a problem. You have to explicitly pass output from the upstream task so there is no danger of accidentally interpolating some string via JINJA templating. So both - the optimisation and accidental is far less of a problem.

I think we could remove the limit of only templated fields for outputs, even without introducing breaking changes. And actually you are right that forcing you to turn your operator fields into templated, does indeed make it open to this accidental interpolation problems.

Unless there are any other reasons, I'd be for doing it. Maybe it should be protected by adding a flag to dag (similar to render_template_as_native_obj) if there are any other concerns, but I see no major problems with it.

COMMENT/UPDATE: I thought a bit more - I think It does introduce a little overhead of having to walk through all the parameters and finding out if "output" is there. so it is not "zero impact". It has also potential of triggering some accidental behaviours (for example turning lazy objects into non-lazy ones before task starts), so if we go for it, I think a flag on DAG level would be necessary.

I wonder what others think ?

1 reply

o-nikolas Oct 20, 2022
Collaborator

I would love to see this change made! I see often in PRs people adding more fields to be templated to get around this, which feels like the wrong solution. It would be great from a user experience perspective to remove this restriction.

Shall we change this Discussion to a Github Issue so that someone can be assigned the development task or do we want to wait for more folks to weigh in?

potiuk · 2022-10-26T02:49:22Z

potiuk
Oct 26, 2022
Collaborator

Created #27285

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using .output on non-templated fields #26938

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Using .output on non-templated fields #26938

stephenonethree Oct 7, 2022

Replies: 2 comments · 1 reply

potiuk Oct 7, 2022 Collaborator

o-nikolas Oct 20, 2022 Collaborator

potiuk Oct 26, 2022 Collaborator

stephenonethree
Oct 7, 2022

Replies: 2 comments 1 reply

potiuk
Oct 7, 2022
Collaborator

o-nikolas Oct 20, 2022
Collaborator

potiuk
Oct 26, 2022
Collaborator