Using .output on non-templated fields #26938
Replies: 2 comments 1 reply
-
This is the same reason why JINJA only works on templated fields. Why for JINJA: We do not want (for performance reasons and some ambiguities that it might produce) to walk through all the fields of all the tasks to pre-process them every time task is run. That slows down, complicates the processing and has potential of creating unforeseen problems - currently both templating and output processing can be used in arbitrary nested fields of the parameters passed - even if they are complex structures. Running discovery and preprocessing such structures to find out if you need to pull extra data from Xcom (which is the case) would lead to extra overhead and might have unintended behaviours. Why for output: I think this choice was made mainly to keep consistent behaviour. It's far less problematic with unintended behaviours (where arbitrary string would have to be pre-processed by Jinja) and probably in this case it could have been actually less of a problem. You have to explicitly pass output from the upstream task so there is no danger of accidentally interpolating some string via JINJA templating. So both - the optimisation and accidental is far less of a problem. I think we could remove the limit of only templated fields for outputs, even without introducing breaking changes. And actually you are right that forcing you to turn your operator fields into templated, does indeed make it open to this accidental interpolation problems. Unless there are any other reasons, I'd be for doing it. Maybe it should be protected by adding a flag to dag (similar to COMMENT/UPDATE: I thought a bit more - I think It does introduce a little overhead of having to walk through all the parameters and finding out if "output" is there. so it is not "zero impact". It has also potential of triggering some accidental behaviours (for example turning lazy objects into non-lazy ones before task starts), so if we go for it, I think a flag on DAG level would be necessary. I wonder what others think ? |
Beta Was this translation helpful? Give feedback.
-
Created #27285 |
Beta Was this translation helpful? Give feedback.
-
I just discovered the
.output
property functionality that apparently was released in Airflow 2 for classic operators, as a simple way of accessing their output XComs. I think that this is a super useful feature because it would allow simpler connections between tasks than what I have been doing until now.Until now, I've been explicitly giving a downstream task the task_ids and XCom names that it needs to pull from its upstreams (as hardcoded string parameters). Something like:
With
.output
I could simplify this to:Unfortunately it seems that there is one limitation. On the TaskFlow documentation page (https://airflow.apache.org/docs/apache-airflow/2.4.1/tutorial/taskflow.html#consuming-xcoms-between-decorated-and-traditional-tasks) it says: Using the .output property as an input to another task is supported only for operator parameters listed as a template_field.
I don't use Jinja templating for very many of my parameters as it's mostly irrelevant for me. So my questions are, is there a technical reason for this limitation? If not, is this limitation something that you are considering dropping in a future Airflow version? I suppose I could just make these fields templated to get around it, but I don't really want to turn on templating if I don't expect to need it, since I suppose it introduces the possibility of incorrect interpolation (though perhaps that's a remote possibility because I don't think most of my variables will include
{{
or}}
.Let me know if I should file this as a feature request instead, for now I guess the Ideas category works.
Beta Was this translation helpful? Give feedback.
All reactions