-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not silently allow the use of undefined variables in jinja2 templates #11016
Conversation
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
|
d8af9de
to
a071fd7
Compare
@lattwood Can you add etry in UPDATING.md? This will allow us to avoid surprises after migrating to Airflow 2.0 |
@mik-laj will do |
a071fd7
to
f0bd33b
Compare
@mik-laj done |
f0bd33b
to
c7b1b1b
Compare
This can have *extremely* bad consequences. After this change, a jinja2 template like the one below will cause the task instance to fail, if the DAG being executed is not a sub-DAG. This may also display an error on the Rendered tab of the Task Instance page. task_instance.xcom_pull('z', key='return_value', dag_id=dag.parent_dag.dag_id) Prior to the change in this commit, the above template would pull the latest value for task_id 'z', for the given execution_date, from *any DAG*. If your task_ids between DAGs are all unique, or if DAGs using the same task_id always have different execution_date values, this will appear to act like dag_id=None. Our current theory is SQLAlchemy/Python doesn't behave as expected when comparing `jinja2.Undefined` to `None`.
c7b1b1b
to
6a3b3fd
Compare
@mik-laj, docs added, PR is green! |
Awesome work, congrats on your first merged pull request! |
```python | ||
import jinja2 | ||
|
||
dag = DAG('simple_dag', template_undefined=jinja2.Undefined) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it also possible on a template-by-template basis to do it as {{ something | default "x" }}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe all standard jinja2 templating features are still available.
This can have extremely bad consequences. After this change, a jinja2 template like the one below will cause the task instance to fail, if the DAG being executed is not a sub-DAG. This may also display an error on the Rendered tab of the Task Instance page.
task_instance.xcom_pull('z', key='return_value', dag_id=dag.parent_dag.dag_id)
Prior to the change in this commit, the above template would pull the latest value for task_id 'z', for the given execution_date, from any DAG. If your task_ids between DAGs are all unique, or if DAGs using the same task_id always have different execution_date values, this will appear to act like dag_id=None.
Our current theory is SQLAlchemy/Python doesn't behave as expected when comparing
jinja2.Undefined
toNone
.Admittedly this issue boils down to PEBKAC, but this is also a case of Airflow not doing the (perceived) right thing in the face of ambiguity, and this PR fixes that up.
I did find this note about why it was only set to
jinja2.Undefined
, but given the way this can fail (and the difficulty in tracking down the root cause), I strongly feel this warrants breaking backwards compatibility with DAGs/task instances that are attempting to erroneously use undefined variables.#4951 (comment)