Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explain the users how they can check if python code is top-level #34006

Merged
merged 1 commit into from
Sep 1, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 90 additions & 0 deletions docs/apache-airflow/best-practices.rst
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,96 @@ Good example:

In the Bad example, NumPy is imported each time the DAG file is parsed, which will result in suboptimal performance in the DAG file processing. In the Good example, NumPy is only imported when the task is running.

Since it is not always obvious, see the next chapter to check how my code is "top-level" code.

How to check if my code is "top-level" code
-------------------------------------------

In order to understand whether your code is "top-level" or not you need to understand a lot of
intricacies of how parsing Python works. In general, when Python parses the python file it executes
the code it sees, except (in general) internal code of the methods that it does not execute.

It has a number of special cases that are not obvious - for example top-level code also means
any code that is used to determine default values of methods.

However, there is an easy way to check whether your code is "top-level" or not. You simply need to
parse your code and see if the piece of code gets executed.

Imagine this code:

.. code-block:: python

from airflow import DAG
from airflow.operators.python import PythonOperator
import pendulum


def get_task_id():
return "print_array_task" # <- is that code going to be executed?


def get_array():
return [1, 2, 3] # <- is that code going to be executed?


with DAG(
dag_id="example_python_operator",
schedule=None,
start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
catchup=False,
tags=["example"],
) as dag:

operator = PythonOperator(
task_id=get_task_id(),
python_callable=get_array,
dag=dag,
Comment on lines +219 to +222
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I gather from #33997 the confusion is more around templated parameters or parameters in general?

I wonder if the author question was more around this case:

      operator = PythonOperator(
          task_id=get_task_id(),
          python_callable=get_array,
          op_kwargs={"0": my_func()}  # <- this one
          dag=dag,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be .. but the mechanism of checking it is the same. If they add print in my_func() they will also see it.

)

What you can do check it, add to your code you want to check some print statements and run
``python <my_dag_file>.py``.


.. code-block:: python

from airflow import DAG
from airflow.operators.python import PythonOperator
import pendulum


def get_task_id():
print("Executing 1")
return "print_array_task" # <- is that code going to be executed? YES


def get_array():
print("Executing 2")
return [1, 2, 3] # <- is that code going to be executed? NO


with DAG(
dag_id="example_python_operator",
schedule=None,
start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
catchup=False,
tags=["example"],
) as dag:

operator = PythonOperator(
task_id=get_task_id(),
python_callable=get_array,
dag=dag,
)

When you execute that code you will see:

.. code-block:: bash

root@cf85ab34571e:/opt/airflow# python /files/test_python.py
Executing 1

This means that the ``get_array`` is not executed as top-level code, but ``get_task_id`` is.

.. _best_practices/dynamic_dag_generation:

Dynamic DAG Generation
Expand Down