Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tutorial improvements. #25

Merged
merged 7 commits into from
Jun 15, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions airflow/example_dags/tutorial.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@

dag = DAG('tutorial', default_args=default_args)

# t1, t2 and t3 are examples of tasks created by instatiating operators
t1 = BashOperator(
task_id='print_date',
bash_command='date',
Expand Down
48 changes: 23 additions & 25 deletions docs/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ complicated, a line by line explanation follows below.

dag = DAG('tutorial', default_args=default_args)

# t1, t2 and t3 are examples of tasks created by instatiating operators
t1 = BashOperator(
task_id='print_date',
bash_command='date',
Expand Down Expand Up @@ -123,8 +124,9 @@ We also pass the default argument dictionary that we just define.

Tasks
-----
Tasks are generated when instantiating objects from operators. The first
argument ``task_id`` acts as a unique identifier for the task.
Tasks are generated when instantiating objects from operators. An object
instatiated from an operator is called a constructor. The first argument
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"constructor" is the name of the special method that define how the object is created. It's an OOP term, not specific to Airflow or anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, apologies, I'm coming from a FP background (R). :)

``task_id`` acts as a unique identifier for the task.

.. code:: python

Expand All @@ -141,24 +143,25 @@ argument ``task_id`` acts as a unique identifier for the task.

Notice how we pass a mix of operator specific arguments (``bash_command``) and
an argument common to all operators (``email_on_failure``) inherited
from BaseOperator to the operators constructor. This is simpler than
from BaseOperator to the operator's constructor. This is simpler than
passing every argument for every constructor call. Also, notice that in
the second call we override ``email_on_failure`` parameter with ``False``.
the second task we override ``email_on_failure`` parameter with ``False``.

The precedence rules for operator is:
The precedence rules for a task are as follows:

* Use the argument explicitly passed to the constructor
* Look in the default_args dictonary, use the value from there if it exists
* Use the operator's default, if any
* If none of these are defined, Airflow raises an exception
1. Explicitly passed arguments
2. Values that exist in the ``default_args`` dictionary
3. The operator's default value, if one exists

A task must include or inherit the arguments ``task_id`` and ``owner``,
otherwise Airflow will raise an exception.

Templating with Jinja
---------------------
Airflow leverages the power of
`Jinja Templating <http://jinja.pocoo.org/docs/dev/>`_ and provides
the pipeline author
with a set of builtin parameters and macros. Airflow also provides
with a set of built-in parameters and macros. Airflow also provides
hooks for the pipeline author to define their own parameters, macros and
templates.

Expand All @@ -172,7 +175,7 @@ curly brackets, and point to the most common template variable: ``{{ ds }}``.
templated_command = """
{% for i in range(5) %}
echo "{{ ds }}"
echo "{{ macros.ds_add(ds, 7)}}"
echo "{{ macros.ds_add(ds, 7) }}"
echo "{{ params.my_param }}"
{% endfor %}
"""
Expand All @@ -185,25 +188,20 @@ curly brackets, and point to the most common template variable: ``{{ ds }}``.

Notice that the ``templated_command`` contains code logic in ``{% %}`` blocks,
references parameters like ``{{ ds }}``, calls a function as in
``{{ macros.ds_add(ds, 7)}}``, and references a user defined parameter
``{{ macros.ds_add(ds, 7)}}``, and references a user-defined parameter
in ``{{ params.my_param }}``.

The ``params`` hook in BaseOperator allows you to pass a dictionary of
The ``params`` hook in ``BaseOperator`` allows you to pass a dictionary of
parameters and/or objects to your templates. Please take the time
to understand how the parameter ``my_param`` makes it through to the template.

Note that templated fields can point to files if you prefer.
It may be desirable for many reasons, like keeping your scripts logic
outside of your pipeline code, getting proper code highlighting in files,
and just generally allowing you to organize your pipeline's logic as you
please.

In the above example, we could have
had a file ``templated_command.sh``, and referenced it in the ``bash_command``
parameter, as in
``bash_command='templated_command.sh'`` where the file location is relative
to the pipeline's (``tutorial.py``) location. Note that it is also possible
to define your ``template_searchpath`` pointing to any folder
Files can also be passed to the ``bash_command`` argument, like
``bash_command='templated_command.sh'`` where the file location is relative to
the directory containing the pipeline file (``tutorial.py`` in this case). This
may be desirable for many reasons, like separating your script's logic and
pipeline code, allowing for proper code highlighting in files composed in
different languages, and general flexibility in structuring pipelines. It is
also possible to define your ``template_searchpath`` pointing to any folder
locations in the DAG constructor call.

Setting up Dependencies
Expand Down