diff --git a/docs/configuration/index.rst b/docs/configuration/index.rst index 919ed9b1e..ec69c1f52 100644 --- a/docs/configuration/index.rst +++ b/docs/configuration/index.rst @@ -20,6 +20,7 @@ Cosmos offers a number of configuration options to customize its behavior. For m Scheduling Testing Behavior Selecting & Excluding + Partial Parsing Operator Args Compiled SQL Logging diff --git a/docs/configuration/partial-parsing.rst b/docs/configuration/partial-parsing.rst new file mode 100644 index 000000000..911e828b3 --- /dev/null +++ b/docs/configuration/partial-parsing.rst @@ -0,0 +1,68 @@ +.. _partial-parsing: + +Partial parsing +=============== + +Starting in the 1.4 version, Cosmos tries to leverage dbt's partial parsing (``partial_parse.msgpack``) to speed up both the task execution and the DAG parsing (if using ``LoadMode.DBT_LS``). + +This feature is bound to `dbt partial parsing limitations `_. +As an example, ``dbt`` requires the same ``--vars``, ``--target``, ``--profile``, and ``profile.yml`` environment variables (as called by the ``env_var()`` macro) while running dbt commands, otherwise it will reparse the project from scratch. + +Profile configuration +--------------------- + +To respect the dbt requirement of having the same profile to benefit from partial parsing, Cosmos users should either: +* If using Cosmos profile mapping (``ProfileConfig(profile_mapping=...``), disable using mocked profile mappings by setting ``render_config=RenderConfig(enable_mock_profile=False)`` +* Declare their own ``profiles.yml`` file, via ``ProfileConfig(profiles_yml_filepath=...)`` + +If users don't follow these guidelines, Cosmos will use different profiles to parse the dbt project and to run tasks, and the user won't leverage dbt partial parsing. +Their logs will contain multiple ``INFO`` messages similar to the following, meaning that Cosmos is not using partial parsing: + +.. code-block:: + + 13:33:16 Unable to do partial parsing because profile has changed + 13:33:16 Unable to do partial parsing because env vars used in profiles.yml have changed + +dbt vars +-------- + +If the Airflow scheduler and worker processes run in the same node, users must ensure the dbt ``--vars`` flag is the same in the ``RenderConfig`` and ``ExecutionConfig``. + +Otherwise, users may see messages similar to the following in their logs: + +.. code-block:: + + [2024-03-14, 17:04:57 GMT] {{subprocess.py:94}} INFO - Unable to do partial parsing because config vars, config profile, or config target have changed + + +Caching +------- + +If the dbt project ``target`` directory has a ``partial_parse.msgpack``, Cosmos will attempt to use it. + +There is a chance, however, that the file is stale or was generated in a way that is different to how Cosmos runs the dbt commands. + +Therefore, Cosmos also caches the most up-to-date ``partial_parse.msgpack`` file after running a dbt command in the `system temporary directory `_. +With this, unless there are code changes, each Airflow node should only run the dbt command with a full dbt project parse once, and benefit from partial parsing from then onwards. + + +Caching is enabled by default. +It is possible to disable caching or override the directory that Cosmos uses caching with the Airflow configuration: + +.. code-block:: cfg + + [cosmos] + cache_dir = path/to/docs/here # to override default caching directory (by default, uses the system temporary directory) + enable_cache = False # to disable caching (enabled by default) + +Or environment variable: + +.. code-block:: cfg + + AIRFLOW__COSMOS__CACHE_DIR="path/to/docs/here" # to override default caching directory (by default, uses the system temporary directory) + AIRFLOW__COSMOS__ENABLE_CACHE="False" # to disable caching (enabled by default) + +Disabling +--------- + +To switch off partial parsing in Cosmos, use the argument ``partial_parse=False`` in the ``ProjectConfig``. diff --git a/docs/configuration/project-config.rst b/docs/configuration/project-config.rst index 3bf524ac8..2882ee9cc 100644 --- a/docs/configuration/project-config.rst +++ b/docs/configuration/project-config.rst @@ -25,7 +25,7 @@ variables that should be used for rendering and execution. It takes the followin env vars is only supported when using ``RenderConfig.LoadMode.DBT_LS`` load mode. - ``partial_parse``: (new in v1.4) If True, then attempt to use the ``partial_parse.msgpack`` if it exists. This is only used for the ``LoadMode.DBT_LS`` load mode, and for the ``ExecutionMode.LOCAL`` and ``ExecutionMode.VIRTUALENV`` - execution modes. + execution modes. Due to the way that dbt `partial parsing works `_, it does not work with Cosmos profile mapping classes. To benefit from this feature, users have to set the ``profiles_yml_filepath`` argument in ``ProfileConfig``. Project Config Example ---------------------- diff --git a/docs/getting_started/execution-modes.rst b/docs/getting_started/execution-modes.rst index 8f7013572..1765144d9 100644 --- a/docs/getting_started/execution-modes.rst +++ b/docs/getting_started/execution-modes.rst @@ -56,8 +56,10 @@ The ``local`` execution mode assumes a ``dbt`` binary is reachable within the Ai If ``dbt`` was not installed as part of the Cosmos packages, users can define a custom path to ``dbt`` by declaring the argument ``dbt_executable_path``. -By default, if Cosmos sees a ``partial_parse.msgpack`` in the target directory of the dbt project directory when using ``local`` execution, it will use this for partial parsing to speed up task execution. -This can be turned off by setting ``partial_parse=False`` in the ``ProjectConfig``. +.. note:: + Starting in the 1.4 version, Cosmos tries to leverage the dbt partial parsing (``partial_parse.msgpack``) to speed up task execution. + This feature is bound to `dbt partial parsing limitations `_. + Learn more: :ref:`partial-parsing`. When using the ``local`` execution mode, Cosmos converts Airflow Connections into a native ``dbt`` profiles file (``profiles.yml``).