Skip to content

Commit

Permalink
Add more selective provider tests (#24666)
Browse files Browse the repository at this point in the history
After implementing #24610 and few follow-up fixes, it is now easy
to add more optimizations to our unit test execution in CI (and
to give this capability back to our contributors).

This PR adds capability of running tests for selected set of
providers - not for the whole "Providers" group. You can
specify `--test-type "Providers[airbyte,http]" to only run tests
for the two selected providers.

This is the step towards separating providers to separate
repositories, but it also allows to optimize the experience of
the contributors developing only single provider changes (which
is vast majority of contributions).

This also allows to optimize build and elapsed time needd to run
tests for those PRs that only affects selected providers (again -
vast majority of PRs).

The CI selection of which provider tests is done now in Selective
Checkcs - they are a bit smarter in just selecting the providers
that has been changed, they also check if there are any other
providers that depend on it (we keep automatically updated by
pre-commit dependencies.json file and this file determines
which files should be run.

(cherry picked from commit 3dedbd3)
  • Loading branch information
potiuk committed Jul 21, 2022
1 parent 60c64bc commit 4daa5c1
Show file tree
Hide file tree
Showing 10 changed files with 311 additions and 112 deletions.
53 changes: 48 additions & 5 deletions BREEZE.rst
Original file line number Diff line number Diff line change
Expand Up @@ -491,9 +491,6 @@ Airflow Breeze is a bash script serving as a "swiss-army-knife" of Airflow testi
hood it uses other scripts that you can also run manually if you have problem with running the Breeze
environment. Breeze script allows performing the following tasks:

Development tasks
-----------------

Those are commands mostly used by contributors:

* Execute arbitrary command in the test environment with ``breeze shell`` command
Expand All @@ -503,6 +500,7 @@ Those are commands mostly used by contributors:
* Initialize local virtualenv with ``./scripts/tools/initialize_virtualenv.py`` command
* Run static checks with autocomplete support ``breeze static-checks`` command
* Run test specified with ``breeze tests`` command
* Run docker-compose tests with ``breeze docker-compose-tests`` command.
* Build CI docker image with ``breeze build-image`` command
* Cleanup breeze with ``breeze cleanup`` command

Expand All @@ -515,8 +513,53 @@ Additional management tasks:
Tests
-----

* Run docker-compose tests with ``breeze docker-compose-tests`` command.
* Run test specified with ``breeze tests`` command.
You can regular unit tests with ``breeze`` in two different ways, either interactively run tests with
the default ``shell`` command or via the ``tests`` command.

Iterate on tests interactively
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You can simply enter the ``breeze`` container and run ``pytest`` command there. You can enter the
container via just ``breeze`` command or ``breeze shell`` command (the latter has more options
useful when you run integration or system tests). This is the best way if you want to interactively
run selected tests and iterate with the tests. Once you enter ``breeze`` environment it is ready
out-of-the-box to run your tests by running the right ``pytest`` command (autocomplete should help
you with autocompleting test name if you start typing ``pytest tests<TAB>``).

Here are few examples:

Running single test:

.. code-block:: bash
pytest tests/core/test_core.py::TestCore::test_check_operators
To run the whole test class:

.. code-block:: bash
pytest tests/core/test_core.py::TestCore
You can re-run the tests interactively, add extra parameters to pytest and modify the files before
re-running the test to iterate over the tests. You can also add more flags when starting the
``breeze shell`` command when you run integration tests or system tests. Read more details about it
in the ``TESTING.rst <TESTING.rst#>`` where all the test types of our are explained and more information
on how to run them.

Running group of tests
~~~~~~~~~~~~~~~~~~~~~~

You can also run tests via built-in ``breeze tests`` command - similarly as iterative ``pytest`` command
allows to run test individually, or by class or in any other way pytest allows to test them, but it
also allows to run the tests in the same test "types" that are used to run the tests in CI: Core, Always
API, Providers. This how our CI runs them - running each group in parallel to other groups and you can
replicate this behaviour.

Another interesting use of the ``breeze tests`` command is that you can easily specify sub-set of the
tests for Providers. ``breeze tests --test-type "Providers[airbyte,http]`` for example will only run
tests for airbyte and http providers.

Here is the detailed set of options for the ``breeze tests`` command.

.. image:: ./images/breeze/output-tests.svg
:width: 100%
Expand Down
7 changes: 6 additions & 1 deletion Dockerfile.ci
Original file line number Diff line number Diff line change
Expand Up @@ -952,13 +952,18 @@ else
${TEST_TYPE} == "Long" || \
${TEST_TYPE} == "Integration" ]]; then
SELECTED_TESTS=("${ALL_TESTS[@]}")
elif [[ ${TEST_TYPE} =~ Providers\[(.*)\] ]]; then
SELECTED_TESTS=()
for provider in ${BASH_REMATCH[1]//,/ }
do
SELECTED_TESTS+=("tests/providers/${provider//./\/}")
done
else
echo
echo "${COLOR_RED}ERROR: Wrong test type ${TEST_TYPE} ${COLOR_RESET}"
echo
exit 1
fi

fi
readonly SELECTED_TESTS CLI_TESTS API_TESTS PROVIDERS_TESTS CORE_TESTS WWW_TESTS \
ALL_TESTS ALL_PRESELECTED_TESTS
Expand Down
15 changes: 14 additions & 1 deletion TESTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,8 @@ To run unit tests from the Visual Studio Code:
:alt: Running tests

Running Unit Tests
--------------------------------
------------------

To run unit, integration, and system tests from the Breeze and your
virtualenv, you can use the `pytest <http://doc.pytest.org/en/latest/>`_ framework.

Expand Down Expand Up @@ -188,6 +189,18 @@ You can also limit the tests to execute to specific group of tests
breeze tests --test-type Core
In case of Providers tests, you can run tests for all providers

.. code-block:: bash
breeze tests --test-type Providers
You can also limit the set of providers you would like to run tests of

.. code-block:: bash
breeze tests --test-type "Providers[airbyte,http]"
You can also write tests in "limited progress" mode (useful in the future to run CI). In this mode each
test just prints "percentage" summary of the run as single line and only dumps full output of the test
Expand Down
8 changes: 8 additions & 0 deletions dev/breeze/SELECTIVE_CHECKS.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,14 @@ The logic implements the following rules:
* If no Source files are changed - no tests are run and no further rules below are checked.
* `Image building` is enabled if either test are run, docs are build or kubernetes tests are run. All those
need `CI` or `PROD` images to be built.
* In case of `Providers` test in regular PRs, additional check is done in order to determine which
providers are affected and the actual selection is made based on that:
* if directly provider code is changed (either in the provider, test or system tests) then this provider
is selected.
* if there are any providers that depend on the affected providers, they are also included in the list
of affected providers (but not recursively - only direct dependencies are added)
* if there are any changes to "common" provider code not belonging to any provider (usually system tests
or tests), then tests for all Providers are run
* The specific unit test type is enabled only if changed files match the expected patterns for each type
(`API`, `CLI`, `WWW`, `Providers`). The `Always` test type is added always if any unit tests are run.
`Providers` tests are removed if current branch is different than `main`
Expand Down
10 changes: 7 additions & 3 deletions dev/breeze/src/airflow_breeze/commands/testing_commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
option_verbose,
)
from airflow_breeze.utils.console import get_console, message_type_from_return_code
from airflow_breeze.utils.custom_param_types import BetterChoice
from airflow_breeze.utils.custom_param_types import NotVerifiedBetterChoice
from airflow_breeze.utils.docker_command_utils import (
get_env_variables_for_docker_commands,
perform_environment_checks,
Expand Down Expand Up @@ -247,9 +247,10 @@ def run_with_progress(
@option_mount_sources
@click.option(
"--test-type",
help="Type of test to run.",
help="Type of test to run. Note that with Providers, you can also specify which provider "
"tests should be run - for example --test-type \"Providers[airbyte,http]\"",
default="All",
type=BetterChoice(ALLOWED_TEST_TYPE_CHOICES),
type=NotVerifiedBetterChoice(ALLOWED_TEST_TYPE_CHOICES),
)
@option_db_reset
@click.argument('extra_pytest_args', nargs=-1, type=click.UNPROCESSED)
Expand All @@ -272,6 +273,9 @@ def tests(
os.environ["RUN_TESTS"] = "true"
if test_type:
os.environ["TEST_TYPE"] = test_type
if "[" in test_type and not test_type.startswith("Providers"):
get_console().print("[error]Only 'Providers' test type can specify actual tests with \\[\\][/]")
sys.exit(1)
if integration:
if "trino" in integration:
integration = integration + ("kerberos",)
Expand Down
87 changes: 77 additions & 10 deletions dev/breeze/src/airflow_breeze/utils/selective_checks.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,19 @@

from __future__ import annotations

import json
import os
import sys
from enum import Enum

from rich.markup import escape

from airflow_breeze.utils.path_utils import AIRFLOW_SOURCES_ROOT

if sys.version_info >= (3, 8):
from functools import cached_property
else:
# noinspection PyUnresolvedReferences
from cached_property import cached_property

from functools import lru_cache
Expand Down Expand Up @@ -59,7 +66,7 @@
def get_ga_output(name: str, value: Any) -> str:
output_name = name.replace('_', '-')
printed_value = str(value).lower() if isinstance(value, bool) else value
get_console().print(f"[info]{output_name}[/] = [green]{printed_value}[/]")
get_console().print(f"[info]{output_name}[/] = [green]{escape(str(printed_value))}[/]")
return f"::set-output name={output_name}::{printed_value}"


Expand Down Expand Up @@ -127,6 +134,7 @@ def __hash__(self):
r"^airflow/.*\.py$",
r"^chart",
r"^providers",
r"^tests/system",
r"^CHANGELOG\.txt",
r"^airflow/config_templates/config\.yml",
r"^chart/RELEASE_NOTES\.txt",
Expand All @@ -148,6 +156,7 @@ def __hash__(self):
r"^kubernetes_tests",
r"^airflow/providers/cncf/kubernetes/",
r"^tests/providers/cncf/kubernetes/",
r"^tests/system/providers/cncf/kubernetes/",
],
FileGroupForCi.ALL_PYTHON_FILES: [
r"\.py$",
Expand Down Expand Up @@ -178,11 +187,61 @@ def __hash__(self):
SelectiveUnitTestTypes.PROVIDERS: [
"^airflow/providers/",
"^tests/providers/",
"^tests/system/",
],
SelectiveUnitTestTypes.WWW: ["^airflow/www", "^tests/www", "^airflow/ui"],
}
)

TESTS_PROVIDERS_ROOT = AIRFLOW_SOURCES_ROOT / "tests" / "providers"
SYSTEM_TESTS_PROVIDERS_ROOT = AIRFLOW_SOURCES_ROOT / "tests" / "system" / "providers"
AIRFLOW_PROVIDERS_ROOT = AIRFLOW_SOURCES_ROOT / "airflow" / "providers"


def find_provider_affected(changed_file: str) -> str | None:
file_path = AIRFLOW_SOURCES_ROOT / changed_file
# is_relative_to is only available in Python 3.9 - we should simplify this check when we are Python 3.9+
for provider_root in (TESTS_PROVIDERS_ROOT, SYSTEM_TESTS_PROVIDERS_ROOT, AIRFLOW_PROVIDERS_ROOT):
try:
file_path.relative_to(provider_root)
relative_base_path = provider_root
break
except ValueError:
pass
else:
return None

for parent_dir_path in file_path.parents:
if parent_dir_path == relative_base_path:
break
relative_path = parent_dir_path.relative_to(relative_base_path)
if (AIRFLOW_PROVIDERS_ROOT / relative_path / "provider.yaml").exists():
return str(parent_dir_path.relative_to(relative_base_path)).replace(os.sep, ".")
# If we got here it means that some "common" files were modified. so we need to test all Providers
return "Providers"


def add_dependent_providers(
providers: set[str], provider_to_check: str, dependencies: dict[str, dict[str, list[str]]]
):
for provider, provider_info in dependencies.items():
if provider_to_check in provider_info['cross-providers-deps']:
providers.add(provider)


def find_all_providers_affected(changed_files: tuple[str, ...]) -> set[str]:
all_providers: set[str] = set()
for changed_file in changed_files:
provider = find_provider_affected(changed_file)
if provider == "Providers":
return set()
if provider is not None:
all_providers.add(provider)
dependencies = json.loads((AIRFLOW_SOURCES_ROOT / "generated" / "provider_dependencies.json").read_text())
for provider in list(all_providers):
add_dependent_providers(all_providers, provider, dependencies)
return all_providers


class SelectiveChecks:
__HASHABLE_FIELDS = {'_files', '_default_branch', '_commit_ref', "_pr_labels", "_github_event"}
Expand Down Expand Up @@ -461,6 +520,11 @@ def _get_test_types_to_run(self) -> list[str]:
get_console().print(remaining_files)
candidate_test_types.update(all_selective_test_types())
else:
if "Providers" in candidate_test_types:
affected_providers = find_all_providers_affected(changed_files=self._files)
if len(affected_providers) != 0:
candidate_test_types.remove("Providers")
candidate_test_types.add(f"Providers[{','.join(sorted(affected_providers))}]")
get_console().print(
"[warning]There are no core/other files. Only tests relevant to the changed files are run.[/]"
)
Expand All @@ -474,22 +538,25 @@ def test_types(self) -> str:
if not self.run_tests:
return ""
if self._run_everything:
current_test_types = list(all_selective_test_types())
current_test_types = set(all_selective_test_types())
else:
current_test_types = self._get_test_types_to_run()
current_test_types = set(self._get_test_types_to_run())
if self._default_branch != "main":
if "Providers" in current_test_types:
get_console().print(
"[warning]Removing 'Providers' because the target branch "
f"is {self._default_branch} and not main[/]"
)
current_test_types.remove("Providers")
test_types_to_remove: set[str] = set()
for test_type in current_test_types:
if test_type.startswith("Providers"):
get_console().print(
f"[warning]Removing {test_type} because the target branch "
f"is {self._default_branch} and not main[/]"
)
test_types_to_remove.add(test_type)
if "Integration" in current_test_types:
get_console().print(
"[warning]Removing 'Integration' because the target branch "
f"is {self._default_branch} and not main[/]"
)
current_test_types.remove("Integration")
test_types_to_remove.add("Integration")
current_test_types = current_test_types - test_types_to_remove
return " ".join(sorted(current_test_types))

@cached_property
Expand Down
Loading

0 comments on commit 4daa5c1

Please sign in to comment.