-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix test dependencies after Airflow 2.8 release #806
Conversation
✅ Deploy Preview for sunny-pastelito-5ecb04 canceled.
|
The integration tests, except for the Expensive ones are failing now |
Fix pendulum dependency issue after 3.0 release
Hatch always installs the latest upper bound Airflow available, regardless of what is declared in overrides tool.hatch.envs.tests.overrides. When it installs Airflow 2.8, it installs apache-airflow-providers-common-io==1.2.0. This library conflicts with all previous versions of Airflow, raising the exception: FAILED tests/operators/test_local.py::test_run_test_operator_with_callback - sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: task_instance [SQL: SELECT task_instance.try_number, task_instance.task_id, task_instance.dag_id, task_instance.run_id, task_instance.map_index, task_instance.start_date, task_instance.end_date, task_instance.duration, task_instance.state, task_instance.max_tries, task_instance.hostname, task_instance.unixname, task_instance.job_id, task_instance.pool, task_instance.pool_slots, task_instance.queue, task_instance.priority_weight, task_instance.operator, task_instance.custom_operator_name, task_instance.queued_dttm, task_instance.queued_by_job_id, task_instance.pid, task_instance.executor_config, task_instance.updated_at, task_instance.external_executor_id, task_instance.trigger_id, task_instance.trigger_timeout, task_instance.next_method, task_instance.next_kwargs, dag_run_1.state AS state_1, dag_run_1.id, dag_run_1.dag_id AS dag_id_1, dag_run_1.queued_at, dag_run_1.execution_date, dag_run_1.start_date AS start_date_1, dag_run_1.end_date AS end_date_1, dag_run_1.run_id AS run_id_1, dag_run_1.creating_job_id, dag_run_1.external_trigger, dag_run_1.run_type, dag_run_1.conf, dag_run_1.data_interval_start, dag_run_1.data_interval_end, dag_run_1.last_scheduling_decision, dag_run_1.dag_hash, dag_run_1.log_template_id, dag_run_1.updated_at AS updated_at_1 FROM task_instance JOIN dag_run ON dag_run.dag_id = task_instance.dag_id AND dag_run.run_id = task_instance.run_id JOIN dag_run AS dag_run_1 ON dag_run_1.dag_id = task_instance.dag_id AND dag_run_1.run_id = task_instance.run_id WHERE task_instance.dag_id = ? AND task_instance.task_id IN (?, ?) AND dag_run.execution_date >= ? AND dag_run.execution_date <= ? AND task_instance.operator = ?] [parameters: ('test-id-2', 'run', 'test', '2024-01-22 23:11:55.593478', '2024-01-22 23:11:55.593478', 'ExternalTaskMarker')] (Background on this error at: https://sqlalche.me/e/14/e3q8\) The suggested solution is to uninstall apache-airflow-providers-common-io for all Airflow versions, and only install it for Airflow 2.8
@jbandoro I'm optimistic e86e57c will solve the remaining failing tests. When building an environment, the first step Hatch does is to install the project dependencies. So, for all our Airflow test Matrix, Hatch first installs Airflow 2.8. As part of this, it installs Therefore, tests running for versions of Airflow before 2.8 were failing because of
I did a workaround to uninstall |
Thank you @tatiana for finding the issue and a workaround for resolving it! 🎉 |
Once Airflow 2.8 was released, Cosmos tests started failing. There were two main issues: conflicting `pendulum` version and the installation of `apache-airflow-providers-common-io`. # Details on `pendulum`: ``` _________________ ERROR collecting tests/airflow/test_graph.py _________________ tests/airflow/test_graph.py:6: in <module> from airflow import __version__ as airflow_version ../../../.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.8-2.4/lib/python3.8/site-packages/airflow/__init__.py:34: in <module> from airflow import settings ../../../.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.8-2.4/lib/python3.8/site-packages/airflow/settings.py:49: in <module> TIMEZONE = pendulum.tz.timezone('UTC') E TypeError: 'module' object is not callable ``` [Example here](https://github.com/astronomer/astronomer-cosmos/actions/runs/7590233614/job/20676384033). I think this is because Airflow v2.8.1 was [released today](https://github.com/apache/airflow/releases/tag/2.8.1) that now targets the 3.0.0 version of Pendulum that has the breaking API changes seen above. Any pip install of `apache-airflow<2.8.1` I think is now installing `pendulum==3.0.0` because the pendulum constraint is only specified if you install airflow [with a constraint file.](https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html) I don't think hatch dependencies allow constraint file referencing, so this attempt pins `pendulum` directly, kind of like what is already done for pydantic. # Details on `apache-airflow-providers-common-io`: When building an environment, the first step Hatch does is to install the project dependencies. It does not consider tool.hatch.envs.tests.overrides when first doing this. So, for all our Airflow test Matrix, Hatch first installs Airflow 2.8. As part of this, it installs apache-airflow-providers-common-io==1.2.0. This new Airflow dependency conflicts with previous versions of Airflow. When Hatch downgrades the version of Airflow, it does not uninstall apache-airflow-providers-common-io. Therefore, tests running for versions of Airflow before 2.8 were failing because of apache-airflow-providers-common-io with: ``` FAILED tests/operators/test_local.py::test_run_test_operator_with_callback - sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: task_instance [SQL: SELECT task_instance.try_number, task_instance.task_id, task_instance.dag_id, task_instance.run_id, task_instance.map_index, task_instance.start_date, task_instance.end_date, task_instance.duration, task_instance.state, task_instance.max_tries, task_instance.hostname, task_instance.unixname, task_instance.job_id, task_instance.pool, task_instance.pool_slots, task_instance.queue, task_instance.priority_weight, task_instance.operator, task_instance.custom_operator_name, task_instance.queued_dttm, task_instance.queued_by_job_id, task_instance.pid, task_instance.executor_config, task_instance.updated_at, task_instance.external_executor_id, task_instance.trigger_id, task_instance.trigger_timeout, task_instance.next_method, task_instance.next_kwargs, dag_run_1.state AS state_1, dag_run_1.id, dag_run_1.dag_id AS dag_id_1, dag_run_1.queued_at, dag_run_1.execution_date, dag_run_1.start_date AS start_date_1, dag_run_1.end_date AS end_date_1, dag_run_1.run_id AS run_id_1, dag_run_1.creating_job_id, dag_run_1.external_trigger, dag_run_1.run_type, dag_run_1.conf, dag_run_1.data_interval_start, dag_run_1.data_interval_end, dag_run_1.last_scheduling_decision, dag_run_1.dag_hash, dag_run_1.log_template_id, dag_run_1.updated_at AS updated_at_1 FROM task_instance JOIN dag_run ON dag_run.dag_id = task_instance.dag_id AND dag_run.run_id = task_instance.run_id JOIN dag_run AS dag_run_1 ON dag_run_1.dag_id = task_instance.dag_id AND dag_run_1.run_id = task_instance.run_id WHERE task_instance.dag_id = ? AND task_instance.task_id IN (?, ?) AND dag_run.execution_date >= ? AND dag_run.execution_date <= ? AND task_instance.operator = ?] [parameters: ('test-id-2', 'run', 'test', '2024-01-22 23:11:55.593478', '2024-01-22 23:11:55.593478', 'ExternalTaskMarker')] (Background on this error at: [https://sqlalche.me/e/14/e3q8\](https://sqlalche.me/e/14/e3q8/)) I did a workaround to uninstall apache-airflow-providers-common-io for all Airflow versions and only install it for Airflow 2.8. It is ugly, but seems to work. Once the tests pass, I'll merge our PR - so the CI can be back to green. We can go ahead and revisit the approach in the future. ``` We did a workaround to uninstall `apache-airflow-providers-common-io` for all Airflow versions and only install it for Airflow 2.8. It is ugly, but seems to work. Once the tests pass, I'll merge our PR - so the CI can be back to green. We can go ahead and revisit the approach in the future. Co-authored-by: Tatiana Al-Chueyr <[email protected]> (cherry picked from commit f953cae)
Bug fixes * Fix: ensure DbtGraph.update_node_dependency is called for all load methods by @jbandoro in #803 * Fix: ensure operator execute method is consistent across all execution base subclasses by @jbandoro in #805 * Fix custom selector when test node has no depends_on values by @tatiana in #814 * Fix forwarding selectors to test task when using TestBehavior.AFTER_ALL (#816) Others * Docs: Remove incorrect docstring from DbtLocalBaseOperator by @jakob-hvitnov-telia in #797 * Add more logs to troubleshoot custom selector by @tatiana in #809 * Fix OpenLineage integration documentation by @tatiana in #810 * Fix test dependencies after Airflow 2.8 release by @jbandoro and @tatiana in #806 * Use Airflow constraint file for test environment setup by @jbandoro in #812 * pre-commit updates in #799, #807
Bug fixes * Fix: ensure DbtGraph.update_node_dependency is called for all load methods by @jbandoro in #803 * Fix: ensure operator execute method is consistent across all execution base subclasses by @jbandoro in #805 * Fix custom selector when test node has no depends_on values by @tatiana in #814 * Fix forwarding selectors to test task when using TestBehavior.AFTER_ALL (#816) Others * Docs: Remove incorrect docstring from DbtLocalBaseOperator by @jakob-hvitnov-telia in #797 * Add more logs to troubleshoot custom selector by @tatiana in #809 * Fix OpenLineage integration documentation by @tatiana in #810 * Fix test dependencies after Airflow 2.8 release by @jbandoro and @tatiana in #806 * Use Airflow constraint file for test environment setup by @jbandoro in #812 * pre-commit updates in #799, #807
Bug fixes * Fix: ensure DbtGraph.update_node_dependency is called for all load methods by @jbandoro in #803 * Fix: ensure operator execute method is consistent across all execution base subclasses by @jbandoro in #805 * Fix custom selector when test node has no depends_on values by @tatiana in #814 * Fix forwarding selectors to test task when using TestBehavior.AFTER_ALL by @tatiana in #816 Others * Docs: Remove incorrect docstring from DbtLocalBaseOperator by @jakob-hvitnov-telia in #797 * Add more logs to troubleshoot custom selector by @tatiana in #809 * Fix OpenLineage integration documentation by @tatiana in #810 * Fix test dependencies after Airflow 2.8 release by @jbandoro and @tatiana in #806 * Use Airflow constraint file for test environment setup by @jbandoro in #812 * pre-commit updates in #799, #807
**Bug fixes** * Fix: ensure ``DbtGraph.update_node_dependency`` is called for all load methods by @jbandoro in #803 * Fix: ensure operator ``execute`` method is consistent across all execution base subclasses by @jbandoro in #805 * Fix custom selector when ``test`` node has no ``depends_on`` values by @tatiana in #814 * Fix forwarding selectors to test task when using ``TestBehavior.AFTER_ALL`` by @tatiana in #816 **Others** * Docs: Remove incorrect docstring from ``DbtLocalBaseOperator`` by @jakob-hvitnov-telia in #797 * Add more logs to troubleshoot custom selector by @tatiana in #809 * Fix OpenLineage integration documentation by @tatiana in #810 * Fix test dependencies after Airflow 2.8 release by @jbandoro and @tatiana in #806 * Use Airflow constraint file for test environment setup by @jbandoro in #812 * pre-commit updates in #799, #807
Bug fixes * Fix: ensure DbtGraph.update_node_dependency is called for all load methods by @jbandoro in #803 * Fix: ensure operator execute method is consistent across all execution base subclasses by @jbandoro in #805 * Fix custom selector when test node has no depends_on values by @tatiana in #814 * Fix forwarding selectors to test task when using TestBehavior.AFTER_ALL by @tatiana in #816 Others * Docs: Remove incorrect docstring from DbtLocalBaseOperator by @jakob-hvitnov-telia in #797 * Add more logs to troubleshoot custom selector by @tatiana in #809 * Fix OpenLineage integration documentation by @tatiana in #810 * Fix test dependencies after Airflow 2.8 release by @jbandoro and @tatiana in #806 * Use Airflow constraint file for test environment setup by @jbandoro in #812 * pre-commit updates in #799, #807
**Bug fixes** * Fix: ensure ``DbtGraph.update_node_dependency`` is called for all load methods by @jbandoro in #803 * Fix: ensure operator ``execute`` method is consistent across all execution base subclasses by @jbandoro in #805 * Fix custom selector when ``test`` node has no ``depends_on`` values by @tatiana in #814 * Fix forwarding selectors to test task when using ``TestBehavior.AFTER_ALL`` by @tatiana in #816 **Others** * Docs: Remove incorrect docstring from ``DbtLocalBaseOperator`` by @jakob-hvitnov-telia in #797 * Add more logs to troubleshoot custom selector by @tatiana in #809 * Fix OpenLineage integration documentation by @tatiana in #810 * Fix test dependencies after Airflow 2.8 release by @jbandoro and @tatiana in #806 * Use Airflow constraint file for test environment setup by @jbandoro in #812 * pre-commit updates in #799, #807 --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Justin Bandoro <[email protected]> Co-authored-by: Jakob Aron Hvitnov <[email protected]>
Once Airflow 2.8 was released, Cosmos tests started failing. There were two main issues: conflicting `pendulum` version and the installation of `apache-airflow-providers-common-io`. # Details on `pendulum`: ``` _________________ ERROR collecting tests/airflow/test_graph.py _________________ tests/airflow/test_graph.py:6: in <module> from airflow import __version__ as airflow_version ../../../.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.8-2.4/lib/python3.8/site-packages/airflow/__init__.py:34: in <module> from airflow import settings ../../../.local/share/hatch/env/virtual/astronomer-cosmos/Za_bFbg4/tests.py3.8-2.4/lib/python3.8/site-packages/airflow/settings.py:49: in <module> TIMEZONE = pendulum.tz.timezone('UTC') E TypeError: 'module' object is not callable ``` [Example here](https://github.com/astronomer/astronomer-cosmos/actions/runs/7590233614/job/20676384033). I think this is because Airflow v2.8.1 was [released today](https://github.com/apache/airflow/releases/tag/2.8.1) that now targets the 3.0.0 version of Pendulum that has the breaking API changes seen above. Any pip install of `apache-airflow<2.8.1` I think is now installing `pendulum==3.0.0` because the pendulum constraint is only specified if you install airflow [with a constraint file.](https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html) I don't think hatch dependencies allow constraint file referencing, so this attempt pins `pendulum` directly, kind of like what is already done for pydantic. # Details on `apache-airflow-providers-common-io`: When building an environment, the first step Hatch does is to install the project dependencies. It does not consider tool.hatch.envs.tests.overrides when first doing this. So, for all our Airflow test Matrix, Hatch first installs Airflow 2.8. As part of this, it installs apache-airflow-providers-common-io==1.2.0. This new Airflow dependency conflicts with previous versions of Airflow. When Hatch downgrades the version of Airflow, it does not uninstall apache-airflow-providers-common-io. Therefore, tests running for versions of Airflow before 2.8 were failing because of apache-airflow-providers-common-io with: ``` FAILED tests/operators/test_local.py::test_run_test_operator_with_callback - sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: task_instance [SQL: SELECT task_instance.try_number, task_instance.task_id, task_instance.dag_id, task_instance.run_id, task_instance.map_index, task_instance.start_date, task_instance.end_date, task_instance.duration, task_instance.state, task_instance.max_tries, task_instance.hostname, task_instance.unixname, task_instance.job_id, task_instance.pool, task_instance.pool_slots, task_instance.queue, task_instance.priority_weight, task_instance.operator, task_instance.custom_operator_name, task_instance.queued_dttm, task_instance.queued_by_job_id, task_instance.pid, task_instance.executor_config, task_instance.updated_at, task_instance.external_executor_id, task_instance.trigger_id, task_instance.trigger_timeout, task_instance.next_method, task_instance.next_kwargs, dag_run_1.state AS state_1, dag_run_1.id, dag_run_1.dag_id AS dag_id_1, dag_run_1.queued_at, dag_run_1.execution_date, dag_run_1.start_date AS start_date_1, dag_run_1.end_date AS end_date_1, dag_run_1.run_id AS run_id_1, dag_run_1.creating_job_id, dag_run_1.external_trigger, dag_run_1.run_type, dag_run_1.conf, dag_run_1.data_interval_start, dag_run_1.data_interval_end, dag_run_1.last_scheduling_decision, dag_run_1.dag_hash, dag_run_1.log_template_id, dag_run_1.updated_at AS updated_at_1 FROM task_instance JOIN dag_run ON dag_run.dag_id = task_instance.dag_id AND dag_run.run_id = task_instance.run_id JOIN dag_run AS dag_run_1 ON dag_run_1.dag_id = task_instance.dag_id AND dag_run_1.run_id = task_instance.run_id WHERE task_instance.dag_id = ? AND task_instance.task_id IN (?, ?) AND dag_run.execution_date >= ? AND dag_run.execution_date <= ? AND task_instance.operator = ?] [parameters: ('test-id-2', 'run', 'test', '2024-01-22 23:11:55.593478', '2024-01-22 23:11:55.593478', 'ExternalTaskMarker')] (Background on this error at: [https://sqlalche.me/e/14/e3q8\](https://sqlalche.me/e/14/e3q8/)) I did a workaround to uninstall apache-airflow-providers-common-io for all Airflow versions and only install it for Airflow 2.8. It is ugly, but seems to work. Once the tests pass, I'll merge our PR - so the CI can be back to green. We can go ahead and revisit the approach in the future. ``` We did a workaround to uninstall `apache-airflow-providers-common-io` for all Airflow versions and only install it for Airflow 2.8. It is ugly, but seems to work. Once the tests pass, I'll merge our PR - so the CI can be back to green. We can go ahead and revisit the approach in the future. Co-authored-by: Tatiana Al-Chueyr <[email protected]>
**Bug fixes** * Fix: ensure ``DbtGraph.update_node_dependency`` is called for all load methods by @jbandoro in astronomer#803 * Fix: ensure operator ``execute`` method is consistent across all execution base subclasses by @jbandoro in astronomer#805 * Fix custom selector when ``test`` node has no ``depends_on`` values by @tatiana in astronomer#814 * Fix forwarding selectors to test task when using ``TestBehavior.AFTER_ALL`` by @tatiana in astronomer#816 **Others** * Docs: Remove incorrect docstring from ``DbtLocalBaseOperator`` by @jakob-hvitnov-telia in astronomer#797 * Add more logs to troubleshoot custom selector by @tatiana in astronomer#809 * Fix OpenLineage integration documentation by @tatiana in astronomer#810 * Fix test dependencies after Airflow 2.8 release by @jbandoro and @tatiana in astronomer#806 * Use Airflow constraint file for test environment setup by @jbandoro in astronomer#812 * pre-commit updates in astronomer#799, astronomer#807 --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Justin Bandoro <[email protected]> Co-authored-by: Jakob Aron Hvitnov <[email protected]>
Description
Cosmos unit tests are failing with errors like:
Example here. I think this is because Airflow v2.8.1 was released today that now targets the 3.0.0 version of Pendulum that has the breaking API changes seen above. Any pip install of
apache-airflow<2.8.1
I think is now installingpendulum==3.0.0
because the pendulum constraint is only specified if you install airflow with a constraint file.I don't think hatch dependencies allows constraint file referencing so this attempt pins
pendulum
directly kind of like what is already done for pydantic.Related Issue(s)
Breaking Change?
Checklist