Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running airflow dags test <dag_id> <execution_dt> results in error when run twice. #21023

Closed
1 of 2 tasks
howardyoo opened this issue Jan 21, 2022 · 4 comments · Fixed by #21031
Closed
1 of 2 tasks

Running airflow dags test <dag_id> <execution_dt> results in error when run twice. #21023

howardyoo opened this issue Jan 21, 2022 · 4 comments · Fixed by #21031
Labels
Milestone

Comments

@howardyoo
Copy link
Contributor

Apache Airflow version

main (development)

What happened

Product and Version

Airflow Version: v2.3.0.dev0 (Git Version: .release:2.3.0.dev0+7a9ab1d7170567b1d53938b2f7345dae2026c6ea) to test and learn its functionalities. I am currently installed this using git clone and building the airflow on my MacOS environment, using python3.9.

Problem Statement

When I was doing a test on my DAG, I wanted run
airflow dags test <dag_id> <execution_dt> so that I don't have to use UI to trigger dag runs each time. Running and looking at the result of the dags test proved to be more productive when doing some rapid tests on your DAG.

The test runs perfectly for the first time it runs, but when I try to re-run the test again the following error message is observed:

[2022-01-21 10:30:33,530] {migration.py:201} INFO - Context impl SQLiteImpl.
[2022-01-21 10:30:33,530] {migration.py:204} INFO - Will assume non-transactional DDL.
[2022-01-21 10:30:33,568] {dagbag.py:498} INFO - Filling up the DagBag from /Users/howardyoo/airflow/dags
[2022-01-21 10:30:33,588] {example_python_operator.py:67} WARNING - The virtalenv_python example task requires virtualenv, please install it.
[2022-01-21 10:30:33,594] {tutorial_taskflow_api_etl_virtualenv.py:29} WARNING - The tutorial_taskflow_api_etl_virtualenv example DAG requires virtualenv, please install it.
Traceback (most recent call last):
  File "/Users/howardyoo/python3/bin/airflow", line 33, in <module>
    sys.exit(load_entry_point('apache-airflow==2.3.0.dev0', 'console_scripts', 'airflow')())
  File "/Users/howardyoo/python3/lib/python3.9/site-packages/airflow/__main__.py", line 48, in main
    args.func(args)
  File "/Users/howardyoo/python3/lib/python3.9/site-packages/airflow/cli/cli_parser.py", line 50, in command
    return func(*args, **kwargs)
  File "/Users/howardyoo/python3/lib/python3.9/site-packages/airflow/utils/session.py", line 71, in wrapper
    return func(*args, session=session, **kwargs)
  File "/Users/howardyoo/python3/lib/python3.9/site-packages/airflow/utils/cli.py", line 98, in wrapper
    return f(*args, **kwargs)
  File "/Users/howardyoo/python3/lib/python3.9/site-packages/airflow/cli/commands/dag_command.py", line 429, in dag_test
    dag.clear(start_date=args.execution_date, end_date=args.execution_date, dag_run_state=State.NONE)
  File "/Users/howardyoo/python3/lib/python3.9/site-packages/airflow/utils/session.py", line 71, in wrapper
    return func(*args, session=session, **kwargs)
  File "/Users/howardyoo/python3/lib/python3.9/site-packages/airflow/models/dag.py", line 1906, in clear
    clear_task_instances(
  File "/Users/howardyoo/python3/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 286, in clear_task_instances
    dr.state = dag_run_state
  File "<string>", line 1, in __set__
  File "/Users/howardyoo/python3/lib/python3.9/site-packages/airflow/models/dagrun.py", line 207, in set_state
    raise ValueError(f"invalid DagRun state: {state}")
ValueError: invalid DagRun state: None

When going through the DAG runs in my UI, I noticed the following entry on my dag test run.
Screen Shot 2022-01-21 at 10 31 52 AM
Looks like when you run the dag with test mode, it submits the dag run as backfill type. I am not completely sure why the airflow dags test would only succeed once, but looks like there might have been some process that may be missing to clear out the test (just my theory).

Workaround

A viable workaround to stop it from failing is to find and deleting the dag run instance. Once the above dag run entry is deleted, I could successfully run my airflow dags test command again.

What you expected to happen

According to the documentation (https://airflow.apache.org/docs/apache-airflow/stable/tutorial.html#id2), it is stated that:

The same applies to airflow dags test [dag_id] [logical_date], but on a DAG level. It performs a single DAG run of the given DAG id. While it does take task dependencies into account, no state is registered in the database. It is convenient for locally testing a full run of your DAG, given that e.g. if one of your tasks expects data at some location, it is available.

It does not mention about whether you have to delete the dag run instance to re-run the test, so I would expect that airflow dags test command will run successfully, and also successfully on any consecutive runs without any errors.

How to reproduce

  • Get the reported version of airflow and install it to run.
  • Run airflow standalone using airflow standalone command. It should start up the basic webserver, scheduler, triggerer to start testing it.
  • Get any dags that exist in the DAGs. run airflow dags test <dag_id> <start_dt> to initiate DAGs test.
  • Once the test is finished, re-run the command and observe the error.
  • Go to the DAG runs, delete the dag run that the first run produced, and run the test again - the test should run successfully.

Operating System

MacOS Monterey (Version 12.1)

Versions of Apache Airflow Providers

No providers were used

Deployment

Other

Deployment details

This airflow is running as a standalone on my local MacOS environment. I have setup a dev env, by cloning from the github and built the airflow to run locally. It is using sqlite as its backend database, and sequentialExecutor to execute tasks sequentially.

Anything else

Nothing much. I would like this issue to be resolved so that I could run my DAG tests easily without 'actually' running it or relying on the UI. Also, there seems to be little information on what this test means and what it is different from the normal runs, so improving documentation to clarify it would be nice.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@howardyoo howardyoo added area:core kind:bug This is a clearly a bug labels Jan 21, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Jan 21, 2022

Thanks for opening your first issue here! Be sure to follow the issue template!

@howardyoo
Copy link
Contributor Author

Thank you, @chenglongyan , for such a quick turnaround on this fix! And thank you @potiuk for approving it!

@uplsh580
Copy link
Contributor

uplsh580 commented Jan 24, 2022

In my case, the same problem occurs when airflow dags backfill in version 2.2.3.
Is it the same cause? And will the bug be solved in the 2.2.x version?

@potiuk
Copy link
Member

potiuk commented Jan 24, 2022

In my case, the same problem occurs when airflow dags backfill in version 2.2.3.
Is it the same cause? And will the bug be solved in the 2.2.x version?

Yeah. I already cherry-picked it to 2.2.4 whcih is ~ few weeks away.

potiuk pushed a commit that referenced this issue Jan 24, 2022
…r when run twice (#21031)

related: #21023
(cherry picked from commit 515ea84)
jedcunningham pushed a commit that referenced this issue Jan 27, 2022
…r when run twice (#21031)

related: #21023
(cherry picked from commit 515ea84)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants