Add filter by state in DagRun REST API (List Dag Runs) #20485

fbertos · 2021-12-23T23:36:17Z

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

boring-cyborg · 2021-12-23T23:36:23Z

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

Pay attention to the quality of your code (flake8, mypy and type annotations). Our pre-commits will help you with that.
In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
Consider using Breeze environment for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
Be sure to read the Airflow Coding style.
Apache Airflow is a community-driven project and together we are making it better 🚀.
In case of doubts contact the developers at:
Mailing List: [email protected]
Slack: https://s.apache.org/airflow-slack

uranusjr

Can you add a test to filter by none state? I believe this is the state that causes the most problems (because the internal representation is not str). And there is actually an util function somewhere (for task instances? @ephraimbuddy may remember better) for this problem.

uranusjr · 2021-12-24T05:47:48Z

tests/api_connexion/endpoints/test_dag_run_endpoint.py

+        result = session.query(DagRun).all()
+        assert len(result) == 2


Suggested change

result = session.query(DagRun).all()

assert len(result) == 2

assert session.query(DagRun).count() == 2

ephraimbuddy · 2021-12-24T06:55:11Z

airflow/api_connexion/endpoints/dag_run_endpoint.py

+    if state:
+        query = query.filter(DagRun.state == state)


Suggested change

if state:

query = query.filter(DagRun.state == state)

if state:

query = query.filter(DagRun.state.in_(state))

The FilterState query component is an array.

ephraimbuddy · 2021-12-24T07:03:40Z

Can you add a test to filter by none state? I believe this is the state that causes the most problems (because the internal representation is not str). And there is actually an util function somewhere (for task instances? @ephraimbuddy may remember better) for this problem.

I think we are fine because we are dealing with dagruns which has only 3 states

fbertos · 2021-12-24T08:00:52Z

@uranusjr @ephraimbuddy thank you very much for your recommendations, I just made them in a new commit. Finally I did not add a test to filter by state None as @ephraimbuddy mentioned. Thanks

ephraimbuddy · 2021-12-24T08:34:50Z

tests/api_connexion/endpoints/test_dag_run_endpoint.py

+        self._create_test_dag_run()
+        assert session.query(DagRun).count() == 2        
+        response = self.client.get(
+            "api/v1/dags/TEST_DAG_ID/dagRuns?state=running,failed", environ_overrides={'REMOTE_USER': "test"}


The failed state is not tested here. Can we fail one dagrun or set the state to queued and adjust the test to match

fbertos · 2021-12-24T08:36:34Z

@ephraimbuddy @uranusjr also I got a failed result on static checks with this details but I do not understand:
black....................................................................................Failed

hook id: black
files were modified by this hook

Indeed I did not touch any hook in this PR, could you please help me to understand this issue? Thanks!

uranusjr · 2021-12-24T08:44:11Z

You have trailing whitespaces in your code.

fbertos · 2021-12-25T18:11:49Z

@uranusjr, @ephraimbuddy all your guidelines were done. Is everything ok now or should I do something else? Thanks

fbertos · 2021-12-29T18:01:55Z

Hi @mik-laj, looks like this PR is pending of your review. Is there anything else I should check/change? This is my first PR so I am a little bit lost. Thank you for your help!!

kaxil · 2021-12-30T20:05:12Z

airflow/api_connexion/endpoints/dag_run_endpoint.py

@@ -149,6 +149,7 @@ def get_dag_runs(
    execution_date_lte: Optional[str] = None,
    end_date_gte: Optional[str] = None,
    end_date_lte: Optional[str] = None,
+    state: Optional[str] = None,


Suggested change

state: Optional[str] = None,

states: Optional[List[str]] = None,

cc @ephraimbuddy

We have state in open API doc instead of states, I don't think it'll work to change it to states

kaxil · 2021-12-30T20:05:21Z

airflow/api_connexion/endpoints/dag_run_endpoint.py

+    if state:
+        query = query.filter(DagRun.state.in_(state))


Suggested change

if state:

query = query.filter(DagRun.state.in_(state))

if states:

query = query.filter(DagRun.state.in_(states))

Hi @kaxil, thanks for your recommendations. I will change the datatype to Optional[List[str]], however I am not sure it is a good idea to change to plural as the FilterState query component is defined in singular in the specification v1.yml, even being an array:

FilterState: in: query **name: state** schema: **type: array** items: type: string required: false description: The value can be repeated to retrieve multiple matching values (OR condition).

To change that to plural I would need to create a new FilterStates query component....

On the other hand in the batch method I did use plural to match with the other existing fields (for instance dag_ids):

{ "order_by": "string", "page_offset": 0, "page_limit": 100, "**dag_ids**": [ "string" ], ... "states": [ "string" ] }

Hi @kaxil, I just did the change on the datatype to Optional[List[str]] as you mentioned but now when I try to run the static ckecks I am getting hundreds of errors (mypy). Is this normal?, what am I doing wrong? Maybe it is nothing to do with my change...? Thanks so much for any help.

cc @ephraimbuddy, @uranusjr

As long as CI passes it should be OK (not optimal, but OK if you don’t want to fix those).

kaxil

Overall lgtm, 2 minor suggestions

uranusjr · 2022-01-04T04:35:02Z

airflow/api_connexion/endpoints/dag_run_endpoint.py

+    if data.get("states"):
+        states = set(data["states"])
+        query = query.filter(DagRun.state.in_(states))


Suggested change

if data.get("states"):

states = set(data["states"])

query = query.filter(DagRun.state.in_(states))

states = data.get("states")

if states:

query = query.filter(DagRun.state.in_(states))

Python variables are function-scoped, not block-scoped, so it’s not useful to put the variable inside the if block. And moving the variable out avoids one unnecessary member access overhead.

uranusjr · 2022-01-04T04:35:40Z

One minor change, otherwise lgtm.

ephraimbuddy

We can avoid the duplication of the create dagruns method

ephraimbuddy · 2022-01-04T06:43:41Z

tests/api_connexion/endpoints/test_dag_run_endpoint.py

+            with create_session() as session:
+                session.add_all(dag_runs)
+                session.add_all(dags)
+        return dag_runs


I don't think we need this duplication, we can use _create_test_dag_run instead. I understand you want to limit it to 2 dagruns but that's the essence of the PR, we can filter down to 2. _create_test_dag_run takes state argument. You can call it twice or more with different states thereby creating more dagruns and then test the added filtering.

Hi @ephraimbuddy , If we follow that approach we get these errors:
E MySQLdb._exceptions.IntegrityError: (1062, "Duplicate entry 'TEST_DAG_ID-2020-06-11 18:00:00.000000' for key 'dag_run_dag_id_execution_date_key'")

The problem is that the method _create_test_dag_run uses the self.default_time and self.default_time_2 as execution time statically. So when we call the method twice, we have a violation of unique index.
To make this dinamically we should change also the way of assigning the execution dates...
How do you advice to proceed?
Thanks.

What do you think about changing the method _create_test_dag_run to avoid that, if it's possible let's do it instead

Ok @ephraimbuddy, let's try that. Thanks! I will try to minimize changes on the other parts of the code...

Dear @ephraimbuddy, I just made a new commit with the changes. Please let me know if you agree with them. Thanks so much.

ephraimbuddy · 2022-01-04T06:45:04Z

tests/api_connexion/endpoints/test_dag_run_endpoint.py

@@ -307,6 +336,41 @@ def test_should_respond_200(self, session):
            "total_entries": 2,
        }

+    def test_filter_by_state(self, session):
+        self._create_test_dag_run_with_queued()


Suggested change

self._create_test_dag_run_with_queued()

self._create_test_dag_run()

self._create_test_dag_run(state='queued')

Hi @ephraimbuddy, understood this but we need to clarify the previous point before going further with this. Thx.

ephraimbuddy · 2022-01-04T06:46:55Z

tests/api_connexion/endpoints/test_dag_run_endpoint.py

+            "api/v1/dags/TEST_DAG_ID/dagRuns?state=running,queued", environ_overrides={'REMOTE_USER': "test"}
+        )
+        assert response.status_code == 200
+        assert response.json == {


You may not need to check and list every fields. You can check important things like total returned, and states of the items returned

Hi @ephraimbuddy, OK understood. I can change that according to recommendation. Thanks.

…s recomendations

…s recomendation with testing queued

…s recomendation with testing queued and removing trailing whitespaces

…s recomendation List Optional

…s recomendation of refactoring _create_test_dag_run method

ephraimbuddy

LGTM. cc: @uranusjr @kaxil

github-actions · 2022-01-04T16:49:28Z

The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease.

boring-cyborg · 2022-01-05T06:58:21Z

Awesome work, congrats on your first merged pull request!

fbertos requested review from ephraimbuddy and mik-laj as code owners December 23, 2021 23:36

uranusjr reviewed Dec 24, 2021

View reviewed changes

ephraimbuddy reviewed Dec 24, 2021

View reviewed changes

kaxil reviewed Dec 30, 2021

View reviewed changes

uranusjr reviewed Jan 4, 2022

View reviewed changes

ephraimbuddy requested changes Jan 4, 2022

View reviewed changes

fbertos added 7 commits January 4, 2022 10:03

Add filter by state in DagRun REST API (List Dag Runs)

aaf175f

Add filter by state in DagRun REST API (List Dag Runs) after commiter…

0547eaa

…s recomendations

Add filter by state in DagRun REST API (List Dag Runs) after commiter…

d4dd702

…s recomendation with testing queued

Add filter by state in DagRun REST API (List Dag Runs) after commiter…

8481c0d

…s recomendation with testing queued and removing trailing whitespaces

Add filter by state in DagRun REST API (List Dag Runs) after commiter…

4d33bd0

…s recomendation with testing queued and removing trailing whitespaces

Add filter by state in DagRun REST API (List Dag Runs) after commiter…

d0dc414

…s recomendation List Optional

Add filter by state in DagRun REST API (List Dag Runs) after commiter…

2c9715a

…s recomendation of refactoring _create_test_dag_run method

ephraimbuddy approved these changes Jan 4, 2022

View reviewed changes

github-actions bot added the okay to merge It's ok to merge this PR as it does not require more tests label Jan 4, 2022

uranusjr merged commit b83084b into apache:main Jan 5, 2022

fbertos mentioned this pull request Jan 5, 2022

Rest API: allow filtering DagRuns by state. #16844

Closed

potiuk pushed a commit that referenced this pull request Jan 6, 2022

Modify Swagger documentation to align with PR #20485 (#20697)

376da6a

jedcunningham added the type:improvement Changelog: Improvements label Apr 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add filter by state in DagRun REST API (List Dag Runs) #20485

Add filter by state in DagRun REST API (List Dag Runs) #20485

fbertos commented Dec 23, 2021

boring-cyborg bot commented Dec 23, 2021

uranusjr left a comment

uranusjr Dec 24, 2021

ephraimbuddy Dec 24, 2021

ephraimbuddy commented Dec 24, 2021

fbertos commented Dec 24, 2021

ephraimbuddy Dec 24, 2021

fbertos commented Dec 24, 2021

uranusjr commented Dec 24, 2021

fbertos commented Dec 25, 2021

fbertos commented Dec 29, 2021

kaxil Dec 30, 2021

ephraimbuddy Jan 4, 2022

kaxil Dec 30, 2021

fbertos Dec 30, 2021 •

edited

Loading

fbertos Dec 30, 2021 •

edited

Loading

uranusjr Jan 4, 2022

kaxil left a comment

uranusjr Jan 4, 2022

uranusjr commented Jan 4, 2022

ephraimbuddy left a comment

ephraimbuddy Jan 4, 2022

fbertos Jan 4, 2022 •

edited

Loading

ephraimbuddy Jan 4, 2022

fbertos Jan 4, 2022 •

edited

Loading

fbertos Jan 4, 2022

ephraimbuddy Jan 4, 2022

fbertos Jan 4, 2022

ephraimbuddy Jan 4, 2022

fbertos Jan 4, 2022

ephraimbuddy left a comment

github-actions bot commented Jan 4, 2022

boring-cyborg bot commented Jan 5, 2022

	result = session.query(DagRun).all()
	assert len(result) == 2
	assert session.query(DagRun).count() == 2

	state: Optional[str] = None,
	states: Optional[List[str]] = None,

	self._create_test_dag_run_with_queued()
	self._create_test_dag_run()
	self._create_test_dag_run(state='queued')

Add filter by state in DagRun REST API (List Dag Runs) #20485

Add filter by state in DagRun REST API (List Dag Runs) #20485

Conversation

fbertos commented Dec 23, 2021

boring-cyborg bot commented Dec 23, 2021

uranusjr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ephraimbuddy commented Dec 24, 2021

fbertos commented Dec 24, 2021

Choose a reason for hiding this comment

fbertos commented Dec 24, 2021

uranusjr commented Dec 24, 2021

fbertos commented Dec 25, 2021

fbertos commented Dec 29, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fbertos Dec 30, 2021 • edited Loading

Choose a reason for hiding this comment

fbertos Dec 30, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaxil left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

uranusjr commented Jan 4, 2022

ephraimbuddy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fbertos Jan 4, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fbertos Jan 4, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ephraimbuddy left a comment

Choose a reason for hiding this comment

github-actions bot commented Jan 4, 2022

boring-cyborg bot commented Jan 5, 2022

fbertos Dec 30, 2021 •

edited

Loading

fbertos Dec 30, 2021 •

edited

Loading

fbertos Jan 4, 2022 •

edited

Loading

fbertos Jan 4, 2022 •

edited

Loading