Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add filter by state in DagRun REST API (List Dag Runs) #20485

Merged
merged 7 commits into from
Jan 5, 2022
Merged

Add filter by state in DagRun REST API (List Dag Runs) #20485

merged 7 commits into from
Jan 5, 2022

Conversation

fbertos
Copy link
Contributor

@fbertos fbertos commented Dec 23, 2021


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@boring-cyborg
Copy link

boring-cyborg bot commented Dec 23, 2021

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (flake8, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: [email protected]
    Slack: https://s.apache.org/airflow-slack

Copy link
Member

@uranusjr uranusjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test to filter by none state? I believe this is the state that causes the most problems (because the internal representation is not str). And there is actually an util function somewhere (for task instances? @ephraimbuddy may remember better) for this problem.

Comment on lines 312 to 313
result = session.query(DagRun).all()
assert len(result) == 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
result = session.query(DagRun).all()
assert len(result) == 2
assert session.query(DagRun).count() == 2

Comment on lines 168 to 169
if state:
query = query.filter(DagRun.state == state)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if state:
query = query.filter(DagRun.state == state)
if state:
query = query.filter(DagRun.state.in_(state))

The FilterState query component is an array.

@ephraimbuddy
Copy link
Contributor

Can you add a test to filter by none state? I believe this is the state that causes the most problems (because the internal representation is not str). And there is actually an util function somewhere (for task instances? @ephraimbuddy may remember better) for this problem.

I think we are fine because we are dealing with dagruns which has only 3 states

@fbertos
Copy link
Contributor Author

fbertos commented Dec 24, 2021

@uranusjr @ephraimbuddy thank you very much for your recommendations, I just made them in a new commit. Finally I did not add a test to filter by state None as @ephraimbuddy mentioned. Thanks

self._create_test_dag_run()
assert session.query(DagRun).count() == 2
response = self.client.get(
"api/v1/dags/TEST_DAG_ID/dagRuns?state=running,failed", environ_overrides={'REMOTE_USER': "test"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failed state is not tested here. Can we fail one dagrun or set the state to queued and adjust the test to match

@fbertos
Copy link
Contributor Author

fbertos commented Dec 24, 2021

@ephraimbuddy @uranusjr also I got a failed result on static checks with this details but I do not understand:
black....................................................................................Failed

  • hook id: black
  • files were modified by this hook

Indeed I did not touch any hook in this PR, could you please help me to understand this issue? Thanks!

@uranusjr
Copy link
Member

You have trailing whitespaces in your code.

@fbertos
Copy link
Contributor Author

fbertos commented Dec 25, 2021

@uranusjr, @ephraimbuddy all your guidelines were done. Is everything ok now or should I do something else? Thanks

@fbertos
Copy link
Contributor Author

fbertos commented Dec 29, 2021

Hi @mik-laj, looks like this PR is pending of your review. Is there anything else I should check/change? This is my first PR so I am a little bit lost. Thank you for your help!!

@@ -149,6 +149,7 @@ def get_dag_runs(
execution_date_lte: Optional[str] = None,
end_date_gte: Optional[str] = None,
end_date_lte: Optional[str] = None,
state: Optional[str] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
state: Optional[str] = None,
states: Optional[List[str]] = None,

cc @ephraimbuddy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have state in open API doc instead of states, I don't think it'll work to change it to states

Comment on lines +168 to +169
if state:
query = query.filter(DagRun.state.in_(state))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if state:
query = query.filter(DagRun.state.in_(state))
if states:
query = query.filter(DagRun.state.in_(states))

Copy link
Contributor Author

@fbertos fbertos Dec 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kaxil, thanks for your recommendations. I will change the datatype to Optional[List[str]], however I am not sure it is a good idea to change to plural as the FilterState query component is defined in singular in the specification v1.yml, even being an array:

    FilterState:
      in: query
      **name: state**
      schema:
        **type: array**
        items:
          type: string
      required: false
      description:
        The value can be repeated to retrieve multiple matching values (OR condition).

To change that to plural I would need to create a new FilterStates query component....

On the other hand in the batch method I did use plural to match with the other existing fields (for instance dag_ids):

{
   "order_by": "string",
   "page_offset": 0,
   "page_limit": 100,
   "**dag_ids**": [
      "string"
   ],
   ...
   "states": [
       "string"
   ]
}

Copy link
Contributor Author

@fbertos fbertos Dec 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kaxil, I just did the change on the datatype to Optional[List[str]] as you mentioned but now when I try to run the static ckecks I am getting hundreds of errors (mypy). Is this normal?, what am I doing wrong? Maybe it is nothing to do with my change...? Thanks so much for any help.

cc @ephraimbuddy, @uranusjr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as CI passes it should be OK (not optimal, but OK if you don’t want to fix those).

Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall lgtm, 2 minor suggestions

Comment on lines 210 to 212
if data.get("states"):
states = set(data["states"])
query = query.filter(DagRun.state.in_(states))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if data.get("states"):
states = set(data["states"])
query = query.filter(DagRun.state.in_(states))
states = data.get("states")
if states:
query = query.filter(DagRun.state.in_(states))

Python variables are function-scoped, not block-scoped, so it’s not useful to put the variable inside the if block. And moving the variable out avoids one unnecessary member access overhead.

@uranusjr
Copy link
Member

uranusjr commented Jan 4, 2022

One minor change, otherwise lgtm.

Copy link
Contributor

@ephraimbuddy ephraimbuddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can avoid the duplication of the create dagruns method

with create_session() as session:
session.add_all(dag_runs)
session.add_all(dags)
return dag_runs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this duplication, we can use _create_test_dag_run instead. I understand you want to limit it to 2 dagruns but that's the essence of the PR, we can filter down to 2. _create_test_dag_run takes state argument. You can call it twice or more with different states thereby creating more dagruns and then test the added filtering.

Copy link
Contributor Author

@fbertos fbertos Jan 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ephraimbuddy , If we follow that approach we get these errors:
E MySQLdb._exceptions.IntegrityError: (1062, "Duplicate entry 'TEST_DAG_ID-2020-06-11 18:00:00.000000' for key 'dag_run_dag_id_execution_date_key'")

The problem is that the method _create_test_dag_run uses the self.default_time and self.default_time_2 as execution time statically. So when we call the method twice, we have a violation of unique index.
To make this dinamically we should change also the way of assigning the execution dates...
How do you advice to proceed?
Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about changing the method _create_test_dag_run to avoid that, if it's possible let's do it instead

Copy link
Contributor Author

@fbertos fbertos Jan 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok @ephraimbuddy, let's try that. Thanks! I will try to minimize changes on the other parts of the code...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear @ephraimbuddy, I just made a new commit with the changes. Please let me know if you agree with them. Thanks so much.

@@ -307,6 +336,41 @@ def test_should_respond_200(self, session):
"total_entries": 2,
}

def test_filter_by_state(self, session):
self._create_test_dag_run_with_queued()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self._create_test_dag_run_with_queued()
self._create_test_dag_run()
self._create_test_dag_run(state='queued')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ephraimbuddy, understood this but we need to clarify the previous point before going further with this. Thx.

"api/v1/dags/TEST_DAG_ID/dagRuns?state=running,queued", environ_overrides={'REMOTE_USER': "test"}
)
assert response.status_code == 200
assert response.json == {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may not need to check and list every fields. You can check important things like total returned, and states of the items returned

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ephraimbuddy, OK understood. I can change that according to recommendation. Thanks.

Copy link
Contributor

@ephraimbuddy ephraimbuddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. cc: @uranusjr @kaxil

@github-actions github-actions bot added the okay to merge It's ok to merge this PR as it does not require more tests label Jan 4, 2022
@github-actions
Copy link

github-actions bot commented Jan 4, 2022

The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease.

@uranusjr uranusjr merged commit b83084b into apache:main Jan 5, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Jan 5, 2022

Awesome work, congrats on your first merged pull request!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:API Airflow's REST/HTTP API okay to merge It's ok to merge this PR as it does not require more tests type:improvement Changelog: Improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants