Replies: 5 comments 13 replies
-
Maybe convert it into a feature request. Sounds plausible. |
Beta Was this translation helpful? Give feedback.
-
We recently (2.4?) added updated_at columns to dag_run and task_instance tables, so we can now add much better filtering/polling: "Give me anything that changed (to this dag/all dags) since time X" |
Beta Was this translation helpful? Give feedback.
-
+1 to adding updated_after a query param for dag run and task instance
endpoints. That could even be marked as a "good first issue" for new
contributors.
…On Fri, Nov 18, 2022 at 8:01 AM Ash Berlin-Taylor ***@***.***> wrote:
Getting push based notification of changes is a very non-trivial thing to
achieve from an architectural stand point, and while it would be nice,
adding "global" support for updated_after to the API would be an small
change to get to an 80% solution
—
Reply to this email directly, view it on GitHub
<#27765 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABDDJBYOKXQGAPN5P4VBJHTWI54Y5ANCNFSM6AAAAAASD66SXE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
+1 on |
Beta Was this translation helpful? Give feedback.
-
+1 too. We just need to actually implement it :) |
Beta Was this translation helpful? Give feedback.
-
Airflow/MWAA does not seem to have any scalable API for returning the status of a dagRun, the APIs states-for-dag-run or list-runs are not scaling well. To fetch the dagRun status, every team seems to have some custom solution using sns_notification or updating the status to some external data store via Airflow callbacks.
My use case is to fetch the Dag status of all the Active runs and update the status tables in the system. There is a poller (with a timeout of 150s configured based on our SLA). The states-for-dag-run API seems to be doing scan operation internally. As the number of DAG runs in system increases, the time to get the status of dagRun increases further. Initially, fetching the status of 100 runs took 2.5 minutes. With increase of dagRuns in the system by 50, the fetch operation to get status for 100 dagRuns is taking more than 5 minutes.
Fetching the status of a workflow is a very common use case and I think airflow should provide a scalable API for doing this.
Beta Was this translation helpful? Give feedback.
All reactions