-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Issues on Initial Page Load #38776
Comments
hi @vchiapaikeo , were you able to make progress on this? Would like to learn if you added any additional indexes or so and would be more than happy to collaborate with you on this one. My guess is we might need few more indexes by looking at the slow queries. |
Hey @pankajkoti ! Yes, that would be wonderful! The only thing I can think of from looking at the queries and the explain plan is something around the dag table - since there already is an index on dag_run and the state col and the full table scans are happening there. Based on those explain plans, perhaps something on the dag table around dag_id, is_subdag, and is_active? I see two existing indices on dag right now: CREATE INDEX idx_root_dag_id
ON dag (root_dag_id ASC);
CREATE INDEX idx_next_dagrun_create_after
ON dag (next_dagrun_create_after ASC); The other thing I was thinking about (but have just been very slow to implement and try) is a ThreadPool for some of the queries in the /home route. The slow queries are here and are retrieved from this call. These get loaded serially here --> https://github.com/apache/airflow/blob/main/airflow/www/views.py#L858-L859 My thought is that these queries plus this one can all get submitted to a threadpool for execution so that they run concurrently. Wanted to get the teams' thoughts on that and if you think it would work. The implementation shouldn't be too difficult there. |
Related work on making sql queries async. cc @hussein-awala |
@hussein-awala , a ThreadPoolExecutor implementation would be easier to get going intially but my guess is that the switch to async calls here makes more sense overall and would be cleaner. Let me know what ya think when you have a few mins! |
These queries (https://github.com/apache/airflow/blob/main/airflow/www/views.py#L855, https://github.com/apache/airflow/blob/main/airflow/www/views.py#L860, https://github.com/apache/airflow/blob/main/airflow/www/views.py#L866, and so on) are executed sequentially as specified by @vchiapaikeo. I attempted to run these queries locally with 4000 DAGs, and it took 1.86 seconds to display the running DAGs. Running these critical database queries concurrently in the index method using ThreadPoolExecutor may not be the best approach. Using threads means it still depends on blocking I/O, which can lead to inefficiencies, especially under heavy load limited by the Python Global Interpreter Lock. Using async queries seems like a better approach #36504. SQLAlchemy has supported Asynchronous I/O (asyncio) since 1.4, which we can utilize here. (https://docs.sqlalchemy.org/en/14/orm/extensions/asyncio.html) cc: @tirkarthi @vchiapaikeo @pankajkoti have you tried adding additional indexes as suggested in the thread? I tried it in my local setup using breeze but it doesn't seem to improve the performance drastically. |
Right, makes sense - it sounds like the best solution then may be to wait for @hussein-awala 's AIP on introducing async SQLAlchemy sessions into the codebase (seems like they will start with connections and variables first) and then calling these db functions asynchronously on the webserver. Wdyt? |
@sunank200 yes, I did try adding few indexes based on the SQL queries that get fired but I did not observe any improvement either |
Sure. Waiting for response from @hussein-awala. |
This issue has been automatically marked as stale because it has been open for 14 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author. |
This issue has been closed because it has not received response from the issue author. |
Apache Airflow version
Other Airflow 2 version (please specify below)
If "Other Airflow 2 version" selected, which one?
2.7.3
What happened?
We have started to notice more slowness on initial page load. On our deployment, it takes nearly 6s for the server to respond.
We started to do some analysis of the queries that are being run and we found 3 that take over 1400ms (and likely are being run serially).
All of these queries are attached in the following txt file with explain plans and explain analyzes:
slow_queries_on_initial_load.txt
What you think should happen instead?
We are wondering if there are any ways we could possibly improve the performance of initial page load. Some ideas that come to mind:
How to reproduce
Add ~2-4k dags to an Airflow deployment and reload the initial page
Operating System
Debian 11
Versions of Apache Airflow Providers
N/A
Deployment
Official Apache Airflow Helm Chart
Deployment details
We deploy our cluster on GCP's GKE
Anything else?
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: