Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Webserver is slow after upgrading to v2.9.3 #41851

Closed
1 of 2 tasks
TreasureMaster opened this issue Aug 29, 2024 · 5 comments
Closed
1 of 2 tasks

Webserver is slow after upgrading to v2.9.3 #41851

TreasureMaster opened this issue Aug 29, 2024 · 5 comments
Labels
area:core area:performance area:webserver Webserver related Issues kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet pending-response stale Stale PRs per the .github/workflows/stale.yml policy file

Comments

@TreasureMaster
Copy link

TreasureMaster commented Aug 29, 2024

Apache Airflow version

2.9.3

If "Other Airflow 2 version" selected, which one?

No response

What happened?

We have updated the version from 2.6.2 to 2.9.3. After the update the main page opens very slowly. The 'failed' button causes Airflow to fail due to a timeout.
Start the discussion here

What you think should happen instead?

We have 3200 Dags now. And their number is increasing.
The slowdown is caused by line 855 in the file /airflow/www/views.py:

status_count_failed = get_query_count(failed_dags, session=session)

But not all of it, but only that part of it that filters dags by access rights. We commented out line 799 to prevent filtering by IN. This row:

dags_query = dags_query.where(DagModel.dag_id.in_(filter_dag_ids))

This filter condition is not optimal when querying and is very slow. The slow part is highlighted in the query
airflow-failed-sql

It is converted from this line:

WHERE NOT dag.is_subdag AND dag.is_active AND dag.dag_id IN (__[POSTCOMPILE_dag_id_1])

The entire request takes between 60 and 70 seconds on our main page. But without this filtering part, the request takes just over 1 second.
My dbeaver hangs during DB query due to lack of memory.

Is it possible to optimize this query? Or split it into parts and filter by permissions it using Python.

How to reproduce

You need to have a large number of dags and dagruns.

Operating System

CentOS 7

Versions of Apache Airflow Providers

No response

Deployment

Docker-Compose

Deployment details

Our infrastructure:

  1. database server (Postgres 13)
  2. server with Airflow webserver, scheduler and Redis
  3. three servers with workers

Modified bitnami images are used. Additional libraries installed.

Anything else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@TreasureMaster TreasureMaster added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Aug 29, 2024
Copy link

boring-cyborg bot commented Aug 29, 2024

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@dosubot dosubot bot added area:performance area:webserver Webserver related Issues labels Aug 29, 2024
@tirkarthi
Copy link
Contributor

Related

#38776
#40547

@jscheffl
Copy link
Contributor

jscheffl commented Aug 30, 2024

I would assume a setup with 3000+ DAGs is nothing that is running w/o special attention in regards of performance. Re-writing the queries of course is possible but also the complexity of data selection might be expensive.

Plans are ongoing to re-write the UI and also make async calls in a future Airflow 3, I would not expect a major investment in current Airflow 2.10-line.

Some options that might help compensating - as you are already in level of patching code:

  • Have you attempted to analyze the query and add more resources to the DB to improve queries or add specific indexes?
  • Do you use DAG level permissions? If not, as you are patching the code, removing the DAG access level filter might be a simple option to improve query
  • Have you tested with Airflow 2.10 if this improves the situation?
  • Would you be willing to supply a PR as performance patch?

Copy link

This issue has been automatically marked as stale because it has been open for 14 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Sep 14, 2024
Copy link

This issue has been closed because it has not received response from the issue author.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core area:performance area:webserver Webserver related Issues kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet pending-response stale Stale PRs per the .github/workflows/stale.yml policy file
Projects
None yet
Development

No branches or pull requests

4 participants
@tirkarthi @TreasureMaster @jscheffl and others