Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-3160] Load latest_dagruns asynchronously, speed up front page load time #4005

Merged
merged 1 commit into from
Nov 5, 2018

Conversation

aoen
Copy link
Contributor

@aoen aoen commented Oct 5, 2018

Make sure you have checked all steps below.

Jira

Description

The front page loads very slowly when the DB has latency because one blocking query is made per DAG against the DB. Even if we were to add indexes, the queries should still be batched to avoid an overhead of RTT * # of DAGs.

The latest dagruns should be loaded asynchronously and in batch like the other UI elements that query the database.

From my tests I was able to get the front page to reduce page load time of the front page from 8s to ~0.7s after this change with a DB in the west part of the USA with the Airflow cluster in the east part of the USA.

The load on the DB should also be slightly reduced due to this change as well.

  • Here are some details about my PR, including screenshots of any UI changes:

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:
    UI changes only, tested on local webservers and our prod hosts.

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added.

Code Quality

  • Passes git diff upstream/master -u -- "*.py" | flake8 --diff

cc @ashb

@ashb
Copy link
Member

ashb commented Oct 5, 2018

The load on the DB should also be slightly reduced due to this change as well.

Aren't we still making all the same queries?

What's the time to the page being fully populated too please?

@aoen
Copy link
Contributor Author

aoen commented Oct 5, 2018

Good call on the RBAC templates, will make a fix.

Aren't we still making all the same queries?

Queries are slightly different, in the past we made one query per DAG, now it's just a single query, so the constant overhead of each query is removed.

What's the time to the page being fully populated too please?

Locally the time for the page to be fully populated is pretty much instant. Prod is within a second, the query is pretty much a more lightweight version of the one being used to fetch and display task instance state on the same page so performance shouldn't be a concern.

@aoen aoen force-pushed the ddavydov--lazy_load_last_dag_run branch 7 times, most recently from dfcaf50 to f5593a2 Compare October 5, 2018 20:11
@aoen
Copy link
Contributor Author

aoen commented Oct 5, 2018

@ashb addressed your comments, retrying CI now, but the two build jobs that failed out of nine were caused by CI flakiness.

@codecov-io
Copy link

codecov-io commented Oct 5, 2018

Codecov Report

Merging #4005 into master will decrease coverage by 0.04%.
The diff coverage is 57.69%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4005      +/-   ##
==========================================
- Coverage   75.79%   75.74%   -0.05%     
==========================================
  Files         199      199              
  Lines       15946    15972      +26     
==========================================
+ Hits        12086    12098      +12     
- Misses       3860     3874      +14
Impacted Files Coverage Δ
airflow/www/views.py 69.06% <100%> (+0.2%) ⬆️
airflow/www_rbac/views.py 71.54% <26.66%> (-0.5%) ⬇️
airflow/jobs.py 82.13% <0%> (-0.27%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b8be322...6607e48. Read the comment docs.

@aoen
Copy link
Contributor Author

aoen commented Oct 8, 2018

CI has passed, ready to be merged.

@aoen aoen force-pushed the ddavydov--lazy_load_last_dag_run branch from f5593a2 to 62aa5a7 Compare October 8, 2018 17:44
@aoen aoen force-pushed the ddavydov--lazy_load_last_dag_run branch from 62aa5a7 to 6607e48 Compare October 8, 2018 17:57
@codecov-io
Copy link

Codecov Report

Merging #4005 into master will decrease coverage by 0.04%.
The diff coverage is 57.69%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4005      +/-   ##
==========================================
- Coverage   75.79%   75.74%   -0.05%     
==========================================
  Files         199      199              
  Lines       15946    15972      +26     
==========================================
+ Hits        12086    12098      +12     
- Misses       3860     3874      +14
Impacted Files Coverage Δ
airflow/www/views.py 69.06% <100%> (+0.2%) ⬆️
airflow/www_rbac/views.py 71.54% <26.66%> (-0.5%) ⬇️
airflow/jobs.py 82.13% <0%> (-0.27%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b8be322...6607e48. Read the comment docs.

@aoen aoen changed the title [AIRFLOW-3160] Load latest_dagruns asynchronously [AIRFLOW-3160] Load latest_dagruns asynchronously, speed up front page load time Oct 10, 2018
@aoen
Copy link
Contributor Author

aoen commented Oct 15, 2018

Ready to merge FYI.

@aoen
Copy link
Contributor Author

aoen commented Oct 29, 2018

I'm happy to merge myself and take responsibility in fixing it if there is a breakage FYI.

@ashb

@Fokko Fokko merged commit 0287cce into apache:master Nov 5, 2018
@Fokko
Copy link
Contributor

Fokko commented Nov 5, 2018

Merged @aoen. High risk, high reward! 💪

@aoen
Copy link
Contributor Author

aoen commented Nov 5, 2018

@Fokko thanks! I owe you a PR review :).

@ashb
Copy link
Member

ashb commented Nov 6, 2018

@aoen Speak of the devil :D this seems to be causing test failures after this PR was merged, but only on MySQL somewhat odly. Can you take a look please, (we might revert this PR temporarily)

@ashb
Copy link
Member

ashb commented Nov 6, 2018

Examples that (we think) started happening after this PR was merged:

https://travis-ci.org/apache/incubator-airflow/jobs/451428747#L4660

in ERROR: test_success (tests.www_rbac.test_views.TestAirflowBaseViews) (postgres this time)

And same build on Mysql https://travis-ci.org/apache/incubator-airflow/jobs/451428748#L4660

Going to try reverting this PR and see if it fixes things, even though the error doesn't make any sense.

ashb added a commit to ashb/airflow that referenced this pull request Nov 6, 2018
ashb added a commit that referenced this pull request Nov 7, 2018
…4145)

* Revert "[AIRFLOW-3190] Make flake8 compliant"

This reverts commit 1691c98.

* Revert "[AIRFLOW-3160] Load latest_dagruns asynchronously (#4005)"

This reverts commit 0287cce.
wyndhblb pushed a commit to asappinc/incubator-airflow that referenced this pull request Nov 9, 2018
wyndhblb pushed a commit to asappinc/incubator-airflow that referenced this pull request Nov 9, 2018
…" (apache#4145)

* Revert "[AIRFLOW-3190] Make flake8 compliant"

This reverts commit 1691c98.

* Revert "[AIRFLOW-3160] Load latest_dagruns asynchronously (apache#4005)"

This reverts commit 0287cce.
tmiller-msft pushed a commit to cse-airflow/incubator-airflow that referenced this pull request Nov 27, 2018
tmiller-msft pushed a commit to cse-airflow/incubator-airflow that referenced this pull request Nov 27, 2018
…" (apache#4145)

* Revert "[AIRFLOW-3190] Make flake8 compliant"

This reverts commit 1691c98.

* Revert "[AIRFLOW-3160] Load latest_dagruns asynchronously (apache#4005)"

This reverts commit 0287cce.
elizabethhalper pushed a commit to cse-airflow/incubator-airflow that referenced this pull request Dec 7, 2018
elizabethhalper pushed a commit to cse-airflow/incubator-airflow that referenced this pull request Dec 7, 2018
…" (apache#4145)

* Revert "[AIRFLOW-3190] Make flake8 compliant"

This reverts commit 1691c98.

* Revert "[AIRFLOW-3160] Load latest_dagruns asynchronously (apache#4005)"

This reverts commit 0287cce.
aliceabe pushed a commit to aliceabe/incubator-airflow that referenced this pull request Jan 3, 2019
aliceabe pushed a commit to aliceabe/incubator-airflow that referenced this pull request Jan 3, 2019
…" (apache#4145)

* Revert "[AIRFLOW-3190] Make flake8 compliant"

This reverts commit 1691c98.

* Revert "[AIRFLOW-3160] Load latest_dagruns asynchronously (apache#4005)"

This reverts commit 0287cce.
@aoen
Copy link
Contributor Author

aoen commented May 29, 2019

@ashb Tests were passing for me before I merged this several times, and it looks like the old logs are gone. Did the issue disappear after the commit was reverted? Does look suspicious I admit. This has been running without issue in our prod. Let me try to run this 10 or so times in CI and see if it's still failing, if it is not I propose remerging, and at least getting a traceback next time it fails in master CI to help fix (I can monitor CI for a while to make sure).

Also sorry for the late reply, need to figure out why I'm not getting email notifications anymore...

@ashb
Copy link
Member

ashb commented May 30, 2019

:D I don't remember exactly what was going on anymore, sorry. It was probable this PR wasn't at fault

@aoen
Copy link
Contributor Author

aoen commented May 30, 2019

@ashb Got it, I think it was a reasonable course of action, mind taking a look at the new PR? It's a lot tinier since non-RBAC has been deprecated, and the testing framework has been made simpler: #5339

wmorris75 pushed a commit to modmed/incubator-airflow that referenced this pull request Jul 29, 2019
wmorris75 pushed a commit to modmed/incubator-airflow that referenced this pull request Jul 29, 2019
…" (apache#4145)

* Revert "[AIRFLOW-3190] Make flake8 compliant"

This reverts commit 1691c98.

* Revert "[AIRFLOW-3160] Load latest_dagruns asynchronously (apache#4005)"

This reverts commit 0287cce.
vshshjn7 pushed a commit to vshshjn7/airflow that referenced this pull request Aug 28, 2019
aoen pushed a commit to twitter-forks/airflow that referenced this pull request Sep 11, 2019
…synchronously, speed up front page load time apache#4005 (#7)

* fb64f2e: [TWTR][AIRFLOW-XXX] Twitter Airflow Customizations + Fixup job scheduling without explicit_defaults_for_timestamp

* reformat

* 6607e48(airflow:master): [AIRFLOW-3160] Load latest_dagruns asynchronously, speed up front page load time apache#4005

* flake8 fix
aoen pushed a commit to twitter-forks/airflow that referenced this pull request Sep 11, 2019
… Default Retries and fix a small DAG refresh bug (#8)

* fb64f2e: [TWTR][AIRFLOW-XXX] Twitter Airflow Customizations + Fixup job scheduling without explicit_defaults_for_timestamp

* reformat

* 6607e48(airflow:master): [AIRFLOW-3160] Load latest_dagruns asynchronously, speed up front page load time apache#4005

* a93d550:

* a93d550: (HEAD, twitter/1.10+twtr) [TWTR][[AIRFLOW-4939]] Add Default Retries and fix a small DAG refresh bug (#3) (2 weeks ago)

* flake8 fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants