Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove dag parsing from airflow db init command #22531

Merged
merged 1 commit into from
Mar 31, 2022

Conversation

SamWheating
Copy link
Contributor

@SamWheating SamWheating commented Mar 25, 2022

When running airflow db init, Airflow will parse all of the DAGs in the configured dag folder sequentially in a single process. When there are a large number of DAGs present this can significantly slow down the time it takes for the db init command to run.

In my opinion, initializing the DB and populating it with data are separate tasks and shouldn't be combined into a single function. The background DAG processor is also much faster at parsing files and populating the DB due to using multiprocessing.

I propose splitting the bootstrapping of the DagBag out into a separate function (so as to not introduce any changes to the test setup / teardown process) and removing it from the db init and db reset commands.

Additionally, removing this PR might be required for AIP-43 anyways

Let me know if there's anything I'm missing here, or if there's an explanation for parsing DAGs here which I may have missed.

@SamWheating SamWheating force-pushed the sw-remove-dag-parsing-from-db-init branch from 1e72cc8 to 9fc83bc Compare March 25, 2022 17:32
@potiuk
Copy link
Member

potiuk commented Mar 31, 2022

Nice one! @kaxil @ashb @ephraimbuddy -> WDYT?

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Mar 31, 2022
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@potiuk potiuk merged commit 8079b4c into apache:main Mar 31, 2022
@ephraimbuddy ephraimbuddy added the type:misc/internal Changelog: Misc changes that should appear in change log label Apr 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
full tests needed We need to run full set of tests for this PR to merge type:misc/internal Changelog: Misc changes that should appear in change log
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants