Remove dag parsing from airflow db init
command
#22531
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When running
airflow db init
, Airflow will parse all of the DAGs in the configured dag folder sequentially in a single process. When there are a large number of DAGs present this can significantly slow down the time it takes for thedb init
command to run.In my opinion, initializing the DB and populating it with data are separate tasks and shouldn't be combined into a single function. The background DAG processor is also much faster at parsing files and populating the DB due to using multiprocessing.
I propose splitting the bootstrapping of the DagBag out into a separate function (so as to not introduce any changes to the test setup / teardown process) and removing it from the
db init
anddb reset
commands.Additionally, removing this PR might be required for AIP-43 anyways
Let me know if there's anything I'm missing here, or if there's an explanation for parsing DAGs here which I may have missed.