Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to compress dag data #21332

Merged
merged 2 commits into from
Feb 15, 2022
Merged

Conversation

pingzh
Copy link
Contributor

@pingzh pingzh commented Feb 4, 2022

The uncompressed dag data size can be very large for large DAGs. In our prod db, the dag size can be up to 514MB.

Adding this optional feature to compress the dag data. It reduces the size from 514MB to 44MB.

By default, compress_serialized_dags is False.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@pingzh
Copy link
Contributor Author

pingzh commented Feb 7, 2022

@potiuk @ashb @kaxil @XD-DENG could you please take a look at this PR?


The CI failure isn't related to this PR.

image

https://github.com/apache/airflow/runs/5098104683?check_suite_focus=true#step:11:281

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of small changes/comments, one medium one about the dag dependencies view and a one big one about the right compression format to use.

Sounds broadly sensible though :)

airflow/config_templates/config.yml Outdated Show resolved Hide resolved
airflow/config_templates/config.yml Outdated Show resolved Hide resolved
airflow/models/serialized_dag.py Show resolved Hide resolved
tests/models/test_serialized_dag.py Outdated Show resolved Hide resolved
@pingzh pingzh force-pushed the pingzh-compress-data branch 2 times, most recently from 61f0df6 to 416d829 Compare February 7, 2022 22:58
@github-actions
Copy link

github-actions bot commented Feb 8, 2022

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Feb 8, 2022
@pingzh
Copy link
Contributor Author

pingzh commented Feb 8, 2022

hi @ashb @potiuk , do you think we can merge this PR? the CI failure isn't related to this PR. thanks

@pingzh
Copy link
Contributor Author

pingzh commented Feb 9, 2022

@ashb nice, after your rebase, the CI passed. hmm, very strange that my pushes did not pass the CI 🤦

@ashb
Copy link
Member

ashb commented Feb 9, 2022

That's cos we fixed the broken main :)

Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 nit

@pingzh pingzh force-pushed the pingzh-compress-data branch from 3b0deda to a79aafc Compare February 14, 2022 19:04
@pingzh pingzh force-pushed the pingzh-compress-data branch from a79aafc to 20ebdc3 Compare February 14, 2022 19:41
@pingzh
Copy link
Contributor Author

pingzh commented Feb 14, 2022

@kaxil addressed your feedback. also the the static CI check isn't related to this PR. can we merge this PR. thanks

@kaxil kaxil merged commit d07f140 into apache:main Feb 15, 2022
@pingzh pingzh deleted the pingzh-compress-data branch February 15, 2022 17:42
@jedcunningham jedcunningham added the type:new-feature Changelog: New Features label Feb 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:serialization full tests needed We need to run full set of tests for this PR to merge kind:documentation type:new-feature Changelog: New Features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants