Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize caching installed packages in CI build #37315

Merged

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Feb 10, 2024

Some of the recent changes in handling conflicting dependencies broke optimization of installing dependencies from branch tip.

The optimisation worked in the way that it installed packages first from branch tip, to make them pre-installed (and cached in docker layer) so that final installatin step with pyproject.toml takes very little time, even if it is changed.

The problem was that in case branch tip and constraints conflicted, the installation failed and effectively no packages were installed in the "branch tip" layer, effectively removing the cache.

This change fixes it - when we install from branch tip now we are not using constraints, which means that they will never conflict, and this also means that cache will never be empty. It can contain other versions of some of the packages, but vast majority of the packages shoudo be the same as in constraints, so the following installation step should reuse vast majority of already installed packages.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

Some of the recent changes in handling conflicting dependencies
broke optimization of installing dependencies from branch tip.

The optimisation worked in the way that it installed packages first
from branch tip, to make them pre-installed (and cached in docker
layer) so that final installatin step with pyproject.toml takes
very little time, even if it is changed.

The problem was that in case branch tip and constraints conflicted,
the installation failed and effectively no packages were installed in
the "branch tip" layer, effectively removing the cache.

This change fixes it - when we install from branch tip now we are not
using constraints, which means that they will never conflict, and
this also means that cache will never be empty. It can contain other
versions of some of the packages, but vast majority of the packages
shoudo be the same as in constraints, so the following installation
step should reuse vast majority of already installed packages.
@potiuk potiuk requested a review from ashb as a code owner February 10, 2024 20:24
@boring-cyborg boring-cyborg bot added area:dev-tools area:production-image Production image improvements and fixes labels Feb 10, 2024
@potiuk
Copy link
Member Author

potiuk commented Feb 10, 2024

Found the reason why some CI image builds are taking 10 minutes instead of expected 2-3 minutes.

@potiuk potiuk changed the title Optimize cachine installed packages in CI build Optimize caching installed packages in CI build Feb 10, 2024
@potiuk potiuk requested a review from eladkal February 10, 2024 22:17
@potiuk
Copy link
Member Author

potiuk commented Feb 11, 2024

Once we get that in, and cache is refreshed, the builds with changed dependencies should takes 5 minutes instead of > 20 minutes BTW.

@hussein-awala
Copy link
Member

It timeouts after 70 minutes in #37151, is it related?

@potiuk potiuk merged commit 90a650d into apache:main Feb 11, 2024
82 checks passed
@potiuk potiuk deleted the optimize-installing-airflow-from-branch-tip branch February 11, 2024 13:24
@potiuk
Copy link
Member Author

potiuk commented Feb 11, 2024

It timeouts after 70 minutes in #37151, is it related?

Likely not - probably there are some conflicting dependencies there (likely pytest>8 conflicts with something). I can check it in a moment.

The way how to check it is to simply get your PR and run breeze ci-image build --upgrade-to-newer-dependencies --build-progress plain and you will see what's going on locallly. I will do it in a moment and see what's going on

@potiuk potiuk added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Feb 12, 2024
@potiuk potiuk added this to the Airflow 2.8.2 milestone Feb 12, 2024
potiuk added a commit that referenced this pull request Feb 13, 2024
Some of the recent changes in handling conflicting dependencies
broke optimization of installing dependencies from branch tip.

The optimisation worked in the way that it installed packages first
from branch tip, to make them pre-installed (and cached in docker
layer) so that final installatin step with pyproject.toml takes
very little time, even if it is changed.

The problem was that in case branch tip and constraints conflicted,
the installation failed and effectively no packages were installed in
the "branch tip" layer, effectively removing the cache.

This change fixes it - when we install from branch tip now we are not
using constraints, which means that they will never conflict, and
this also means that cache will never be empty. It can contain other
versions of some of the packages, but vast majority of the packages
shoudo be the same as in constraints, so the following installation
step should reuse vast majority of already installed packages.

(cherry picked from commit 90a650d)
ephraimbuddy pushed a commit that referenced this pull request Feb 22, 2024
Some of the recent changes in handling conflicting dependencies
broke optimization of installing dependencies from branch tip.

The optimisation worked in the way that it installed packages first
from branch tip, to make them pre-installed (and cached in docker
layer) so that final installatin step with pyproject.toml takes
very little time, even if it is changed.

The problem was that in case branch tip and constraints conflicted,
the installation failed and effectively no packages were installed in
the "branch tip" layer, effectively removing the cache.

This change fixes it - when we install from branch tip now we are not
using constraints, which means that they will never conflict, and
this also means that cache will never be empty. It can contain other
versions of some of the packages, but vast majority of the packages
shoudo be the same as in constraints, so the following installation
step should reuse vast majority of already installed packages.

(cherry picked from commit 90a650d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dev-tools area:production-image Production image improvements and fixes changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants