Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reproducibility of provider builds #43557

Merged
merged 1 commit into from
Oct 31, 2024

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Oct 31, 2024

Four hours ago Flit released a new version (3.10.0) that bumped versio of metadata used to generate providers. That revealed a bug in the reproducibility setup for our providers. Previously the providers had flit>=3.2,<4 as build dependency - but that caused newer flit version to be used during the build and - since flit bumped the metadata-version produced, it caused the packages generated to be binary non-reproducible.

This PR fixes it by "pinning" flit version (all build dependencies should generally be pinned to provide reproducibility)


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk
Copy link
Member Author

potiuk commented Oct 31, 2024

This is the day of weirdest bugs ever. Flit released 3.10.0 today. Which uncovered problem in reproducibility of package generation:

image

Four hours ago Flit released a new version (3.10.0) that bumped
versio of metadata used to generate providers. That revealed a bug
in the reproducibility setup for our providers. Previously the
providers had flit>=3.2,<4 as build dependency - but that caused
newer flit version to be used during the build and - since flit
bumped the metadata-version produced, it caused the packages
generated to be binary non-reproducible.

This PR fixes it by "pinning" flit version (all build dependencies
should generally be pinned to provide reproducibility)
@potiuk potiuk force-pushed the fix-reproducibility-of-provider-builds branch from 5de52ff to ea73ccd Compare October 31, 2024 18:50
@ashb
Copy link
Member

ashb commented Oct 31, 2024

it caused the packages generated to be binary non-reproducible.

What does this mean?

@potiuk
Copy link
Member Author

potiuk commented Oct 31, 2024

it caused the packages generated to be binary non-reproducible.

What does this mean?

All the packages we release in airflow are binary reproducible - which means that whoever builds them gets the excat binary identtical packages (or that's how it was supposed to be but there was this bug). This follows the highly recommended in ASF (and in the future likely mandatory or at least expected until you have good reason not to have them) property of produced artifacts to be binary reproducible as this heavily improves security - especially for supply dependency chain.

We implemented reproducible builds with the help of Sovereign Tech Fund fund last year. And all our builds are (well - not reallly, there is this bug which made them not reproducible).

We added reproducible check as one of the gates to pass when we vote by the PMC - some 5 moths ago - this is actually how I found today that our provider package builds are not reproducible today.

The new Apache Trusted Releases platform to release packages that ASF infrastructure works on (and soon there will likely be beta - likely and we wil be one of the first users of I hope) will have specifilc features desgned around binary reproducibility that will make it more secure (for example binary reproducibility adds another layer of protection - for example with build reproducibility, you can vastly simplify the checks if the hardware that you use to build packages on or tooling you use to build your packages have not beem compromised.

Also binary reproducibility is pre-requisite (from the ASF policies point of view) to automate uploads of the artifacts of our to PyPI direcfly via Github Actions (this is tracked and hopefully we will have it soon #41937) via Trusted Publishing - which is yet another level of security of distrubution that we work on together with the ASF and PSF for Airflow and for Airflow software supply chain (Trusted Publishing is part of the "Airflow Beach Cleaning" i work on)

You can read more about buld reproducibility and why it matters here:

https://reproducible-builds.org/

@potiuk potiuk merged commit 50dcb27 into apache:main Oct 31, 2024
82 checks passed
@potiuk potiuk deleted the fix-reproducibility-of-provider-builds branch October 31, 2024 20:42
ellisms pushed a commit to ellisms/airflow that referenced this pull request Nov 13, 2024
Four hours ago Flit released a new version (3.10.0) that bumped
versio of metadata used to generate providers. That revealed a bug
in the reproducibility setup for our providers. Previously the
providers had flit>=3.2,<4 as build dependency - but that caused
newer flit version to be used during the build and - since flit
bumped the metadata-version produced, it caused the packages
generated to be binary non-reproducible.

This PR fixes it by "pinning" flit version (all build dependencies
should generally be pinned to provide reproducibility)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants