-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmake: compute and display the reproducible checksum by default #51954
Conversation
Temporary bugs, corner cases and obsolete toolchains aside, the Zephyr build is most of the time reproducible: zephyrproject-rtos#50205 and zephyrproject-rtos#14593. This means two different build machines using the same toolchain will always produce the same binary output. The one-line addition in this commit makes it trivial to verify that binary outputs are indeed the same by adding a single checksum line in the build logs: ``` [16/16] Linking C executable zephyr/zephyr.elf Memory region Used Size Region Size %age Used RAM: 53280 B 3 MB 1.69% IDT_LIST: 0 GB 2 KB 0.00% fdd2ddf2ad7d5da5bbd79b41cef...7b16ef549a8281111d8e205 zephyr.strip ``` This commit makes a non-measurable build time difference. Build reproducibility matters for (at least) two important reasons: - Security / supply chain attacks, see https://www.cisa.gov/sbom, zephyrproject-rtos#50205, https://reproducible-builds.org/ and many others. - Making sure build configurations are strictly identical when trying to reproduce elusive issues or when issuing releases. Displaying a reproducible checksum accelerates the investigation of temporary reproducibility issues like zephyrproject-rtos#48195. Signed-off-by: Marc Herbert <[email protected]>
Temporary bugs, corner cases and obsolete toolchains aside, the Zephyr build is reproducible most of the time: zephyrproject-rtos#50205 and zephyrproject-rtos#14593 This means two different build machines using the same toolchain will always produce the same binary output. The previous, one-line commit made it trivial to verify that binary outputs are indeed the same by adding this single line in the buid logs: ``` [16/16] Linking C executable zephyr/zephyr.elf Memory region Used Size Region Size %age Used RAM: 53280 B 3 MB 1.69% IDT_LIST: 0 GB 2 KB 0.00% fdd2ddf2ad7d5da5bbd79b41cef8d7...1a896b989a8281111d8e205 zephyr.strip ``` This commit enables that feature by default because build reproducibility matters for (at least) two important reasons: - Security / supply chain attacks, see https://www.cisa.gov/sbom, zephyrproject-rtos#50205, https://reproducible-builds.org/ and many others. - Making sure build configurations are strictly identical when trying to reproduce elusive issues or when issuing releases. It was of course already possible to _manually_ make this Kconfig change and manually compute this checksum. However this can be impossible when dealing with an automated build system that does not archive all _intermediate_ (zephyrproject-rtos#5009) files like `zephyr.elf`. Tweaking the build configuration can also be difficult and error-prone for people who are not Zephyr developers. Most automated CI systems preserve build logs by default. Displaying the reproducible checksum by default accelerates the discovery of reproducibility bugs like zephyrproject-rtos#48195. When measured with `west build -p -b qemu_x86 samples/hello_world/`, the additional `build/zephyr/zephyr.strip` disk space required is 43 kilobytes compared to a total of 11 Megabytes. Measuring a more realistic SOF example, `zephyr.strip` weighed 690 kb which was about 0.1% of a total `build/` directory weighing 65M. To measure the build time cost I ran `west build -p -b qemu_x86 samples/hello_world/` many times in a loop with and without this PR on my Linux workstation. Stripping and checksumming made literally no time difference compared to the "noise" observed when building the same configuration. This is not surprising considering how small `zephyr.strip`: so the extra cost is most likely dominated by process creation and the total number of processes created during a Zephyr build dwarfs the few extra processes required by this feature. More surprisingly, I measured incremental builds by running `touch kernel/timer.c; west build ...` in a loop and I could not observe any visible time difference either. Signed-off-by: Marc Herbert <[email protected]>
Woohoo! 🥳 Having this output within the build logs makes it easy for for various supply chain security activities to run cross-environment verification. (related) |
@@ -575,6 +575,7 @@ endif # BUILD_OUTPUT_UF2 | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marc-hb This change is very useful! Thanks.
Also for the incremental builds test, wouldn't make more sense to touch a header file like ./include/zephyr/kernel.h
for example.
@@ -575,6 +575,7 @@ endif # BUILD_OUTPUT_UF2 | |||
|
|||
config BUILD_OUTPUT_STRIPPED | |||
bool "Build a stripped binary" | |||
default y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not close a PR that are rejected and open a new PR with same changes, just hoping to get it approve a second time.
Please keep this change in #51737 and let's continue the discussion there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I explained why in #51737 why I closed it and I linked the two both ways - not hiding anything.
In #51737 I made the mistake of stopping half-way, not checksumming the stripped binary. As a consequence all the (few) reviewers missed the point and focused on the wrong thing that I wrongly put in the PR title: the stripped file. The stripped file is a pure, low-level and mostly invisible implementation detail that does not matter. I don't care about the stripped file, it's only a means to an end. Finding reproducibility bugs like #48195 and all the others listed in #14593 is the only thing that matters.
@@ -1679,6 +1679,7 @@ if(CONFIG_BUILD_OUTPUT_STRIPPED) | |||
$<TARGET_PROPERTY:bintools,strip_flag_infile>${KERNEL_ELF_NAME} | |||
$<TARGET_PROPERTY:bintools,strip_flag_outfile>${KERNEL_STRIP_NAME} | |||
$<TARGET_PROPERTY:bintools,strip_flag_final> | |||
COMMAND ${CMAKE_COMMAND} -E sha256sum ${KERNEL_STRIP_NAME} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm sure this is a very useful feature, just like printing of memory usage.
But still, printing of memory usage is user configurable, so why should sha256 checksum be always on ?
(I do note it's tied to strip)
Lines 462 to 463 in 261025a
config OUTPUT_PRINT_MEMORY_USAGE | |
bool "Print memory usage to stdout" |
Should this be a general option that users can enable, so that any output file, bin
, hex,
s19
, etc can have this ?
As a matter of fact, bin
files are created per default, and a checksum on those would be more safe than using the elf file for checksumming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. "Strip" should not be synonymous with "compute & print checksum" and vice versa.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be getting maybe needlessly contentious?
From my perspective this is a simple feature, a good thing, and an industry best practice that probably should be on by default for general apps. It seems like the big complaints are just about extra build complexity (one added artifact and a hash step at the end)? Maybe we could default it to =n if CONFIG_ZTEST? That would cover about 80% of the CI overhead. Or alternatively have twister/ci set things up to suppress the step?
Broadly: this is a feature for deployed apps. We have a CI so that we can exercise features for real apps, we don't architect our apps to make things easier for CI. :)
So... just to clarify for folks not familiar: Reproducible/deterministic builds[1] are a security paradigm. The idea is that the source code determines the output product in a provable way, such that people can audit the resulting binaries to detect any modification. That is, it closes the "inside actor" hole where someone with access to build infrastructure can do things like insert backdoors. It's a good thing, even if us hackers at the other end of the source chain will never touch it nor care. And for traditional software, it's actually really difficult to do. But as it happens, Zephyr does it pretty well already. I don't really see much reason not to have this on for normal apps. Users who don't understand it won't be harmed, and might just be saved if they keep their build logs. The only thing to argue here in my mind is polish: it would be nice if this had a name other than "zephyr.strip". It might be nice to define a more formal form of the output vs. just stuffing it in a build log and sha256sum file. It would be nice to have a section in the docs detailing "what you have to do to secure your reproducible build", etc... But really, we should merge this. Potentially with a workaround to turn it off in CI if it turns out to be a measurable performance problem. [1] You sometimes see the term "hermetic builds" or "hermeticity" in the same context, which refers to the practice of ensuring no "outside" software gets run in the context of a build (by analogy to a "hermetic seal" in food packaging). |
+1 that is exactly what I talked with Marc |
I do not see any problem with enabling this (printing checksum) by default. The part that is problematic is how it currently works -- i.e. why does enabling "Stripping" and "computing/printing checksum" are really two orthogonal features and should be controlled by separate configs. |
@@ -575,6 +575,7 @@ endif # BUILD_OUTPUT_UF2 | |||
|
|||
config BUILD_OUTPUT_STRIPPED | |||
bool "Build a stripped binary" | |||
default y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per other commit, rejecting as this is not something that should ever be enabled by default
please note, i'm not against the requested functionality, as I believe that is very useful. I dislike the current proposal. We should not just print info into log files just because someone needs this information and knows where to look for it. Also we should not link this to stripped elf files. We already create bin files per default: Lines 491 to 493 in b052143
It would be more correct to do a sha on the bin file, compared to doing it on the stripped elf, although I believe a different approach should be taken. Instead of producing stripped elf files per default and print sha as build information which must be parsed in build log, then we should really try to align how we collect such important information and provide it to users. In my opinion, a much better approach is to extend the build meta output to not only contain revision information, but extend it to contain information from the produced artifacts. Lines 629 to 644 in b052143
Extending this meta file with checksum of generated files will truly provide users with information on how to reproduce a build. That would be checksum of artifacts like:
are all important information which can go into the meta file, along the current revisions. that will give a uniform way of obtaining information for reproducing a build and the ability to verify that the build was indeed identical to the original. The knowledge of this meta file is even in the docs, although we can definitely be better at exposing this feature. |
Thanks everyone for the reviews. I've been drawn into some more urgent task, should get back to this next week at the latest. |
The discussion in zephyrproject-rtos/zephyr#51954 may take time and we need reproducibility last year (who doesn't?) Signed-off-by: Marc Herbert <[email protected]>
The discussion in zephyrproject-rtos/zephyr#51954 may take time and we need reproducibility last year (who doesn't?) Signed-off-by: Marc Herbert <[email protected]>
The discussion in zephyrproject-rtos/zephyr#51954 may take time and we need reproducibility last year (who doesn't?) Signed-off-by: Marc Herbert <[email protected]> (cherry picked from commit 945adb8)
The discussion in zephyrproject-rtos/zephyr#51954 may take time and we need reproducibility last year (who doesn't?) Signed-off-by: Marc Herbert <[email protected]> (cherry picked from commit 945adb8)
This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time. |
Make CONFIG_BUILD_OUTPUT_STRIPPED mandatory so we can always compare builds. It makes practically zero build space and time difference and has huge reproductibility value, see discussion in zephyrproject-rtos/zephyr#51954 Signed-off-by: Marc Herbert <[email protected]>
Make CONFIG_BUILD_OUTPUT_STRIPPED mandatory so we can always compare builds. It makes practically zero build space and time difference and has huge reproductibility value, see discussion in zephyrproject-rtos/zephyr#51954 Signed-off-by: Marc Herbert <[email protected]>
In theory, reproducing someone else's environment is easy. In practice it's often a nightmare. Extensive real-world experience shows that build logs is the only thing you can really rely on: Unless maybe you work in some ideal place where the build is ruled with an iron fist because the CEO is a former automation engineer? :-D
In theory, they're conceptually different. In practice they're the same. Thanks to stripping (and other things), Linux and Windows builds have been successfully compared in SOF(Zephyr) CI for 2 years now: thesofproject/sof@35dda2a That automation has successfully found a few build reproducibility issues in Zephyr, see 2023 and 2024 issues at the bottom of this long list: #14593 CONFIG_BUILD_OUTPUT_STRIPPED has been a fantastic success. It just works. |
Two commits, second commit message copied below.
tl;dr
zephyr.strip
is only a means to an end. Alternative ways to achieve the same purpose are welcome... as long as they don't require hundreds times more lines of code. This PR adds 2 lines.If 0.5% additional
build/
disk space is considered too much then one alternative is to deletezephyr.strip
immediately after checksumming it. However the easiest way to automate #50205 is to comparezephyr.strip
files.An earlier and incomplete version of this was previously (and negatively) reviewed by a few people in
Temporary bugs, corner cases and obsolete toolchains aside, the Zephyr
build is reproducible most of the time: #50205 and #14593
This means two different build machines using the same toolchain will
always produce the same binary output. These 2 lines make
it trivial to verify that binary outputs are indeed the same by adding this
single line in the build logs:
The second commit enables that feature by default because build
reproducibility matters for (at least) two important reasons:
https://reproducible-builds.org/ and many others.
reproduce elusive issues or when issuing releases.
It was of course already possible to manually make this Kconfig change
and manually compute this checksum. However this can be impossible when
dealing with an automated build system that does not archive all
intermediate (#5009) files like
zephyr.elf
. Tweaking the buildconfiguration can also be difficult and error-prone for people who are
not Zephyr developers.
Most automated CI systems preserve build logs by default.
Displaying the reproducible checksum by default accelerates the
discovery of reproducibility bugs like #48195.
When measured with
west build -p -b qemu_x86 samples/hello_world/
, theadditional
build/zephyr/zephyr.strip
disk space required is 43kilobytes compared to a total of 11 Megabytes. Measuring a more
realistic SOF example,
zephyr.strip
weighed 690 kb which was about0.1% of a total
build/
directory weighing 65M.To measure the build time cost I ran "west build -p -b qemu_x86
samples/hello_world/" many times in a loop with and without this PR on
my Linux workstation. Stripping and checksumming made literally no time
difference compared to the "noise" observed when building the same
configuration. This is not surprising considering how small
zephyr.strip
is: so the extra cost is most likely dominated by processcreation and the total number of processes created during a Zephyr build
dwarfs the few extra processes required by this feature.
More surprisingly, I measured incremental builds by running
touch kernel/timer.c; west build ...
in a loop and I could notobserve any visible time difference either.