meta issue for reproducible builds (was: tests/determinism.sh) #14593

marc-hb · 2019-03-15T23:34:08Z

These tests are in a "work4me" state. They all proved valuable because they all found issues and all passed a fair number of times in some configuration(s)/environment(s). However for either passing or finding issues they may require:

magic test parameters
fixes not merged or not submitted yet
additional tools like disorderedfs
additional test steps like comparing checksums across different environments

I'm sharing this very early draft for two purposes:

Provide more detailed reproduction information and transparency for related fixes. There's only so much test code that can be put in a commit message.
Gather feedback and ideas about what could be an acceptable, high-level test design so CI can hopefully catch the most basic regressions some day. Please don't spend time reviewing any implementation detail because I doubt the final test design (if any) will look similar to this prototype.

Here a list of fixes that some past, present or future version of (some of) these tests have helped with:

diffoscope MR 29 Catch failures to disassemble and rescue all other differences
diffoscope issue 64 Remove elf.StaticLibFile: it's a serious design flaw

Just a prototype to get feedback. Already identifying many issues and testing their fixes. Signed-off-by: Marc Herbert <[email protected]>

zephyrbot · 2019-03-15T23:40:23Z

Found the following issues, please fix and resubmit:

Codeowners issues

New files added that are not covered in CODEOWNERS:

tests/determinism.sh

Please add one or more entries in the CODEOWNERS file to cover those files

checkpatch issues

-:244: ERROR:TRAILING_WHITESPACE: trailing whitespace
#244: FILE: tests/determinism.sh:238:
+    ninja -C "$bld"  # obj_list # kobj_types_h_target $

- total: 1 errors, 0 warnings, 264 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

NOTE: Whitespace errors detected.
      You may wish to use scripts/cleanpatch or scripts/cleanfile

Your patch has style problems, please review.

NOTE: Ignored message types: AVOID_EXTERNS BRACES CONFIG_EXPERIMENTAL CONST_STRUCT DATE_TIME FILE_PATH_CHANGES MINMAX NETWORKING_BLOCK_COMMENT_STYLE PRINTK_WITHOUT_KERN_LEVEL SPLIT_STRING VOLATILE

NOTE: If any of the errors are false positives, please report
      them to the maintainers.

nashif · 2019-03-16T02:05:52Z

determinism might be a bit confusing name, how about reproducible_builds or something that is closer to what this is about.

carlescufi

Just a general comment, without getting into the particulars of this change (which by the way looks pretty interesting as a concept to me). Can we write this in Python? This would be good for 2 reasons:

Most people contributing to Zephyr are more or less comfortable with Python at this point. Less so with bash.
It would work on Windows, which is a supported platform

marc-hb · 2019-03-18T17:00:47Z

Thanks @carlescufi , yes python shouldn't be a problem. Just a bit more verbose that's all :-)

But even before looking at the programming language I would like ideas and directions about where this could/should fit in the grand testing architecture scheme of things. Like: still sitting on top of sanitycheck like this? Or as additional --feature(s) embedded inside sanitycheck itself instead? Other? If not embedded in sanitycheck then how could CI run the most basic of these tests and catch the most basic regressions? Additional github check?

nashif · 2019-03-28T18:58:36Z

But even before looking at the programming language I would like ideas and directions about where this could/should fit in the grand testing architecture scheme of things. Like: still sitting on top of sanitycheck like this? Or as additional --feature(s) embedded inside sanitycheck itself instead? Other? If not embedded in sanitycheck then how could CI run the most basic of these tests and catch the most basic regressions? Additional github check?

this will run on its own in a nightly job, not per PR and not with check_compliance.

marc-hb · 2019-03-28T19:48:12Z

Thanks @nashif , this answers my main question.

Here's how I see the next steps:

Share here my most recent updates and fixes to this shell script
Rewrite it in tests/build_reproduction.py and submit a new PR
Archive this shell script PR.

nashif · 2019-09-20T02:43:15Z

are you planning for this to be merged?

marc-hb · 2019-09-20T03:14:20Z

No, I will push another one. This script has pretty much been entirely rewritten. I will just keep using the description for bookmarks

EDIT 3 years later: I never shared the newer version. It's been bitrotting on a private branch somewhere. The high-level logic was still the same:

build once
change as many things as possible
build again
diff

Temporary bugs, corner cases and obsolete toolchains aside, the Zephyr build is most of the time reproducible: zephyrproject-rtos#50205 and zephyrproject-rtos#14593. This means two different build machines using the same toolchain will always produce the same binary output. The one-line addition in this commit makes it trivial to verify that binary outputs are indeed the same by adding a single checksum line in the build logs: ``` [16/16] Linking C executable zephyr/zephyr.elf Memory region Used Size Region Size %age Used RAM: 53280 B 3 MB 1.69% IDT_LIST: 0 GB 2 KB 0.00% fdd2ddf2ad7d5da5bbd79b41cef...7b16ef549a8281111d8e205 zephyr.strip ``` This commit makes a non-measurable build time difference. Build reproducibility matters for (at least) two important reasons: - Security / supply chain attacks, see https://www.cisa.gov/sbom, zephyrproject-rtos#50205, https://reproducible-builds.org/ and many others. - Making sure build configurations are strictly identical when trying to reproduce elusive issues or when issuing releases. Displaying a reproducible checksum accelerates the investigation of temporary reproducibility issues like zephyrproject-rtos#48195. Signed-off-by: Marc Herbert <[email protected]>

Temporary bugs, corner cases and obsolete toolchains aside, the Zephyr build is reproducible most of the time: zephyrproject-rtos#50205 and zephyrproject-rtos#14593 This means two different build machines using the same toolchain will always produce the same binary output. The previous, one-line commit made it trivial to verify that binary outputs are indeed the same by adding this single line in the buid logs: ``` [16/16] Linking C executable zephyr/zephyr.elf Memory region Used Size Region Size %age Used RAM: 53280 B 3 MB 1.69% IDT_LIST: 0 GB 2 KB 0.00% fdd2ddf2ad7d5da5bbd79b41cef8d7...1a896b989a8281111d8e205 zephyr.strip ``` This commit enables that feature by default because build reproducibility matters for (at least) two important reasons: - Security / supply chain attacks, see https://www.cisa.gov/sbom, zephyrproject-rtos#50205, https://reproducible-builds.org/ and many others. - Making sure build configurations are strictly identical when trying to reproduce elusive issues or when issuing releases. It was of course already possible to _manually_ make this Kconfig change and manually compute this checksum. However this can be impossible when dealing with an automated build system that does not archive all _intermediate_ (zephyrproject-rtos#5009) files like `zephyr.elf`. Tweaking the build configuration can also be difficult and error-prone for people who are not Zephyr developers. Most automated CI systems preserve build logs by default. Displaying the reproducible checksum by default accelerates the discovery of reproducibility bugs like zephyrproject-rtos#48195. When measured with `west build -p -b qemu_x86 samples/hello_world/`, the additional `build/zephyr/zephyr.strip` disk space required is 43 kilobytes compared to a total of 11 Megabytes. Measuring a more realistic SOF example, `zephyr.strip` weighed 690 kb which was about 0.1% of a total `build/` directory weighing 65M. To measure the build time cost I ran `west build -p -b qemu_x86 samples/hello_world/` many times in a loop with and without this PR on my Linux workstation. Stripping and checksumming made literally no time difference compared to the "noise" observed when building the same configuration. This is not surprising considering how small `zephyr.strip`: so the extra cost is most likely dominated by process creation and the total number of processes created during a Zephyr build dwarfs the few extra processes required by this feature. More surprisingly, I measured incremental builds by running `touch kernel/timer.c; west build ...` in a loop and I could not observe any visible time difference either. Signed-off-by: Marc Herbert <[email protected]>

tests/determinism.sh: very early prototype

19ce610

Just a prototype to get feedback. Already identifying many issues and testing their fixes. Signed-off-by: Marc Herbert <[email protected]>

marc-hb requested review from SebastianBoe, andrewboie, carlescufi, galak, nashif and andyross March 16, 2019 01:00

marc-hb requested a review from ulfalizer March 16, 2019 16:29

carlescufi reviewed Mar 18, 2019

View reviewed changes

marc-hb requested a review from cinlyooi-intel March 21, 2019 06:21

marc-hb added Feature Request A request for a new feature area: Build System TSC Topics that need TSC discussion Needs review This PR needs attention from Zephyr's maintainers labels Mar 21, 2019

marc-hb requested a review from mbolivar March 22, 2019 04:59

nashif removed the TSC Topics that need TSC discussion label Mar 27, 2019

marc-hb mentioned this pull request Mar 28, 2019

West documentation for v1.14 #14983

Merged

nashif mentioned this pull request Mar 28, 2019

script: test framework to validate the reproducible builds #11523

Closed

marc-hb removed the Needs review This PR needs attention from Zephyr's maintainers label Apr 1, 2019

nashif removed Feature Request A request for a new feature labels Apr 17, 2019

This was referenced Apr 30, 2019

cmake: clang: Support host's clang for non-MCU x86 targets on Linux #14077

Merged

Provide build number in include/generated/version.h #1333

Closed

marc-hb mentioned this pull request Jun 7, 2019

Path handling needs overhaul zephyrproject-rtos/west#273

Closed

marc-hb mentioned this pull request Jun 29, 2019

tests: new sanitycheck tag "emu_time"; increase some timeouts #17107

Closed

marc-hb mentioned this pull request Jul 16, 2019

Compile binutils with --enable-deterministic-archives zephyrproject-rtos/sdk-ng#81

Closed

nashif added the Stale PR label Sep 20, 2019

marc-hb closed this Sep 20, 2019

marc-hb changed the title ~~tests/determinism.sh: very early prototype~~ meta issue for reproducible builds (was: tests/determinism.sh) Oct 2, 2019

marc-hb mentioned this pull request Oct 2, 2019

Add support for non-recursive single-toolchain multi-image builds #13672

Closed

25 tasks

keith-zephyr mentioned this pull request Sep 13, 2022

Verify builds are reproducible in the CI #50205

Open

marc-hb mentioned this pull request Sep 21, 2022

Generated linker scripts break when ZEPHYR_BASE and ZEPHYR_MODULES share structure that contains symlinks #50284

Closed

marc-hb mentioned this pull request Nov 4, 2022

cmake: compute and display the reproducible checksum by default #51954

Closed

marc-hb mentioned this pull request Mar 20, 2024

list_hardware.py: sort rglob(SOC_YML) HWMv2 results #70132

Merged

marc-hb mentioned this pull request Jun 27, 2024

cmake: fix relative path calculate error #74710

Closed

marc-hb mentioned this pull request Dec 19, 2024

scripts/xtensa-build-zephyr.py: Allow for alternate toolchain versions thesofproject/sof#9736

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

meta issue for reproducible builds (was: tests/determinism.sh) #14593

meta issue for reproducible builds (was: tests/determinism.sh) #14593

marc-hb commented Mar 15, 2019 •

edited

Loading

zephyrbot commented Mar 15, 2019

nashif commented Mar 16, 2019

carlescufi left a comment

marc-hb commented Mar 18, 2019

nashif commented Mar 28, 2019

marc-hb commented Mar 28, 2019

nashif commented Sep 20, 2019

marc-hb commented Sep 20, 2019 •

edited

Loading

meta issue for reproducible builds (was: tests/determinism.sh) #14593

meta issue for reproducible builds (was: tests/determinism.sh) #14593

Conversation

marc-hb commented Mar 15, 2019 • edited Loading

zephyrbot commented Mar 15, 2019

Codeowners issues

checkpatch issues

nashif commented Mar 16, 2019

carlescufi left a comment

Choose a reason for hiding this comment

marc-hb commented Mar 18, 2019

nashif commented Mar 28, 2019

marc-hb commented Mar 28, 2019

nashif commented Sep 20, 2019

marc-hb commented Sep 20, 2019 • edited Loading

marc-hb commented Mar 15, 2019 •

edited

Loading

marc-hb commented Sep 20, 2019 •

edited

Loading