-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
meta issue for reproducible builds (was: tests/determinism.sh) #14593
Conversation
Just a prototype to get feedback. Already identifying many issues and testing their fixes. Signed-off-by: Marc Herbert <[email protected]>
Found the following issues, please fix and resubmit: Codeowners issues
checkpatch issues
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a general comment, without getting into the particulars of this change (which by the way looks pretty interesting as a concept to me). Can we write this in Python? This would be good for 2 reasons:
- Most people contributing to Zephyr are more or less comfortable with Python at this point. Less so with bash.
- It would work on Windows, which is a supported platform
Thanks @carlescufi , yes python shouldn't be a problem. Just a bit more verbose that's all :-) But even before looking at the programming language I would like ideas and directions about where this could/should fit in the grand testing architecture scheme of things. Like: still sitting on top of sanitycheck like this? Or as additional --feature(s) embedded inside sanitycheck itself instead? Other? If not embedded in sanitycheck then how could CI run the most basic of these tests and catch the most basic regressions? Additional github check? |
this will run on its own in a nightly job, not per PR and not with check_compliance. |
Thanks @nashif , this answers my main question. Here's how I see the next steps:
|
are you planning for this to be merged? |
No, I will push another one. This script has pretty much been entirely rewritten. I will just keep using the description for bookmarks EDIT 3 years later: I never shared the newer version. It's been bitrotting on a private branch somewhere. The high-level logic was still the same:
|
Temporary bugs, corner cases and obsolete toolchains aside, the Zephyr build is most of the time reproducible: zephyrproject-rtos#50205 and zephyrproject-rtos#14593. This means two different build machines using the same toolchain will always produce the same binary output. The one-line addition in this commit makes it trivial to verify that binary outputs are indeed the same by adding a single checksum line in the build logs: ``` [16/16] Linking C executable zephyr/zephyr.elf Memory region Used Size Region Size %age Used RAM: 53280 B 3 MB 1.69% IDT_LIST: 0 GB 2 KB 0.00% fdd2ddf2ad7d5da5bbd79b41cef...7b16ef549a8281111d8e205 zephyr.strip ``` This commit makes a non-measurable build time difference. Build reproducibility matters for (at least) two important reasons: - Security / supply chain attacks, see https://www.cisa.gov/sbom, zephyrproject-rtos#50205, https://reproducible-builds.org/ and many others. - Making sure build configurations are strictly identical when trying to reproduce elusive issues or when issuing releases. Displaying a reproducible checksum accelerates the investigation of temporary reproducibility issues like zephyrproject-rtos#48195. Signed-off-by: Marc Herbert <[email protected]>
Temporary bugs, corner cases and obsolete toolchains aside, the Zephyr build is reproducible most of the time: zephyrproject-rtos#50205 and zephyrproject-rtos#14593 This means two different build machines using the same toolchain will always produce the same binary output. The previous, one-line commit made it trivial to verify that binary outputs are indeed the same by adding this single line in the buid logs: ``` [16/16] Linking C executable zephyr/zephyr.elf Memory region Used Size Region Size %age Used RAM: 53280 B 3 MB 1.69% IDT_LIST: 0 GB 2 KB 0.00% fdd2ddf2ad7d5da5bbd79b41cef8d7...1a896b989a8281111d8e205 zephyr.strip ``` This commit enables that feature by default because build reproducibility matters for (at least) two important reasons: - Security / supply chain attacks, see https://www.cisa.gov/sbom, zephyrproject-rtos#50205, https://reproducible-builds.org/ and many others. - Making sure build configurations are strictly identical when trying to reproduce elusive issues or when issuing releases. It was of course already possible to _manually_ make this Kconfig change and manually compute this checksum. However this can be impossible when dealing with an automated build system that does not archive all _intermediate_ (zephyrproject-rtos#5009) files like `zephyr.elf`. Tweaking the build configuration can also be difficult and error-prone for people who are not Zephyr developers. Most automated CI systems preserve build logs by default. Displaying the reproducible checksum by default accelerates the discovery of reproducibility bugs like zephyrproject-rtos#48195. When measured with `west build -p -b qemu_x86 samples/hello_world/`, the additional `build/zephyr/zephyr.strip` disk space required is 43 kilobytes compared to a total of 11 Megabytes. Measuring a more realistic SOF example, `zephyr.strip` weighed 690 kb which was about 0.1% of a total `build/` directory weighing 65M. To measure the build time cost I ran `west build -p -b qemu_x86 samples/hello_world/` many times in a loop with and without this PR on my Linux workstation. Stripping and checksumming made literally no time difference compared to the "noise" observed when building the same configuration. This is not surprising considering how small `zephyr.strip`: so the extra cost is most likely dominated by process creation and the total number of processes created during a Zephyr build dwarfs the few extra processes required by this feature. More surprisingly, I measured incremental builds by running `touch kernel/timer.c; west build ...` in a loop and I could not observe any visible time difference either. Signed-off-by: Marc Herbert <[email protected]>
These tests are in a "work4me" state. They all proved valuable because they all found issues and all passed a fair number of times in some configuration(s)/environment(s). However for either passing or finding issues they may require:
I'm sharing this very early draft for two purposes:
Provide more detailed reproduction information and transparency for related fixes. There's only so much test code that can be put in a commit message.
Gather feedback and ideas about what could be an acceptable, high-level test design so CI can hopefully catch the most basic regressions some day. Please don't spend time reviewing any implementation detail because I doubt the final test design (if any) will look similar to this prototype.
Here a list of fixes that some past, present or future version of (some of) these tests have helped with:
cmake: more deterministic git describe --abbrev=12 #13608
scripts/*syscalls.py: sort os.walk() for a more deterministic build #13446
Sort a few python dicts for deterministic builds, even with python 3.5 and earlier. #14119
Sort a few python dicts for deterministic builds, even with python 3.5 and earlier. #14119
git.cmake: let the environment override BUILD_VERSION #14594
host-gcc/target.cmake: stop discarding CMAKE_<tool>_FLAGS for x86 #14595
sanitycheck: don't generate the top-level Makefile in random order #14775
extensions.cmake: need unique strings, not random ones #14776
zephyr_library_compile_options(): de-duplicate #15094
file2hex.py: switch from gzip.compress() to GzipFile() #14804
file2hex.py: new --gzip-mtime option that defaults to zero + test #15043
cmake: atomic rename to fix toolchain cache creation race #15128
sanitycheck: order results.csv and discards.csv deterministically #15313
cmake: zephyr_cc_option(-fmacro-prefix-map=${ZEPHYR_BASE}=.) #15376
gen_kobject_list.py: better comments and --help. Zero code change. #15578
qemu_x86 's SeaBIOS clears the screen every time it runs #11717
extensions.cmake: don't leak absolute paths in snippets-*.ld comment #16368
sanitycheck: CONFIG_TEST_USERSPACE / userspace tag cleanup #16466
CMakeLists.txt: -fmacro-prefix-map=${CMAKE_SOURCE_DIR}=CMAKE_SOURCE_DIR #16536
BOOT_BANNER: show KERNEL_VERSION and BUILD_VERSION differently #16917
tests: minor FCB re-ordering not to leave a random flash.bin behind #16965
cmake: strip_nondeterminism_if_found() on all .a library files #17494
strip-nondeterminism MR4 ar.pm: Don't corrupt tables of symbols and long filenames
toolchain/zephyr: invoke ar with -D for deterministic .a files #17495
scripts: dts: Output paths relative to $ZEPHYR_BASE #19444
cmake: log ${CMAKE_VERSION} #53286
scripts/build: make struct_tags.json deterministic #66009
elf_parser.py: make dependency graph output deterministic #66014
list_hardware.py: sort rglob(SOC_YML) HWMv2 results #70132
diffoscope MR 29 Catch failures to disassemble and rescue all other differences
diffoscope issue 64 Remove elf.StaticLibFile: it's a serious design flaw