Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

separateDebugInfo causes impurity (ca-derivations) #151475

Open
Mindavi opened this issue Dec 20, 2021 · 9 comments
Open

separateDebugInfo causes impurity (ca-derivations) #151475

Mindavi opened this issue Dec 20, 2021 · 9 comments

Comments

@Mindavi
Copy link
Contributor

Mindavi commented Dec 20, 2021

Describe the bug

Note that this bug is specific for content-addressed derivations. When input-addressed, it's a given that introducing env vars that don't do anything will give a different output.

Due to separateDebugInfo inserting a build-id based on the (input-addressed) paths that are used, enabling that option makes a package (that's built using the nixUnstable ca-derivations feature) non-reproducible based on the environment.

E.g. if an environment variable (that isn't used in the build) is set, this causes the build-id to change and thus also the content-addressed package.

A way to resolve this is to not make the build-id based on the contents (+ the path, presumably), but to set it manually to a value (maybe a sha256 of only the contents?). I'm not 100% sure how that should work, but it does not seem too hard to implement.

This is related to NixOS/nix#5220 as well. Hydra is probably doing something it shouldn't, but regardless this should be fixed too.

Steps To Reproduce

Steps to reproduce the behavior:

  1. Enable the ca-derivations experimental feature.
  2. Download this zip with a reproducer: content-address-unreproducible-debug-info.zip
  3. nix-shell -p diffoscope --run "diffoscope $(nix-build 1.nix) $(nix-build 2.nix)"
  4. Observe different Build ID (and store paths).

Expected behavior

Build is reproducible and produces the same hash when unrelated environment variables are introduced, even with enableDebugInfo on (in content-addressed mode).

Additional context

Resolving this will improve reproducibility of important derivations, since the ones with enableDebugInfo are typically very low-level core libraries.

Since the Build ID is dependent on the input-addressed path, any change in stdenv / dependencies may cause the build to give a different result, even though the only difference is the Build ID. I think this is undesirable.

Notify maintainers

@regnat -> so you're aware of this issue before rolling out ca-derivations in nixpkgs. Thanks for the great work you've done and are doing!

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 5.15.7, NixOS, 22.05 (Quokka)`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.5.0pre20211206_d1aaa7e`
 - channels(root): `"nixos-22.05pre335103.6daa4a5c045, nixpkgs-22.05pre335103.6daa4a5c045"`
 - channels(rick): `""`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`
@Artturin
Copy link
Member

The build ID is for associating debug files with their active binaries. They are deterministically generated from the binaries themselves. So while it may look like a "root cause", i think you've got the causality reversed -- this is a symptom, and not a cause. As long as any variation exists in the binaries, the build ID will differ. Once the binaries are concretely reproducible, the build ID will become reproducible too.

https://www.mail-archive.com/[email protected]/msg00734.html

@Mindavi
Copy link
Contributor Author

Mindavi commented Dec 21, 2021

Also this is where it's actually added: https://github.com/NixOS/nixpkgs/blob/master/pkgs/build-support/bintools-wrapper/ld-wrapper.sh#L233-L237.

I do think that the rewriting of store paths has to do with this, since they're 'input-addressed' before rewriting, and the linking (thus insertion of build-id) happens before rewriting store paths. Thus the linker takes the input-addressed store path that's embedded (and possibly the folder location too?) and uses that to generate the build-id.

Nix then goes on to rewrite the path to be content-addressed, but since the input-addressed path was used as input for the build-id, the 'build path' leaks into the build-id.

@Artturin
Copy link
Member

@Mindavi
Copy link
Contributor Author

Mindavi commented Dec 21, 2021

Hmm, I think this is not the build path, but the output path that leaks into the build id (since it's in the rpath of the so, at least). The output path is rewritten after generating the build id. I wonder if setting that ffile-prefix-map option would help here. I'll try it out.

I might need to add some more examples of where this is relevant and where it isn't (or try to explain it better).

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jun 19, 2022
@Mindavi
Copy link
Contributor Author

Mindavi commented Jun 30, 2022

Haven't really looked into this much anymore, but would still make ca-derivations a lot more useful. Hopefully we can find something.

I remember trying some things (based on suggestions here) and it not leading anywhere. But I didn't document any of it.

@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jun 30, 2022
@trofi
Copy link
Contributor

trofi commented Sep 13, 2022

My understanding of the issue: minor changes to .nix expression cause Build ID to change. Example package is glibc (common dependency for almost everything on linux). Build ID is calculated based on output binary (before binary is stripped or rewritten into content-addressed). When debug info is enabled (say, via separateDebugInfo) Build ID frequently includes $out.

Example reproducer:

$ nix build --out-link ca-default --impure --expr 'with import ./. { config = { contentAddressedByDefault = true; }; }; glibc.out'
$ nix build --out-link ca-tweaked --impure --expr 'with import ./. { config = { contentAddressedByDefault = true; }; }; (glibc.overrideAttrs (oa: { UNUSED = "42";  })).out'
$ diffoscope ca-default/lib/librt.so.1 ca-tweaked/lib/librt.so.1
--- ca-default/lib/librt.so.1
+++ ca-tweaked/lib/librt.so.1
│┄ File has been modified after NT_GNU_BUILD_ID has been applied.
├── readelf --wide --notes {}
│ @@ -1,12 +1,12 @@
│
│  Displaying notes found in: .note.gnu.property
│    Owner                Data size     Description
│    GNU                  0x00000010    NT_GNU_PROPERTY_TYPE_0        Properties: x86 ISA needed: x86-64-baseline
│
│  Displaying notes found in: .note.gnu.build-id
│    Owner                Data size     Description
│ -  GNU                  0x00000014    NT_GNU_BUILD_ID (unique build ID bitstring)         Build ID: 99f5469d20ff97ca606870565f7804f786725ed2
│ +  GNU                  0x00000014    NT_GNU_BUILD_ID (unique build ID bitstring)         Build ID: f00a9c4a6154d73e2c955d821a8c573a67156fbf
│
│  Displaying notes found in: .note.ABI-tag
│    Owner                Data size     Description
│    GNU                  0x00000010    NT_GNU_ABI_TAG (ABI version tag)            OS: Linux, ABI: 2.6.32

If we compare debug info:

$ nix build --out-link ca-default --impure --expr 'with import ./. { config = { contentAddressedByDefault = true; }; }; glibc.debug'
$ nix build --out-link ca-tweaked --impure --expr 'with import ./. { config = { contentAddressedByDefault = true; }; }; (glibc.overrideAttrs (oa: { UNUSED = "42";  })).debug'

$ diffoscope ca-default-debug/lib/debug/librt.so.1 ca-tweaked-debug/lib/debug/librt.so.1
--- ca-default-debug/lib/debug/librt.so.1
+++ ca-tweaked-debug/lib/debug/librt.so.1
│┄ symlink
@@ -1 +1 @@
-destination: .build-id/99/f5469d20ff97ca606870565f7804f786725ed2.debug
+destination: .build-id/f0/0a9c4a6154d73e2c955d821a8c573a67156fbf.debug

$ diffoscope ca-default-debug/lib/debug/.build-id/99/f5469d20ff97ca606870565f7804f786725ed2.debug ca-tweaked-debug/lib/debug/.build-id/f0/0a9c4a6154d73e2c955d821a8c573a67156fbf.debug
--- ca-default-debug/lib/debug/.build-id/99/f5469d20ff97ca606870565f7804f786725ed2.debug
+++ ca-tweaked-debug/lib/debug/.build-id/f0/0a9c4a6154d73e2c955d821a8c573a67156fbf.debug
...
│ -  [   f5d]  GNU C11 8.3.0 -mtune=generic -march=x86-64 -g -ggdb -O2 -std=gnu11 -fno-strict-overflow -fgnu89-inline -fmerge-all-constants -frounding-math -fstack-protector-strong -fno-common -fmath-errno -fPIC -fcf-protection=full -fexceptions -fasynchronous-unwind-tables -frandom-seed=0glmzz47d1
│ +  [   ccd]  GNU C11 8.3.0 -mtune=generic -march=x86-64 -g -ggdb -O2 -std=gnu11 -fno-strict-overflow -fgnu89-inline -fmerge-all-constants -frounding-math -fstack-protector-strong -fno-common -fmath-errno -fPIC -fcf-protection=full -fexceptions -fasynchronous-unwind-tables -frandom-seed=fygjilgk60

Here we see that gcc uses different random seeds. It causes different symbol order in the binary.

I think this unexpected -frandom-seed comes from pkgs/build-support/setup-hooks/reproducible-builds.sh:

# Use the last part of the out path as hash input for the build.
# This should ensure that it is deterministic across rebuilds of the same
# derivation and not easily collide with other builds.
# We also truncate the hash so that it cannot cause reference cycles.
NIX_CFLAGS_COMPILE="${NIX_CFLAGS_COMPILE:-} -frandom-seed=$(
    outbase="${out##*/}"
    randomseed="${outbase:0:10}"
    echo $randomseed
)"
export NIX_CFLAGS_COMPILE

I would say it would be safer to set -frandom-seed= to a constant, like ${pname}-${version} if it exists. Or even set it universally to a specific constant like -frandom-seed=not-that-random

@trofi
Copy link
Contributor

trofi commented Sep 13, 2022

The following hack makes it slightly better (outputs are now almost identical), but Build ID still is not stable (presumably because it gets calculated before $out is overwritten with CA store path).

The hack:

--- a/pkgs/build-support/setup-hooks/reproducible-builds.sh
+++ b/pkgs/build-support/setup-hooks/reproducible-builds.sh
@@ -5,6 +5,7 @@
 NIX_CFLAGS_COMPILE="${NIX_CFLAGS_COMPILE:-} -frandom-seed=$(
     outbase="${out##*/}"
     randomseed="${outbase:0:10}"
+    randomseed=not-random-at-all
     echo $randomseed
 )"
 export NIX_CFLAGS_COMPILE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Other
Development

No branches or pull requests

3 participants