Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stdenv: default fixupPhase bails if setuid files got installed #300635

Closed
vcunat opened this issue Apr 1, 2024 · 6 comments
Closed

stdenv: default fixupPhase bails if setuid files got installed #300635

vcunat opened this issue Apr 1, 2024 · 6 comments
Assignees
Labels
0.kind: bug Something is broken 0.kind: regression Something that worked before working no longer 6.topic: stdenv Standard environment
Milestone

Comments

@vcunat
Copy link
Member

vcunat commented Apr 1, 2024

Example: https://hydra.nixos.org/build/254601361/nixlog/1/tail
Details:

++ chmod -R u+w /nix/store/wwxn8yz0376ailcnj18a0mbp9kf30r4z-lxc-5.0.3
chmod: changing permissions of '/nix/store/wwxn8yz0376ailcnj18a0mbp9kf30r4z-lxc-5.0.3/libexec/lxc/lxc-user-nic': Operation not permitted

The permissions

-rwsr-xr-x 1 nixbld1 nixbld 1271576 $date /nix/store/6g9fhcrc4iwcf4x4l1vsya15441n8lhi-lxc-5.0.3/libexec/lxc/lxc-user-nic

I'm baffled why this started, i.e. which commit triggered the change (can't see relevant change in setup hooks or coreutils), though it can surely get fixed even without knowing that. Introduced somewhere during staging* (PR #298548).

Note: obviously, nix will strip any set*id bits when registering the result into the nix store anyway, so the point is only to avoid the failure.

@vcunat vcunat added 0.kind: bug Something is broken 0.kind: regression Something that worked before working no longer 6.topic: stdenv Standard environment labels Apr 1, 2024
@vcunat
Copy link
Member Author

vcunat commented Apr 1, 2024

I assume it would work to do something like

--- a/pkgs/stdenv/generic/setup.sh
+++ b/pkgs/stdenv/generic/setup.sh
@@ -1424 +1424 @@ fixupPhase() {
-        if [ -e "${!output}" ]; then chmod -R u+w "${!output}"; fi
+        if [ -e "${!output}" ]; then chmod -R u+w,u-s,g-s "${!output}"; fi

@lf-
Copy link
Member

lf- commented Apr 2, 2024

oh hi, this is caused by nix having an extremely silly seccomp policy that blocks chmod with the sticky bit. the thing is though, it is blatantly possible to bypass (consider .... the open syscall, which it doesn't block at all), and has zero security purpose. we should honestly just delete this failed attempt at sandboxing.

https://github.com/NixOS/nix/blob/8b16cced18925aa612049d08d5e78eccbf0530e4/src/libstore/build/local-derivation-goal.cc#L1650-L1663

The policy is supposed to stop you from creating setuid files, which normal users are allowed to do and it will let other users become them. However, a) it doesn't work and b) these perms are supposed to get wiped after the build finishes anyway and c) doesn't present any obvious security exposure to me. (i would be ... overall suspicious of security holes in the sandbox around this stuff, but i don't think there is one here)

@vcunat
Copy link
Member Author

vcunat commented Apr 2, 2024

Let me copy from stdenv chat:

With kernel ≥6.6 and glibc 2.39, lxc's install phase uses fchmodat2, which slips through https://github.com/NixOS/nix/blob/9b88e5284608116b7db0dbd3d5dd7a33b90d52d7/src/libstore/build/local-derivation-goal.cc#L1650-L1663. The fixupPhase then uses fchmodat, which fails.
With older kernel or glibc, setting the suid bit fails in the install phase, which is not treated as fatal, and then the fixup phase does not try to set it again.

@Ma27
Copy link
Member

Ma27 commented Apr 6, 2024

Ouch sorry for that.

So, the immediate fallout (failing builds) seems fixed. However, there's a certain danger of this coming up again in different scenarios (6.6 will be the default on 24.05), so I'd file a bug against the Nix repo (haven't seen any so far, but will double-check). I agree with @lf- that we can probably drop the entire filtering, but if that leads to too much discussion, I'll probably file a patch that adds fchmodat2 as short-term solution because I'd rather not have the core issue in 24.05.

@vcunat
Copy link
Member Author

vcunat commented Apr 10, 2024

So for now the patch above?
#303049

vcunat added a commit that referenced this issue Apr 12, 2024
See #300635.  Maybe in time we'll have a better solution.
Ma27 added a commit to Ma27/nix that referenced this issue Apr 14, 2024
With Linux kernel >=6.6 & glibc 2.39 a `fchmodat2(2)` is available that
isn't filtered away by the libseccomp sandbox.

Being able to use this to bypass that restriction has surprising results
for some builds such as lxc[1]:

> With kernel ≥6.6 and glibc 2.39, lxc's install phase uses fchmodat2,
> which slips through https://github.com/NixOS/nix/blob/9b88e5284608116b7db0dbd3d5dd7a33b90d52d7/src/libstore/build/local-derivation-goal.cc#L1650-L1663.
> The fixupPhase then uses fchmodat, which fails.
> With older kernel or glibc, setting the suid bit fails in the
> install phase, which is not treated as fatal, and then the
> fixup phase does not try to set it again.

Please note that there are still ways to bypass this sandbox[2] and this is
mostly a fix for the breaking builds.

This change works by checking if seccomp is new enough to support
fchmodat2 (this is the case with the combination glibc 2.39,
libseccomp 2.5.5 & Linux 6.6 from nixos-unstable). Since the project's
flake uses 23.11, the code isn't compiled into the Nix built by this
flake. I tested the change by adding the diff below as patch to
`pkgs/tools/package-management/nix/common.nix` & then built a VM from
the following config using my dirty nixpkgs master:

    {
      vm = { pkgs, ... }: {
        virtualisation.writableStore = true;
        virtualisation.memorySize = 8192;
        virtualisation.diskSize = 12 * 1024;
        nix.package = pkgs.nixVersions.nix_2_21;
      };
    }

The original issue can be triggered via

    nix build -L github:nixos/nixpkgs/d6dc19adbda4fd92fe9a332327a8113eaa843894#lxc \
      --extra-experimental-features 'nix-command flakes'

however the problem disappears with this patch applied.

Closes NixOS#10424

[1] NixOS/nixpkgs#300635 (comment)
[2] NixOS/nixpkgs#300635 (comment)
Ma27 added a commit to Ma27/nix that referenced this issue Apr 14, 2024
With Linux kernel >=6.6 & glibc 2.39 a `fchmodat2(2)` is available that
isn't filtered away by the libseccomp sandbox.

Being able to use this to bypass that restriction has surprising results
for some builds such as lxc[1]:

> With kernel ≥6.6 and glibc 2.39, lxc's install phase uses fchmodat2,
> which slips through https://github.com/NixOS/nix/blob/9b88e5284608116b7db0dbd3d5dd7a33b90d52d7/src/libstore/build/local-derivation-goal.cc#L1650-L1663.
> The fixupPhase then uses fchmodat, which fails.
> With older kernel or glibc, setting the suid bit fails in the
> install phase, which is not treated as fatal, and then the
> fixup phase does not try to set it again.

Please note that there are still ways to bypass this sandbox[2] and this is
mostly a fix for the breaking builds.

This change works by checking if seccomp is new enough to support
fchmodat2 (this is the case with the combination glibc 2.39,
libseccomp 2.5.5 & Linux 6.6 from nixos-unstable). Since the project's
flake uses 23.11, the code isn't compiled into the Nix built by this
flake. I tested the change by adding the diff below as patch to
`pkgs/tools/package-management/nix/common.nix` & then built a VM from
the following config using my dirty nixpkgs master:

    {
      vm = { pkgs, ... }: {
        virtualisation.writableStore = true;
        virtualisation.memorySize = 8192;
        virtualisation.diskSize = 12 * 1024;
        nix.package = pkgs.nixVersions.nix_2_21;
      };
    }

The original issue can be triggered via

    nix build -L github:nixos/nixpkgs/d6dc19adbda4fd92fe9a332327a8113eaa843894#lxc \
      --extra-experimental-features 'nix-command flakes'

however the problem disappears with this patch applied.

Closes NixOS#10424

[1] NixOS/nixpkgs#300635 (comment)
[2] NixOS/nixpkgs#300635 (comment)
Ma27 added a commit to Ma27/nix that referenced this issue Apr 15, 2024
With Linux kernel >=6.6 & glibc 2.39 a `fchmodat2(2)` is available that
isn't filtered away by the libseccomp sandbox.

Being able to use this to bypass that restriction has surprising results
for some builds such as lxc[1]:

> With kernel ≥6.6 and glibc 2.39, lxc's install phase uses fchmodat2,
> which slips through https://github.com/NixOS/nix/blob/9b88e5284608116b7db0dbd3d5dd7a33b90d52d7/src/libstore/build/local-derivation-goal.cc#L1650-L1663.
> The fixupPhase then uses fchmodat, which fails.
> With older kernel or glibc, setting the suid bit fails in the
> install phase, which is not treated as fatal, and then the
> fixup phase does not try to set it again.

Please note that there are still ways to bypass this sandbox[2] and this is
mostly a fix for the breaking builds.

This change works by creating a syscall filter for the `fchmodat2`
syscall (number 452 on most systems). The problem is that glibc 2.39
and seccomp 2.5.5 are needed to have the correct syscall number available
via `__NR_fchmodat2` / `__SNR_fchmodat2`, but this flake is still on
nixpkgs 23.11. To have this change everywhere and not dependent on the
glibc this package is built against, I added a header
"fchmodat2-compat.hh" that sets the syscall number based on the
architecture. On most platforms its 452 according to glibc with a few
exceptions:

    $ rg --pcre2 'define __NR_fchmodat2 (?!452)'
    sysdeps/unix/sysv/linux/x86_64/x32/arch-syscall.h
    58:#define __NR_fchmodat2 1073742276

    sysdeps/unix/sysv/linux/mips/mips64/n32/arch-syscall.h
    67:#define __NR_fchmodat2 6452

    sysdeps/unix/sysv/linux/mips/mips64/n64/arch-syscall.h
    62:#define __NR_fchmodat2 5452

    sysdeps/unix/sysv/linux/mips/mips32/arch-syscall.h
    70:#define __NR_fchmodat2 4452

    sysdeps/unix/sysv/linux/alpha/arch-syscall.h
    59:#define __NR_fchmodat2 562

I tested the change by adding the diff below as patch to
`pkgs/tools/package-management/nix/common.nix` & then built a VM from
the following config using my dirty nixpkgs master:

    {
      vm = { pkgs, ... }: {
        virtualisation.writableStore = true;
        virtualisation.memorySize = 8192;
        virtualisation.diskSize = 12 * 1024;
        nix.package = pkgs.nixVersions.nix_2_21;
      };
    }

The original issue can be triggered via

    nix build -L github:nixos/nixpkgs/d6dc19adbda4fd92fe9a332327a8113eaa843894#lxc \
      --extra-experimental-features 'nix-command flakes'

however the problem disappears with this patch applied.

Closes NixOS#10424

[1] NixOS/nixpkgs#300635 (comment)
[2] NixOS/nixpkgs#300635 (comment)
Ma27 added a commit to Ma27/nix that referenced this issue Apr 15, 2024
With Linux kernel >=6.6 & glibc 2.39 a `fchmodat2(2)` is available that
isn't filtered away by the libseccomp sandbox.

Being able to use this to bypass that restriction has surprising results
for some builds such as lxc[1]:

> With kernel ≥6.6 and glibc 2.39, lxc's install phase uses fchmodat2,
> which slips through https://github.com/NixOS/nix/blob/9b88e5284608116b7db0dbd3d5dd7a33b90d52d7/src/libstore/build/local-derivation-goal.cc#L1650-L1663.
> The fixupPhase then uses fchmodat, which fails.
> With older kernel or glibc, setting the suid bit fails in the
> install phase, which is not treated as fatal, and then the
> fixup phase does not try to set it again.

Please note that there are still ways to bypass this sandbox[2] and this is
mostly a fix for the breaking builds.

This change works by creating a syscall filter for the `fchmodat2`
syscall (number 452 on most systems). The problem is that glibc 2.39
and seccomp 2.5.5 are needed to have the correct syscall number available
via `__NR_fchmodat2` / `__SNR_fchmodat2`, but this flake is still on
nixpkgs 23.11. To have this change everywhere and not dependent on the
glibc this package is built against, I added a header
"fchmodat2-compat.hh" that sets the syscall number based on the
architecture. On most platforms its 452 according to glibc with a few
exceptions:

    $ rg --pcre2 'define __NR_fchmodat2 (?!452)'
    sysdeps/unix/sysv/linux/x86_64/x32/arch-syscall.h
    58:#define __NR_fchmodat2 1073742276

    sysdeps/unix/sysv/linux/mips/mips64/n32/arch-syscall.h
    67:#define __NR_fchmodat2 6452

    sysdeps/unix/sysv/linux/mips/mips64/n64/arch-syscall.h
    62:#define __NR_fchmodat2 5452

    sysdeps/unix/sysv/linux/mips/mips32/arch-syscall.h
    70:#define __NR_fchmodat2 4452

    sysdeps/unix/sysv/linux/alpha/arch-syscall.h
    59:#define __NR_fchmodat2 562

I tested the change by adding the diff below as patch to
`pkgs/tools/package-management/nix/common.nix` & then built a VM from
the following config using my dirty nixpkgs master:

    {
      vm = { pkgs, ... }: {
        virtualisation.writableStore = true;
        virtualisation.memorySize = 8192;
        virtualisation.diskSize = 12 * 1024;
        nix.package = pkgs.nixVersions.nix_2_21;
      };
    }

The original issue can be triggered via

    nix build -L github:nixos/nixpkgs/d6dc19adbda4fd92fe9a332327a8113eaa843894#lxc \
      --extra-experimental-features 'nix-command flakes'

however the problem disappears with this patch applied.

Closes NixOS#10424

[1] NixOS/nixpkgs#300635 (comment)
[2] NixOS/nixpkgs#300635 (comment)
Ma27 added a commit to Ma27/nix that referenced this issue Apr 15, 2024
With Linux kernel >=6.6 & glibc 2.39 a `fchmodat2(2)` is available that
isn't filtered away by the libseccomp sandbox.

Being able to use this to bypass that restriction has surprising results
for some builds such as lxc[1]:

> With kernel ≥6.6 and glibc 2.39, lxc's install phase uses fchmodat2,
> which slips through https://github.com/NixOS/nix/blob/9b88e5284608116b7db0dbd3d5dd7a33b90d52d7/src/libstore/build/local-derivation-goal.cc#L1650-L1663.
> The fixupPhase then uses fchmodat, which fails.
> With older kernel or glibc, setting the suid bit fails in the
> install phase, which is not treated as fatal, and then the
> fixup phase does not try to set it again.

Please note that there are still ways to bypass this sandbox[2] and this is
mostly a fix for the breaking builds.

This change works by creating a syscall filter for the `fchmodat2`
syscall (number 452 on most systems). The problem is that glibc 2.39
and seccomp 2.5.5 are needed to have the correct syscall number available
via `__NR_fchmodat2` / `__SNR_fchmodat2`, but this flake is still on
nixpkgs 23.11. To have this change everywhere and not dependent on the
glibc this package is built against, I added a header
"fchmodat2-compat.hh" that sets the syscall number based on the
architecture. On most platforms its 452 according to glibc with a few
exceptions:

    $ rg --pcre2 'define __NR_fchmodat2 (?!452)'
    sysdeps/unix/sysv/linux/x86_64/x32/arch-syscall.h
    58:#define __NR_fchmodat2 1073742276

    sysdeps/unix/sysv/linux/mips/mips64/n32/arch-syscall.h
    67:#define __NR_fchmodat2 6452

    sysdeps/unix/sysv/linux/mips/mips64/n64/arch-syscall.h
    62:#define __NR_fchmodat2 5452

    sysdeps/unix/sysv/linux/mips/mips32/arch-syscall.h
    70:#define __NR_fchmodat2 4452

    sysdeps/unix/sysv/linux/alpha/arch-syscall.h
    59:#define __NR_fchmodat2 562

I tested the change by adding the diff below as patch to
`pkgs/tools/package-management/nix/common.nix` & then built a VM from
the following config using my dirty nixpkgs master:

    {
      vm = { pkgs, ... }: {
        virtualisation.writableStore = true;
        virtualisation.memorySize = 8192;
        virtualisation.diskSize = 12 * 1024;
        nix.package = pkgs.nixVersions.nix_2_21;
      };
    }

The original issue can be triggered via

    nix build -L github:nixos/nixpkgs/d6dc19adbda4fd92fe9a332327a8113eaa843894#lxc \
      --extra-experimental-features 'nix-command flakes'

however the problem disappears with this patch applied.

Closes NixOS#10424

[1] NixOS/nixpkgs#300635 (comment)
[2] NixOS/nixpkgs#300635 (comment)
Ma27 added a commit to Ma27/nix that referenced this issue Apr 18, 2024
With Linux kernel >=6.6 & glibc 2.39 a `fchmodat2(2)` is available that
isn't filtered away by the libseccomp sandbox.

Being able to use this to bypass that restriction has surprising results
for some builds such as lxc[1]:

> With kernel ≥6.6 and glibc 2.39, lxc's install phase uses fchmodat2,
> which slips through https://github.com/NixOS/nix/blob/9b88e5284608116b7db0dbd3d5dd7a33b90d52d7/src/libstore/build/local-derivation-goal.cc#L1650-L1663.
> The fixupPhase then uses fchmodat, which fails.
> With older kernel or glibc, setting the suid bit fails in the
> install phase, which is not treated as fatal, and then the
> fixup phase does not try to set it again.

Please note that there are still ways to bypass this sandbox[2] and this is
mostly a fix for the breaking builds.

This change works by creating a syscall filter for the `fchmodat2`
syscall (number 452 on most systems). The problem is that glibc 2.39
and seccomp 2.5.5 are needed to have the correct syscall number available
via `__NR_fchmodat2` / `__SNR_fchmodat2`, but this flake is still on
nixpkgs 23.11. To have this change everywhere and not dependent on the
glibc this package is built against, I added a header
"fchmodat2-compat.hh" that sets the syscall number based on the
architecture. On most platforms its 452 according to glibc with a few
exceptions:

    $ rg --pcre2 'define __NR_fchmodat2 (?!452)'
    sysdeps/unix/sysv/linux/x86_64/x32/arch-syscall.h
    58:#define __NR_fchmodat2 1073742276

    sysdeps/unix/sysv/linux/mips/mips64/n32/arch-syscall.h
    67:#define __NR_fchmodat2 6452

    sysdeps/unix/sysv/linux/mips/mips64/n64/arch-syscall.h
    62:#define __NR_fchmodat2 5452

    sysdeps/unix/sysv/linux/mips/mips32/arch-syscall.h
    70:#define __NR_fchmodat2 4452

    sysdeps/unix/sysv/linux/alpha/arch-syscall.h
    59:#define __NR_fchmodat2 562

I tested the change by adding the diff below as patch to
`pkgs/tools/package-management/nix/common.nix` & then built a VM from
the following config using my dirty nixpkgs master:

    {
      vm = { pkgs, ... }: {
        virtualisation.writableStore = true;
        virtualisation.memorySize = 8192;
        virtualisation.diskSize = 12 * 1024;
        nix.package = pkgs.nixVersions.nix_2_21;
      };
    }

The original issue can be triggered via

    nix build -L github:nixos/nixpkgs/d6dc19adbda4fd92fe9a332327a8113eaa843894#lxc \
      --extra-experimental-features 'nix-command flakes'

however the problem disappears with this patch applied.

Closes NixOS#10424

[1] NixOS/nixpkgs#300635 (comment)
[2] NixOS/nixpkgs#300635 (comment)
@vcunat
Copy link
Member Author

vcunat commented Apr 19, 2024

The workaround is in nixpkgs master now, and nix master has the added seccomp rule as well, so I'll call this solved.

@vcunat vcunat closed this as completed Apr 19, 2024
Ma27 added a commit to Ma27/nix that referenced this issue Apr 25, 2024
With Linux kernel >=6.6 & glibc 2.39 a `fchmodat2(2)` is available that
isn't filtered away by the libseccomp sandbox.

Being able to use this to bypass that restriction has surprising results
for some builds such as lxc[1]:

> With kernel ≥6.6 and glibc 2.39, lxc's install phase uses fchmodat2,
> which slips through https://github.com/NixOS/nix/blob/9b88e5284608116b7db0dbd3d5dd7a33b90d52d7/src/libstore/build/local-derivation-goal.cc#L1650-L1663.
> The fixupPhase then uses fchmodat, which fails.
> With older kernel or glibc, setting the suid bit fails in the
> install phase, which is not treated as fatal, and then the
> fixup phase does not try to set it again.

Please note that there are still ways to bypass this sandbox[2] and this is
mostly a fix for the breaking builds.

This change works by creating a syscall filter for the `fchmodat2`
syscall (number 452 on most systems). The problem is that glibc 2.39
and seccomp 2.5.5 are needed to have the correct syscall number available
via `__NR_fchmodat2` / `__SNR_fchmodat2`, but this flake is still on
nixpkgs 23.11. To have this change everywhere and not dependent on the
glibc this package is built against, I added a header
"fchmodat2-compat.hh" that sets the syscall number based on the
architecture. On most platforms its 452 according to glibc with a few
exceptions:

    $ rg --pcre2 'define __NR_fchmodat2 (?!452)'
    sysdeps/unix/sysv/linux/x86_64/x32/arch-syscall.h
    58:#define __NR_fchmodat2 1073742276

    sysdeps/unix/sysv/linux/mips/mips64/n32/arch-syscall.h
    67:#define __NR_fchmodat2 6452

    sysdeps/unix/sysv/linux/mips/mips64/n64/arch-syscall.h
    62:#define __NR_fchmodat2 5452

    sysdeps/unix/sysv/linux/mips/mips32/arch-syscall.h
    70:#define __NR_fchmodat2 4452

    sysdeps/unix/sysv/linux/alpha/arch-syscall.h
    59:#define __NR_fchmodat2 562

I tested the change by adding the diff below as patch to
`pkgs/tools/package-management/nix/common.nix` & then built a VM from
the following config using my dirty nixpkgs master:

    {
      vm = { pkgs, ... }: {
        virtualisation.writableStore = true;
        virtualisation.memorySize = 8192;
        virtualisation.diskSize = 12 * 1024;
        nix.package = pkgs.nixVersions.nix_2_21;
      };
    }

The original issue can be triggered via

    nix build -L github:nixos/nixpkgs/d6dc19adbda4fd92fe9a332327a8113eaa843894#lxc \
      --extra-experimental-features 'nix-command flakes'

however the problem disappears with this patch applied.

Closes NixOS#10424

[1] NixOS/nixpkgs#300635 (comment)
[2] NixOS/nixpkgs#300635 (comment)

(cherry picked from commit ba68045)
lf- pushed a commit to lix-project/lix that referenced this issue May 4, 2024
With Linux kernel >=6.6 & glibc 2.39 a `fchmodat2(2)` is available that
isn't filtered away by the libseccomp sandbox.

Being able to use this to bypass that restriction has surprising results
for some builds such as lxc[1]:

> With kernel ≥6.6 and glibc 2.39, lxc's install phase uses fchmodat2,
> which slips through https://github.com/NixOS/nix/blob/9b88e5284608116b7db0dbd3d5dd7a33b90d52d7/src/libstore/build/local-derivation-goal.cc#L1650-L1663.
> The fixupPhase then uses fchmodat, which fails.
> With older kernel or glibc, setting the suid bit fails in the
> install phase, which is not treated as fatal, and then the
> fixup phase does not try to set it again.

Please note that there are still ways to bypass this sandbox[2] and this is
mostly a fix for the breaking builds.

This change works by creating a syscall filter for the `fchmodat2`
syscall (number 452 on most systems). The problem is that glibc 2.39
is needed to have the correct syscall number available via
`__NR_fchmodat2` / `__SNR_fchmodat2`, but this flake is still on
nixpkgs 23.11. To have this change everywhere and not dependent on the
glibc this package is built against, I added a header
"fchmodat2-compat.hh" that sets the syscall number based on the
architecture. On most platforms its 452 according to glibc with a few
exceptions:

    $ rg --pcre2 'define __NR_fchmodat2 (?!452)'
    sysdeps/unix/sysv/linux/x86_64/x32/arch-syscall.h
    58:#define __NR_fchmodat2 1073742276

    sysdeps/unix/sysv/linux/mips/mips64/n32/arch-syscall.h
    67:#define __NR_fchmodat2 6452

    sysdeps/unix/sysv/linux/mips/mips64/n64/arch-syscall.h
    62:#define __NR_fchmodat2 5452

    sysdeps/unix/sysv/linux/mips/mips32/arch-syscall.h
    70:#define __NR_fchmodat2 4452

    sysdeps/unix/sysv/linux/alpha/arch-syscall.h
    59:#define __NR_fchmodat2 562

I added a small regression-test to the setuid integration-test that
attempts to set the suid bit on a file using the fchmodat2 syscall.
I confirmed that the test fails without the change in
local-derivation-goal.

Additionally, we require libseccomp 2.5.5 or greater now: as it turns
out, libseccomp maintains an internal syscall table and
validates each rule against it. This means that when using libseccomp
2.5.4 or older, one may pass `452` as syscall number against it, but
since it doesn't exist in the internal structure, `libseccomp` will refuse
to create a filter for that. This happens with nixpkgs-23.11, i.e. on
stable NixOS and when building Lix against the project's flake.

To work around that

* a backport of libseccomp 2.5.5 on upstream nixpkgs has been
  scheduled[3].

* the package now uses libseccomp 2.5.5 on its own already. This is to
  provide a quick fix since the correct fix for 23.11 is still a staging cycle
  away.

We still need the compat header though since `SCMP_SYS(fchmodat2)`
internally transforms this into `__SNR_fchmodat2` which points to
`__NR_fchmodat2` from glibc 2.39, so it wouldn't build on glibc 2.38.
The updated syscall table from libseccomp 2.5.5 is NOT used for that
step, but used later, so we need both, our compat header and their
syscall table 🤷

Relevant PRs in CppNix:

* NixOS/nix#10591
* NixOS/nix#10501

[1] NixOS/nixpkgs#300635 (comment)
[2] NixOS/nixpkgs#300635 (comment)
[3] NixOS/nixpkgs#306070

(cherry picked from commit ba68045)
Change-Id: I6921ab5a363188c6bff617750d00bb517276b7fe
lf- pushed a commit to lix-project/lix that referenced this issue May 4, 2024
With Linux kernel >=6.6 & glibc 2.39 a `fchmodat2(2)` is available that
isn't filtered away by the libseccomp sandbox.

Being able to use this to bypass that restriction has surprising results
for some builds such as lxc[1]:

> With kernel ≥6.6 and glibc 2.39, lxc's install phase uses fchmodat2,
> which slips through https://github.com/NixOS/nix/blob/9b88e5284608116b7db0dbd3d5dd7a33b90d52d7/src/libstore/build/local-derivation-goal.cc#L1650-L1663.
> The fixupPhase then uses fchmodat, which fails.
> With older kernel or glibc, setting the suid bit fails in the
> install phase, which is not treated as fatal, and then the
> fixup phase does not try to set it again.

Please note that there are still ways to bypass this sandbox[2] and this is
mostly a fix for the breaking builds.

This change works by creating a syscall filter for the `fchmodat2`
syscall (number 452 on most systems). The problem is that glibc 2.39
is needed to have the correct syscall number available via
`__NR_fchmodat2` / `__SNR_fchmodat2`, but this flake is still on
nixpkgs 23.11. To have this change everywhere and not dependent on the
glibc this package is built against, I added a header
"fchmodat2-compat.hh" that sets the syscall number based on the
architecture. On most platforms its 452 according to glibc with a few
exceptions:

    $ rg --pcre2 'define __NR_fchmodat2 (?!452)'
    sysdeps/unix/sysv/linux/x86_64/x32/arch-syscall.h
    58:#define __NR_fchmodat2 1073742276

    sysdeps/unix/sysv/linux/mips/mips64/n32/arch-syscall.h
    67:#define __NR_fchmodat2 6452

    sysdeps/unix/sysv/linux/mips/mips64/n64/arch-syscall.h
    62:#define __NR_fchmodat2 5452

    sysdeps/unix/sysv/linux/mips/mips32/arch-syscall.h
    70:#define __NR_fchmodat2 4452

    sysdeps/unix/sysv/linux/alpha/arch-syscall.h
    59:#define __NR_fchmodat2 562

I added a small regression-test to the setuid integration-test that
attempts to set the suid bit on a file using the fchmodat2 syscall.
I confirmed that the test fails without the change in
local-derivation-goal.

Additionally, we require libseccomp 2.5.5 or greater now: as it turns
out, libseccomp maintains an internal syscall table and
validates each rule against it. This means that when using libseccomp
2.5.4 or older, one may pass `452` as syscall number against it, but
since it doesn't exist in the internal structure, `libseccomp` will refuse
to create a filter for that. This happens with nixpkgs-23.11, i.e. on
stable NixOS and when building Lix against the project's flake.

To work around that

* a backport of libseccomp 2.5.5 on upstream nixpkgs has been
  scheduled[3].

* the package now uses libseccomp 2.5.5 on its own already. This is to
  provide a quick fix since the correct fix for 23.11 is still a staging cycle
  away.

We still need the compat header though since `SCMP_SYS(fchmodat2)`
internally transforms this into `__SNR_fchmodat2` which points to
`__NR_fchmodat2` from glibc 2.39, so it wouldn't build on glibc 2.38.
The updated syscall table from libseccomp 2.5.5 is NOT used for that
step, but used later, so we need both, our compat header and their
syscall table 🤷

Relevant PRs in CppNix:

* NixOS/nix#10591
* NixOS/nix#10501

[1] NixOS/nixpkgs#300635 (comment)
[2] NixOS/nixpkgs#300635 (comment)
[3] NixOS/nixpkgs#306070

(cherry picked from commit ba68045)
Change-Id: I6921ab5a363188c6bff617750d00bb517276b7fe
lf- pushed a commit to lix-project/lix that referenced this issue Jul 26, 2024
Previously, system call filtering (to prevent builders from storing files with
setuid/setgid permission bits or extended attributes) was performed using a
blocklist. While this looks simple at first, it actually carries significant
security and maintainability risks: after all, the kernel may add new syscalls
to achieve the same functionality one is trying to block, and it can even be
hard to actually add the syscall to the blocklist when building against a C
library that doesn't know about it yet. For a recent demonstration of this
happening in practice to Nix, see the introduction of fchmodat2 [0] [1].

The allowlist approach does not share the same drawback. While it does require
a rather large list of harmless syscalls to be maintained in the codebase,
failing to update this list (and roll out the update to all users) in time has
rather benign effects; at worst, very recent programs that already rely on new
syscalls will fail with an error the same way they would on a slightly older
kernel that doesn't support them yet. Most importantly, no unintended new ways
of performing dangerous operations will be silently allowed.

Another possible drawback is reduced system call performance due to the larger
filter created by the allowlist requiring more computation [2]. However, this
issue has not convincingly been demonstrated yet in practice, for example in
systemd or various browsers. To the contrary, it has been measured that the the
actual filter constructed here has approximately the same overhead as a very
simple filter blocking only one system call.

This commit tries to keep the behavior as close to unchanged as possible. The
system call list is in line with libseccomp 2.5.5 and glibc 2.39, which are the
latest versions at the point of writing. Since libseccomp 2.5.5 is already a
requirement and the distributions shipping this together with older versions of
glibc are mostly not a thing any more, this should not lead to more build
failures any more.

[0] NixOS/nixpkgs#300635
[1] NixOS/nix#10424
[2] flatpak/flatpak#4462 (comment)

Change-Id: I541be3ea9b249bcceddfed6a5a13ac10b11e16ad
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken 0.kind: regression Something that worked before working no longer 6.topic: stdenv Standard environment
Projects
None yet
Development

No branches or pull requests

3 participants