Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/nginx: run nginx-config-reload as the nginx user and group #85820

Closed
wants to merge 1 commit into from

Conversation

flokli
Copy link
Contributor

@flokli flokli commented Apr 22, 2020

This unit currently runs ${execCommand} -t as root, which will cause
nginx to create files in /var/spool/nginx as the "nobody" user (which is
the default if no user/group is specified in the nginx config itself
(which is true in our case, as we run nginx as an unprivileged user and
don't give it a chance to create/chown files and folders for the nginx
unit at least.

Reported-By: @tazjin

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

@ofborg ofborg bot added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels Apr 22, 2020
@flokli flokli requested a review from nh2 April 22, 2020 23:09
@Izorkin
Copy link
Contributor

Izorkin commented Apr 23, 2020

Error reload:

systemd[1]: Starting nginx-config-reload.service...
sb398bf2mimvcnxn629wsc92v9a9h41v-unit-script-nginx-config-reload-start[5737]: nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
sb398bf2mimvcnxn629wsc92v9a9h41v-unit-script-nginx-config-reload-start[5737]: 2020/04/23 10:40:39 [emerg] 5745#5745: bind() to 0.0.0.0:80 failed (13: Permission denied)
sb398bf2mimvcnxn629wsc92v9a9h41v-unit-script-nginx-config-reload-start[5737]: nginx: configuration file /etc/nginx/nginx.conf test failed
systemd[1]: nginx-config-reload.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: nginx-config-reload.service: Failed with result 'exit-code'.
systemd[1]: Failed to start nginx-config-reload.service.

If add:

        # Capabilities
        AmbientCapabilities = [ "CAP_NET_BIND_SERVICE" ];

Error:

systemd[1]: Starting nginx-config-reload.service...
kfxsv78ji9cjjkjn0kflbhf5hlz336kq-unit-script-nginx-config-reload-start[6831]: nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
kfxsv78ji9cjjkjn0kflbhf5hlz336kq-unit-script-nginx-config-reload-start[6831]: nginx: configuration file /etc/nginx/nginx.conf test is successful
kfxsv78ji9cjjkjn0kflbhf5hlz336kq-unit-script-nginx-config-reload-start[6831]: Failed to reload nginx.service: Interactive authentication required.
kfxsv78ji9cjjkjn0kflbhf5hlz336kq-unit-script-nginx-config-reload-start[6831]: See system logs and 'systemctl status nginx.service' for details.
systemd[1]: nginx-config-reload.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: nginx-config-reload.service: Failed with result 'exit-code'.
systemd[1]: Failed to start nginx-config-reload.service.

@flokli flokli force-pushed the nginx-config-reload-user branch from 05acb0a to d1f1197 Compare April 23, 2020 08:27
@flokli
Copy link
Contributor Author

flokli commented Apr 23, 2020

I updated the unit. We now invoke systemctl reload nginx.service as root, and the check command (which creates the state files) as cfg.user. TBH, this is a scary unit in first place, and the nginx unit will be fixed to just handle proper reload on its own for master in a follow-up PR, but this is something also for backporting.

@flokli
Copy link
Contributor Author

flokli commented Apr 23, 2020

apparently, it seems like nginx -t also tries to bind to ports, which is a questionable thing to do while running as a separate user, and validating configuration before reloading the real server with the new configuration:

webserver # Apr 23 10:20:33 webserver nginx[1094]: 2020/04/23 10:20:33 [emerg] 1094#1094: bind() to 0.0.0.0:80 failed (13: Permission denied)

I don't really think it should do this, especially not while "the previous" nginx is still running. @Izorkin agreed to open a bugreport upstream.

@danbst
Copy link
Contributor

danbst commented Apr 25, 2020

@flokli would AmbientCapabilities = [ "CAP_NET_BIND_SERVICE" ]; fix the issue in mean time, as @Izorkin proposed?

@Izorkin
Copy link
Contributor

Izorkin commented Apr 26, 2020

@flokli check this variant:

diff --git a/nixos/modules/services/web-servers/nginx/default.nix b/nixos/modules/services/web-servers/nginx/default.nix
index 8d49dc66eb1..21bedc93df8 100644
--- a/nixos/modules/services/web-servers/nginx/default.nix
+++ b/nixos/modules/services/web-servers/nginx/default.nix
@@ -729,7 +729,12 @@ in
       script = ''
         if ${pkgs.systemd}/bin/systemctl -q is-active nginx.service ; then
           ${execCommand} -t && \
-            ${pkgs.systemd}/bin/systemctl reload nginx.service
+            ${pkgs.systemd}/bin/systemctl reload nginx.service && \
+            chown ${cfg.user}:${cfg.group} '${cfg.stateDir}/client_body_temp' && \
+            chown ${cfg.user}:${cfg.group} '${cfg.stateDir}/fastcgi_temp' && \
+            chown ${cfg.user}:${cfg.group} '${cfg.stateDir}/proxy_temp' && \
+            chown ${cfg.user}:${cfg.group} '${cfg.stateDir}/scgi_temp' && \
+            chown ${cfg.user}:${cfg.group} '${cfg.stateDir}/uwsgi_temp'
         fi
       '';
       serviceConfig.RemainAfterExit = true;

Ticket closed :( - https://trac.nginx.org/nginx/ticket/1960
Manual add custom patch?

@flokli
Copy link
Contributor Author

flokli commented Apr 27, 2020

@flokli would AmbientCapabilities = [ "CAP_NET_BIND_SERVICE" ]; fix the issue in mean time, as @Izorkin proposed?

This won't help. The nginx-config-reload unit runs nginx -t if the real nginx is already running. So even if it'd have CAP_NET_BIND_SERVICE, it still couldn't bind, as the socket is already in use by the real nginx process.

Also, on @Izorkin 's proposed alternative: This will still chown state files around, while the real nginx is still running.

I'm more in favor of not executing nginx -t at all, and just handling nginx reloads via the original systemctl reload nginx.service. What was the reason for this separate unit in first place?

@Izorkin
Copy link
Contributor

Izorkin commented Apr 28, 2020

Variant - add this patch:

diff --git a/src/core/ngx_connection.c b/src/core/ngx_connection.c
index 3368253..1cbfe46 100644
--- a/src/core/ngx_connection.c
+++ b/src/core/ngx_connection.c
@@ -601,21 +601,12 @@ ngx_open_listening_sockets(ngx_cycle_t *cycle)
             if (bind(s, ls[i].sockaddr, ls[i].socklen) == -1) {
                 err = ngx_socket_errno;

-                if (err != NGX_EADDRINUSE || !ngx_test_config) {
-                    ngx_log_error(NGX_LOG_EMERG, log, err,
-                                  "bind() to %V failed", &ls[i].addr_text);
-                }
-
                 if (ngx_close_socket(s) == -1) {
                     ngx_log_error(NGX_LOG_EMERG, log, ngx_socket_errno,
                                   ngx_close_socket_n " %V failed",
                                   &ls[i].addr_text);
                 }

-                if (err != NGX_EADDRINUSE) {
-                    return NGX_ERROR;
-                }
-
                 if (!ngx_test_config) {
                     failed = 1;
                 }

@flokli
Copy link
Contributor Author

flokli commented Apr 28, 2020

Variant - add this patch:

nginx would still chown state files around (which we could fix by setting user and group in the nginx config file as well), but this patch will also just prevent nginx from doing error reporting if it legitimately tries to bind to a socket.

What about just using systemctl reload nginx.service to reload nginx, and skip the check, which is more than a check anyways?

@aanderse
Copy link
Member

I don't think we should be shipping an nginx with strange patches to accommodate new features done of our developers would like to see upstream, but upstream has rejected. It's one thing to patch software for nix specific quirks, but this is entirely different.

@flokli flokli force-pushed the nginx-config-reload-user branch from d1f1197 to 4538beb Compare April 28, 2020 11:24
@flokli flokli marked this pull request as draft April 28, 2020 11:57
@flokli flokli force-pushed the nginx-config-reload-user branch from 4538beb to de6d696 Compare April 28, 2020 17:25
@flokli
Copy link
Contributor Author

flokli commented Apr 28, 2020

I pushed a new commit that handles the reload differently:

nixos/nginx: run nginx-config-reload as the nginx user and group

The "nginx-config-reload" script previously ran nginx -t as root,
causing nginx to create files in /var/spool/nginx as the "nobody" user
(which is the default if no user/group is specified in the nginx config
itself).

As further investigation showed 1, nginx -t does not only create
files, but also tries to bind to sockets - which is a terrible thing to
do to while just checking the configuration, which is why we removed the
-t command from the nginx-config-reload command entirely, effectively
aliasing it to systemctl reload nginx.service.

Instead of all that, we now invoke nginx -s reload in the ExecReload
command of nginx.service, which will move the unit into a failed
state (but not stop the webserver).

These failures are currently not properly propagated to the
switch-to-configuration script, but systemctl is-failed nginx will
return 0, and monitoring should detect this.

@danbst, @aanderse, can you take a look?

@flokli flokli marked this pull request as ready for review April 28, 2020 17:28
@Izorkin
Copy link
Contributor

Izorkin commented Apr 28, 2020

Worked in test server.

The "nginx-config-reload" script previously ran `nginx -t` as root,
causing nginx to create files in /var/spool/nginx as the "nobody" user
(which is the default if no user/group is specified in the nginx config
itself).

As further investigation showed [1], `nginx -t` does not only create
files, but also tries to bind to sockets - which is a terrible thing to
do to while just checking the configuration, which is why we removed the
-t command from the nginx-config-reload command entirely, effectively
aliasing it to systemctl reload nginx.service.

Instead of all that, we now invoke `nginx -s reload` in the `ExecReload`
command of `nginx.service`, which will move the unit into a failed
state (but not stop the webserver).

These failures are currently not properly propagated to the
switch-to-configuration script, but `systemctl is-failed nginx` will
return 0, and monitoring should detect this.

[1]: https://trac.nginx.org/nginx/ticket/1506

Reported-By: Vincent Ambo <[email protected]>
@flokli flokli force-pushed the nginx-config-reload-user branch from de6d696 to ef78ca2 Compare April 29, 2020 10:49
@nh2
Copy link
Contributor

nh2 commented May 1, 2020

Thanks for looking into this!

@flokli
Copy link
Contributor Author

flokli commented May 1, 2020

Fixing the reload behaviour needs a bit more work - I wrote down something in #49528 (comment).

@ajs124
Copy link
Member

ajs124 commented Jun 18, 2020

it still couldn't bind, as the socket is already in use by the real nginx process

This isn't true if you've enabled SO_REUSEPORT, which is sort of recommended for efficiency, etc. Although it being able to bind probably only makes things worse.

Thanks for looking into this, though!

@Mic92
Copy link
Member

Mic92 commented Aug 12, 2020

I have this alternative: #95249

@flokli
Copy link
Contributor Author

flokli commented Aug 12, 2020

Closing in favor of #95249.

@flokli flokli closed this Aug 12, 2020
@flokli flokli deleted the nginx-config-reload-user branch August 12, 2020 11:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants