Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixosTests.ldap is broken #86305

Closed
flokli opened this issue Apr 29, 2020 · 8 comments
Closed

nixosTests.ldap is broken #86305

flokli opened this issue Apr 29, 2020 · 8 comments
Labels
0.kind: bug Something is broken 6.topic: testing Tooling for automated testing of packages and modules

Comments

@flokli
Copy link
Contributor

flokli commented Apr 29, 2020

nixosTests.ldap fails to run - the login attempts fail to connect to the ldap server, exponentially backoff, and timeout eventually.

After some investigation, it seems openldap.service on the server fails to start (We don't wait for openldap.service it in the test, so it's not clearly visible):

invalid config directory /var/db/slapd, error 2
slaptest: bad configuration directory!

It could have been broken by 382b0aa.

I opened the openldap module and test, and saw a lot of scary stuff w.r.t. how the configuration is handled there. On top of that, the whole preStart is entirely overwritten by nixosTests.ldap - making me ask myself if what's intended to be done there should probably be moved into the module…

I'd also like to do some refactoring there to use RuntimeDirectory= and StateDirectory= to create these directories - but I need to understand the usecases, and maybe see some real-world setups.

Pinging some possible users: @Mic92, @fpletz @dasJ @tfc.

@flokli flokli added the 0.kind: bug Something is broken label Apr 29, 2020
@dasJ
Copy link
Member

dasJ commented Apr 29, 2020

Okay, we don't actually use the openldap module in our infrastructure, we have our own slapd module. But since I know some minor details about slapd, let me explain why I didn't bother using the upstream module.

First off, slapd supports multiple databases. While this seems like a minor detail, it's important for the OLC. slapd identifies the databases by their suffix (like dc=example,dc=com). If the suffix is the hardcoded value of cn=config, this database becomes the OLC (Online configuration). This database allows configuration changes without any downtime and deprecates the configuration file (iirc).
The OLC situation is the main difference, as we have a slapd.service and a slapd-olc.service which uses the standard LDAP tools and writes to the OLC without slapd going down.
The OLC is created with a custom prestart:

        set -e
        set -u
        configDir='/var/lib/${cfg.dataDirName}/{0}config'
        if ! [ -d "$configDir/cn=config" ]; then
          # We use PrivateTmp anyway
          configFile=/tmp/config
          touch "$configFile"
          chmod 600 "$configFile"

          if ! [ -r "${cfg.olcAdminPwFile}" ]; then
            echo "${cfg.olcAdminPwFile} is missing"
            exit 1
          fi
          pw="$(cat "${cfg.olcAdminPwFile}" | tr -d '[:space:]')"

          (
            echo "database config"
            echo "rootdn \"${cfg.olcAdminDN}\""
            echo "rootpw \"$pw\""
          ) > "$configFile"

          ${pkgs.openldap.out}/bin/slaptest -F "$configDir" -f "$configFile"
          rm -f "$configFile"
        fi

As you can see from this example, we store our databases in /var/lib (hence we are able to use StateDirectory and store it in a structure supporting multiple databases. Also, the preStart contains no slaptest for configuration tests (just for OLC creation) because syntax checks are actually performed by slapd when the OLC data is written using ldapmodify.
If you attempt to write an invalid ACL, for example, slapd fails with an implementation-specific error code.

For systemd option (apart from my regular confinement options), these may be of interest:

{
        ProtectSystem = "strict"; # No /var/db
        PrivateTmp = true; # For olc creation
        StateDirectory = map (db: "${cfg.dataDirName}/${db}") (remove "{-1}frontend" (attrNames (filterAttrs (k: v: v.needsDirectory) allDatabases))); # Ensure databases
        RuntimeDirectory = "slapd"; # For the ldapi socket

        User = "openldap"; # Do not start as root at all!
        Group = "openldap";

        CapabilityBoundingSet = "CAP_NET_BIND_SERVICE";
        AmbientCapabilities = "CAP_NET_BIND_SERVICE";
}

Hope this helps. While I doubt I will ever have the motivation to upstream my module, I can certainly give you the needed information and code snippets to create an upstream-worthy version of it ;)

@veprbl veprbl added the 6.topic: testing Tooling for automated testing of packages and modules label Apr 30, 2020
@flokli
Copy link
Contributor Author

flokli commented May 5, 2020

I need to pass on this one. I don't maintain an LDAP server by myself, and if both the test is broken and the module itself needs some serious hacks to get it to barely work, we might think about just dropping the module alltogether, and having the very few users using it implement it the way they deem fit for their usecase.

It's better than shipping a broken test and a module no-one knows how to get to work.

@flokli flokli mentioned this issue May 5, 2020
10 tasks
@flokli
Copy link
Contributor Author

flokli commented May 5, 2020

PR removing the broken test and adding the deprecation warning: #87004

@flokli
Copy link
Contributor Author

flokli commented May 5, 2020

Also cc @ju1m.

@Ekleog
Copy link
Member

Ekleog commented May 6, 2020

I use the slapd module with just the following options: enable, urlList, defaultSchemas = true, database = "bdb", suffix, rootdn, rootpw, dataDir, a few indices in extraDatabaseConfig and all the contents in declarativeContents.

While this definitely is not an advanced use case of it, it works without a hitch.

So, IMO, removing this module would be a net loss.

@flokli
Copy link
Contributor Author

flokli commented May 6, 2020

Well, #87004 is only about removing the test that already failed for months, and adding a deprecation notice to the module.

Without tests, we're flying blind and don't notice breakages, so I consider it appropriate to warn users about this fact, assuming they use it in production.

I can't judge on how advanced the NixOS tests are, but If you use the module and want it to keep working, having a simple, passing vm test that doesn't reach deep into module internals should be the best way forward here.

@flokli
Copy link
Contributor Author

flokli commented May 6, 2020

Let's move the discussion to #87004.

flokli added a commit to flokli/nixpkgs that referenced this issue May 6, 2020
This seems to have worked in 15f105d (5
months ago) but broke somewhere in the meantime.

The current module doesn't seem to be underdocumented and might need a
serious refactor. It requires quite some hacks to get it to work (see
NixOS#86305 (comment)),
or how the ldap.nix test used systemd.services.openldap.preStart and
made quite some assumptions on internals.

Mic92 agreed on being added as a maintainer for the module, as he uses
it a lot and can possibly fix eventual breakages. For the most basic
startup breakages, the remaining openldap.nix test might suffice.
@flokli
Copy link
Contributor Author

flokli commented May 6, 2020

With nixosTests.ldap removed (by #87004), we can close this.

@flokli flokli closed this as completed May 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken 6.topic: testing Tooling for automated testing of packages and modules
Projects
None yet
Development

No branches or pull requests

4 participants