Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursive nix #213

Closed
wants to merge 20 commits into from
Closed

Recursive nix #213

wants to merge 20 commits into from

Conversation

shlevy
Copy link
Member

@shlevy shlevy commented Feb 18, 2014

This PR makes it possible to run nix from within a nix build. This is useful for things like using nix as a low level build tool, and is usable in places where import-from-derivation isn't (see #13 for more details).

This is done by introducing a new remote mode for nix. If NIX_REMOTE is recursive, then the remote client (e.g. RemoteStore or guix) should do the following:

  1. Read NIX_REMOTE_RECURSIVE_PROTOCOL_VERSION as the ASCII representation of an unsigned int. This is the version of the remote protocol in use by the parent nix process (currently 0x101)
  2. Verify that the major version (using the same definition for major and minor as the daemon protocol) of the protocol that the parent process uses matches a major version the client understand
  3. Read NIX_REMOTE_SOCKET_FD as the ASCII representation of a file descriptor. This is a writable unix domain socket (note that this can't be used as the daemon socket directly as multiple processes may use the same fd).
  4. Write a message containing the version of the protocol the client will use as a little-endian unsigned integer in the main data and a connected full-duplex unix domain socket (e.g. one of a pair created by socketpair) in the ancillary data to the fd found in NIX_REMOTE_RECURSIVE_SOCKET_FD. The socket that the sent socket is connected to can then be treated as a freshly-connected socket for the daemon protocol.
  5. Read NIX_REMOTE_RECURSIVE_PATHS_FD as the ASCII representation of a file descriptor. This file descriptor should be used to tell the parent nix process of new inputs that it should scan for when looking for references in this build's outputs (which is needed since recursive nix can make store paths that aren't inputs to the build accessible to it, which is really the whole point). Any paths so reported are also guaranteed not to be garbage collected until after the reference scan. Before appending (the fd is opened with O_APPEND), you must take an fcntl write lock on the file. The file format is just a sequential series of null-terminated store paths. The RemoteStore implementation collects every newly-discovered path in PathSet throughout its lifetime, then in the destructor reports them all at once to avoid holding the lock for a long time, but that isn't necessary.

And then the daemon protocol is followed as usual. This required several changes on the part of the parent nix process:

  • Instead of per-process temproots files, we have per-process temproots directories. This allows a single nix process to keep the reported paths of several recursive nix invocations alive.
  • When started normally, in addition to listening on the traditional socket nix-daemon also creates a pair of unix sockets to receive recursive nix connections on. It passes the write end of the pair to all children LocalStore instances.
  • When starting a build, LocalStore sets NIX_STORE_DIR and friends to the values appropriate for this store instance, packs the settings into _NIX_OPTIONS (see Settings::pack and Settings::unpack), and sets NIX_REMOTE to recursive and puts the protocol version in NIX_REMOTE_RECURSIVE_PROTOCOL_VERSION. It also creates a new temproot (named after the derivation file for guaranteed uniqueness) and puts that into NIX_REMOTE_RECURSIVE_PATHS_FD. If this is the first build, it then checks if it was passed a socket upon construction. If it was, that means this is a child of a nix-daemon process and it passes that socket in NIX_REMOTE_RECURSIVE_SOCKET_FD. If it wasn't, then it creates a socket pair and a pipe, puts both ends of the socket and the read end of the pipe into NIX_RECURSIVE_FDS (space separated) and starts nix-daemon. When NIX_RECURSIVE_FDS is set, nix-daemon does not listen on its traditional socket and instead uses the passed fds to listen for recursive nix connections. The read end of the pipe is used by nix-daemon to know when the parent process has died so it knows to exit. The parent LocalStore then closes the read end of the pipe and socket pair, purposefully leaks the write end of the pipe (though it is O_CLOEXEC), and uses the write end of the socket pair as NIX_REMOTE_RECURSIVE_SOCKET_FD for all builds.
  • After a build, LocalStore reads the temproot for that build, adds the closure of each path written there (including outputs if any are derivations) to its temproots and to the PathSet used when scanning for references, and then deletes the temproot.

cc @civodul for guix

Fixes #13

@shlevy
Copy link
Member Author

shlevy commented Feb 18, 2014

Ah, forgot to mention, as a quick check I ran make installcheck 10 times against this and master, and the time to do so was the same within 1%.

@shlevy
Copy link
Member Author

shlevy commented Feb 19, 2014

Ah, just realized that this doesn't make the new paths available in chroot. I'll work on a fix to that (which will probably revert the temproots change in favor of a new way of reporting paths), but what's here is still valid.

@shlevy
Copy link
Member Author

shlevy commented Feb 26, 2014

I've rebased and added commits to make it so new paths are available in the chroot. The recursive paths reporting protocol has changed slightly: The client can only write in chunks of at most 4096 bytes at a time, and must read a single byte to confirm each write. Only after confirmation can the client assume any valid paths are available (due to chroot concerns).

@shlevy
Copy link
Member Author

shlevy commented Feb 26, 2014

cc @civodul for the slight protocol change.

@shlevy
Copy link
Member Author

shlevy commented Feb 26, 2014

Ack, still broken, chroot builds are done in a private mount namespace so bind mounts done in the parent aren't seen. Will fix later.

If NIX_REMOTE=recursive, the client checks
NIX_REMOTE_RECURSIVE_PROTOCOL_VERSION for an integer version. If it is
set and its major version matches the client major version (current
protocol version 0x101), it then reads a connected unix domain socket
file descriptor from NIX_REMOTE_RECURSIVE_FD. It sends a message
containing the client protocol version in the message data and a
connected unix domain socket in the ancillary data, and the other end of
that socket is then used for the normal daemon protocol.
With recursive nix, it is possible for a build to legally gain access to
paths that weren't specified as its inputs (indeed, that is the whole
point). In order for the nix process that started the build to be able
to scan for dependencies properly, it must be told when these new paths
are (potentially) accessed.

This amends version 0x101 of the recursive protocol to put the socket fd
into NIX_REMOTE_RECURSIVE_SOCKET_FD and to put another fd into
NIX_REMOTE_RECURSIVE_PATHS_FD. This new fd is assumed to point to a new
kind of temporary gc root, where the root is kept from being considered
invalid by the parent nix process staying alive and thus only a write
lock is needed to avoid a race between the gc reading the root file and
a process adding to it. Upcoming commits should clarify this if it's
confusing.
nix-daemon now supports listening on an already-connected socket. In the
normal case, nix-daemon will listen on its traditional socket (passed
through systemd or not) and makes a new socketpair for listening for the
new kind of connection. If, however, nix-daemon was started with
NIX_RECURSIVE_FDS set, it uses the two fds listed in that env var as the
two ends of the connected socket and doesn't listen on the traditional
socket at all. In a future commit, LocalStores that were *not* launched
from a nix-daemon child process will launch nix-daemon with that env var
properly set if they need to perform a build.

In either case, nix-daemon now has a connected socket to read from and
possibly an unconnected socket it's listening to to connect to. When the
connected socket is readable, it reads the client recursive version from
the message data and the client domain socket from the ancillary data to
form a new connection. If the listening socket is connectable, a
connection is formed as usual.

Finally, when a child process is launched, its LocalStore gets passed
the other side of the connected socket so it can use the existing
nix-daemon process for recursive nix instead of having to spawn a new
nix-daemon process every time (to be implemented in a future commit)
This will allow one process to keep several temproots files alive, as is
required for recursive nix.

The semantics are as follows:

* When creating its first temproot file, a process takes the gc lock,
  creates /nix/var/nix/temproots/{pid} directory (deleting it if it
  exits), takes a read lock on /nix/var/nix/temproots/{pid}/lock, and
  releases the gc lock
* Before adding a new temproot file (including the first one), the
  process upgrades its read lock on /nix/var/nix/temproots/{pid}/lock to
  a write lock. It can then make as many temproot files in
  /nix/var/nix/temproots/{pid}/lock as it likes, so long as they are not
  named "lock", and then it downgrades the write lock to a read lock
* Before adding a new root to a temproot file, the process takes a write
  lock on the file. Afterwards, it releases the lock. The locking
  process *never* holds a read lock on individual temproot files
* For each directory in /nix/var/nix/temproots, the gc first tries to
  get a write lock on /nix/var/nix/temproots/{dirname}/lock. If it
  succeeds, or if the lock file does not exist, the directory must be
  stale so it deletes it and moves on. If it fails, it waits until it
  can get a read lock on /nix/var/nix/temproots/{dirname}/lock (after
  which the owning process can't create more temproot files until
  released) and then iterates through every file not named "lock" in
  /nix/var/nix/tmproots/{dirname}, taking a read lock on the file then
  reading roots out of it as before. The gc does not release any of the
  read locks it takes here until collection is done.

For now, local-store uses /nix/var/nix/temproots/{pid}/main for its
temproots file, but future commits will also put recursive nix temproots
there.
This involves:

* Setting up the NIX_STORE_PATH etc. env vars to match the current
  process
* Packing settings into _NIX_OPTIONS
* Setting NIX_REMOTE=recursive and NIX_REMOTE_RECURSIVE_PROTOCOL_VERSION
  to the protocol version
* If this process is not itself a daemon worker and it has not started a
  recursive nix daemon instance yet, making a new unidirectional
  socketpair and a pipe, sending both ends of the socketpair and the
  read end of the pipe to a new nix-daemon process in NIX_RECURSIVE_FDS,
  keeping the write end of the pipe open (so that nix-daemon can know
  if/when we die), and using the write end of the new socketpair as the
  store's recursive nix daemon socket
* Setting NIX_REMOTE_RECURSIVE_SOCKET_FD to the store's recursive nix
  daemon socket (and keeping it alive past closeMostFDs)
* Creating a new temproots file with the same name as this drv's path
* Setting NIX_REMOTE_RECURSIVE_PATHS to the new temproots file (and
  keeping it alive past closeMostFDs)
* After the build (if not using the build hook, which for now must use
  nix-store --import if it wants to report references not in the inputs
  closure), read through the new temproots file, add the closure of each
  temproot (include outputs of derivations) to allPaths, and add each
  path as a temproot so we can delete the new temproots file

This required adding functions to gc.cc to create and delete arbitrary
temproots files (which addTempRoot uses to creat the main temproots
file). nix-daemon was updated to add the third fd in NIX_RECURSIVE_FDS
(the read side of the close notification pipe) to its select set so that
it can exit once its parent dies (closes the write side of the pipe).

Now recursive nix should work! Next up: Test case.
This allows the parent nix process to add any needed new paths to the
chroot. The reporting protocol is changed to have a cap of 4096 bytes
per write and to require reading a single confirmation byte after each
write.

Temp roots handling is reverted back to the pre-recursive-nix way, since
we no longer create a new temp roots file for each build.
Recursive path reporting is expensive, especially in the chroot case. So
now we only report valid paths and are less defensive about derivations
doing stupid things with paths they aren't sure are valid.
Builder processes can purposefully cause an exception to be thrown in
any number of ways, so don't let a rogue builder kill an entire store
process.
@shlevy
Copy link
Member Author

shlevy commented Feb 27, 2014

Rebased and pushed some fixes. Now this works with and without chroot enabled. Note that only valid recursive paths should be reported now, and that generally you only need to report a recursive path if you might reasonably later assume the path exists (so in fewer cases than before).

@shlevy
Copy link
Member Author

shlevy commented Feb 27, 2014

BTW @edolstra I think this is ready for merge now, though of course you'll probably want to wait until after 1.7

@edolstra
Copy link
Member

Probably best to do it after 1.7.

@lucabrunox
Copy link
Contributor

Any comment on my proposal at #13 ? Maybe a builder written in nix itself might make this obsolete on the long term.

@shlevy
Copy link
Member Author

shlevy commented Apr 8, 2014

Take the example use case of a package that comes with its own nix expressions. How does your suggestion solve the problem of needing to download and unpack the tarball before those expressions can be evaluated?

@lucabrunox
Copy link
Contributor

I don't know the implementation details behind nix builds. However, if you reference a package y from package x such that it requires the derivation to be evaluated (for example $y), currently that y gets built.

Now with a nix builder and a dynamic .nix expression:

builder = do {
  ...
  dynamic = callPackage $out/path/to/dynamic.nix { };
  ... use $dynamic ...
}

That's certainly possible to do, after all the builder is just like any other .nix expression, except it's triggered when the derivation is needed. However that imports the .nix in the nix environment the builder is running in, just like any other callPackage.

@charleso
Copy link

As discussed on IRC, I get the following on MacOSX whenever I run nix-build with __recursive

error: disabling reads from connecting side of daemon socketpair: Socket is not connected

Possibly unrelated, but I get the following with nix-daemon

error: disabling reads from fdRecursiveTo: Socket is not connected

@shlevy
Copy link
Member Author

shlevy commented Apr 27, 2014

@charleso What version of OSX? I haven't tested yet but this looks like a bug in shutdown(2), which may be fixed in a newer version already if you're not on the latest (either way this should work though of course).

@charleso
Copy link

OSX 10.9.2.

@shlevy
Copy link
Member Author

shlevy commented May 14, 2014

@edolstra I have to bring this up to date and fix some bugs, but before I do that can you comment on whether this is something you want/the general approach looks good?

@shlevy
Copy link
Member Author

shlevy commented May 20, 2014

@edolstra ping?

@edolstra
Copy link
Member

Do you have any examples of how this will be used?

@shlevy
Copy link
Member Author

shlevy commented May 20, 2014

Most of my experimenting also uses my recursive-derivations branch, which I haven't yet finished with the design of but basically allows you to define a derivation that acts as a stand-in for another derivation evaluated at build time. For example, these two expressions will evaluate to different .drvs at eval time but will at build time ultimately result in the same out path:

let
  nix-src = fetchgit {
    url = git://github.com/NixOS/nix.git;
    rev = "abcdef";
    sha256 = "12345678";
  };

  nix1 = (import (nix-src + "/build.nix")).all; #Assuming all is a derivation with $out/bin etc. instead of just a list as it is now, named `nix`
  nix2 = runCommand "nix" { __recursive = true; buildInputs = [ pkgs.nix ]; } ''
    ln -sv $(nix-instantiate -A all ${nix-src}/build.nix) $out
  '';
in {
  env1 = buildEnv { name = "nix-from-nix-make"; paths = [ nix1 ]; };
  env2 = buildEnv { name = "nix-from-nix-make"; paths = [ nix2 ]; };
}

Without the recursive derivations stuff, if you replace the nix-instantiate with a nix-build the above two envs will be functionally equivalent.

Besides nix-make, I'd also like to use this in conjunction with a to-be-written qemu backend to test nixops networks in hydra. If running nix commands just worked inside of a build, then you could just run nixops create and nixops deploy and have a network created.

@edolstra
Copy link
Member

How would the inner build.nix find its dependencies (e.g. Nixpkgs)?

@shlevy
Copy link
Member Author

shlevy commented May 20, 2014

You'd have to make it available as an input, forgot that bit. So you could add -I nixpkgs=${<nixpkgs>} to the nix-instantiate call. Note that with recursive derivations the fact that a nixpkgs change results in a hash change doesn't actually require a full rebuild, once the reevaluation happens the same .drv will be generated and replaced in the dependency chain.

@edolstra
Copy link
Member

The output path of the outer derivation will still change, right? And if that's used as a dependency by lots of other packages, you might still be looking at a lot of rebuilds.

@shlevy
Copy link
Member Author

shlevy commented May 20, 2014

Right, that's why the next step is the recursive-derivations branch. See shlevy/nix@84ab4d4 for more details (though as I said that branch isn't ready yet, still some things to tweak), but basically the idea is that you define a derivation that just symlinks another derivation into $out, and then at build time nix will replace that derivation in the dependency tree before moving a step up in the dependency chain.

An illustration might help: Suppose /nix/store/abcde-foo.drv has /nix/store/fghij-bar.drv!out as an input, an /nix/store/fghij-bar.drv has /nix/store/klmno-baz.drv!?dev as an input (note the ?). If asked to realise /nix/store/abcde-foo.drv!out, nix will take the following steps:

  1. Realize /nix/store/klmno-baz.drv!out
  2. Verify that /nix/store/klmno-baz.drv!out is a symlink to a .drv named baz, say /nix/store/pqrst-baz.drv
  3. Realize /nix/store/pqrst-baz.drv!dev
  4. Rewrite all references to /nix/store/klmno-baz.drv and its outputs to /nix/store/pqrst-baz.drv and its outputs in /nix/store/fghij-bar.drv, saving the result as a new .drv, say /nix/store/uvwxy-bar.drv
  5. Realize /nix/store/uvwxy-bar.drv!out
  6. Rewrite all references to /nix/store/fghij-bar.drv and its outputs to /nix/store/uvwxy-bar.drv and its outputs in /nix/store/abcde-foo.drv, saving the result as a new .drv, say /nix/store/z1234-foo.drv
  7. Realize /nix/store/z1234-foo.drv!out
  8. Tell the caller of realisePath about the rewrite so it can find the actually-realized output path

This way, even though the inner evaluation is done at realisation time (step 1), the dependency graph is rewritten as if it had been done as part of the initial evaluation, and so changes in nix expression dependencies that don't affect the path of the .drv linked into the output don't affect the ultimate output.

On top of this machinery, derivationStrict recognizes __recursive = true as a cue to add the ? to the context of each of the outputs so that the .drvs are written properly.

@shlevy
Copy link
Member Author

shlevy commented May 27, 2014

@edolstra ping?

@shlevy
Copy link
Member Author

shlevy commented Jun 10, 2014

@edolstra I would really like to bring this up-to-date and get recursive-derivations working as well, can you comment on whether or not this is something you'll consider merging if I do?

@shlevy
Copy link
Member Author

shlevy commented Aug 8, 2014

Guessing this is not wanted.

@shlevy shlevy closed this Aug 8, 2014
@charleso
Copy link

charleso commented Aug 8, 2014

:(

@cstrahan
Copy link
Contributor

cstrahan commented Aug 8, 2014

I'm bummed to see this PR end this way, given the effort that went into it, in terms of both code/commits and communication. I can't speak for @shlevy, but I know that I'd much rather be told that my ideas are poorly conceived (and the reasons why) than feel left hanging. From a marketing standpoint, such silence suggests that involvement in this project is risky business - you might invest your time in some new feature, only to either abandon it or decide to fork nix/nixpkgs (if you do the latter, you're now encumbered with maintaining the fork, and it just might turn out that it was a bad idea anyway - but you could hardly be blamed, as no one told you why).

@edolstra: I really, really want to see nix grow, because in purely subjective non-technical terms, it's absolutely fucking awesome. What can we - the community - do to improve communication?

@tailhook
Copy link
Contributor

tailhook commented Aug 8, 2014

I agree with @cstrahan. Probably better to discuss it in mailing list?

@copumpkin
Copy link
Member

Is it worth leaving this open until the discussion is resolved? Seems like a pity for this to get lost.

@copumpkin
Copy link
Member

This is something I really want. Can we reopen the PR? I don't expect @shlevy will want to take care of bringing it up to date, but I (or somebody else) might.

@copumpkin
Copy link
Member

I suppose I can take the same branch and open a new PR for it from my own fork. Would be nice to retain the comment history though.

@shlevy
Copy link
Member Author

shlevy commented Jan 31, 2015

@copumpkin I've added you as a collab on my fork so you can just reuse this PR. Please also check out my recursive-derivations work, and feel free to ask any questions.

@nmikhailov
Copy link

This is sad and unhealthy situation. Really demotivates to contribute.

@copumpkin
Copy link
Member

@nmikhailov I definitely agree about the organizational aspects of what you're saying, but on the specifics of this PR, not all is lost! Perhaps @shlevy can elaborate on what aspects of recursive nix still need to exist after the new primop, and someone (maybe me, except I never have free time) can do the work to adapt this to reflect it.

@nmikhailov
Copy link

@copumpkin Well that's nice to hear regardless of general organizational issues.

@expipiplus1
Copy link
Contributor

I've tried quite hard over the last few days to resurrect this branch and make it mergeable with 2017's HEAD. Unfortunately there seems to just be too much churn in remote-store.cc (among others) to make this an easy merge and I don't think that I can complete this in a reasonable time with any confidence in its correctness.

The best path forward might be to use this PR as an inspiration for a re-implementation.

zolodev pushed a commit to zolodev/nix that referenced this pull request Jan 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Feature request or proposal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Recursive Nix
9 participants