Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Devenv hangs forever #1364

Closed
artemlive opened this issue Aug 7, 2024 · 34 comments · Fixed by #1433
Closed

Devenv hangs forever #1364

artemlive opened this issue Aug 7, 2024 · 34 comments · Fixed by #1433
Labels
bug Something isn't working

Comments

@artemlive
Copy link

artemlive commented Aug 7, 2024

Describe the bug
Hello!
I have a problem with Devenv on MacOS (Sonoma 14.6).
When I try to init devenv it hangs at the devenv init step:

devenv init                                           
• Creating devenv.nix
• Creating devenv.yaml
• Creating .envrc
• Creating .gitignore
direnv: loading ~/ops-stuff/repos/github-management/.envrc
direnv: loading https://raw.githubusercontent.com/cachix/devenv/95f329d49a8a5289d31e0982652f7058a189bfca/direnvrc (sha256-d+8cBpDfDBj41inrADaJt+bDWhOktwslgoP5YiGJ1v0=)
direnv: using devenv
direnv: .envrc changed, reloading
direnv: ([direnv export zsh]) is taking a while to execute. Use CTRL-C to give up.||

It hangs forever if you don't press Ctrl-C.
I checked what it runs underneath the hood via ps and ran the same command with an extra verbose.

/nix/store/z1zrijdcxzx5a8yf4prggamg6k41hcmi-nix-2.21-devenv/bin/nix --show-trace --extra-experimental-features nix-command --extra-experimental-features flakes --option warn-dirty false --keep-going --max-jobs 4 --option eval-cache false eval .#devenv.cachix --json -vvvvvvvv
evaluating file '<nix/derivation-internal.nix>'
using cache entry '{"_what":"gitLastModified","rev":"0a41d25e85caeb81211a5742b47a9a986edf35d2"}' -> '{"lastModified":1722604605}'
evaluating file '/.devenv.flake.nix'

And it stuck on the evaluating file /.devenv.flake.nix.
The same thing happens if I'm trying to run any command.
For example if I run devenv info:

devenv info -v
• Running command: /nix/store/z1zrijdcxzx5a8yf4prggamg6k41hcmi-nix-2.21-devenv/bin/nix --show-trace --extra-experimental-features nix-command --extra-experimental-features flakes --option warn-dirty false --keep-going --max-jobs 4 --option eval-cache false flake metadata
• Running command: /nix/store/z1zrijdcxzx5a8yf4prggamg6k41hcmi-nix-2.21-devenv/bin/nix --show-trace --extra-experimental-features nix-command --extra-experimental-features flakes --option warn-dirty false --keep-going --max-jobs 4 --option eval-cache false eval --raw .#info

And it hangs forever.
Same thing if I run the command manually:

/nix/store/z1zrijdcxzx5a8yf4prggamg6k41hcmi-nix-2.21-devenv/bin/nix --show-trace --extra-experimental-features nix-command --extra-experimental-features flakes --option warn-dirty false --keep-going --max-jobs 4 --option eval-cache false eval --raw .#info -vvvvvvvvvvv
evaluating file '<nix/derivation-internal.nix>'
using cache entry '{"_what":"gitLastModified","rev":"030722876accf9f65d17616d6d435f982c0ceb2f"}' -> '{"lastModified":1722513473}'
evaluating file '/.devenv.flake.nix'

I tried re-install nix and devenv but it didn't help.
To reproduce
devenv init
Examples are above.

Version
OS: MacOS Sonoma (14.6)

devenv version
devenv 1.0.8 (aarch64-darwin)

Nix version: 2.24.1

@artemlive artemlive added the bug Something isn't working label Aug 7, 2024
@artemlive
Copy link
Author

artemlive commented Aug 7, 2024

It seems there's a problem with the latest versions of Nix.
I've completely uninstalled Nix
And installed the older 2.21.4 version via:
sh <(curl -L https://releases.nixos.org/nix/nix-2.21.4/install)
And it doesn't hang now and seems to be working.

@rfhayashi
Copy link

I've also managed to reproduce the issue in nix 2.24.0 and 2.24.1 in Linux (Debian). It worked using 2.23.3. This only happens using the multi-user installation, the single-user works ok.

@domenkozar
Copy link
Member

Could someone open an issue at https://github.com/NixOS/nix/issues

@domenkozar domenkozar pinned this issue Aug 7, 2024
@pan93412
Copy link

pan93412 commented Aug 7, 2024

Seems like they have fixed this, hopefully: NixOS/nix#11258

@L-Ryland
Copy link

Seems like they have fixed this, hopefully: NixOS/nix#11258

It's still hanging 🫠

@levi-manoel
Copy link

it's happening with me too
image

$ devenv version
devenv 1.0.8 (x86_64-linux)
$ nixos-version 
24.11pre-git (Vicuna)
$ nix --version
nix (Nix) 2.24.1

wiedzmin added a commit to wiedzmin/nixos-config that referenced this issue Aug 16, 2024
`latest` makes "devenv" hang forever for some reason, see
cachix/devenv#1364 for details,
while more dated versions either lack essential features,
including experimental ones, or works overall strangely.
@delehef
Copy link

delehef commented Aug 19, 2024

Can confirm the same as above (w.r.t. versions) on a pretty vanilla Ubuntu 24.04.

Some additional info:

  • root has no issue using devenv on the exact same multi-user nix installation whereas it fails for non-previleged users;
  • according to gdb, nix hangs forever in a call to read in the libc6 bundled by Nix.

As @rfhayashi mentioned (thanks!), downgrading Nix to 2.23.3 fixes everything.

@MAHDTech
Copy link

MAHDTech commented Aug 20, 2024

Looks like it might be resolved as of 2.24.3 today.

I had devenv hanging in 2.24.2 and 2.24.1 but since upgrading just now it's fixed for me.

Anyone else tried 2.24.3?

@delehef
Copy link

delehef commented Aug 20, 2024

Just tried on a brand new Ubuntu 24.04 VM: 2.23.3 still works, but 2.24.3 still fails for me. Here is the gdb backtrace when hanging if it helps:

image

@pan93412
Copy link

Looks like it might be resolved as of 2.24.3 today.

I had devenv hanging in 2.24.2 and 2.24.1 but since upgrading just now it's fixed for me.

Anyone else tried 2.24.3?

It doesn't work, sadly:

CleanShot 2024-08-20 at 20 21 59@2x

@lf-
Copy link

lf- commented Aug 20, 2024

Just tried on a brand new Ubuntu 24.04 VM: 2.23.3 still works, but 2.24.3 still fails for me. Here is the gdb backtrace when hanging if it helps:

Can you extract the actual nix invocation that is hanging and/or attach gdb to that nix and pull a back trace of that instead? the one here is just saying it's waiting on output of some nix command but not what nix command or why it's stuck.

@delehef
Copy link

delehef commented Aug 20, 2024

Nix command:

[nix-shell:~/asdf]$ devenv -v info
• Running command: /nix/store/x74x4sz8ayv0dqc001za5rxxr26l0lmv-nix-2.21-devenv/bin/nix --show-trace --extra-experimental-features nix-command --extra-experimental-features flakes --option warn-dirty false --keep-going --max-jobs 2 --option eval-cache false flake metadata
• Running command: /nix/store/x74x4sz8ayv0dqc001za5rxxr26l0lmv-nix-2.21-devenv/bin/nix --show-trace --extra-experimental-features nix-command --extra-experimental-features flakes --option warn-dirty false --keep-going --max-jobs 2 --option eval-cache false eval --raw .#info

GDB backtrace from /nix/store/x74x4sz8ayv0dqc001za5rxxr26l0lmv-nix-2.21-devenv/bin/nix --show-trace --extra-experimental-features nix-command --extra-experimental-features flakes --option warn-dirty false --keep-going --max-jobs 2 --option eval-cache false eval --raw .#info:

(gdb) bt
#0  0x00007151eba253dc in read () from /nix/store/5adwdl39g3k9a2j0qadvirnliv4r7pwd-glibc-2.39-52/lib/libc.so.6
#1  0x00007151ebf9c943 in nix::FdSource::readUnbuffered(char*, unsigned long) ()
   from /nix/store/x74x4sz8ayv0dqc001za5rxxr26l0lmv-nix-2.21-devenv/lib/libnixutil.so
#2  0x00007151ebf976b2 in nix::BufferedSource::read(char*, unsigned long) ()
   from /nix/store/x74x4sz8ayv0dqc001za5rxxr26l0lmv-nix-2.21-devenv/lib/libnixutil.so
#3  0x00007151ebf99318 in nix::Source::operator()(char*, unsigned long) ()
   from /nix/store/x74x4sz8ayv0dqc001za5rxxr26l0lmv-nix-2.21-devenv/lib/libnixutil.so
#4  0x00007151ec151ca7 in unsigned char nix::readNum<unsigned char>(nix::Source&) ()
   from /nix/store/x74x4sz8ayv0dqc001za5rxxr26l0lmv-nix-2.21-devenv/lib/libnixstore.so
#5  0x00007151ec2263b0 in nix::WorkerProto::Serialise<std::optional<nix::TrustedFlag> >::read(nix::StoreDirConfig const&, nix::WorkerProto::ReadConn) () from /nix/store/x74x4sz8ayv0dqc001za5rxxr26l0lmv-nix-2.21-devenv/lib/libnixstore.so
#6  0x00007151ec1ee75b in nix::RemoteStore::initConnection(nix::RemoteStore::Connection&) ()
   from /nix/store/x74x4sz8ayv0dqc001za5rxxr26l0lmv-nix-2.21-devenv/lib/libnixstore.so
#7  0x00007151ec1fc230 in std::_Function_handler<nix::ref<nix::RemoteStore::Connection> (), nix::RemoteStore::ref(std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) [clone .lto_priv.0] ()
--Type <RET> for more, q to quit, c to continue without paging--c
   from /nix/store/x74x4sz8ayv0dqc001za5rxxr26l0lmv-nix-2.21-devenv/lib/libnixstore.so
#8  0x00007151ec1f8e46 in nix::Pool<nix::RemoteStore::Connection>::get() ()
   from /nix/store/x74x4sz8ayv0dqc001za5rxxr26l0lmv-nix-2.21-devenv/lib/libnixstore.so
#9  0x00007151ec1efd18 in nix::RemoteStore::getConnection() ()
   from /nix/store/x74x4sz8ayv0dqc001za5rxxr26l0lmv-nix-2.21-devenv/lib/libnixstore.so
#10 0x00007151ec1efe6b in virtual thunk to nix::RemoteStore::setOptions() ()
   from /nix/store/x74x4sz8ayv0dqc001za5rxxr26l0lmv-nix-2.21-devenv/lib/libnixstore.so
#11 0x00007151ec6d169c in nix::flake::lockFlake(nix::EvalState&, nix::FlakeRef const&, nix::flake::LockFlags const&) ()
   from /nix/store/x74x4sz8ayv0dqc001za5rxxr26l0lmv-nix-2.21-devenv/lib/libnixexpr.so
#12 0x00007151ec392ff0 in nix::InstallableFlake::getLockedFlake() const ()
   from /nix/store/x74x4sz8ayv0dqc001za5rxxr26l0lmv-nix-2.21-devenv/lib/libnixcmd.so
#13 0x00007151ec39448c in nix::InstallableFlake::getCursors(nix::EvalState&) ()
   from /nix/store/x74x4sz8ayv0dqc001za5rxxr26l0lmv-nix-2.21-devenv/lib/libnixcmd.so
#14 0x00007151ec3910bb in nix::InstallableValue::getCursor(nix::EvalState&) ()
   from /nix/store/x74x4sz8ayv0dqc001za5rxxr26l0lmv-nix-2.21-devenv/lib/libnixcmd.so
#15 0x00007151ec3906c8 in nix::InstallableFlake::toValue(nix::EvalState&) ()
   from /nix/store/x74x4sz8ayv0dqc001za5rxxr26l0lmv-nix-2.21-devenv/lib/libnixcmd.so
#16 0x000062e16096ecd6 in CmdEval::run(nix::ref<nix::Store>, nix::ref<nix::InstallableValue>) ()
#17 0x00007151ec37e08c in nix::InstallableValueCommand::run(nix::ref<nix::Store>, nix::ref<nix::Installable>) ()
   from /nix/store/x74x4sz8ayv0dqc001za5rxxr26l0lmv-nix-2.21-devenv/lib/libnixcmd.so
#18 0x00007151ec3a559c in nix::InstallableCommand::run(nix::ref<nix::Store>) ()
   from /nix/store/x74x4sz8ayv0dqc001za5rxxr26l0lmv-nix-2.21-devenv/lib/libnixcmd.so
#19 0x00007151ec37f6c7 in nix::StoreCommand::run() ()
   from /nix/store/x74x4sz8ayv0dqc001za5rxxr26l0lmv-nix-2.21-devenv/lib/libnixcmd.so
#20 0x000062e160994bdd in nix::mainWrapped(int, char**) ()
#21 0x00007151ec868fd5 in nix::handleExceptions(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()>) () from /nix/store/x74x4sz8ayv0dqc001za5rxxr26l0lmv-nix-2.21-devenv/lib/libnixmain.so
#22 0x000062e1608de2bc in main ()

@pan93412
Copy link

readUnbuffered

Produce the same backtrace in macOS:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x0000000194d799b4 libsystem_kernel.dylib`read + 8
    frame #1: 0x00000001054d7f30 libnixutil.dylib`nix::FdSource::readUnbuffered(char*, unsigned long) + 124
    frame #2: 0x00000001054d7e90 libnixutil.dylib`nix::BufferedSource::read(char*, unsigned long) + 184
    frame #3: 0x00000001054d7b20 libnixutil.dylib`nix::Source::operator()(char*, unsigned long) + 56
    frame #4: 0x0000000105768420 libnixstore.dylib`unsigned char nix::readNum<unsigned char>(nix::Source&) + 48
    frame #5: 0x000000010588fdb8 libnixstore.dylib`nix::WorkerProto::Serialise<std::__1::optional<nix::TrustedFlag>>::read(nix::StoreDirConfig const&, nix::WorkerProto::ReadConn) + 44
    frame #6: 0x0000000105842240 libnixstore.dylib`nix::RemoteStore::initConnection(nix::RemoteStore::Connection&) + 644
    frame #7: 0x0000000105850780 libnixstore.dylib`std::__1::__function::__func<nix::RemoteStore::RemoteStore(std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>> const&)::$_0, std::__1::allocator<nix::RemoteStore::RemoteStore(std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>> const&)::$_0>, nix::ref<nix::RemoteStore::Connection> ()>::operator()() + 40
    frame #8: 0x0000000105843f08 libnixstore.dylib`nix::Pool<nix::RemoteStore::Connection>::get() + 424
    frame #9: 0x0000000105844154 libnixstore.dylib`nix::RemoteStore::setOptions() + 52
    frame #10: 0x0000000104e60904 libnixexpr.dylib`nix::flake::lockFlake(nix::EvalState&, nix::FlakeRef const&, nix::flake::LockFlags const&) + 336
    frame #11: 0x000000010425d14c nix`FlakeCommand::lockFlake() + 88
    frame #12: 0x000000010425bd9c nix`virtual thunk to CmdFlakeUpdate::run(nix::ref<nix::Store>) + 128
    frame #13: 0x000000010529ce94 libnixcmd.dylib`nix::StoreCommand::run() + 64
    frame #14: 0x000000010425a94c nix`virtual thunk to CmdFlake::run() + 76
    frame #15: 0x000000010428b060 nix`nix::mainWrapped(int, char**) + 9044
    frame #16: 0x0000000104c2a344 libnixmain.dylib`nix::handleExceptions(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::function<void ()>) + 368
    frame #17: 0x0000000104292070 nix`main + 196
    frame #18: 0x0000000194a2f154 dyld`start + 2476

@pan93412
Copy link

pan93412 commented Aug 20, 2024

readUnbuffered

Produce the same backtrace in macOS:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x0000000194d799b4 libsystem_kernel.dylib`read + 8
    frame #1: 0x00000001054d7f30 libnixutil.dylib`nix::FdSource::readUnbuffered(char*, unsigned long) + 124
    frame #2: 0x00000001054d7e90 libnixutil.dylib`nix::BufferedSource::read(char*, unsigned long) + 184
    frame #3: 0x00000001054d7b20 libnixutil.dylib`nix::Source::operator()(char*, unsigned long) + 56
    frame #4: 0x0000000105768420 libnixstore.dylib`unsigned char nix::readNum<unsigned char>(nix::Source&) + 48
    frame #5: 0x000000010588fdb8 libnixstore.dylib`nix::WorkerProto::Serialise<std::__1::optional<nix::TrustedFlag>>::read(nix::StoreDirConfig const&, nix::WorkerProto::ReadConn) + 44
    frame #6: 0x0000000105842240 libnixstore.dylib`nix::RemoteStore::initConnection(nix::RemoteStore::Connection&) + 644
    frame #7: 0x0000000105850780 libnixstore.dylib`std::__1::__function::__func<nix::RemoteStore::RemoteStore(std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>> const&)::$_0, std::__1::allocator<nix::RemoteStore::RemoteStore(std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>> const&)::$_0>, nix::ref<nix::RemoteStore::Connection> ()>::operator()() + 40
    frame #8: 0x0000000105843f08 libnixstore.dylib`nix::Pool<nix::RemoteStore::Connection>::get() + 424
    frame #9: 0x0000000105844154 libnixstore.dylib`nix::RemoteStore::setOptions() + 52
    frame #10: 0x0000000104e60904 libnixexpr.dylib`nix::flake::lockFlake(nix::EvalState&, nix::FlakeRef const&, nix::flake::LockFlags const&) + 336
    frame #11: 0x000000010425d14c nix`FlakeCommand::lockFlake() + 88
    frame #12: 0x000000010425bd9c nix`virtual thunk to CmdFlakeUpdate::run(nix::ref<nix::Store>) + 128
    frame #13: 0x000000010529ce94 libnixcmd.dylib`nix::StoreCommand::run() + 64
    frame #14: 0x000000010425a94c nix`virtual thunk to CmdFlake::run() + 76
    frame #15: 0x000000010428b060 nix`nix::mainWrapped(int, char**) + 9044
    frame #16: 0x0000000104c2a344 libnixmain.dylib`nix::handleExceptions(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::function<void ()>) + 368
    frame #17: 0x0000000104292070 nix`main + 196
    frame #18: 0x0000000194a2f154 dyld`start + 2476

Seems like the devenv patched Nix could not connect to the remote store anymore?

Frame 6

CleanShot 2024-08-21 at 02 45 02@2x

Frame 5

CleanShot 2024-08-21 at 02 44 39@2x

@lf-
Copy link

lf- commented Aug 20, 2024

Hmmm, yes this might be a protocol violation. I would be surprised if it were a bug in the devenv patched nix version because why on earth would you patch the protocol code? One inference I can draw here is it's likely a protocol desynchronization. I can't remember the protocol context in which the trusted flag is grabbed since my work on the protocol in lix, but it may be a protocol versioning fuck-up somehow.

@delehef
Copy link

delehef commented Aug 20, 2024

Funnily enough, I confirm that running as root, devenv still works perfectly fine.

@lf-
Copy link

lf- commented Aug 20, 2024

Funnily enough, I running as root still does not hang, but now gives a HTTP 502 from github when downloading the .tar.gz from cachix/devenv-nixpkgs.

Running as root is expected to not get this problem because it's not using the nix protocol, it scribbles over the store without using the daemon.

Ok so here's the part where you kind of cry: I would advise getting debug info from both the nix client and the nix daemon, then gdb both of them and run them in lock step. Nix protocol desyncs are a nightmare.

@lf-
Copy link

lf- commented Aug 20, 2024

Is this expected to reproduce on the devenv repo itself, or what is the flake being tested?

@lf-
Copy link

lf- commented Aug 20, 2024

Update: doesn't matter what the flake is, devenv-nix cannot connect to CppNix 2.24.3 at all.

Reproducer:

nix build -L 'github:nixos/nix/2.24.3#^*' -o nix
nix build -L 'github:domenkozar/nix/devenv-2.21^*' -o nix-devenv
NIX_DAEMON_SOCKET_PATH=$(pwd)/daemon-socket nix/bin/nix daemon --store $(pwd)/store

In another window:

NIX_DEBUG_INFO_DIRS=./nix-devenv-debug/lib/debug gdb --args nix-devenv/bin/nix --store unix://$(pwd)/daemon-socket store ping

This will just get stuck in the same way as suggested above. Seems like a serious regression to me. If it's useful, it seems to desync before daemonNixVersion gets correctly read:

107	        if (GET_PROTOCOL_MINOR(conn.daemonVersion) >= 35) {
(gdb) p conn.daemonNixVersion
$7 = std::optional<std::string> = {[contained value] = ""}

I will also add, I am 90% sure we backported this bug into Lix and since fixed it, and it might be a since-fixed Nix 2.21 bug.

@lf-
Copy link

lf- commented Aug 20, 2024

Possibly NixOS/nix#9584? Ah no, that was never a bug in the first place. Each side just speaks the version the other end asks back at it, not necessarily the same version, sure, but I don't think that is this bug.

@lf-
Copy link

lf- commented Aug 20, 2024

OK I have figured out what the problem is: for some reason the nix-devenv Nix is reporting protocol 38, which it absolutely is not compliant with, which is then causing protocol negotiation to explode. Either this is a poorly thought out patch in devenv-nix, or a failure on the part of CppNix to increment protocol versions when they make protocol changes.

I am going to go look at the devenv-nix patches and we are going to find out which one it is.

Consider the client side:

Thread 1 "nix" hit Breakpoint 2, nix::RemoteStore::initConnection (this=0x7b7bf0, conn=...) at src/libstore/remote-store.cc:67
67	{
(gdb) n
70	        conn.from.endOfFileError = "Nix daemon disconnected unexpectedly (maybe it crashed?)";
(gdb)
71	        conn.to << WORKER_MAGIC_1;
(gdb)
72	        conn.to.flush();
(gdb) info proc
process 2114582
cmdline = '/nix/store/kanbarzg39qkbq5shvj326c6yxsxkfbv-nix-devenv-2.21.0pre20240614_31b9700/bin/nix --store unix:///home/jade/lix/devenv-haunting/daemon-socket store ping'
cwd = '/home/jade/lix/devenv-haunting'
exe = '/nix/store/kanbarzg39qkbq5shvj326c6yxsxkfbv-nix-devenv-2.21.0pre20240614_31b9700/bin/nix'
(gdb) n
75	            TeeSource tee(conn.from, saved);
(gdb)
73	        StringSink saved;
(gdb)
76	            unsigned int magic = readInt(tee);
(gdb)
73	        StringSink saved;
(gdb)
75	            TeeSource tee(conn.from, saved);
(gdb)
73	        StringSink saved;
(gdb)
75	            TeeSource tee(conn.from, saved);
(gdb)
73	        StringSink saved;
(gdb)
76	            unsigned int magic = readInt(tee);
(gdb)
77	            if (magic != WORKER_MAGIC_2)
(gdb) n
87	        conn.from >> conn.daemonVersion;
(gdb)
88	        if (GET_PROTOCOL_MAJOR(conn.daemonVersion) != GET_PROTOCOL_MAJOR(PROTOCOL_VERSION))
(gdb) p conn.daemonVersion
$10 = 294
(gdb) p/x conn.daemonVersion
$11 = 0x126
(gdb) n
90	        if (GET_PROTOCOL_MINOR(conn.daemonVersion) < 10)
(gdb)
92	        conn.to << PROTOCOL_VERSION;
(gdb)
94	        if (GET_PROTOCOL_MINOR(conn.daemonVersion) >= 14) {
(gdb)
96	            conn.to << 0;
(gdb) n
364	    return sink;
(gdb) n
99	        if (GET_PROTOCOL_MINOR(conn.daemonVersion) >= 11)
(gdb) n
100	            conn.to << false; // obsolete reserveSpace
(gdb) n
102	        if (GET_PROTOCOL_MINOR(conn.daemonVersion) >= 33) {
(gdb) n
103	            conn.to.flush();
(gdb) n
104	            conn.daemonNixVersion = readString(conn.from);
(gdb) list -
89	            throw Error("Nix daemon protocol version not supported");
90	        if (GET_PROTOCOL_MINOR(conn.daemonVersion) < 10)
91	            throw Error("the Nix daemon version is too old");
92	        conn.to << PROTOCOL_VERSION;
93	
94	        if (GET_PROTOCOL_MINOR(conn.daemonVersion) >= 14) {
95	            // Obsolete CPU affinity.
96	            conn.to << 0;
97	        }
98
(gdb) list +
99	        if (GET_PROTOCOL_MINOR(conn.daemonVersion) >= 11)
100	            conn.to << false; // obsolete reserveSpace
101	
102	        if (GET_PROTOCOL_MINOR(conn.daemonVersion) >= 33) {
103	            conn.to.flush();
104	            conn.daemonNixVersion = readString(conn.from);
105	        }
106	
107	        if (GET_PROTOCOL_MINOR(conn.daemonVersion) >= 35) {
108	            conn.remoteTrustsUs = WorkerProto::Serialise<std::optional<TrustedFlag>>::read(*this, conn);

And the server side:

Thread 1 "nix" hit Breakpoint 1, nix::WorkerProto::BasicServerConnection::handshake (to=..., from=..., localVersion=localVersion@entry=294,
    supportedFeatures=std::set with 0 elements) at src/libstore/worker-protocol-connection.cc:191
191	    if (magic != WORKER_MAGIC_1)
(gdb) n
193	    to << WORKER_MAGIC_2 << localVersion;
(gdb)
194	    to.flush();
(gdb)
195	    auto clientVersion = readInt(from);
(gdb) n
200	    std::set<WorkerProto::Feature> clientFeatures;
(gdb) n
197	    auto protoVersion = std::min(clientVersion, localVersion);
(gdb) p clientVersion
$1 = 294
(gdb) p/x clientVersion
$2 = 0x126
(gdb) p/x localVersion
$3 = 0x126
(gdb) n
200	    std::set<WorkerProto::Feature> clientFeatures;
(gdb)
197	    auto protoVersion = std::min(clientVersion, localVersion);
(gdb)
200	    std::set<WorkerProto::Feature> clientFeatures;
(gdb)
201	    if (GET_PROTOCOL_MINOR(protoVersion) >= 38) {
(gdb) p 0x26
$4 = 38
(gdb) n
202	        clientFeatures = readStrings<std::set<WorkerProto::Feature>>(from);
(gdb) list -
187	    WorkerProto::Version localVersion,
188	    const std::set<WorkerProto::Feature> & supportedFeatures)
189	{
190	    unsigned int magic = readInt(from);
191	    if (magic != WORKER_MAGIC_1)
192	        throw Error("protocol mismatch");
193	    to << WORKER_MAGIC_2 << localVersion;
194	    to.flush();
195	    auto clientVersion = readInt(from);
196	
(gdb) list +
197	    auto protoVersion = std::min(clientVersion, localVersion);
198	
199	    /* Exchange features. */
200	    std::set<WorkerProto::Feature> clientFeatures;
201	    if (GET_PROTOCOL_MINOR(protoVersion) >= 38) {
202	        clientFeatures = readStrings<std::set<WorkerProto::Feature>>(from);
203	        to << supportedFeatures;
204	        to.flush();
205	    }
206	
(gdb)
legend: C -> S; S <- C

-> WORKER_MAGIC_1
S: (ok)
<- WORKER_MAGIC_2 localVersion (0x100 | 38)
C: (read magic) (read conn.daemonVersion)
-> PROTOCOL_VERSION (0x100 | 38, I assume)
S: (read clientVersion)
-> 0 (obsolete cpu affinity)
-> false (obsolete reserveSpace)
C: ATTEMPT <- daemonNixVersion (string)
S: ATTEMPT <- readStrings clientFeatures (WHAT?!)
S: ATTEMPT -> supportedFeatures

Server here *should* be eating the nonsense ints (cpu affinity, reserveSpace)
that the client is sending it, but is instead attempting to do a feature
negotiation.

@lf-
Copy link

lf- commented Aug 20, 2024

This is a broken patch to devenv-nix: domenkozar/nix@537b7de is not safe to pick without the other protocol changes made prior to it, which seem to be missing as observed above (the client should be sending client features).

@pan93412
Copy link

pan93412 commented Aug 21, 2024

cc @domenkozar Would you mind investigating this issue? I've noticed that even the latest Nix 2.23 (nor 2.24 I suspect) version does not support garbage collecting a closure, as reported in this issue. It might be safe to remove this feature temporarily.

@domenkozar
Copy link
Member

Going to look into this today/tomorrow. Thanks a lot @lf-

@roberth
Copy link
Contributor

roberth commented Aug 22, 2024

Alternative solution strategies are

  • rebase all customizations onto a released version of Nix, or onto a recent version of the lazy-trees branch
    • depending on the amount of customization, this might actually be an easy solution
    • only after making sure it doesn't exhibit the same problem
  • or manually audit that all protocol changes have been applied in full
  • use an old version of Nix as a daemon proxy. This is less efficient, but forces the protocol to a known-good version on both sides. This might be possible with the CLI as is, but if you need anything extra, here's an example of a daemon and forwards to any store, including another daemon (and it runs in production, and removes the "trusted" capability, which you don't need to do)

Due to how protocol versioning works, we can not provide guarantees about interoperability with non-released versions of Nix, including development branches like lazy-trees.

This may work better for future protocol versions since recently:

However, that still doesn't mean that new operations don't change during development.

@rawkode
Copy link
Contributor

rawkode commented Aug 23, 2024

Is there any workaround for us to get a working devenv while this is being fixed?

@domenkozar
Copy link
Member

Is there any workaround for us to get a working devenv while this is being fixed?

The workaround is to downgrade Nix to 2.23.3

@rawkode
Copy link
Contributor

rawkode commented Aug 23, 2024

Silly me! I didn't even realise I could do that!

# NixOS
nix = {
  package = pkgs.nixVersions.nix_2_23;
}

Just in-case anyone else wasn't aware either 👍🏻

@ShaneMurphy2
Copy link

Silly me! I didn't even realise I could do that!

# NixOS
nix = {
  package = pkgs.nixVersions.nix_2_23;
}

Just in-case anyone else wasn't aware either 👍🏻

Nix newbie here, this only works running NixOS? Otherwise, I have to re-run the installer and specify an older version somehow?

@rfhayashi
Copy link

Nix newbie here, this only works running NixOS? Otherwise, I have to re-run the installer and specify an older version somehow?

Yeap, you have to uninstall (https://nix.dev/manual/nix/2.24/installation/uninstall) and install again using sh <(curl -L https://releases.nixos.org/nix/nix-2.23.3/install) --daemon

wiedzmin added a commit to wiedzmin/nixos-config that referenced this issue Aug 28, 2024
`latest` makes "devenv" hang forever for some reason, see
cachix/devenv#1364 for details,
while more dated versions either lack essential features,
including experimental ones, or works overall strangely.
@dabrowski-adam
Copy link

Nix newbie here, this only works running NixOS? Otherwise, I have to re-run the installer and specify an older version somehow?

Yeap, you have to uninstall (https://nix.dev/manual/nix/2.24/installation/uninstall) and install again using sh <(curl -L https://releases.nixos.org/nix/nix-2.23.3/install) --daemon

Determinate Systems has also released a new version of their installer with downgraded Nix.

https://github.com/DeterminateSystems/nix-installer/releases/tag/v0.23.0

(Note: the older install can be removed using /nix/nix-installer uninstall)

This was referenced Sep 10, 2024
@domenkozar domenkozar unpinned this issue Sep 11, 2024
@pan93412
Copy link

CleanShot 2024-09-13 at 19 59 56@2x

Can confirm it has been fixed. Thanks @domenkozar !!

@Swoorup
Copy link

Swoorup commented Sep 14, 2024

Still broken for me (darwin)

image

Actually nevermind. The first init just took a very long time.

@RadxaYuntian
Copy link

I'm on Arch and their latest Nix package as of today is causing this issue for me. I decided to install Lix instead and that works fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.