Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hydra: nixos/release-20.03 and unstable fails to evaluate #79907

Closed
FRidh opened this issue Feb 12, 2020 · 45 comments
Closed

Hydra: nixos/release-20.03 and unstable fails to evaluate #79907

FRidh opened this issue Feb 12, 2020 · 45 comments
Labels
0.kind: bug Something is broken 0.kind: regression Something that worked before working no longer 1.severity: blocker This is preventing another PR or issue from being completed 1.severity: channel blocker Blocks a channel
Milestone

Comments

@FRidh
Copy link
Member

FRidh commented Feb 12, 2020

Describe the bug
The nixos/release-20.03 jobset fails to evaluate:

hydra-eval-jobs returned signal 9:
(no output)

I've tried several times to trigger an evaluation, yet every time it fails.

cc @Disasm @worldofpeace @grahamc @vcunat

@FRidh FRidh added 0.kind: bug Something is broken 1.severity: channel blocker Blocks a channel labels Feb 12, 2020
@FRidh FRidh added this to the 20.03 milestone Feb 12, 2020
@vcunat
Copy link
Member

vcunat commented Feb 12, 2020

The last few weeks felt like we're slowly making the big nixos eval too expensive (again). Maybe not just nixos, as I've seen increase in out-of-memory failures also in jobs like tarball, but perhaps it was just a feeling as I see no significant increase in these graphs: https://hydra.nixos.org/job/nixpkgs/trunk/metrics#tabs-charts

@Disasm
Copy link

Disasm commented Feb 12, 2020

cc @disassembler

@worldofpeace
Copy link
Contributor

Noticed this as well, can't open ZHF until there's an eval on the jobset.

@worldofpeace worldofpeace pinned this issue Feb 12, 2020
@grahamc
Copy link
Member

grahamc commented Feb 12, 2020

The thinking from Eelco is the growth of NixOS tests is causing memory pressure problems. Each VM in the tests adds a few hundred MB of RAM consumption for hydra's evaluator.

@worldofpeace
Copy link
Contributor

worldofpeace commented Feb 12, 2020

These got added 7a625e7 bf49181

@grahamc
Copy link
Member

grahamc commented Feb 12, 2020

It feels bad to be "within 5 tests" of being unable to move forward. :(

@grahamc
Copy link
Member

grahamc commented Feb 12, 2020

To clarify, @edolstra's suggestion short-term is to remove some of the tests. For example, those key map tests were commented for a very long time. It would be sad to drop them again but it may be the best short-term solution. Long-term, there is a branch for a more precise GC, and possibly some optimisation work which could be made in how NixOS is evaluated.

but I don't know if either of these more long-term things are possible today.

That said, I'm 100% not the right person for this problem, and possibly @LnL7, @samueldr, or @fpletz, @Ma27 have advice on how to tune hydra's evaluator.

@flokli
Copy link
Contributor

flokli commented Feb 12, 2020

@grahamc does it just run out of memory, or why does it fail to evaluate?

@LnL7
Copy link
Member

LnL7 commented Feb 12, 2020

This was for different reasons but I've been tracking the stdenv requisite size for quite a while now, could be totally unrelated but that had a rather large jump recently.

linux-requisites

Update this was between fa74455 and d453c2f. Most likely the libidn2 change at first glance.

@flokli
Copy link
Contributor

flokli commented Feb 12, 2020

@LnL7 could we bisect around that?

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixos-20-03-feature-freeze/5655/32

@LnL7
Copy link
Member

LnL7 commented Feb 12, 2020

Does anybody know why this only occurs for 20.03 and not trunk-combined? Evaluation for those should be equivalent (except for stableBranch but is/should be purely metadata).

@andir
Copy link
Member

andir commented Feb 12, 2020

I've seen killed trunk-combined tasks earlier today while trying to trigger a eval.

@LnL7
Copy link
Member

LnL7 commented Feb 12, 2020

@flokli 447edaa looks like python (and not a minimal build) was introduced in the stdenv. A minimal python would bring it down from ~270 to ~240.

@flokli
Copy link
Contributor

flokli commented Feb 12, 2020

If that's the case, we might just want to remove that reference -

I don't really see a reason why python should become part of glibc's runtime closure.

@vcunat
Copy link
Member

vcunat commented Feb 12, 2020

I'm not sure how the closure sizes are relevant to this thread, but I can't see a significant increase of (runtime) closure size for stdenv output path on x86_64-linux (and python is not there).

@jonringer
Copy link
Contributor

jonringer commented Feb 12, 2020

stdenv size didn't change much:

[13:37:37] jon@jon-workstation ~/projects/nixpkgs (master)
$ nix path-info -Sh ./result
/nix/store/5gc1hyqbxwfwcw7l1bs7gy6rw9zbnc09-stdenv-linux	 231.6M
[13:39:35] jon@jon-workstation ~/projects/nixpkgs (release-19.09)
$ nix path-info -Sh ./result
/nix/store/qghrkvk86f9llfkcr1bxsypqbw1a4qmw-stdenv-linux	 224.4M

and python is not in the runtime closure:

[13:40:47] jon@jon-workstation ~/projects/nixpkgs (master)
$ nix-store -q --tree ./result | grep python
[13:40:58] jon@jon-workstation ~/projects/nixpkgs (master)

@LnL7
Copy link
Member

LnL7 commented Feb 12, 2020

@vcunat It could be something totally different, but given that nixos instances will evaluate pkgs multiple times it's something that increases evaluation for each test.

@jonringer
Copy link
Contributor

jonringer commented Feb 12, 2020

I did notice that the hydra jobsets for "trunk" now take over 100 seconds to evaluate, where they use to be significantly lower when I first started viewing hydra >6 months ago.

@FRidh
Copy link
Member Author

FRidh commented Feb 13, 2020

The evaluator dies with hydra-eval-jobs returned signal 9 but also random builds fail with 9. Would the evaluator kill remote jobs when it runs out of memory? Or could those be builds that happen to run on the evaluator?

@vcunat
Copy link
Member

vcunat commented Feb 13, 2020

No, I believe there are no such connections.

@worldofpeace worldofpeace added the 1.severity: blocker This is preventing another PR or issue from being completed label Feb 13, 2020
vcunat added a commit that referenced this issue Feb 13, 2020
It's a temporary measure until we have better ways.  See #79907.
(Not a real revert, as the comment wouldn't make sense, etc.)
@vcunat vcunat mentioned this issue Feb 13, 2020
10 tasks
@edolstra
Copy link
Member

Ouch, having glibc depend on python is really unfortunate.

@vcunat
Copy link
Member

vcunat commented Feb 13, 2020

It was upstream decision to use python in the build process (build-time only dependency). I don't think we can do much about that. EDIT: using some minimal python could be nice, though.

@flokli
Copy link
Contributor

flokli commented Feb 14, 2020

@vcunat you could probably switch that occurence to python3Minimal, introduced in #66762, which should have a smaller build and runtime closure - if you don't rely on things like libreadline or ssl support.

@vcunat
Copy link
Member

vcunat commented Feb 14, 2020

OK, I submitted #80112, but I still can't see how it's relevant to this thread.

@LnL7
Copy link
Member

LnL7 commented Feb 14, 2020

Based on the gc stats from nix the memory needed to evaluate eg. hello increased from 26mb -> 29mb with the glibc update (this has now doubled compared to 18.03 btw). This indeed isn't a big deal since it's a flat cost per architecture. However that's not the case for nixos instances, since each test imports it's own instance of nixpkgs.

I can't evaluate everything on my machine with the current settings, but evaluating just the tests seems to use between 600mb and 1.5Gb more before reverting that commit. With the way evaluation currently works that's a problem if this bumps up the memory usage enough to require a larger heap.

I don't know how much memory the hydra evaluator has available, but with GC_INITIAL_HEAP_SIZE=20G both 20.03 and older releases evaluated without issues. The larger heap size does result in higher average memory usage however which might be a problem for concurrent evaluations.

@vcunat
Copy link
Member

vcunat commented Feb 14, 2020

If I look correctly, using python3Minimal recovers only a small fraction of this increase.

@LnL7
Copy link
Member

LnL7 commented Feb 14, 2020

Yeah, I'm not sure there's a good solution for this other than trying to reduce the memory "enough" without more fundamental changes.

I took a quick look at the evaluation for tests, this probably isn't the right place to change and I think it would break tests that use overlays as well as multiple architectures. But something similar might work to reduce the overhead for tests quite significantly.

diff --git a/nixos/lib/build-vms.nix b/nixos/lib/build-vms.nix
index 1bad63b9194..8da2504bea9 100644
--- a/nixos/lib/build-vms.nix
+++ b/nixos/lib/build-vms.nix
@@ -36,6 +36,7 @@ rec {
       baseModules =  (import ../modules/module-list.nix) ++
         [ ../modules/virtualisation/qemu-vm.nix
           ../modules/testing/test-instrumentation.nix # !!! should only get added for automated test runs
+          { key = "nixpkgs-pkgs"; nixpkgs.pkgs = pkgs; }
           { key = "no-manual"; documentation.nixos.enable = false; }
           { key = "qemu"; system.build.qemu = qemu; }
           { key = "nodes"; _module.args.nodes = nodes; }

@vcunat
Copy link
Member

vcunat commented Feb 14, 2020

However that's not the case for nixos instances, since each test imports it's own instance of nixpkgs.

My reading of that part is that pkgs is passed through and not re-imported.

The idea for VM tests seems intriguing. Overlays appear considered at a quick glance.

@vcunat
Copy link
Member

vcunat commented Feb 14, 2020

I tried your patch with evaluation of just a pair of tests at once, and it decreased gc.totalBytes by ~22%

@LnL7
Copy link
Member

LnL7 commented Feb 14, 2020

Yeah, I linked the wrong thing.

Overlays appear considered at a quick glance.

That looks promising, threading through pkgs for the correct system instead of just pkgs (which is always x86_64-linux) to buildMV might be an option then. I won't have time to look into this further for a few days however.

@vcunat
Copy link
Member

vcunat commented Feb 14, 2020

I don't know these parts of code well, but I looked around and I still can't see any problem with that patch. I tried on Hydra, but it's still getting killed: https://hydra.nixos.org/jobset/nixos/nixos-test-expensive-eval (2/2 eval attempts killed)

@grahamc grahamc changed the title Hydra: nixos/release-20.03 fails to evaluate Hydra: nixos/release-20.03 and unstable fails to evaluate Feb 14, 2020
@bjornfor bjornfor added the 0.kind: regression Something that worked before working no longer label Feb 15, 2020
@vcunat
Copy link
Member

vcunat commented Feb 15, 2020

When I restricted it to just x86_64-linux, it succeeded on second attempt. I'm hopeful to use this approach for now. Note that 20.03 was also created just for x86_64-linux and couldn't get evaluation even after cutting some tests in ceb90b0... at least until a while ago (not sure what's changed).

Therefore I still expect that patch helped significantly; I'd still check diff in test failures before using it for real.

@grahamc
Copy link
Member

grahamc commented Feb 15, 2020

It looks like Eelco has whipped up a miracle and got evaluations passing, and in less time too.

@vcunat
Copy link
Member

vcunat commented Feb 15, 2020

Bought a better server? :-) In any case, it will be nice to know how he managed it, as it's a never-ending problem. EDIT: I suspect it was some kind of cheating, as we no longer have the aggregate tested job, neither in trunk-combined nor in release-20.03.

For long-term solutions of RAM consumption I have high hopes for NixOS/hydra#715

@jonringer
Copy link
Contributor

Evaluation 1570647 of jobset nixos:nixos-test-expensive-eval
Compare to...

This evaluation was performed on 2020-02-15 00:59:23. Fetching the dependencies took 3s and evaluation took 1109s

oof, 20mins for an eval. That's rough

@vcunat
Copy link
Member

vcunat commented Feb 15, 2020

That seems quite a normal number IIRC. (for our big jobsets like trunk-combined)

@vcunat
Copy link
Member

vcunat commented Feb 17, 2020

OK, let me ask explicitly about that miracle: how are channels going to work when we have no tested job anymore? Perhaps I just don't understand the intentions.

@edolstra
Copy link
Member

The tested job is back (it was never gone but it did have an evaluation error). We'll need to backport 2de3caf and 8950429 to the 20.03 branch.

@vcunat
Copy link
Member

vcunat commented Feb 17, 2020

Great ❤️ I pushed 20.03 backports.

I believe the issue is fixed and shouldn't re-appear anytime soon. Possible TODOs:

  • backport to 19.09. It probably will keep evaluating without that, but we could have it cheaper (for the several remaining months). It surely doesn't apply cleanly, but it should be a mechanical change.
  • still consider the approach from LnL; perhaps we can get even better performance thanks to that.

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixos-20-03-beta/5935/1

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/firefox-not-up-to-date/5941/2

@vcunat
Copy link
Member

vcunat commented Feb 18, 2020

No good, even the small channels are blocked now: NixOS/hydra#715 (comment)

@vcunat vcunat reopened this Feb 18, 2020
@vcunat
Copy link
Member

vcunat commented Feb 20, 2020

Resolved and today all channels even got updated.

@vcunat vcunat closed this as completed Feb 20, 2020
@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixos-20-03-beta/5935/7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken 0.kind: regression Something that worked before working no longer 1.severity: blocker This is preventing another PR or issue from being completed 1.severity: channel blocker Blocks a channel
Projects
None yet
Development

No branches or pull requests