Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

18.09 Zero Hydra Failures #45960

Closed
samueldr opened this issue Sep 2, 2018 · 88 comments
Closed

18.09 Zero Hydra Failures #45960

samueldr opened this issue Sep 2, 2018 · 88 comments

Comments

@samueldr
Copy link
Member

samueldr commented Sep 2, 2018

image

Let's make Jellyfish the best release so far!

We have: the main jobset starting at 425 failures, x86_64-darwin at 829, and aarch64-linux at ~1120. The numbers may seem large, but one weird trick appropriate fix may fix many at once.

How can I help???

The remaining packages will be marked as broken before the release (on the failing platforms), i.e. at the end of September. /cc @NixOS/nixpkgs-committers, but everyone can help out!

@samueldr samueldr added this to the 18.09 milestone Sep 2, 2018
@worldofpeace worldofpeace mentioned this issue Sep 3, 2018
9 tasks
danieldk added a commit to danieldk/nixpkgs that referenced this issue Sep 3, 2018
Keras expects keras_preprocessing 1.0.2 and 1.0.4. 1.0.3 and 1.0.5
are respectively in nixpkgs.

ZHF NixOS#45960
@symphorien symphorien mentioned this issue Sep 3, 2018
9 tasks
@xeji
Copy link
Contributor

xeji commented Sep 3, 2018

Reverting ad47c38 will fix nixpkgs.perl*Packages.MouseXGetOpt

reverted in 9889c0f and 4c00a04

xeji pushed a commit that referenced this issue Sep 3, 2018
Keras expects keras_preprocessing 1.0.2 and 1.0.4. 1.0.3 and 1.0.5
are respectively in nixpkgs.

ZHF #45960

(cherry picked from commit e33be2a)
xeji pushed a commit that referenced this issue Sep 3, 2018
Keras expects keras_preprocessing 1.0.2 and 1.0.4. 1.0.3 and 1.0.5
are respectively in nixpkgs.

ZHF #45960
@dywedir dywedir mentioned this issue Sep 3, 2018
9 tasks
@samueldr
Copy link
Member Author

samueldr commented Sep 4, 2018

Failures report as of right now.

Let's see how useful this is as a format. This was queried from the last finished eval, there were evals running while this was made.

@xeji
Copy link
Contributor

xeji commented Sep 4, 2018

This was queried from the last finished eval, there were evals running while this was made.

@volth the table shows an earlier eval where UNIVERSALref wasn't marked as broken yet, see the build logs, it is fine in the latest eval: https://hydra.nixos.org/eval/1477017#tabs-removed

Or should each dependent have its own meta.broken = versionAtLeast perl.version "5.26" ?

No, only the package that is broken itself.

@danieldk danieldk mentioned this issue Sep 4, 2018
9 tasks
@vcunat
Copy link
Member

vcunat commented Sep 4, 2018

I'd add that broken and other checks are transitive during evaluation (implemented as exceptions).

@timokau
Copy link
Member

timokau commented Sep 4, 2018

The sage failures are due to a mistake I made when adding pkg-config aliases to openblas and the recent numpy update. I fixed openblas in #46016 in staging. I don't know if that means it will also be merged into 18.09. I haven't gotten to backporting the numpy upgrade from sage upstream yet.

I really think it is a shame that hydra doesn't ping maintainers on failures anymore. Seems like an essential feature to miss.

@xeji
Copy link
Contributor

xeji commented Sep 4, 2018

@vcunat @samueldr there are a number of changes currently in staging/staging-next that should go to 18.09 once they reach master - openblas, texlive 2018, a systemd bugfix, etc.
What's the workflow for these? Guess we'll need a staging-18.09 branch + Hydra job.

@vcunat
Copy link
Member

vcunat commented Sep 4, 2018

Since the fork point the staging branch won't get to 18.09 anymore. Cherry-picking should be done if desired. For this one I did it in 6f8e07a.

@vcunat
Copy link
Member

vcunat commented Sep 4, 2018

@xeji: we have staging-18.09 already, I think we'll add a jobset soon as well to make ZHF easier. (People don't want to rebuild everything when fixing other packages.) This openblas commit wasn't a mass rebuild according to my measurements, so I pushed it directly.

Reminder: we should choose the changes a bit more carefully than for usual staging, so that real stabilization happens :-)

@xeji
Copy link
Contributor

xeji commented Sep 4, 2018

8fb90de (in staging-18.09) fixes a regression introduced with systemd 239, and makes the systemd test pass.
Update: Fix is in release-18.09 now, done.

@domenkozar
Copy link
Member

I've released stack2nix 0.2.1 to address version bump, once haskell packages are synced it should compile again.

@xeji
Copy link
Contributor

xeji commented Sep 4, 2018

boost162 is fixed by c70ff28 and 11c2595 (in staging-18.09, mass rebuild)
Update: fixes are in release-18.09 now, done.

@NeQuissimus
Copy link
Member

I removed the Copperhead kernel yesterday, so those kernel modules should no longer fail :) The whole kernel is unsupported and unmaintained.

Mic92 pushed a commit that referenced this issue Sep 24, 2018
Recent boost versions name their `python3` shared objects
`boost_python3x` rather than `boost_python3`.

See https://hydra.nixos.org/build/80712295
Addresses #45960

(cherry picked from commit 50f23da)
xeji pushed a commit that referenced this issue Sep 24, 2018
The dependency `distro` was missing.
See https://hydra.nixos.org/build/81330387

Addresses #45960

(cherry picked from commit baa7e52)
xeji pushed a commit that referenced this issue Sep 24, 2018
@xeji
Copy link
Contributor

xeji commented Sep 24, 2018

sage is still broken on 18.09 (but not on master). ping @timokau

@timokau
Copy link
Member

timokau commented Sep 27, 2018

@xeji thanks for the heads up. I cannot reproduce a failure with the most recent release-18.09, so that failure is probably outdated.

@vcunat
Copy link
Member

vcunat commented Sep 29, 2018

@timokau: the newest evaluation failed three times (on different Hydra machines): https://hydra.nixos.org/build/81780926 There's always some "N doctests failed" at the end.

@qknight
Copy link
Member

qknight commented Sep 29, 2018

you lack a implementation (library, egg) where fib is defined, see the error

NameError: name 'fib' is not defined

**********************************************************************
File "/nix/store/9nix1fnsj56iqqcjavkar8gbjnic8djm-sage-src-8.3/src/sage/repl/ipython_extension.py", line 403, in sage.repl.ipython_extension.SageMagics.fortran
Failed example:
    fib
Exception raised:
    Traceback (most recent call last):
      File "/nix/store/0bg15dw9whi80i71ldkxkyh1v5ck74lv-python-2.7.15-env/lib/python2.7/site-packages/sage/doctest/forker.py", line 573, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/nix/store/0bg15dw9whi80i71ldkxkyh1v5ck74lv-python-2.7.15-env/lib/python2.7/site-packages/sage/doctest/forker.py", line 983, in compile_and_execute
        exec(compiled, globs)
      File "<doctest sage.repl.ipython_extension.SageMagics.fortran[3]>", line 1, in <module>
        fib
    NameError: name 'fib' is not defined
**********************************************************************
File "/nix/store/9nix1fnsj56iqqcjavkar8gbjnic8djm-sage-src-8.3/src/sage/repl/ipython_extension.py", line 407, in sage.repl.ipython_extension.SageMagics.fortran
Failed example:
    fib(a, 10)
Exception raised:
    Traceback (most recent call last):
      File "/nix/store/0bg15dw9whi80i71ldkxkyh1v5ck74lv-python-2.7.15-env/lib/python2.7/site-packages/sage/doctest/forker.py", line 573, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/nix/store/0bg15dw9whi80i71ldkxkyh1v5ck74lv-python-2.7.15-env/lib/python2.7/site-packages/sage/doctest/forker.py", line 983, in compile_and_execute
        exec(compiled, globs)
      File "<doctest sage.repl.ipython_extension.SageMagics.fortran[6]>", line 1, in <module>
        fib(a, Integer(10))
    NameError: name 'fib' is not defined
**********************************************************************
File "/nix/store/9nix1fnsj56iqqcjavkar8gbjnic8djm-sage-src-8.3/src/sage/repl/ipython_extension.py", line 408, in sage.repl.ipython_extension.SageMagics.fortran
Failed example:
    a
Expected:
    array([  0.,   1.,   1.,   2.,   3.,   5.,   8.,  13.,  21.,  34.])
Got:
    array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
**********************************************************************

@timokau
Copy link
Member

timokau commented Sep 29, 2018

@vcunat I still cannot reproduce that locally. The main error seems to be

File "/nix/store/9nix1fnsj56iqqcjavkar8gbjnic8djm-sage-src-8.3/src/sage/misc/inline_fortran.py", line 65, in sage.misc.inline_fortran.InlineFortran.eval
Failed example:
    fortran(code, globals())
Exception raised:
    Traceback (most recent call last):
      ...
      File "/nix/store/0bg15dw9whi80i71ldkxkyh1v5ck74lv-python-2.7.15-env/lib/python2.7/site-packages/sage/misc/inline_fortran.py", line 125, in eval
        raise RuntimeError("failed to compile Fortran code:\n" + log_string)
    RuntimeError: failed to compile Fortran code:
      ...
      File "/nix/store/0bg15dw9whi80i71ldkxkyh1v5ck74lv-python-2.7.15-env/lib/python2.7/threading.py", line 736, in start
        _start_new_thread(self.__bootstrap, ())
    thread.error: can't start new thread

With the rest being avalanche errors. Could that be some sort of hydra threading issue? Can anybody reproduce the error?

@samueldr
Copy link
Member Author

samueldr commented Oct 1, 2018

Hmm, last times I wanted to generate a report there weren't good evals to check. Now there is:

It looks like a chunk of the new failures on aarch64 would be fixed with #47564.

@vcunat any blockers for a release you think? I don't have anything in mind, since staging was merged and the last things I knew of were all on staging.

While not strictly ZHF related, the blockers don't seem to be severe enough to warrant blocking the release? Most are either older than this release cycle, or things that don't really block, but still would be great to finalize.

@matthewbauer
Copy link
Member

It sounds alright to me. Someone had mentioned merging #42846 before the release but since it's a nonessential module I would consider it nonblocking (although might be worthwhile backporting).

Make sure you look at the 18.09 milestone PRs though too: https://github.com/NixOS/nixpkgs/pulls?q=is%3Aopen+is%3Apr+milestone%3A18.09

@peterhoeg
Copy link
Member

I consider #47577 a blocker - apologies for not raising it earlier.

@timokau
Copy link
Member

timokau commented Oct 1, 2018

It looks like the sage build succeeded, curious.

As for the release I suggest removing all the blocker labels and bumping the milestones, then give the people involved at least one day to complain before going ahead with 18.09.

@samueldr
Copy link
Member Author

samueldr commented Oct 1, 2018

Hey, what do you know: I found a blocker, and I should have known better and known it is one beforehand:

See #47602, while this doesn't block on a technical side, this is a definite blocker on the human side, with a side dish of bad user experience as their first bite into NixOS.

@timokau
Copy link
Member

timokau commented Nov 7, 2018

In case anybody was on the edge of their seat because of the transient sage failure: Turns out it was caused by some numpy issue with high cpu count. Since I'm still seeing the issue in master, its probably still there in release-18.09 too. I've opened a PR (#49888) to cherry-pick the fix to numpy in release-18.09. For master it should be good enough to wait for the next numpy upgrade.

@cyounkins
Copy link
Contributor

Looks like we had a huge bump at https://hydra.nixos.org/eval/1495549

Many are propagated from https://hydra.nixos.org/build/85961701 which failed with "Log limit exceeded" and "building of '/nix/store/5awxqywjwjldazlzls4jslgm1l828hb3-nbd-3.18' killed after writing more than 67108864 bytes of log output" despite not writing 6MB to the log in the web UI. Possible hydra issue? @grahamc

@vcunat
Copy link
Member

vcunat commented Dec 30, 2018

Apparently someone has restarted these builds and they succeeded. (Yes, I'm a bit late.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests