Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scikit: random segmentation faults in hydra ( #121988

Closed
nbren12 opened this issue May 7, 2021 · 8 comments
Closed

scikit: random segmentation faults in hydra ( #121988

nbren12 opened this issue May 7, 2021 · 8 comments
Labels
0.kind: bug Something is broken

Comments

@nbren12
Copy link
Contributor

nbren12 commented May 7, 2021

Describe the bug

Hydra builds for scikitlearn give a segmentation fault intermittently in darwin and linux.

The issue happens 100% of the time for Darwin, but only rarely for Linux, and occurs about 72% of the way through the test suite. Since it's a segfault, the offending test is not shown. I suspect we'll need to enable verbose logging to figure out which test to disable.

To Reproduce

Unfortunately (?), the builds work locally in a sandbox so this seems specific to the hydra environment, which I have no idea how to debug.

Expected behavior

scikitlearn builds in hydra and is in the cache.

Notify maintainers

@rmcgibbo

Maintainer information:

# a list of nixpkgs attributes affected by the problem
attribute:
# a list of nixos modules affected by the problem
module:
@nbren12 nbren12 added the 0.kind: bug Something is broken label May 7, 2021
@nbren12
Copy link
Contributor Author

nbren12 commented May 7, 2021

From counting "." in the pytest logs, I think the segfault is in the 4313rd test in test_common.py. The name of this test is

sklearn/tests/test_common.py::test_estimators[NuSVC()-check_estimators_fit_returns_self(readonly_memmap=True)]

readonly_memmap seems fishy 🐟 . Exactly the kind of thing that would raise a segfault.

@rmcgibbo
Copy link
Contributor

rmcgibbo commented May 7, 2021

Good find with the '.' counting! Were you able to determine if it happens only on certain hydra machines? They might be old machines missing certain x86_64 ISA features?

@nbren12
Copy link
Contributor Author

nbren12 commented May 7, 2021

I'm not too familiar with hydra. I really just figured out what URL to check for mac packages.

@rmcgibbo
Copy link
Contributor

rmcgibbo commented May 7, 2021

This would be a great ZHF: #122042 for someone.

@tricktron
Copy link
Member

Seems to be the same problem scikit-learn/scikit-learn#17582.

@TomDLT wrote: May be related to misaligned arrays in joblib joblib/joblib#563

@rmcgibbo
Copy link
Contributor

It seems like an imminent fix for that issue in joblib is not in the cards, so I'd say that we should disable that test and include a comment that points to the relevant GitHub issues.

@tricktron
Copy link
Member

@rmcgibbo I agree.

@nbren12
Copy link
Contributor Author

nbren12 commented May 12, 2021

@rmcgibbo I just pushed a PR (#122687) fixing this.

@rmcgibbo rmcgibbo removed their assignment May 12, 2021
risicle pushed a commit to risicle/nixpkgs that referenced this issue May 25, 2021
* Disable all tests of the NuSVC estimator that use memmap'd data
* build in serial on darwin

Resolves NixOS#121988

(cherry picked from commit cb2891b)
jonringer pushed a commit that referenced this issue May 26, 2021
* Disable all tests of the NuSVC estimator that use memmap'd data
* build in serial on darwin

Resolves #121988

(cherry picked from commit cb2891b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants