Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop PPC builds #191

Closed
wants to merge 4 commits into from
Closed

Drop PPC builds #191

wants to merge 4 commits into from

Conversation

h-vetinari
Copy link
Member

As discussed in #187, the aarch/ppc builds were causing lots of trouble (not least that PPC has been silently failing the test suite for a while, see #186), take forever to build, and have an absurd ratio of maintenance-effort vs. usage numbers (see #171).

We disable them for the time being, in the hope that meson builds (#164) will have less issues. As mentioned in #187, we can create a v1.7 branch if we need to come back to these builds for the 1.7 series.

CC @conda-forge/core @rgommers for visibility.

Closes #187

@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@rgommers
Copy link
Contributor

I'm a little surprised about the low number for aarch64, but we still have the 1.7.1 and older packages so if it helps to temporarily drop those builds, it seems fine to me.

recipe/meta.yaml Outdated
# for signature, see https://github.com/scipy/scipy/blob/v1.7.0/scipy/_lib/_testutils.py#L27
- python -c "import scipy; scipy.test(verbose=2, label={{ label }}, tests={{ tests }}, extra_argv=['--durations=50'])"
- python -c "import scipy; scipy.test(verbose=2, label='full', extra_argv=['--durations=50'])"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this leading to a large increase in CI job times?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine for everything but aarch...

@h-vetinari
Copy link
Member Author

I'm a little surprised about the low number for aarch64

Do you mean the download numbers? I took your 👍 from #187 as a sign of support for this, but (aside from the PPC builds not running the test suite; which I had managed to forget myself...), it's not that urgent to act ATM.

@rgommers
Copy link
Contributor

took your 👍 from #187 as a sign of support for this, but (aside from the PPC builds not running the test suite; which I had managed to forget myself...),

I looked more at PPC there; aarch64 we get a lot of questions about. It's 0.3% of downloads here, and I had expected it to be slightly higher. Anyway, I'm happy either way, also dropping aarch64 is okay as long as it's temporary. Although I would like to see that Mamba follow-up on gh-171.

@h-vetinari
Copy link
Member Author

I'm fine to restrict this PR to dropping only PPC for the moment. Would that sound better?

Regarding mamba, it's possible to use for debugging, but not for publishing packages. There's a CEP to use mamba in conda, I'm guessing that's a prerequisite for building with mamba in conda-forge (though I might be wrong on that) - in any case, I'm not aware of a tracking issue for this on conda-forge side.

@jaimergp
Copy link
Member

This was discussed in last core meeting! Maybe we'll see it soon :)

@jakirkham
Copy link
Member

cc @conda-forge/core (for awareness)

@h-vetinari
Copy link
Member Author

@rgommers, aside from mamba-builds not being allowed yet, #192 is still hitting the timeout on aarch, and travis is just failing at the most basic task (downloading sources from github, see conda-forge/conda-forge.github.io#1521).

I think a reasonable approach would be to keep aarch running on azure (mamba or not), but PPC has got to go under the current conditions (especially the state of travis + #186).

only skipping the known slowest upstream tests with label='fast'
@h-vetinari h-vetinari changed the title Drop PPC & aarch builds Drop PPC builds Oct 15, 2021
@h-vetinari
Copy link
Member Author

Interesting... Now we have compilation errors everywhere. It looks like conda-forge/xsimd-feedstock#75 broke scipy (and/or pythran):

/[...]/pythran/pythonic/include/numpy/conjugate.hpp:28:24: error: redefinition of 'template<class T, long unsigned int N> int {anonymous}::pythonic::numpy::wrapper::conjugate(const int&)'
28 |     xsimd::batch<T, N> conjugate(xsimd::batch<T, N> const &v)
   |                        ^~~~~~~~~

CC @conda-forge/xsimd @serge-sans-paille

@jaimergp
Copy link
Member

Check their Gitter. I think it's the same issue: https://gitter.im/QuantStack/Lobby?at=61692853ee6c260cf71ce724

@rgommers
Copy link
Contributor

I think a reasonable approach would be to keep aarch running on azure (mamba or not), but PPC has got to go under the current conditions (especially the state of travis + #186).

Sounds good to me. Thanks for giving mambabuild a go.

@h-vetinari h-vetinari closed this Oct 16, 2021
@h-vetinari h-vetinari reopened this Oct 16, 2021
@h-vetinari
Copy link
Member Author

Pity that even on azure we have to restrict the modules we test on aarch...

In any case, if there are no further comments, I'm going to merge this after 24h.

@jakirkham
Copy link
Member

Did you see this ( https://twitter.com/condaforge/status/1450912470810431490 )?

@h-vetinari
Copy link
Member Author

Did you see this ( https://twitter.com/condaforge/status/1450912470810431490 )?

I didn't, but I already tested mamba in #192, and it doesn't make a difference here, unfortunately.

@jakirkham
Copy link
Member

Is there some release we are trying to get out the door that these build issues are blocking? If not, I'd say let's wait until that situation arises. It's possible we have another solution before then

@h-vetinari
Copy link
Member Author

Is there some release we are trying to get out the door that these build issues are blocking? If not, I'd say let's wait until that situation arises.

There's no release right now, but spuriously green CI is too easy to forget - I had raised #186 here and scipy/scipy#14560 upstream, and then simply didn't have it in mind anymore (e.g. when merging #179). This means that with the current setup, we're constantly merging completely untested and possibly broken PPC artefacts, and I want to remove that footgun.

It's possible we have another solution before then

It's really not a big deal to re-add a CI provider. I'll be happy to do it once a solution comes around.

@jakirkham
Copy link
Member

It doesn't seem like anyone is going to forget this issue (on the contrary people are working very hard behind the scenes to address it). Also it doesn't sound like we have a compelling need for this right now. So let's hold off on this until either there is a pressing need. Sound good? 🙂

@h-vetinari
Copy link
Member Author

h-vetinari commented Oct 22, 2021

It doesn't seem like anyone is going to forget this issue

How often do you inspect nominally "green" CI runs if they really passed? Certainly the bots (e.g. with an automerge-label) don't. The PPC builds currently produced here are known to fail the test suite and should simply not be published. I've only not suggested marking the existing PPC builds as broken because there were no direct complaints, but that's not proof of the absence of issues.

on the contrary people are working very hard behind the scenes to address it

Even if travis were magically fixed, this feedstock still times out there.

Also it doesn't sound like we have a compelling need for this right now

I disagree, because we shouldn't be publishing any more likely-broken builds.

So let's hold off on this until either there is a pressing need. Sound good? 🙂

No, actually, it doesn't sound good to me. I really fail to see the difficulty of raising a PR once a solution becomes available.

We're in this situation because PPC builds silently broke in CI (which is scipy-specific, not applicable to all PPC- or travis-builds), and the reasonable thing is to remove them. And it's IMO equally reasonable that the requirement for PPC builds to be (re-)added is a PR with passing CI.

@h-vetinari
Copy link
Member Author

OK, giving this another 72h (so it doesn't fall on the weekend) after the recent discussions.

I don't want to antagonise anyone here, though I really feel a PR to later re-add PPC (which I'd even do myself once it's clear how!) is not an undue barrier or burden for anyone. I feel even more strongly about removing silently broken builds from CI.

@serge-sans-paille
Copy link

@h-vetinari is pythran responsible in anyway of the instability you mention?

@h-vetinari
Copy link
Member Author

h-vetinari commented Oct 23, 2021

@h-vetinari is pythran responsible in anyway of the instability you mention?

With near certainty, the answer is no. It's a bizarre interaction of the sparse matrix code with PPC (and emulation), that I haven't been able to chase down for a long time (I believe it's a similar problem to one also affects the cvxpy-stack which I co-maintain).

This caused >2000 failures in the scipy test-suite - well, at least before those errors migrated to errors during collection that ended up silently ignored due to some pytest hook-issue.

@isuruf
Copy link
Member

isuruf commented Oct 26, 2021

See #193

@h-vetinari
Copy link
Member Author

Obsoleted (hopefully) by #193

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Disable aarch & ppc builds until new build system arrives
7 participants