Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST, CI: add ARM64 Graviton 2 to CI #2956

Merged

Conversation

tylerjereddy
Copy link
Member

  • start testing MDAnalysis on ARM64 using
    new AWS Graviton 2 architecture available
    in Travis CI

  • testing on my fork shows only two failures
    in the MDAnalysis test suite using a fairly
    minimal set of dependencies; we can probably
    either fix those or skip them and open an issue

  • the test failures are:

  1. test_written_remarks_property
  2. TestEncoreClustering.test_clustering_three_ensembles_two_identical
  • run time for this CI entry is about
    23 minutes, which is solid for ARM; we save a lot
    of time using experimental upstream wheels in some
    cases and excluding (source builds of) dependencies
    where possible until the ARM64 binary wheel ecosystem
    matures a bit more

@codecov
Copy link

codecov bot commented Sep 27, 2020

Codecov Report

Merging #2956 into develop will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop    #2956   +/-   ##
========================================
  Coverage    93.05%   93.05%           
========================================
  Files          186      186           
  Lines        24611    24611           
  Branches      3188     3188           
========================================
  Hits         22902    22902           
  Misses        1661     1661           
  Partials        48       48           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update beb5232...1deaff1. Read the comment docs.

@tylerjereddy
Copy link
Member Author

Looks like subtle behavior differences in MDAnalysis/analysis/encore/clustering/src/ap.c on the platforms of the only test that really matters here (the other is already known to be a issue on other plaforms so we can probably xfail the remarks one on arm64 with the hypothesis case problems there).

Looking at the first few lines of a diff command on the debug output from that C file when running the test in the absence of scikit-learn on both platforms:

-6.07   -5.59   -5.17   -4.75   -4.33   -3.97   -3.56   -3.24 | -6.07   -5.59   -5.17   -4.75   -4.33   -3.97   -3.56   -3.24
tmp -0.20                                                     | tmp -0.21
tmp -0.20                                                     | tmp -0.21
-0.54   -0.46   -0.96   -1.42   -1.90   -2.33   -2.75   -3.14 | -0.54   -0.46   -0.96   -1.42   -1.90   -2.33   -2.75   -3.14
0.46    -0.46   -0.96   -1.42   -1.90   -2.33   -2.75   -3.14 | 0.46    -0.46   -0.96   -1.42   -1.90   -2.33   -2.75   -3.14
tmp -4.44                                                     | tmp -4.45
tmp -4.44                                                     | tmp -4.45

@tylerjereddy
Copy link
Member Author

tylerjereddy commented Oct 13, 2020

A bunch of float -> double changes in that source file allows the test to pass on an ARM64 gcc compile farm node; I'll see if I can reduce the size of the diff/changes.

@tylerjereddy
Copy link
Member Author

Perhaps also useful to know if we really need float for "performance." Using double is often preferable anyway. Perhaps I could add a platform-conditional remap in the worst case.

@pep8speaks
Copy link

pep8speaks commented Oct 14, 2020

Hello @tylerjereddy! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 268:80: E501 line too long (81 > 79 characters)
Line 271:80: E501 line too long (81 > 79 characters)

Line 357:21: E127 continuation line over-indented for visual indent

Comment last updated at 2020-10-18 23:56:41 UTC

Copy link
Member Author

@tylerjereddy tylerjereddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll see if this passes ARM64 CI now..

package/MDAnalysis/analysis/encore/clustering/src/ap.c Outdated Show resolved Hide resolved
package/MDAnalysis/analysis/encore/clustering/src/ap.c Outdated Show resolved Hide resolved
@pytest.mark.xfail((os.name == 'nt'
and sys.maxsize <= 2**32) or
platform.machine() == 'aarch64',
reason="occasional fail on 32-bit windows and ARM")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't actually fail on the gcc 115 gcc compile farm ARM64 machine, but the test is clearly already flaky on some other platforms--maybe depends on character encoding/support stuff?

@tylerjereddy
Copy link
Member Author

hmm, I still need to iterate more it seems; the partial switch to double was harmful for another encore test

@tylerjereddy
Copy link
Member Author

tylerjereddy commented Oct 18, 2020

If I use gcc 4.x on ARM64 the failure is different than the 9.x series version in use in CI (and on the compile farm machine):

E       AssertionError: Unexpected result: <ClusterCollection with 42 clusters>
E       assert 42 == 40
E        +  where 42 = len(<ClusterCollection with 42 clusters>)

The "bad" value changes to 36 with gcc 9.x.

* start testing MDAnalysis on ARM64 using
new AWS Graviton 2 architecture available
in Travis CI

* testing on my fork shows only two failures
in the MDAnalysis test suite using a fairly
minimal set of dependencies; we can probably
either fix those or skip them and open an issue

* the test failures are:
`test_written_remarks_property`
`TestEncoreClustering.test_clustering_three_ensembles_two_identical`

* run time for this CI entry is about
23 minutes, which is solid for ARM; we save a lot
of time using experimental upstream wheels in some
cases and excluding (source builds of) dependencies
where possible until the ARM64 binary wheel ecosystem
matures a bit more
* adjust `CAffinityPropagation()` C-level function to use `double`
instead of `float` in some strategic locations that allow the
`test_clustering_three_ensembles_two_identical()` test to pass
on gcc115 ARM64 machine

* mark `test_written_remarks_property()` as a known failure on
ARM64; it fails in Travis CI on that platform, but not on gcc115
ARM64 machine; that said, this test is already known to be flaky
on Windows 32-bit (may depend on character settings on machine?)
* reduce optimization level of MDAnalysis builds on
ARM64 to avoid problems with `ap.c` causing test failures
on that architecture

* revert any source changes to `ap.c`
@tylerjereddy tylerjereddy force-pushed the treddy_mda_arm_aws_graviton2_ci branch from da244e7 to 1deaff1 Compare October 18, 2020 23:56
@tylerjereddy
Copy link
Member Author

Selectively reducing compiler optimization level to -O1 for ARM64 machines allows the encore tests to pass in my hands. We'll see if the CI agrees.

@tylerjereddy
Copy link
Member Author

@lilyminium @richardjgowers Can you live with the ARM64 shims here? CI is green now; the new job still runs in ~23 minutes.

I think we'll be able to bump up the optimization level a bit---O1 was my second try, but -O2 might be possible. We can always make that improvement later though.

@IAlibay
Copy link
Member

IAlibay commented Oct 19, 2020

It's interesting that the tests still run in a reasonably short amount of time even at O1. Even though they output incorrect values, are they substantially faster at 02/03?

It would be good to eventually work out if we can isolate the opt flag that's causing this. If anything it would help us understand how/if the code can be rewritten in a safer manner.

@tylerjereddy
Copy link
Member Author

are they substantially faster at 02/03?

The test suite as a whole in CI is about the same runtime at -O1 on ARM64. You can see by comparing the time I reported for the ARM64 job when I first opened the PR at -O3 vs. current run time after realizing that aggressive optimization was causing problems.

@tylerjereddy
Copy link
Member Author

Some performance loss in the testsuite may also be made up by reduced build time with optimizations turned down. In any case, I think correctness and portability trump the hunger for performance here, especially since I've isolated it to a single architecture.

Maybe open an issue to perfect things on ARM64 eventually, but that will be a separate task I think--the build/iteration times on ARM64 can still be slow on some of the openly available machines, and we're really only using a small subset of dependencies until that ecosystem matures anyway.

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this looks like a good step towards having Arm compatibility, thanks @tylerjereddy !

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work here @tylerjereddy Is the only thing that needs changing before we claim "minimal ARM support" the use of the scipy dev wheels? (mainly thinking as to whether or not a CHANGELOG entry is warranted)

@tylerjereddy
Copy link
Member Author

Yeah, I released SciPy 1.5.3 over the weekend with "official" Linux ARM wheels so I'll try to clean that up in a follow-up

@tylerjereddy tylerjereddy merged commit daee516 into MDAnalysis:develop Oct 19, 2020
@tylerjereddy tylerjereddy deleted the treddy_mda_arm_aws_graviton2_ci branch October 19, 2020 14:57
IAlibay pushed a commit that referenced this pull request Oct 19, 2020
Fixes issue #1989, completes PR #2956

## Work done in this PR

* Switches CI to using recently released SciPy `1.5.3` with "official" Linux ARM64 wheels
over the weekend, instead of the previous "special"/unreleased wheels
* Add a `CHANGELOG` entry to reflect preliminary MDAnalysis support
for the Linux ARM64 platform (minimal dependencies)
lilyminium pushed a commit to lilyminium/mdanalysis that referenced this pull request Dec 8, 2020
* TST, CI: add ARM64 Graviton 2 to CI

* start testing MDAnalysis on ARM64 using
new AWS Graviton 2 architecture available
in Travis CI

* testing on my fork shows only two failures
in the MDAnalysis test suite using a fairly
minimal set of dependencies; we can probably
either fix those or skip them and open an issue

* the test failures are:
`test_written_remarks_property`
`TestEncoreClustering.test_clustering_three_ensembles_two_identical`

* run time for this CI entry is about
23 minutes, which is solid for ARM; we save a lot
of time using experimental upstream wheels in some
cases and excluding (source builds of) dependencies
where possible until the ARM64 binary wheel ecosystem
matures a bit more

* MAINT: ARM64 shims

* adjust `CAffinityPropagation()` C-level function to use `double`
instead of `float` in some strategic locations that allow the
`test_clustering_three_ensembles_two_identical()` test to pass
on gcc115 ARM64 machine

* mark `test_written_remarks_property()` as a known failure on
ARM64; it fails in Travis CI on that platform, but not on gcc115
ARM64 machine; that said, this test is already known to be flaky
on Windows 32-bit (may depend on character settings on machine?)

* MAINT: MR 2956 revisions

* reduce optimization level of MDAnalysis builds on
ARM64 to avoid problems with `ap.c` causing test failures
on that architecture

* revert any source changes to `ap.c`
lilyminium pushed a commit to lilyminium/mdanalysis that referenced this pull request Dec 9, 2020
* TST, CI: add ARM64 Graviton 2 to CI

* start testing MDAnalysis on ARM64 using
new AWS Graviton 2 architecture available
in Travis CI

* testing on my fork shows only two failures
in the MDAnalysis test suite using a fairly
minimal set of dependencies; we can probably
either fix those or skip them and open an issue

* the test failures are:
`test_written_remarks_property`
`TestEncoreClustering.test_clustering_three_ensembles_two_identical`

* run time for this CI entry is about
23 minutes, which is solid for ARM; we save a lot
of time using experimental upstream wheels in some
cases and excluding (source builds of) dependencies
where possible until the ARM64 binary wheel ecosystem
matures a bit more

* MAINT: ARM64 shims

* adjust `CAffinityPropagation()` C-level function to use `double`
instead of `float` in some strategic locations that allow the
`test_clustering_three_ensembles_two_identical()` test to pass
on gcc115 ARM64 machine

* mark `test_written_remarks_property()` as a known failure on
ARM64; it fails in Travis CI on that platform, but not on gcc115
ARM64 machine; that said, this test is already known to be flaky
on Windows 32-bit (may depend on character settings on machine?)

* MAINT: MR 2956 revisions

* reduce optimization level of MDAnalysis builds on
ARM64 to avoid problems with `ap.c` causing test failures
on that architecture

* revert any source changes to `ap.c`
lilyminium pushed a commit to lilyminium/mdanalysis that referenced this pull request Dec 14, 2020
* TST, CI: add ARM64 Graviton 2 to CI

* start testing MDAnalysis on ARM64 using
new AWS Graviton 2 architecture available
in Travis CI

* testing on my fork shows only two failures
in the MDAnalysis test suite using a fairly
minimal set of dependencies; we can probably
either fix those or skip them and open an issue

* the test failures are:
`test_written_remarks_property`
`TestEncoreClustering.test_clustering_three_ensembles_two_identical`

* run time for this CI entry is about
23 minutes, which is solid for ARM; we save a lot
of time using experimental upstream wheels in some
cases and excluding (source builds of) dependencies
where possible until the ARM64 binary wheel ecosystem
matures a bit more

* MAINT: ARM64 shims

* adjust `CAffinityPropagation()` C-level function to use `double`
instead of `float` in some strategic locations that allow the
`test_clustering_three_ensembles_two_identical()` test to pass
on gcc115 ARM64 machine

* mark `test_written_remarks_property()` as a known failure on
ARM64; it fails in Travis CI on that platform, but not on gcc115
ARM64 machine; that said, this test is already known to be flaky
on Windows 32-bit (may depend on character settings on machine?)

* MAINT: MR 2956 revisions

* reduce optimization level of MDAnalysis builds on
ARM64 to avoid problems with `ap.c` causing test failures
on that architecture

* revert any source changes to `ap.c`
PicoCentauri pushed a commit to PicoCentauri/mdanalysis that referenced this pull request Mar 30, 2021
* TST, CI: add ARM64 Graviton 2 to CI

* start testing MDAnalysis on ARM64 using
new AWS Graviton 2 architecture available
in Travis CI

* testing on my fork shows only two failures
in the MDAnalysis test suite using a fairly
minimal set of dependencies; we can probably
either fix those or skip them and open an issue

* the test failures are:
`test_written_remarks_property`
`TestEncoreClustering.test_clustering_three_ensembles_two_identical`

* run time for this CI entry is about
23 minutes, which is solid for ARM; we save a lot
of time using experimental upstream wheels in some
cases and excluding (source builds of) dependencies
where possible until the ARM64 binary wheel ecosystem
matures a bit more

* MAINT: ARM64 shims

* adjust `CAffinityPropagation()` C-level function to use `double`
instead of `float` in some strategic locations that allow the
`test_clustering_three_ensembles_two_identical()` test to pass
on gcc115 ARM64 machine

* mark `test_written_remarks_property()` as a known failure on
ARM64; it fails in Travis CI on that platform, but not on gcc115
ARM64 machine; that said, this test is already known to be flaky
on Windows 32-bit (may depend on character settings on machine?)

* MAINT: MR 2956 revisions

* reduce optimization level of MDAnalysis builds on
ARM64 to avoid problems with `ap.c` causing test failures
on that architecture

* revert any source changes to `ap.c`
PicoCentauri pushed a commit to PicoCentauri/mdanalysis that referenced this pull request Mar 30, 2021
Fixes issue MDAnalysis#1989, completes PR MDAnalysis#2956

## Work done in this PR

* Switches CI to using recently released SciPy `1.5.3` with "official" Linux ARM64 wheels
over the weekend, instead of the previous "special"/unreleased wheels
* Add a `CHANGELOG` entry to reflect preliminary MDAnalysis support
for the Linux ARM64 platform (minimal dependencies)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants