Tweak Travis to use GCE #28500

alexcrichton · 2015-09-18T19:58:09Z

Travis CI has new infrastructure using the Google Compute Engine which has both
faster CPUs and more memory, and we've been encouraged to switch as it should
help our build times! The only downside currently, however, is that IPv6 is
disabled, causing a number of standard library tests to fail.

Consequently this commit tweaks our travis config in a few ways:

ccache is disabled as it's not working on GCE just yet
Docker is used to run tests inside which reportedly will get IPv6 working
A system LLVM installation is used instead of building LLVM itself. This is
primarily done to reduce build times, but we want automation for this sort of
behavior anyway and we can extend this in the future with building from source
as well if needed.
gcc-specific logic is removed as the docker image for Ubuntu gives us a
recent-enough gcc by default.

rust-highfive · 2015-09-18T19:58:13Z

r? @pcwalton

(rust_highfive has picked a reviewer for you, use r? to override)

alexcrichton · 2015-09-18T19:58:38Z

(should hold off on merging until travis actually passes)

cc @joshk, gonna have this be the continuation of #28437

Gankra · 2015-09-18T20:44:23Z

cc me

Gankra · 2015-09-18T20:44:56Z

Maybe I'm misremembering, but I thought we preferred building our patched up LLVM for maximal similarity to the buildbots?

alexcrichton · 2015-09-18T21:25:39Z

Yeah that's nice to have, but we've also gotten more requests recently to ensure we work with a stock build of Travis, and we may want to start gating on Travis CI as well soon (in addition to buildbot). The gating should fix the "accidentally broken" problem and if possible we can also have a build on Travis which builds LLVM from source.

Looks like the travis build still had failures:


failures:

---- [run-pass] run-pass/core-run-destroy.rs stdout ----

error: test run failed!
status: exit code: 101
command: x86_64-unknown-linux-gnu/test/run-pass/core-run-destroy.stage2-x86_64-unknown-linux-gnu 
stdout:
------------------------------------------

running 1 test
test test_destroy_actually_kills ... FAILED

failures:

---- test_destroy_actually_kills stdout ----
    thread 'test_destroy_actually_kills' panicked at 'called `Result::unwrap()` on an `Err` value: Error { repr: Os { code: 13, message: "Permission denied" } }', src/libcore/result.rs:736



failures:
    test_destroy_actually_kills

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured


------------------------------------------
stderr:
------------------------------------------

------------------------------------------

thread '[run-pass] run-pass/core-run-destroy.rs' panicked at 'explicit panic', /build/src/compiletest/runtest.rs:1501


---- [run-pass] run-pass/issue-26468.rs stdout ----

error: test run failed!
status: exit code: 101
command: x86_64-unknown-linux-gnu/test/run-pass/issue-26468.stage2-x86_64-unknown-linux-gnu 
stdout:
------------------------------------------

------------------------------------------
stderr:
------------------------------------------
thread '<main>' panicked at 'assertion failed: `(left == right)` (left: `42`, right: `19`)', /build/src/test/run-pass/issue-26468.rs:37

------------------------------------------

thread '[run-pass] run-pass/issue-26468.rs' panicked at 'explicit panic', /build/src/compiletest/runtest.rs:1501



failures:
    [run-pass] run-pass/core-run-destroy.rs
    [run-pass] run-pass/issue-26468.rs

failures:

---- sys::process::tests::test_process_mask stdout ----
    thread 'sys::process::tests::test_process_mask' panicked at 'called `Result::unwrap()` on an `Err` value: Error { repr: Os { code: 13, message: "Permission denied" } }', src/libcore/result.rs:736



failures:
    net::udp::tests::bind_error
    process::tests::signal_reported_right
    sys::process::tests::test_process_mask

Retrying with some more diagnostics and hopefully some fixed.

alexcrichton · 2015-09-19T00:09:41Z

@joshk interestingly it looks like all our calls to the kill syscall are failing with EPERM, you wouldn't happen to have seen this before, would you have?

alexcrichton · 2015-09-19T00:10:33Z

Oh wait travis-ci/travis-ci#4751 indicates that --priviledged is a thing to docker, let's try that!

joshk · 2015-09-20T21:54:46Z

Is it working better now?

joshk · 2015-09-20T21:55:10Z

Also, I see some things we can do to improve this further :)

alexcrichton · 2015-09-21T06:42:17Z

@joshk yeah so far looking so good, the IPv6 tests are passing in the docker container and the only remaining failure is because there's a known bug in LLVM 3.6 which causes our tests to fail, so I just need to figure out how to install LLVM 3.7 instead!

joshk · 2015-09-21T10:17:00Z

@alexcrichton i would suggest creating a new Docker image (eg. something like rust-base) which has all the deps you need already preinstalled for testing purposes. This would mean all that needs to be done is for Travis to clone the repo, pull the Docker image, and then run the tests.

alexcrichton · 2015-09-21T22:20:36Z

Huzzah! That did the trick! So, to summarize the changes here to get the build working:

I added a Dockerfile to src/etc which is used by the Travis bots to build an image inside of which all tests are run. This is built via docker build in a before_install step on Travis.
I fixed a UDP test to work even if the tests are run as root (which happens here apparently)
A few process-related tests were touched up in terms of style and what they do, but nothing major was changed. Without the --privileged flag to Docker we can't use the kill syscall, and I had to do some diagnosing to figure this out, but it ended up "just fixing" all the tests once this flag was passed. To be clear, I have no idea what this flag does beyond make our tests start passing.
Turns out there were a few straggler commits in LLVM 3.7 we didn't pick up recently, so I updated our fork of LLVM to use the true upstream 3.7 release and tweaked one of our bindings as well.

So I'm pretty comfortable with this:

I like the idea of us having a higher timeout by default here so we're not "sketchily getting past timeouts"
I like docker to start building from an "absolutely clean slate" to keep our dependencies under control
I like docker as it's relatively easy to reproduce failures locally
I like having some form of CI for using system LLVM as we want this regardless
This opens up the door to running a lot more tests on Travis (yay!)

So all in all, r? @brson

joshk · 2015-09-21T22:29:04Z

Wow, nice!

Two quick questions:

Does using a higher -j value in make improve the build time at all?
Can the build be split up so that you can use different jobs to run different parts of the test suite? (thus reducing the overall build time)

alexcrichton · 2015-09-21T23:09:49Z

Does using a higher -j value in make improve the build time at all?

How many cores do these machines have? e.g. is there a recommended -j for us to use?

Can the build be split up so that you can use different jobs to run different parts of the test suite? (thus reducing the overall build time)

The current way our makefiles are set up doesn't make this super easy, but logically this is totally plausible. We've got one ~30min build to produce a compiler followed by N test suites (which greatly vary in size), but all of the N test suites can be run in parallel (just gotta make sure they're all run).

joshk · 2015-09-21T23:15:48Z

The current instances use 2 CPUs, but playing around with a higher -j value might pay off, or might not.

As for the test suite break up, this is cool to know. This would mean, at least for now, that each Job would also need to produce a new compiler. We have plans to improve this, but this might be some work which we include in next year and work together on.

alexcrichton · 2015-09-21T23:20:27Z

Ah ok, I think that @gankro played around with different values of -j in the past and didn't see much benefit beyond 2 (but went with 4 to be safe). Also we'd totally be willing to tweak our makefiles to conform to whatever framework is in place to run tests in parallel, I have a feeling it'd definitely speed up our builds!

joshk · 2015-09-21T23:24:17Z

You can use Env vars to break up your build into N many jobs. But if we can partner on some work, then we can look into build pipeline support for Travis, allowing you to build a compiler once instead of N times. And maybe we can partner this year on the ability to opt in for larger VMs?

alexcrichton · 2015-09-22T00:14:27Z

That sounds like it'd work for us! We'd love to put more of our testing on travis, so we'd basically be producing N different compiler configurations, each of which needs to run M test suites, so we could manually encode the NxM matrix into .travis.yml but for now we'll probably stick to just N lines :). If we could encode into our configuration, however, N+M steps (e.g. how to build a compiler and then how to test any compiler) that'd be awesome!

I'm sure we'd also be more than willing to help out wherever possible, ideally we'd be able to move off buildbot completely but we're probably aways out from that!

alexcrichton · 2015-09-22T05:35:38Z

@joshk hm interesting, looks like the recent build we added timed out (enabling debug assertions and debuginfo in the compiler itself). It also looks like the normal build took ~15min longer than usual (perhaps normal?), in light of that should we keep trying to optimize our build, or would it be possible to increase the timeout a bit more? It's totally reasonable to say a 3hr timeout is a bit unreasonable :)

joshk · 2015-09-22T13:30:57Z

I can increase it to 3hrs if you like, so you can test further, but this means, of the 5 jobs you can run at once, you could have the queue blocked for 3 hours at a time due to long running jobs.

alexcrichton · 2015-09-22T20:41:47Z

@joshk hm ok, if it's alright we'll take the higher timeout for now and probably investigate how to parallelize more or just run fewer tests on our end.

joshk · 2015-09-27T23:30:50Z

Done!

(sorry for the wait)

brson · 2015-09-28T20:20:31Z

r=me

alexcrichton · 2015-09-28T20:42:51Z

Hm, still waiting on a successful run from travis, haven't gotten one from the debug builder yet...

alexcrichton · 2015-09-29T01:19:40Z

OK, looks like we may not be able to run the debug builder on travis (just takes too long), so just updated back to only using the system LLVM + make check, and let's see how far we get!

alexcrichton · 2015-09-29T15:05:55Z

@bors: r+ 0f09984

Travis CI has new infrastructure using the Google Compute Engine which has both faster CPUs and more memory, and we've been encouraged to switch as it should help our build times! The only downside currently, however, is that IPv6 is disabled, causing a number of standard library tests to fail. Consequently this commit tweaks our travis config in a few ways: * ccache is disabled as it's not working on GCE just yet * Docker is used to run tests inside which reportedly will get IPv6 working * A system LLVM installation is used instead of building LLVM itself. This is primarily done to reduce build times, but we want automation for this sort of behavior anyway and we can extend this in the future with building from source as well if needed. * gcc-specific logic is removed as the docker image for Ubuntu gives us a recent-enough gcc by default.

bors · 2015-09-29T17:56:05Z

⌛ Testing commit 0f09984 with merge 759bf1c...

bors · 2015-09-29T19:28:27Z

💔 Test failed - auto-linux-64-x-android-t

tamird · 2015-09-29T19:32:34Z

http://buildbot.rust-lang.org/builders/auto-linux-64-x-android-t/builds/6536/steps/test/logs/stdio

failure was signal_reported_right

Weird, that test is marked #[cfg(all(unix, not(target_os="android")))]

Travis CI has new infrastructure using the Google Compute Engine which has both faster CPUs and more memory, and we've been encouraged to switch as it should help our build times! The only downside currently, however, is that IPv6 is disabled, causing a number of standard library tests to fail. Consequently this commit tweaks our travis config in a few ways: * ccache is disabled as it's not working on GCE just yet * Docker is used to run tests inside which reportedly will get IPv6 working * A system LLVM installation is used instead of building LLVM itself. This is primarily done to reduce build times, but we want automation for this sort of behavior anyway and we can extend this in the future with building from source as well if needed. * gcc-specific logic is removed as the docker image for Ubuntu gives us a recent-enough gcc by default.

alexcrichton · 2015-09-29T23:56:54Z

@bors: r=brson 27dd6dd

Oh that's because I switched it to cfg(unix), oops!

Travis CI has new infrastructure using the Google Compute Engine which has both faster CPUs and more memory, and we've been encouraged to switch as it should help our build times! The only downside currently, however, is that IPv6 is disabled, causing a number of standard library tests to fail. Consequently this commit tweaks our travis config in a few ways: * ccache is disabled as it's not working on GCE just yet * Docker is used to run tests inside which reportedly will get IPv6 working * A system LLVM installation is used instead of building LLVM itself. This is primarily done to reduce build times, but we want automation for this sort of behavior anyway and we can extend this in the future with building from source as well if needed. * gcc-specific logic is removed as the docker image for Ubuntu gives us a recent-enough gcc by default.

bors · 2015-09-30T01:21:56Z

⌛ Testing commit 27dd6dd with merge 15db6ec...

tamird · 2015-09-30T02:36:25Z

src/libstd/process.rs

    #[cfg(all(unix, not(target_os="android")))]
-    #[test]


@alexcrichton did you mean to remove this?

Gah oops! I did indeed not mean to!

bors · 2015-09-30T03:20:14Z

☀️ Test successful - auto-linux-32-nopt-t, auto-linux-32-opt, auto-linux-64-nopt-t, auto-linux-64-opt, auto-linux-64-x-android-t, auto-mac-32-opt, auto-mac-64-nopt-t, auto-mac-64-opt, auto-win-gnu-32-nopt-t, auto-win-gnu-32-opt, auto-win-gnu-64-nopt-t, auto-win-gnu-64-opt, auto-win-msvc-32-opt, auto-win-msvc-64-opt

killercup · 2015-09-30T13:41:24Z

src/etc/Dockerfile

+RUN apt-get -y --force-yes install llvm-3.7-tools
+
+RUN mkdir /build
+WORKDIR /build


@alexcrichton IIRC, each RUN creates a new file system layer. Combining the RUNs into one (just using &&) might speed up your docker build.

If this image is cached anywhere (which might make sense), you might also want append some clean up calls to the RUN calling apt-get to reduce the size. I'm no authority on this, but I've often seen stuff like apt-get autoremove -y && apt-get clean all && rm -rf /var/lib/apt/lists/*.

Good point! Right now I don't think that this is anywhere near the limiting factor of our builds, however, so it may not be too bad one way or the other.

If building an image ends up taking too long in the future we'll probably want to just send it up to the hub and download it from there, but hopefully it won't be taking too too long!

@alexcrichton

This test was mysteriously messed with as part of rust-lang#28500 r? @alexcrichton

@alexcrichton

This test was mysteriously messed with as part of rust-lang#28500 r? @alexcrichton

rust-highfive assigned pcwalton Sep 18, 2015

alexcrichton mentioned this pull request Sep 18, 2015

lets try this on the new Travis setup #28437

Closed

alexcrichton force-pushed the docker-travis branch from 27da22e to c4c61e7 Compare September 18, 2015 21:25

alexcrichton force-pushed the docker-travis branch from c4c61e7 to a1e8c82 Compare September 19, 2015 00:10

alexcrichton force-pushed the docker-travis branch 3 times, most recently from cb4dd2d to a6b9fe8 Compare September 19, 2015 19:28

alexcrichton force-pushed the docker-travis branch from a6b9fe8 to 8fc41a3 Compare September 21, 2015 06:41

alexcrichton force-pushed the docker-travis branch 2 times, most recently from f02f8ff to b284636 Compare September 21, 2015 18:44

rust-highfive assigned brson and unassigned pcwalton Sep 21, 2015

edunham mentioned this pull request Sep 22, 2015

Automatically run grammar tests (grammar bot) #28592

Closed

alexcrichton force-pushed the docker-travis branch from c96a9ad to 0f09984 Compare September 29, 2015 01:19

alexcrichton force-pushed the docker-travis branch from 0f09984 to 27dd6dd Compare September 29, 2015 23:56

tamird reviewed Sep 30, 2015
View reviewed changes

bors merged commit 27dd6dd into rust-lang:master Sep 30, 2015

killercup reviewed Sep 30, 2015
View reviewed changes

wthrowe mentioned this pull request Oct 3, 2015

Build fails against LLVM 3.7.0 #28830

Closed

arcnmx mentioned this pull request Oct 19, 2015

Add missing #[test] attribute to test #29158

Merged

steveklabnik added a commit to steveklabnik/rust that referenced this pull request Oct 20, 2015

Rollup merge of rust-lang#29158 - arcnmx:process-test, r=alexcrichton

c393108

This test was mysteriously messed with as part of rust-lang#28500 r? @alexcrichton

steveklabnik added a commit to steveklabnik/rust that referenced this pull request Oct 20, 2015

Rollup merge of rust-lang#29158 - arcnmx:process-test, r=alexcrichton

b314f84

This test was mysteriously messed with as part of rust-lang#28500 r? @alexcrichton

alexcrichton deleted the docker-travis branch October 21, 2015 06:16

Tweak Travis to use GCE #28500

Tweak Travis to use GCE #28500

Conversation

alexcrichton commented Sep 18, 2015

rust-highfive commented Sep 18, 2015

alexcrichton commented Sep 18, 2015

Gankra commented Sep 18, 2015

Gankra commented Sep 18, 2015

alexcrichton commented Sep 18, 2015

alexcrichton commented Sep 19, 2015

alexcrichton commented Sep 19, 2015

joshk commented Sep 20, 2015

joshk commented Sep 20, 2015

alexcrichton commented Sep 21, 2015

joshk commented Sep 21, 2015

alexcrichton commented Sep 21, 2015

joshk commented Sep 21, 2015

alexcrichton commented Sep 21, 2015

joshk commented Sep 21, 2015

alexcrichton commented Sep 21, 2015

joshk commented Sep 21, 2015

alexcrichton commented Sep 22, 2015

alexcrichton commented Sep 22, 2015

joshk commented Sep 22, 2015

alexcrichton commented Sep 22, 2015

joshk commented Sep 27, 2015

brson commented Sep 28, 2015

alexcrichton commented Sep 28, 2015

alexcrichton commented Sep 29, 2015

alexcrichton commented Sep 29, 2015

bors commented Sep 29, 2015

bors commented Sep 29, 2015

tamird commented Sep 29, 2015

alexcrichton commented Sep 29, 2015

bors commented Sep 30, 2015

tamird Sep 30, 2015

Choose a reason for hiding this comment

alexcrichton Oct 19, 2015

Choose a reason for hiding this comment

bors commented Sep 30, 2015

killercup Sep 30, 2015

Choose a reason for hiding this comment

alexcrichton Sep 30, 2015

Choose a reason for hiding this comment