Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🏗 Retry tests before failing them on Travis #22210

Merged
merged 1 commit into from
May 9, 2019
Merged

🏗 Retry tests before failing them on Travis #22210

merged 1 commit into from
May 9, 2019

Conversation

rsimha
Copy link
Contributor

@rsimha rsimha commented May 9, 2019

During CI builds, we run ~15000 tests on Travis.

  • ~9500 unit tests (local)
  • ~3700 unit tests (sauce labs)
  • ~350 integration tests (local)
  • ~270 integration tests (local, single pass)
  • ~600 integration tests (sauce labs)
  • ~350 integration tests (sauce labs, beta browsers)

Often times, a single-digit number of tests will fail due to timing issues, and the Travis job needs to be retried to get a green build. This is frustrating for developers, and expensive due to wasted resources.

In this PR, we change the default number of test retries from 0 to 2 for Travis CI runs.

  • This will be a no-op for most tests, but will greatly reduce the incidence of one or two flaky tests bringing down an entire Travis CI build.
  • This will also be a no-op for genuine, repeatable test failures, for which we are guaranteed a failed build.

This should result in less random flakiness on Travis, and increase our trust in CI builds.

Partial fix for #14360

@rsimha
Copy link
Contributor Author

rsimha commented May 9, 2019

Tested this in both local and Travis modes by running an always failing test and making sure it runs once in local mode, and 3 times in Travis mode.

Copy link
Member

@danielrozenberg danielrozenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixed feelings about this... this doesn't reduce flakiness, it just hides it. Ideally, we should have some system in place to notify the build-cop when specific tests fail-retry-pass. By merging this PR, we're hiding the bad tests and risking shipping unstable code in the long run

@rsimha
Copy link
Contributor Author

rsimha commented May 9, 2019

I agree, mostly.

Right now, there is a huge cost to development because PR authors have to rerun jobs due to flaky tests that are unrelated to their code changes. This PR will reduce that cost.

Meanwhile, it's true that our current build-cop workflow does little to address the root cause of flaky tests other than to disable them. We have numerous tests that rely on external services. A more concerted effort to make all unit tests hermetic will go a long way in weeding out flakiness. Let's discuss this approach offline.

Copy link
Collaborator

@estherkim estherkim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: @danielrozenberg's comment, I think this is fine. It automates the retry mechanism that developers end up doing themselves - I can see this saving a lot of time.

@danielrozenberg
Copy link
Member

Still, I'm worried we're doing a set-it-and-forget-it on something that should not be forgotten

@rsimha
Copy link
Contributor Author

rsimha commented May 9, 2019

Still, I'm worried we're doing a set-it-and-forget-it on something that should not be forgotten

See this line from my previous reply.

A more concerted effort to make all unit tests hermetic will go a long way in weeding out flakiness. Let's discuss this approach offline.

@rsimha
Copy link
Contributor Author

rsimha commented May 9, 2019

Discussed offline, merging this for now while we search for ways to make our tests more hermetic and more isolated from one another.

@rsimha rsimha merged commit 4a58111 into ampproject:master May 9, 2019
@rsimha rsimha deleted the 2019-05-08-RetryOnTravis branch May 9, 2019 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants