Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rerun Failing Tests #882

Closed
mattwynne opened this issue Jul 2, 2015 · 33 comments
Closed

Rerun Failing Tests #882

mattwynne opened this issue Jul 2, 2015 · 33 comments
Labels
good first issue Good for newcomers 🚦 needs tests Needs tests ⚡ enhancement Request for new functionality
Milestone

Comments

@mattwynne
Copy link
Member

e.g. http://maven.apache.org/surefire/maven-surefire-plugin/examples/rerun-failing-tests.html

@mattwynne mattwynne added 🚦 needs tests Needs tests ⚡ enhancement Request for new functionality good first issue Good for newcomers labels Jul 2, 2015
@mattwynne mattwynne added this to the 3.0 milestone Jul 2, 2015
@danascheider
Copy link
Contributor

Anybody working on this?

@richarda
Copy link
Contributor

richarda commented Sep 5, 2015

I started, but unfortunately I haven't been able to make time for it
recently. I'd be happy if someone else did, but if not, I'll check in next
week.

On Saturday, September 5, 2015, Dana Scheider [email protected]
wrote:

Anybody working on this?


Reply to this email directly or view it on GitHub
#882 (comment)
.

@danascheider
Copy link
Contributor

I think I can work on this. I'll take a look at your PR, @richarda, and go from there.

@danascheider
Copy link
Contributor

BTW, if anybody has any ideas on how to simulate a flaky test in the features, I'd be glad to hear them. It's trivial to make the feature under test fail or pass at random, but I'm still working out how to make it fail, then pass, as needed for the test. @richarda, I'm using pretty much your same feature file at this point. Any thoughts?

@mattwynne
Copy link
Member Author

You could use a global variable in the step def, a bit like we do here: https://github.com/cucumber/cucumber-ruby/blob/master/features/docs/cli/randomize.feature

@danascheider
Copy link
Contributor

Thanks, @mattwynne, I'll go ahead and do it that way.

@danascheider
Copy link
Contributor

Any objections to my implementing this as a formatter?

@brasmusson
Copy link
Contributor

I think the event-API should be used (which @mattwynne change the fail-fast formatter to use a296821, #903). It is intended both for internal clients and external clients like user defined filters. Today filters generally live in Cucumber::Filters, maybe we should create Cucumber::Listeners as a home for event listeners?

@mattwynne
Copy link
Member Author

A simplistic solution would be to make this both a filter and an event listener - listen for failed test cases, them pump them back through the filter API for re-evaluation.

Where this falls down is that, presumably, we'll want to be able to report a test-case as passed if it's passed within the specified number of re-runs. To do that we'd need to reach deeper into things. I think this might need changes within the core to do it properly.

@mattwynne
Copy link
Member Author

@danascheider I think the place to start is with a failing feature anyway.

@mattwynne
Copy link
Member Author

On 28 Sep 2015, at 07:47, Björn Rasmusson [email protected] wrote:
maybe we should create Cucumber::Listeners as a home for event listeners?

Honestly, I don’t particularly like this kind of “layering by category” as a way to organise things. It’s too solution-domain focused and can break cohesion. I can imagine lots of different uses for event listeners, and things that listen for events may have other responsibilities too. Let’s not pre-judge yet.

You’re now making me think I want to get rid of the Filters namespace!

@danascheider
Copy link
Contributor

I have failing features on my fork already if you want me to make a PR. I went ahead and wrote them with the same idea that @richarda used, which was that test cases that passed within the specified number of retries were considered passing, but the output noted that they were flakes by saying "1 passing, 1 flake". The disadvantage of that is that it's a little unclear whether it's talking about two test cases or one, so one alternative would be to consider flakes separately from either passes or failures, but still exit with status 0. I'm open to suggestions.

On Sep 28, 2015, at 1:15 AM, Matt Wynne [email protected] wrote:

On 28 Sep 2015, at 07:47, Björn Rasmusson [email protected] wrote:
maybe we should create Cucumber::Listeners as a home for event listeners?

Honestly, I don’t particularly like this kind of “layering by category” as a way to organise things. It’s too solution-domain focused and can break cohesion. I can imagine lots of different uses for event listeners, and things that listen for events may have other responsibilities too. Let’s not pre-judge yet.

You’re now making me think I want to get rid of the Filters namespace!

Reply to this email directly or view it on GitHub.

@danascheider
Copy link
Contributor

Incidentally, after working on this more, I also came to the conclusion that a formatter is not the best way to do this, and was also tending more toward a filter or event listener. I should have commented to that effect, sorry! And @mattwynne, it is starting to look to me as well that implementing this "right" would involve some changes to core, although probably not very dramatic ones. In any case, if you think of a better implementation that hasn't been discussed here, let me know and I'll be happy to incorporate it.

@brasmusson
Copy link
Contributor

One option about the summary output is to use the same principle as the --wip option. For instance if only one scenario was executed first failing and the passing when rerun, the summary output would be something like:

2 scenarios (1 passed, 1 failed)
<step and time report>

All failing scenarios did pass when rerun (rerun was not attempted more the x times)

where "x" is the parameter for the maximum number of rerun attempts.

@mattwynne
Copy link
Member Author

Deciding on the output is tricky. I think that the implementation will be enough to focus on for now, so I suggest that for this first iteration, we just stick with the current formatter output.

For example:

Background:
  Given a scenario "Flakey" that fails once, then passes
  And a scenario "Shakey" that fails twice, then passes
  And a scenario "Solid" that passes

Scenario:
  When I run `cucumber --retry 2`
  Then it should pass with:
    """
    3 scenarios (2 passed, 1 failed)
    """

@brasmusson
Copy link
Contributor

I also think that focusing on the implementation is enough for now. I think that (the core of) the runner should be ignorant of the fact that the same test case is executed again (and again), it should just get the next test case to be executed and run it. Some other part (maybe in cucumber-ruby-core, maybe in cucumber-ruby) should be responsible to detect failures and trigger reruns.

With some relation to the thoughts about the future of reporting (cucumber-attic/gherkin#12 (comment)), from which I get the idea of a primary (json) report, from which other report are created, I think that the first level of reporting should be very much WYSIWWE ("What You See Is What Was Executed"). In case of the example above I would expect the events/formatter-calls:

before_test_case name="Flakey"
after_test_case name="Flakey" result="Failed"
before_test_case name="Flakey"
after_test_case name="Flakey" result="Passed"
before_test_case name="Shakey"
after_test_case name="Shakey" result="Failed"
before_test_case name="Shakey"
after_test_case name="Shakey" result="Failed"
before_test_case name="Shakey"
after_test_case name="Shakey" result="Passed"
before_test_case name="Solid"
after_test_case name="Solid" result="Passed"

Maybe I interpret --retry 2 differently - as try two times more after the first run - than in the example above.

@danascheider
Copy link
Contributor

@brasmusson - I was interpreting --retry 2 the same way you do, as two extra tries, not counting the original run.

@mattwynne
Copy link
Member Author

You're right! I wasn't thinking straight. So it should be:

Background:
  Given a scenario "Flakey" that fails once, then passes
  And a scenario "Shakey" that fails twice, then passes
  And a scenario "Solid" that passes

Scenario:
  When I run `cucumber --retry 1`
  Then it should fail with:
    """
    3 scenarios (2 passed, 1 failed)
    """

Scenario:
  When I run `cucumber --retry 2`
  Then it should pass with:
    """
    3 scenarios (3 passed)
    """

I think you can do this with a filter that listens for events, and pumps failed test cases back into the pipeline within the retry limit.

Make sense @danascheider?

@mattwynne
Copy link
Member Author

Interesting UI here: https://github.com/NoRedInk/rspec-retry

They use what are effectively tags to mark specific scenarios for being rerun. That seems like it might be more useful than a one-size-fits-all CLI options. WDYT?

@danascheider
Copy link
Contributor

The problem I see there is that it would require the user to figure out, or at least guess, which features are likely to be flaky to begin with. I use cucumber-ruby to run integration tests against my Backbone.js app, so thinking of my use case there, literally any of the features are potentially flaky in the event the remote test API hiccups or there's a problem with the connection. Having to put a tag on all of them could be cumbersome. That said, I do think a tag is a good idea in cases like the wire protocol features here. Is there a reason we couldn't implement both?

@mattwynne
Copy link
Member Author

Is there a reason we couldn't implement both?

👍

@jnpoyser
Copy link

@mattwynne has pointed me in this direction from the cukes forums after making a similar request. I'd suggested the following, using an around filter:

Around do |scenario, block|
  begin
    $slowdown = 0
    block.call
  rescue
    $slowdown = 1
    block.call
  end
end

The idea here was to catch the exception of a failing test, and to rerun it - in this case, once. In my particular flaky test, setting a slowdown global helps them to run. If it were implemented using hooks it would give the dev the control over how to handle the failing test. In the use case for Maven it could inspect an environment variable from your CI platform and retry that number of times.

This is all well and good, but there's no exception to catch :-/

I don't know enough about the internals of cukes to make an informed suggestion. When I realised there's no exception to catch, I stopped. Would it be possible to pass another variable into the block, against which the success or failure of the scenario can be set?

@diabolo
Copy link

diabolo commented Feb 10, 2016

@jnpoyser Doesn't your failing expectation create the exception?

@jnpoyser
Copy link

@diabolo No. Here's an example

Feature: tests
  Scenario: tests
    Given a flipflop test
Given(/^a flipflop test$/) do
  expect($dunit).to be(true)
end
Around do |scenario, block|
  $dunit = false
  begin
    puts "First run"
    block.call
  rescue
    $dunit = true
    puts "Second run"
    block.call
  end
end

Rescue block is not executed

@mattwynne
Copy link
Member Author

@jnpoyser @diabolo no, when those Around hook blocks run, any exceptions from the scenario are caught by Cucumber and reported to the formatters. Around hooks are a bit weird.

Let's look forward and imagine how you might do this using the new --retry feature.

We could add a BeforeRetry event that fires just before a failed scenario is retried - then you could set your $slowdown variable inside that. WDYT?

@jnpoyser
Copy link

@mattwynne retry sounds great, and straightforward - if the beforeretry hook was present then it would solve this problem. It would be a nice-to-have to be able to list the re-tried tests at the end of a run, but that's not a deal breaker - especially as CI servers wouldn't know what to do with that.

@brasmusson
Copy link
Contributor

@mattwynne I get really uncomfortable by the notion of the receiver of an "event" being able to influence the "event" sender by setting a variable or something. Then you are talking about something else than events. If it wasn't for that "hook" in Cucumber means something that becomes a step in a test case, it would be a good name for what you are talking about here ("register a BeforeRetry hook that is called just before a failed scenario is retried").

I would also like to relate this to the discussion of Parallel scenario execution and the vision of a Cucumber IO using a set of Cucumber runners distributed across machines. The events you are referring to are all generated inside the runners, and in my mind the will be propagated to the Cucumber IO, so that formatters and other event recipients will not see any difference compared to non-parallel Cucumber execution. In that context an event recipient on the Cucumber IO side, for sure cannot expect to "set a variable inside" and expect that it will influence a runner on a different machine.

@Archenoth
Copy link

Archenoth commented Apr 15, 2016

I don't really have too much to add except for another possible use case that a --retry parameter and BeforeRetry hook would allow.

I stumbled on this issue when looking for ways that I could interactively retry individual scenarios from pry on my local box before pushing fixes.

Basically, what I have right now is an After hook that calls binding.pry (Which drops me into a debugger) when a test fails so I can inspect the failed test state manually:

After do |scenario|
  binding.pry if scenario.failed?
end

This works beautifully. I can play around with the thing being tested in pry at the exact moment it failed. I can also hotswap in possible fixes or new features (In the case of TDD) and play around in pry to see if that yields good results.

However, when I try to make a particular test pass, I often just want to run the same test again and again until it works while making changes to the thing I am testing. A BeforeRetry hook could be a very opportune place to put this binding.pry call. It would allow me to put a gigantic number in the --retry parameter, start my tests from the beginning, and just run any failing tests over until I make them work.

Basically I am trying to build bridges as I am crossing them.

@mattwynne
Copy link
Member Author

@Archenoth that's an interesting idea. Could we discuss that in a separate ticket?

@mattwynne
Copy link
Member Author

I'm closing this ticket. IMO --retry is now working. The summary output could be better, but that's something we need to figure out separately I think.

@mattwynne
Copy link
Member Author

Thanks for all your hard work on this one @danascheider @brasmusson and everyone who gave us feedback!

@westlakem
Copy link

@mattwynne Is there any way to expose the --retry value as an environment variable? that way I can do something like if $total_retries == ENV['MAX_RETRIES']?

@lock
Copy link

lock bot commented Oct 25, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Oct 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
good first issue Good for newcomers 🚦 needs tests Needs tests ⚡ enhancement Request for new functionality
Projects
None yet
Development

No branches or pull requests

8 participants