-
Notifications
You must be signed in to change notification settings - Fork 96
[Converge] Flaky Tests #32
Comments
I think the original flaky tests were fixed in io.js, but there's still one or two that give sporadic failures. The much larger, but not entirely related issue is jenkins messing up certain windows tests. |
Please no no no, I have strong objections to this on a philosophical level, flakey tests should be fixed rather than accepted, that's why the whole thing was removed from io.js. I haven't checked but has it been put back in in this repo? We have ongoing problems with the io.js CI because of Windows+Jenkins crap that we haven't resolved but as far as I'm concern this just means we need to work on it and figure out a fix, not accept it and move on. |
I have a pretty strong opinion about this as well. Hiding them won't really solve anything. |
I think there might have been a bit of miscommunication around the topic of ‘flaky tests’. The point is not to to accept nor to ignore that tests are failing. The point is to enable an automated merge job, like node-accept-pull-request. (in fact, I think it would be best to rename the issue title to something more along those lines). node-accept-pull-requests automatically merges changes, after rebasing them on top of the target branch and passing tests. We have found that useful because it pretty much eliminates the chances of anyone introducing regressions in the test set that it runs on. Using this merge job has ensured that a) nobody can forget to run tests on all platforms; b) nobody can make a mistake when pushing directly to the repo; c) changes are tested on the same final rebase that will be pushed to origin, so it’s not possible for change B to be pushed between the time change A was tested and the time it is rebased/pushed. We could argue whether this process is useful and whether there are good cases for not following it. But in the context of an automated merge job, support for flaky tests is must. Perhaps it’s the terminology ‘flaky tests’ that stirs confusion. You may think of it as mere configuration for the node-accept-pull-request, to tell it which tests should not prevent an automated merge. Such configuration is versioned with the code, as it should be. But it’s not used for anything else other than the automated merge job. Even in the automated merge job, failures for the tests that are marked as flaky are not completely hidden. Builds with failures in flaky tests are marked as ‘yellow’, instead of ‘blue’, and the TAP test results show the tests that failed, with a ‘todo’ next to the outcome. When a test is marked as flaky, a high-priority issue should be filed to follow up on the test flakiness, so that this test can be re-enabled again for the automated merge job. In io.js you have been running CI on every PR, which is great. I am pretty sure, you have had cases where -say- John’s PR was tested and some tests failed that you recognized as not due to the John’s PR. Even if you fixed all the tests, this can always happen, as tests can be susceptible to environment or timing issues. When this happened, did you prevent John’s PR to be merged? Of course not. You made the determination that the failure was independent from John’s changes. Flaky test configuration allows node-accept-pull-request to do the same in an automated way. Having this supported in an automated way allows for the process to scale. We can add more tests and more platform to verify every change that gets pushed to the main repo, and keep it manageable. Otherwise, we'll end up scribbling the same list of tests on a piece of paper. I think we should enhance this functionality, not remove it. It should be noted that this support in our test runner is based on support that was there in an ancient version of the v8 test runner. Since then, the v8 test runner has greatly improved support for this type of configuration. |
My suggestion would be to implement a node-accept-pull-request -like job in the converged CI system, without making mandatory, but giving people a chance to try it and show its usefulness. Only if people find it useful, we can adopt it in a more systematic way. I think that should be pretty uncontroversial. But there's no point in disabling support for flaky tests in the test runner. It does nothing different unless you pass --flaky-tests=dontcare as an option. |
We just had a quick talk about this in the build meeting; I think the thing "between us" in terms of mentality is the merge-pr job -- since the amount of flaky tests we're experience is probably around 10 we should see what can be done about them, but to avoid holding things up we should indeed support a flaky flag and make sure we have a high priority on resolving them once flagged as flaky. |
Perhaps this as a compromise: incorporate the flaky tests for day to day
|
Who is on the hook for flaky tests? If the answer is 'no one in particular', there is no real imperative for anyone to go in and fix them. I'd love to see automated merges a la Rust, by the way. But with rebase and fast-forward, of course, anything else is blasphemy. |
@orangemocha perhaps you could elaborate on what we ended yesterdays call with in regards to how you're procedure currently work? |
The current procedure is to file a P-1 issue and assign it to someone, at the time that we mark a test a flaky. We should probably also set a milestone goal. Not sure if we can do much better than that. If by the time the milestone date hits, the flakiness has not been resolved we can decide to hold up the milestone for it, or to postpone the issue. This is to say that unless a test is invalid, it's better to keep it around as flaky than to delete it. As I mentioned, flaky tests are run and their failures are shown (in yellow instead of red). Deleting a test really hides the failure.
@bnoordhuis : yes, that's how node-accept-pull-request does it by default. It also has a merge option, which was intended for branch merges. |
Given that this is largely a build/CI issue, I'm inclined to defer this to the @nodejs/build WG for resolution. |
agreed, we'll be discussing this at our next meeting and possibly beyond, current plan is to wait to experiment with the jenkins merge job and understand the value of flakey test marking and then attempt to move towards a compromise situation that satisfies everyone's concerns |
@rvagg @misterdjules ... do we need to keep this issue open? |
@orangemocha and @rvagg know more about whether the flaky tests mechanism will be kept in the converged CI platform, so deferring to them. |
Yes, I believe current status is that we've capitulated and agreed that flaky tests will need to be in converged repo in order to make the node-accept-pull-request jenkins job to work satisfactorily, and in fact we may even want it to come back in to io.js for the same reason. Will let @orangemocha update on progress there. |
Update on progress:
In terms of process for marking tests as flaky, I think the current process that we have used in joyent\node is a reasonable starting point: we open a P-1 issue to investigate the flakiness and assign it to the next milestone, cc'ing the project collaborators. I will document this in the wiki by the time we announce the availability of node-accept-pull-request. If we find that this process is not sufficient we might want to consider assigning to specific collaborators, either the test/area owner or -if that's not available- any collaborator in a round-robin fashion. With regard to this issue, I think the original misalignment that it was meant to track has been resolved, so it should be fine to close it. 😄 |
Excellent! |
Flaky tests were removed in io.js. They've still been useful in node.js's CI. Need a final decision on this.
e8b21095670543ad6ee8e24e0a1b096b908e20b1 test: update flaky test definitions
- nodejs/node-v0.x-archive@e8b2109@misterdjules @orangemocha @nodejs/tsc
The text was updated successfully, but these errors were encountered: