Try reporting flaky tests to issues #34432

kevin940726 · 2021-09-01T04:06:27Z

Description

Why

As an exploration of #33809, this PR is an experiment of using GitHub issues as our flaky tests dashboard. It has been discussed before in #31682. The goal is to auto-retry flaky e2e tests so that they don't cause confusion to contributors and block their progress. But just retrying is not enough, the flaky failure would be hidden by the retrying system and could potentially be a bug. Instead, we report them to a GitHub issue with the detail of the failure, so that we can monitor them in the long term.

What does it look like

Here's an example issue I created in my fork.

There are the title of the test, the path of the test, the full error logs of the test, and the estimated flaky rate.

The issue is labeled by flaky-test so that we can filter them in the issues tab. Each log has a link pointing to the original failing action run.

How does it work

The brief overview of the flow is as follow:

We use jest.retryTimes(2) to at most retry the tests 2 times (plus the original run is 3 times) before it pass.
We record each failure and mark the test as flaky if it pass after retrying using Jest's reporter API. Then, we store each flaky test result to a file and upload them to the GitHub artifact.
We trigger the flaky-tests workflow and run the report-flaky-tests action.
Inside report-flaky-tests, we use GitHub API to download the stored artifact, run some pre-formatting, and post the aggregated result to an issue.
The issue is identified by the test title, if there's already an existing issue with the same title, it will try to update the issue instead of creating a new one.

Step 4 needs the repo token to perform some of the GitHub API, due to security reason, we can't expose the token in our e2e workflow in step 2, hence we need the individual workflow and the artifacts. This is documented in a blog post by GitHub.

The new workflow is depending on the workflow_run event, which will only work if the workflow file is in the default branch (trunk), so this PR won't work until it's merged.

The flaky rate is calculated by counting the failing times and the total amount of commits since the first recorded commit. It's only recorded in trunk, and it doesn't count the manual reruns, so it's just an estimated value.

Types of changes

New feature

Checklist:

My code is tested.
My code follows the WordPress code style.
My code follows the accessibility standards.
I've tested my changes with keyboard and screen readers.
My code has proper inline documentation.
I've included developer documentation if appropriate.
I've updated all React Native files affected by any refactorings/renamings in this PR (please manually search all *.native.js files for terms that need renaming or removal).

github-actions · 2021-09-01T04:30:08Z

Size Change: 0 B

Total Size: 1.04 MB

ℹ️ View Unchanged

Filename	Size
`build/a11y/index.min.js`	931 B
`build/admin-manifest/index.min.js`	1.09 kB
`build/annotations/index.min.js`	2.7 kB
`build/api-fetch/index.min.js`	2.19 kB
`build/autop/index.min.js`	2.08 kB
`build/blob/index.min.js`	459 B
`build/block-directory/index.min.js`	6.2 kB
`build/block-directory/style-rtl.css`	1.01 kB
`build/block-directory/style.css`	1.01 kB
`build/block-editor/default-editor-styles-rtl.css`	378 B
`build/block-editor/default-editor-styles.css`	378 B
`build/block-editor/index.min.js`	120 kB
`build/block-editor/style-rtl.css`	13.8 kB
`build/block-editor/style.css`	13.8 kB
`build/block-library/blocks/archives/editor-rtl.css`	61 B
`build/block-library/blocks/archives/editor.css`	60 B
`build/block-library/blocks/archives/style-rtl.css`	65 B
`build/block-library/blocks/archives/style.css`	65 B
`build/block-library/blocks/audio/editor-rtl.css`	58 B
`build/block-library/blocks/audio/editor.css`	58 B
`build/block-library/blocks/audio/style-rtl.css`	111 B
`build/block-library/blocks/audio/style.css`	111 B
`build/block-library/blocks/audio/theme-rtl.css`	125 B
`build/block-library/blocks/audio/theme.css`	125 B
`build/block-library/blocks/block/editor-rtl.css`	161 B
`build/block-library/blocks/block/editor.css`	161 B
`build/block-library/blocks/button/editor-rtl.css`	474 B
`build/block-library/blocks/button/editor.css`	474 B
`build/block-library/blocks/button/style-rtl.css`	600 B
`build/block-library/blocks/button/style.css`	600 B
`build/block-library/blocks/buttons/editor-rtl.css`	315 B
`build/block-library/blocks/buttons/editor.css`	315 B
`build/block-library/blocks/buttons/style-rtl.css`	370 B
`build/block-library/blocks/buttons/style.css`	370 B
`build/block-library/blocks/calendar/style-rtl.css`	207 B
`build/block-library/blocks/calendar/style.css`	207 B
`build/block-library/blocks/categories/editor-rtl.css`	84 B
`build/block-library/blocks/categories/editor.css`	83 B
`build/block-library/blocks/categories/style-rtl.css`	79 B
`build/block-library/blocks/categories/style.css`	79 B
`build/block-library/blocks/code/style-rtl.css`	90 B
`build/block-library/blocks/code/style.css`	90 B
`build/block-library/blocks/code/theme-rtl.css`	131 B
`build/block-library/blocks/code/theme.css`	131 B
`build/block-library/blocks/columns/editor-rtl.css`	206 B
`build/block-library/blocks/columns/editor.css`	205 B
`build/block-library/blocks/columns/style-rtl.css`	497 B
`build/block-library/blocks/columns/style.css`	496 B
`build/block-library/blocks/cover/editor-rtl.css`	666 B
`build/block-library/blocks/cover/editor.css`	670 B
`build/block-library/blocks/cover/style-rtl.css`	1.23 kB
`build/block-library/blocks/cover/style.css`	1.23 kB
`build/block-library/blocks/embed/editor-rtl.css`	488 B
`build/block-library/blocks/embed/editor.css`	488 B
`build/block-library/blocks/embed/style-rtl.css`	417 B
`build/block-library/blocks/embed/style.css`	417 B
`build/block-library/blocks/embed/theme-rtl.css`	124 B
`build/block-library/blocks/embed/theme.css`	124 B
`build/block-library/blocks/file/editor-rtl.css`	300 B
`build/block-library/blocks/file/editor.css`	300 B
`build/block-library/blocks/file/style-rtl.css`	255 B
`build/block-library/blocks/file/style.css`	255 B
`build/block-library/blocks/file/view.min.js`	322 B
`build/block-library/blocks/freeform/editor-rtl.css`	2.44 kB
`build/block-library/blocks/freeform/editor.css`	2.44 kB
`build/block-library/blocks/gallery/editor-rtl.css`	927 B
`build/block-library/blocks/gallery/editor.css`	934 B
`build/block-library/blocks/gallery/style-rtl.css`	1.6 kB
`build/block-library/blocks/gallery/style.css`	1.59 kB
`build/block-library/blocks/gallery/theme-rtl.css`	122 B
`build/block-library/blocks/gallery/theme.css`	122 B
`build/block-library/blocks/group/editor-rtl.css`	159 B
`build/block-library/blocks/group/editor.css`	159 B
`build/block-library/blocks/group/style-rtl.css`	57 B
`build/block-library/blocks/group/style.css`	57 B
`build/block-library/blocks/group/theme-rtl.css`	70 B
`build/block-library/blocks/group/theme.css`	70 B
`build/block-library/blocks/heading/style-rtl.css`	114 B
`build/block-library/blocks/heading/style.css`	114 B
`build/block-library/blocks/home-link/style-rtl.css`	247 B
`build/block-library/blocks/home-link/style.css`	247 B
`build/block-library/blocks/html/editor-rtl.css`	283 B
`build/block-library/blocks/html/editor.css`	284 B
`build/block-library/blocks/image/editor-rtl.css`	728 B
`build/block-library/blocks/image/editor.css`	728 B
`build/block-library/blocks/image/style-rtl.css`	482 B
`build/block-library/blocks/image/style.css`	487 B
`build/block-library/blocks/image/theme-rtl.css`	124 B
`build/block-library/blocks/image/theme.css`	124 B
`build/block-library/blocks/latest-comments/style-rtl.css`	284 B
`build/block-library/blocks/latest-comments/style.css`	284 B
`build/block-library/blocks/latest-posts/editor-rtl.css`	137 B
`build/block-library/blocks/latest-posts/editor.css`	137 B
`build/block-library/blocks/latest-posts/style-rtl.css`	528 B
`build/block-library/blocks/latest-posts/style.css`	527 B
`build/block-library/blocks/list/style-rtl.css`	94 B
`build/block-library/blocks/list/style.css`	94 B
`build/block-library/blocks/media-text/editor-rtl.css`	266 B
`build/block-library/blocks/media-text/editor.css`	263 B
`build/block-library/blocks/media-text/style-rtl.css`	488 B
`build/block-library/blocks/media-text/style.css`	485 B
`build/block-library/blocks/more/editor-rtl.css`	431 B
`build/block-library/blocks/more/editor.css`	431 B
`build/block-library/blocks/navigation-link/editor-rtl.css`	489 B
`build/block-library/blocks/navigation-link/editor.css`	488 B
`build/block-library/blocks/navigation-link/style-rtl.css`	94 B
`build/block-library/blocks/navigation-link/style.css`	94 B
`build/block-library/blocks/navigation/editor-rtl.css`	1.72 kB
`build/block-library/blocks/navigation/editor.css`	1.72 kB
`build/block-library/blocks/navigation/style-rtl.css`	1.42 kB
`build/block-library/blocks/navigation/style.css`	1.41 kB
`build/block-library/blocks/navigation/view.min.js`	2.52 kB
`build/block-library/blocks/nextpage/editor-rtl.css`	395 B
`build/block-library/blocks/nextpage/editor.css`	395 B
`build/block-library/blocks/page-list/editor-rtl.css`	310 B
`build/block-library/blocks/page-list/editor.css`	310 B
`build/block-library/blocks/page-list/style-rtl.css`	241 B
`build/block-library/blocks/page-list/style.css`	241 B
`build/block-library/blocks/paragraph/editor-rtl.css`	157 B
`build/block-library/blocks/paragraph/editor.css`	157 B
`build/block-library/blocks/paragraph/style-rtl.css`	261 B
`build/block-library/blocks/paragraph/style.css`	261 B
`build/block-library/blocks/post-author/editor-rtl.css`	210 B
`build/block-library/blocks/post-author/editor.css`	210 B
`build/block-library/blocks/post-author/style-rtl.css`	182 B
`build/block-library/blocks/post-author/style.css`	181 B
`build/block-library/blocks/post-comments-form/style-rtl.css`	140 B
`build/block-library/blocks/post-comments-form/style.css`	140 B
`build/block-library/blocks/post-comments/style-rtl.css`	360 B
`build/block-library/blocks/post-comments/style.css`	359 B
`build/block-library/blocks/post-content/editor-rtl.css`	138 B
`build/block-library/blocks/post-content/editor.css`	138 B
`build/block-library/blocks/post-excerpt/editor-rtl.css`	73 B
`build/block-library/blocks/post-excerpt/editor.css`	73 B
`build/block-library/blocks/post-excerpt/style-rtl.css`	69 B
`build/block-library/blocks/post-excerpt/style.css`	69 B
`build/block-library/blocks/post-featured-image/editor-rtl.css`	398 B
`build/block-library/blocks/post-featured-image/editor.css`	398 B
`build/block-library/blocks/post-featured-image/style-rtl.css`	143 B
`build/block-library/blocks/post-featured-image/style.css`	143 B
`build/block-library/blocks/post-template/editor-rtl.css`	99 B
`build/block-library/blocks/post-template/editor.css`	98 B
`build/block-library/blocks/post-template/style-rtl.css`	378 B
`build/block-library/blocks/post-template/style.css`	379 B
`build/block-library/blocks/post-terms/style-rtl.css`	73 B
`build/block-library/blocks/post-terms/style.css`	73 B
`build/block-library/blocks/post-title/style-rtl.css`	60 B
`build/block-library/blocks/post-title/style.css`	60 B
`build/block-library/blocks/preformatted/style-rtl.css`	103 B
`build/block-library/blocks/preformatted/style.css`	103 B
`build/block-library/blocks/pullquote/editor-rtl.css`	198 B
`build/block-library/blocks/pullquote/editor.css`	198 B
`build/block-library/blocks/pullquote/style-rtl.css`	378 B
`build/block-library/blocks/pullquote/style.css`	378 B
`build/block-library/blocks/pullquote/theme-rtl.css`	167 B
`build/block-library/blocks/pullquote/theme.css`	167 B
`build/block-library/blocks/query-pagination-numbers/editor-rtl.css`	122 B
`build/block-library/blocks/query-pagination-numbers/editor.css`	121 B
`build/block-library/blocks/query-pagination/editor-rtl.css`	270 B
`build/block-library/blocks/query-pagination/editor.css`	262 B
`build/block-library/blocks/query-pagination/style-rtl.css`	239 B
`build/block-library/blocks/query-pagination/style.css`	236 B
`build/block-library/blocks/query-title/editor-rtl.css`	85 B
`build/block-library/blocks/query-title/editor.css`	85 B
`build/block-library/blocks/query/editor-rtl.css`	131 B
`build/block-library/blocks/query/editor.css`	132 B
`build/block-library/blocks/quote/style-rtl.css`	187 B
`build/block-library/blocks/quote/style.css`	187 B
`build/block-library/blocks/quote/theme-rtl.css`	220 B
`build/block-library/blocks/quote/theme.css`	222 B
`build/block-library/blocks/rss/editor-rtl.css`	202 B
`build/block-library/blocks/rss/editor.css`	204 B
`build/block-library/blocks/rss/style-rtl.css`	289 B
`build/block-library/blocks/rss/style.css`	288 B
`build/block-library/blocks/search/editor-rtl.css`	165 B
`build/block-library/blocks/search/editor.css`	165 B
`build/block-library/blocks/search/style-rtl.css`	374 B
`build/block-library/blocks/search/style.css`	375 B
`build/block-library/blocks/search/theme-rtl.css`	64 B
`build/block-library/blocks/search/theme.css`	64 B
`build/block-library/blocks/separator/editor-rtl.css`	99 B
`build/block-library/blocks/separator/editor.css`	99 B
`build/block-library/blocks/separator/style-rtl.css`	250 B
`build/block-library/blocks/separator/style.css`	250 B
`build/block-library/blocks/separator/theme-rtl.css`	172 B
`build/block-library/blocks/separator/theme.css`	172 B
`build/block-library/blocks/shortcode/editor-rtl.css`	474 B
`build/block-library/blocks/shortcode/editor.css`	474 B
`build/block-library/blocks/site-logo/editor-rtl.css`	462 B
`build/block-library/blocks/site-logo/editor.css`	464 B
`build/block-library/blocks/site-logo/style-rtl.css`	153 B
`build/block-library/blocks/site-logo/style.css`	153 B
`build/block-library/blocks/site-tagline/editor-rtl.css`	86 B
`build/block-library/blocks/site-tagline/editor.css`	86 B
`build/block-library/blocks/site-title/editor-rtl.css`	84 B
`build/block-library/blocks/site-title/editor.css`	84 B
`build/block-library/blocks/social-link/editor-rtl.css`	165 B
`build/block-library/blocks/social-link/editor.css`	165 B
`build/block-library/blocks/social-links/editor-rtl.css`	812 B
`build/block-library/blocks/social-links/editor.css`	811 B
`build/block-library/blocks/social-links/style-rtl.css`	1.3 kB
`build/block-library/blocks/social-links/style.css`	1.3 kB
`build/block-library/blocks/spacer/editor-rtl.css`	307 B
`build/block-library/blocks/spacer/editor.css`	307 B
`build/block-library/blocks/spacer/style-rtl.css`	48 B
`build/block-library/blocks/spacer/style.css`	48 B
`build/block-library/blocks/table/editor-rtl.css`	471 B
`build/block-library/blocks/table/editor.css`	472 B
`build/block-library/blocks/table/style-rtl.css`	481 B
`build/block-library/blocks/table/style.css`	481 B
`build/block-library/blocks/table/theme-rtl.css`	188 B
`build/block-library/blocks/table/theme.css`	188 B
`build/block-library/blocks/tag-cloud/style-rtl.css`	146 B
`build/block-library/blocks/tag-cloud/style.css`	146 B
`build/block-library/blocks/template-part/editor-rtl.css`	636 B
`build/block-library/blocks/template-part/editor.css`	635 B
`build/block-library/blocks/template-part/theme-rtl.css`	101 B
`build/block-library/blocks/template-part/theme.css`	101 B
`build/block-library/blocks/term-description/editor-rtl.css`	90 B
`build/block-library/blocks/term-description/editor.css`	90 B
`build/block-library/blocks/text-columns/editor-rtl.css`	95 B
`build/block-library/blocks/text-columns/editor.css`	95 B
`build/block-library/blocks/text-columns/style-rtl.css`	166 B
`build/block-library/blocks/text-columns/style.css`	166 B
`build/block-library/blocks/verse/style-rtl.css`	87 B
`build/block-library/blocks/verse/style.css`	87 B
`build/block-library/blocks/video/editor-rtl.css`	571 B
`build/block-library/blocks/video/editor.css`	572 B
`build/block-library/blocks/video/style-rtl.css`	173 B
`build/block-library/blocks/video/style.css`	173 B
`build/block-library/blocks/video/theme-rtl.css`	124 B
`build/block-library/blocks/video/theme.css`	124 B
`build/block-library/common-rtl.css`	853 B
`build/block-library/common.css`	849 B
`build/block-library/editor-rtl.css`	9.54 kB
`build/block-library/editor.css`	9.52 kB
`build/block-library/index.min.js`	151 kB
`build/block-library/reset-rtl.css`	527 B
`build/block-library/reset.css`	527 B
`build/block-library/style-rtl.css`	10.2 kB
`build/block-library/style.css`	10.2 kB
`build/block-library/theme-rtl.css`	658 B
`build/block-library/theme.css`	663 B
`build/block-serialization-default-parser/index.min.js`	1.09 kB
`build/block-serialization-spec-parser/index.min.js`	2.79 kB
`build/blocks/index.min.js`	46.9 kB
`build/components/index.min.js`	209 kB
`build/components/style-rtl.css`	15.8 kB
`build/components/style.css`	15.8 kB
`build/compose/index.min.js`	10.2 kB
`build/core-data/index.min.js`	12.3 kB
`build/customize-widgets/index.min.js`	11.1 kB
`build/customize-widgets/style-rtl.css`	1.5 kB
`build/customize-widgets/style.css`	1.49 kB
`build/data-controls/index.min.js`	614 B
`build/data/index.min.js`	7.1 kB
`build/date/index.min.js`	31.5 kB
`build/deprecated/index.min.js`	428 B
`build/dom-ready/index.min.js`	304 B
`build/dom/index.min.js`	4.53 kB
`build/edit-navigation/index.min.js`	14.9 kB
`build/edit-navigation/style-rtl.css`	3.37 kB
`build/edit-navigation/style.css`	3.37 kB
`build/edit-post/classic-rtl.css`	492 B
`build/edit-post/classic.css`	494 B
`build/edit-post/index.min.js`	29 kB
`build/edit-post/style-rtl.css`	7.2 kB
`build/edit-post/style.css`	7.2 kB
`build/edit-site/index.min.js`	26.4 kB
`build/edit-site/style-rtl.css`	5.07 kB
`build/edit-site/style.css`	5.07 kB
`build/edit-widgets/index.min.js`	16.1 kB
`build/edit-widgets/style-rtl.css`	4.06 kB
`build/edit-widgets/style.css`	4.06 kB
`build/editor/index.min.js`	37.7 kB
`build/editor/style-rtl.css`	3.74 kB
`build/editor/style.css`	3.73 kB
`build/element/index.min.js`	3.17 kB
`build/escape-html/index.min.js`	517 B
`build/format-library/index.min.js`	5.36 kB
`build/format-library/style-rtl.css`	668 B
`build/format-library/style.css`	669 B
`build/hooks/index.min.js`	1.55 kB
`build/html-entities/index.min.js`	424 B
`build/i18n/index.min.js`	3.6 kB
`build/is-shallow-equal/index.min.js`	501 B
`build/keyboard-shortcuts/index.min.js`	1.49 kB
`build/keycodes/index.min.js`	1.25 kB
`build/list-reusable-blocks/index.min.js`	1.85 kB
`build/list-reusable-blocks/style-rtl.css`	838 B
`build/list-reusable-blocks/style.css`	838 B
`build/media-utils/index.min.js`	2.88 kB
`build/notices/index.min.js`	845 B
`build/nux/index.min.js`	2.03 kB
`build/nux/style-rtl.css`	747 B
`build/nux/style.css`	743 B
`build/plugins/index.min.js`	1.83 kB
`build/primitives/index.min.js`	921 B
`build/priority-queue/index.min.js`	582 B
`build/react-i18n/index.min.js`	671 B
`build/redux-routine/index.min.js`	2.63 kB
`build/reusable-blocks/index.min.js`	2.28 kB
`build/reusable-blocks/style-rtl.css`	256 B
`build/reusable-blocks/style.css`	256 B
`build/rich-text/index.min.js`	10.6 kB
`build/server-side-render/index.min.js`	1.32 kB
`build/shortcode/index.min.js`	1.48 kB
`build/token-list/index.min.js`	562 B
`build/url/index.min.js`	1.74 kB
`build/viewport/index.min.js`	1.02 kB
`build/warning/index.min.js`	248 B
`build/widgets/index.min.js`	7.27 kB
`build/widgets/style-rtl.css`	1.17 kB
`build/widgets/style.css`	1.18 kB
`build/wordcount/index.min.js`	1.04 kB

_{compressed-size-action}

tellthemachines

Thanks for working on this! I mostly left questions below 😄

Is there any way we can test this, or shall we just merge it and test live?

packages/e2e-tests/config/setup-test-framework.js

.github/report-flaky-tests/index.js

.github/workflows/end2end-test.yml

packages/e2e-tests/config/flaky-tests-reporter.js

kevin940726 · 2021-09-01T05:54:35Z

I think the only way we can test it for now without merging this is to create your own fork and merge this PR onto your fork's trunk. Might also want to create a intentionally flaky test and skip all the other tests for faster feedbacks.

noisysocks

This is freaking cool! 🎉

Should we only retry tests and report flaky tests when CI runs on trunk? Two reasons:

It aligns more closely with the discussion in Retry flaky e2e tests at most 2 times #31682 where the consensus seems to be that we should build a report of flaky tests before we automatically retry tests on PRs.
It lets us incrementally ramp up this new bit of infrastructure. A staged rollout, if you will. This would reduce the likelihood that e.g. the job accidentally spams the repo with GitHub issues after it is deployed.

packages/e2e-tests/jest.config.js

packages/e2e-tests/config/setup-test-framework.js

.gitignore

.github/workflows/flaky-tests.yml

.github/report-flaky-tests/index.js

noisysocks · 2021-09-01T05:51:00Z

.github/report-flaky-tests/index.js

+
+			body =
+				renderIssueDescription( { meta, testTitle, testPath } ) +
+				body.slice(


If you wanted to really lean into the "HTML comments are obviously the best place to store data why are you looking at me like that" way of life then you could use wp.blocks.serialize() and wp.blocks.parse() instead of string concatenation 😂

.github/report-flaky-tests/index.js

adamziel · 2021-09-01T10:40:32Z

This is such a good idea, thank you for working on that! ❤️

talldan · 2021-09-01T11:04:33Z

Amazing work here 👏

Should we only retry tests and report flaky tests when CI runs on trunk?

I did also have a concern that because PRs are a work in progress, a failing test could be in a PR, but the test could be fixed in the PR and never merged to trunk. An issue wouldn't make sense for that. That said, it would probably be a rare occurrence with the way this uses retries to detect flakiness.

On a different note, there's also an alternative model to this proposal, instead any test failure in trunk could be reported in an issue and wouldn't need to be retried, because by definition any failure in trunk is undesired.

The retries could still happen in PRs, but trunk is instead used as the source of truth.

kevin940726 · 2021-09-02T05:16:45Z

On a different note, there's also an alternative model to this proposal, instead any test failure in trunk could be reported in an issue and wouldn't need to be retried, because by definition any failure in trunk is undesired.

The retries could still happen in PRs, but trunk is instead used as the source of truth.

This is also a cool idea, but I'm afraid if there's actually a bug in trunk then it will be catastrophic 😅 .

For instance, if some commit in WordPress core causes all e2e tests in gutenberg fail out of a sudden, the action will then create an issue for each of them. I think auto-retrying is a safer way to determine if a test is flaky or not. At least until we can make sure our trunk is stable and predictable enough 😂.

gziolo · 2021-09-07T15:17:42Z

packages/e2e-tests/config/flaky-tests-reporter.js

+ * - If a test fails, it will automatically re-run at most 2 times.
+ * - If it pass after retrying (below 2 times), then it's marked as **flaky**
+ *   but displayed as **passed** in the original test suite.
+ * - If it fail all 3 times, then it's a **failed** test.


The interesting part in this proposal is that when the test fails but then passes with re-tries we open an issue but when the test fails all the time we do nothing. I would worry more about the latter – clearly a regression 😄

If the test fails all the time it's not considered flaky, it's just doing its job 😄

Tests that fail consistently on CI usually get noticed and fixed pretty quickly, but tests that fail 1 out of 10 times less so.

@tellthemachines, all correct and we are on the same page here. I'm just saying that it could be also beneficial to open automatically a bur report when we detect a regression to make it easier to coordinate the flow for fixing it. In case of regression, it could also leave a message on WordPress Slack like it happens in #core channel when CI job on trunk fails.

There's already the "check" itself on each commit reporting the failure of the tests. Yep, we could add some notifications or webhook if we desire, but I don't think we need to open an "issue" for them. If the tests suddenly start to fail on trunk then I think we can assume that something went wrong in our pipeline. Excluding flaky tests, the failures should have something to do with third-party libraries (core or plugins update for example). In that case, we should immediately start to fix them, opening an issue doesn't seem to help much other than creating additional noice 🤔.

And after all it's not in the scope of this PR 😅. Adding a webhook to notify failing tests on Slack is also pretty trivial with the help of the official webhook API of both GitHub and Slack.

nerrad · 2021-09-07T15:46:52Z

Generally I really love this idea and think we should ship it and see how it works in practice on the repo. We can revert if it introduces too much noise initially. One thing that could be tried is posting comments on a single issue as part of the initial iteration to help with fine tuning frequency and content but I appreciate that might result in a lot of extra code.

kevin940726 · 2021-09-08T06:43:53Z

We discussed this in our weekly core-js meeting. I believe we're in agreement with giving this a try.

There are still some minor issues we need to figure out before merging this though.

Should we run this on all PRs and trunk or only on trunk at first?
What should be the name of the flaky test label?
As proposed in Try reporting flaky tests to issues #34432 (comment), should we only post comments in one issue instead?

And here are my thoughts:

Let's only run this on trunk at first and iterate to all PRs when ready.
I'm thinking [Type] Flaky Test.
That would require a lot of changes in code but should be possible. I don't think it's worth it though. As mentioned in the original discussion in the meeting, I believe all the flaky tests have values and worth having their own corresponding issues. We can always revert this PR anytime, or several times, until we fix the noise issues (if there's any).

nerrad · 2021-09-08T11:42:30Z

If you go the route of creating issues, one tweak that might make this nicer to follow up on is to have the bot leave a reaction on an issue that's already created. That way we can sort based on reactions (using the GitHub issue UI) to determine high flakiness tests (and prioritize which get addressed first).

I do think there needs to be some process for regularly addressing flaky tests surfaced by this new logic (either via removing, or rewriting the test) because by it's very nature a flaky test is unreliable and thus can give a false impression of code coverage that doesn't actually exist. So while I still love the idea, the only concern I have is issues created by this process will pile up and not be addressed.

nerrad · 2021-09-08T12:02:11Z

Should we run this on all PRs and trunk or only on trunk at first?

Having this run only on trunk first seems to remove one of the main benefits of this work, which is to improve the experience for contributors where flaky tests contribute to unexpected failures in PRS where changes are unrelated to the test.

kevin940726 · 2021-09-08T17:48:32Z

one tweak that might make this nicer to follow up on is to have the bot leave a reaction on an issue that's already created. That way we can sort based on reactions (using the GitHub issue UI) to determine high flakiness tests (and prioritize which get addressed first).

I've thought about this, but immediately dropped it. I don't think there's an API to add multiple same reactions to an issue from the same (bot) user. I initially wanted to append new errors as comments to the issue and lock the issue, but I guess that makes the issue even less visible.

the only concern I have is issues created by this process will pile up and not be addressed.

We already have this problem now, I don't think it'd be a new thing. Just that flaky tests are more visible now, but are also blocking contributors at the same time. The way we usually temporary fix this if we haven't come up with a proper solution for the flaky test is to simply skip it. It's the last resort and is very invisible to us (a less invisible way to overcome this is to... create an issue, which is exactly what this PR does).

Having this run only on trunk first seems to remove one of the main benefits of this work, which is to improve the experience for contributors where flaky tests contribute to unexpected failures in PRS where changes are unrelated to the test.

Yep, agree. The end goal is always to run on all PRs. Running it on trunk is just a trial to see if it works before rolling out to all the contributors. We could also argue that it's not necessary and we can always just revert this PR though. Hence I'm open to all suggestions.

noisysocks · 2021-09-09T02:14:10Z

And here are my thoughts:

I concur with all your thoughts here—let's do it! 🙂

noisysocks

Let's try it! 👍

kevin940726 added the [Type] Automated Testing Testing infrastructure changes impacting the execution of end-to-end (E2E) and/or unit tests. label Sep 1, 2021

tellthemachines reviewed Sep 1, 2021

View reviewed changes

noisysocks reviewed Sep 1, 2021

View reviewed changes

kevin940726 force-pushed the try/flaky-tests-issue-report branch from 27a38d8 to f11f308 Compare September 1, 2021 08:34

kevin940726 mentioned this pull request Sep 1, 2021

Retry flaky e2e tests at most 2 times #31682

Closed

7 tasks

kevin940726 force-pushed the try/flaky-tests-issue-report branch from f11f308 to fb2ecb2 Compare September 2, 2021 06:08

kevin940726 mentioned this pull request Sep 2, 2021

Ideas for improving E2E test developer experience #33532

Closed

10 tasks

gziolo reviewed Sep 7, 2021

View reviewed changes

kevin940726 added 4 commits September 9, 2021 17:15

Try reporting flaky tests to issues

beb6a78

Update label name

504caf9

Add license note of strip-ansi

f306911

Add some comments

e5499d3

kevin940726 force-pushed the try/flaky-tests-issue-report branch from fb2ecb2 to e5499d3 Compare September 9, 2021 09:15

kevin940726 marked this pull request as ready for review September 9, 2021 09:16

kevin940726 requested review from ajitbohra, nerrad and ntwb as code owners September 9, 2021 09:16

noisysocks approved these changes Sep 13, 2021

View reviewed changes

kevin940726 merged commit 23b75cc into trunk Sep 13, 2021

kevin940726 deleted the try/flaky-tests-issue-report branch September 13, 2021 08:59

github-actions bot added this to the Gutenberg 11.6 milestone Sep 13, 2021

kevin940726 mentioned this pull request Sep 22, 2021

Enable flaky tests reporter bot in PRs #35029

Merged

7 tasks

ellatrix mentioned this pull request Aug 16, 2023

Look at automatically retrying E2E tests #33980

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try reporting flaky tests to issues #34432

Try reporting flaky tests to issues #34432

kevin940726 commented Sep 1, 2021 •

edited

Loading

github-actions bot commented Sep 1, 2021 •

edited

Loading

tellthemachines left a comment

kevin940726 commented Sep 1, 2021

noisysocks left a comment •

edited

Loading

noisysocks Sep 1, 2021

adamziel commented Sep 1, 2021

talldan commented Sep 1, 2021 •

edited

Loading

kevin940726 commented Sep 2, 2021

gziolo Sep 7, 2021 •

edited

Loading

tellthemachines Sep 7, 2021

gziolo Sep 8, 2021 •

edited

Loading

kevin940726 Sep 8, 2021

nerrad commented Sep 7, 2021

kevin940726 commented Sep 8, 2021

nerrad commented Sep 8, 2021 •

edited

Loading

nerrad commented Sep 8, 2021

kevin940726 commented Sep 8, 2021

noisysocks commented Sep 9, 2021

noisysocks left a comment

Try reporting flaky tests to issues #34432

Try reporting flaky tests to issues #34432

Conversation

kevin940726 commented Sep 1, 2021 • edited Loading

Description

Why

What does it look like

How does it work

Types of changes

Checklist:

github-actions bot commented Sep 1, 2021 • edited Loading

tellthemachines left a comment

Choose a reason for hiding this comment

kevin940726 commented Sep 1, 2021

noisysocks left a comment • edited Loading

Choose a reason for hiding this comment

noisysocks Sep 1, 2021

Choose a reason for hiding this comment

adamziel commented Sep 1, 2021

talldan commented Sep 1, 2021 • edited Loading

kevin940726 commented Sep 2, 2021

gziolo Sep 7, 2021 • edited Loading

Choose a reason for hiding this comment

tellthemachines Sep 7, 2021

Choose a reason for hiding this comment

gziolo Sep 8, 2021 • edited Loading

Choose a reason for hiding this comment

kevin940726 Sep 8, 2021

Choose a reason for hiding this comment

nerrad commented Sep 7, 2021

kevin940726 commented Sep 8, 2021

nerrad commented Sep 8, 2021 • edited Loading

nerrad commented Sep 8, 2021

kevin940726 commented Sep 8, 2021

noisysocks commented Sep 9, 2021

noisysocks left a comment

Choose a reason for hiding this comment

kevin940726 commented Sep 1, 2021 •

edited

Loading

github-actions bot commented Sep 1, 2021 •

edited

Loading

noisysocks left a comment •

edited

Loading

talldan commented Sep 1, 2021 •

edited

Loading

gziolo Sep 7, 2021 •

edited

Loading

gziolo Sep 8, 2021 •

edited

Loading

nerrad commented Sep 8, 2021 •

edited

Loading