Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try reporting flaky tests to issues #34432

Merged
merged 4 commits into from
Sep 13, 2021
Merged

Conversation

kevin940726
Copy link
Member

@kevin940726 kevin940726 commented Sep 1, 2021

Description

Why

As an exploration of #33809, this PR is an experiment of using GitHub issues as our flaky tests dashboard. It has been discussed before in #31682. The goal is to auto-retry flaky e2e tests so that they don't cause confusion to contributors and block their progress. But just retrying is not enough, the flaky failure would be hidden by the retrying system and could potentially be a bug. Instead, we report them to a GitHub issue with the detail of the failure, so that we can monitor them in the long term.

What does it look like

Here's an example issue I created in my fork.

There are the title of the test, the path of the test, the full error logs of the test, and the estimated flaky rate.

The issue is labeled by flaky-test so that we can filter them in the issues tab. Each log has a link pointing to the original failing action run.

How does it work

The brief overview of the flow is as follow:

  1. We use jest.retryTimes(2) to at most retry the tests 2 times (plus the original run is 3 times) before it pass.
  2. We record each failure and mark the test as flaky if it pass after retrying using Jest's reporter API. Then, we store each flaky test result to a file and upload them to the GitHub artifact.
  3. We trigger the flaky-tests workflow and run the report-flaky-tests action.
  4. Inside report-flaky-tests, we use GitHub API to download the stored artifact, run some pre-formatting, and post the aggregated result to an issue.
  5. The issue is identified by the test title, if there's already an existing issue with the same title, it will try to update the issue instead of creating a new one.

Step 4 needs the repo token to perform some of the GitHub API, due to security reason, we can't expose the token in our e2e workflow in step 2, hence we need the individual workflow and the artifacts. This is documented in a blog post by GitHub.

The new workflow is depending on the workflow_run event, which will only work if the workflow file is in the default branch (trunk), so this PR won't work until it's merged.

The flaky rate is calculated by counting the failing times and the total amount of commits since the first recorded commit. It's only recorded in trunk, and it doesn't count the manual reruns, so it's just an estimated value.

Types of changes

New feature

Checklist:

  • My code is tested.
  • My code follows the WordPress code style.
  • My code follows the accessibility standards.
  • I've tested my changes with keyboard and screen readers.
  • My code has proper inline documentation.
  • I've included developer documentation if appropriate.
  • I've updated all React Native files affected by any refactorings/renamings in this PR (please manually search all *.native.js files for terms that need renaming or removal).

@kevin940726 kevin940726 added the [Type] Automated Testing Testing infrastructure changes impacting the execution of end-to-end (E2E) and/or unit tests. label Sep 1, 2021
@github-actions
Copy link

github-actions bot commented Sep 1, 2021

Size Change: 0 B

Total Size: 1.04 MB

ℹ️ View Unchanged
Filename Size
build/a11y/index.min.js 931 B
build/admin-manifest/index.min.js 1.09 kB
build/annotations/index.min.js 2.7 kB
build/api-fetch/index.min.js 2.19 kB
build/autop/index.min.js 2.08 kB
build/blob/index.min.js 459 B
build/block-directory/index.min.js 6.2 kB
build/block-directory/style-rtl.css 1.01 kB
build/block-directory/style.css 1.01 kB
build/block-editor/default-editor-styles-rtl.css 378 B
build/block-editor/default-editor-styles.css 378 B
build/block-editor/index.min.js 120 kB
build/block-editor/style-rtl.css 13.8 kB
build/block-editor/style.css 13.8 kB
build/block-library/blocks/archives/editor-rtl.css 61 B
build/block-library/blocks/archives/editor.css 60 B
build/block-library/blocks/archives/style-rtl.css 65 B
build/block-library/blocks/archives/style.css 65 B
build/block-library/blocks/audio/editor-rtl.css 58 B
build/block-library/blocks/audio/editor.css 58 B
build/block-library/blocks/audio/style-rtl.css 111 B
build/block-library/blocks/audio/style.css 111 B
build/block-library/blocks/audio/theme-rtl.css 125 B
build/block-library/blocks/audio/theme.css 125 B
build/block-library/blocks/block/editor-rtl.css 161 B
build/block-library/blocks/block/editor.css 161 B
build/block-library/blocks/button/editor-rtl.css 474 B
build/block-library/blocks/button/editor.css 474 B
build/block-library/blocks/button/style-rtl.css 600 B
build/block-library/blocks/button/style.css 600 B
build/block-library/blocks/buttons/editor-rtl.css 315 B
build/block-library/blocks/buttons/editor.css 315 B
build/block-library/blocks/buttons/style-rtl.css 370 B
build/block-library/blocks/buttons/style.css 370 B
build/block-library/blocks/calendar/style-rtl.css 207 B
build/block-library/blocks/calendar/style.css 207 B
build/block-library/blocks/categories/editor-rtl.css 84 B
build/block-library/blocks/categories/editor.css 83 B
build/block-library/blocks/categories/style-rtl.css 79 B
build/block-library/blocks/categories/style.css 79 B
build/block-library/blocks/code/style-rtl.css 90 B
build/block-library/blocks/code/style.css 90 B
build/block-library/blocks/code/theme-rtl.css 131 B
build/block-library/blocks/code/theme.css 131 B
build/block-library/blocks/columns/editor-rtl.css 206 B
build/block-library/blocks/columns/editor.css 205 B
build/block-library/blocks/columns/style-rtl.css 497 B
build/block-library/blocks/columns/style.css 496 B
build/block-library/blocks/cover/editor-rtl.css 666 B
build/block-library/blocks/cover/editor.css 670 B
build/block-library/blocks/cover/style-rtl.css 1.23 kB
build/block-library/blocks/cover/style.css 1.23 kB
build/block-library/blocks/embed/editor-rtl.css 488 B
build/block-library/blocks/embed/editor.css 488 B
build/block-library/blocks/embed/style-rtl.css 417 B
build/block-library/blocks/embed/style.css 417 B
build/block-library/blocks/embed/theme-rtl.css 124 B
build/block-library/blocks/embed/theme.css 124 B
build/block-library/blocks/file/editor-rtl.css 300 B
build/block-library/blocks/file/editor.css 300 B
build/block-library/blocks/file/style-rtl.css 255 B
build/block-library/blocks/file/style.css 255 B
build/block-library/blocks/file/view.min.js 322 B
build/block-library/blocks/freeform/editor-rtl.css 2.44 kB
build/block-library/blocks/freeform/editor.css 2.44 kB
build/block-library/blocks/gallery/editor-rtl.css 927 B
build/block-library/blocks/gallery/editor.css 934 B
build/block-library/blocks/gallery/style-rtl.css 1.6 kB
build/block-library/blocks/gallery/style.css 1.59 kB
build/block-library/blocks/gallery/theme-rtl.css 122 B
build/block-library/blocks/gallery/theme.css 122 B
build/block-library/blocks/group/editor-rtl.css 159 B
build/block-library/blocks/group/editor.css 159 B
build/block-library/blocks/group/style-rtl.css 57 B
build/block-library/blocks/group/style.css 57 B
build/block-library/blocks/group/theme-rtl.css 70 B
build/block-library/blocks/group/theme.css 70 B
build/block-library/blocks/heading/style-rtl.css 114 B
build/block-library/blocks/heading/style.css 114 B
build/block-library/blocks/home-link/style-rtl.css 247 B
build/block-library/blocks/home-link/style.css 247 B
build/block-library/blocks/html/editor-rtl.css 283 B
build/block-library/blocks/html/editor.css 284 B
build/block-library/blocks/image/editor-rtl.css 728 B
build/block-library/blocks/image/editor.css 728 B
build/block-library/blocks/image/style-rtl.css 482 B
build/block-library/blocks/image/style.css 487 B
build/block-library/blocks/image/theme-rtl.css 124 B
build/block-library/blocks/image/theme.css 124 B
build/block-library/blocks/latest-comments/style-rtl.css 284 B
build/block-library/blocks/latest-comments/style.css 284 B
build/block-library/blocks/latest-posts/editor-rtl.css 137 B
build/block-library/blocks/latest-posts/editor.css 137 B
build/block-library/blocks/latest-posts/style-rtl.css 528 B
build/block-library/blocks/latest-posts/style.css 527 B
build/block-library/blocks/list/style-rtl.css 94 B
build/block-library/blocks/list/style.css 94 B
build/block-library/blocks/media-text/editor-rtl.css 266 B
build/block-library/blocks/media-text/editor.css 263 B
build/block-library/blocks/media-text/style-rtl.css 488 B
build/block-library/blocks/media-text/style.css 485 B
build/block-library/blocks/more/editor-rtl.css 431 B
build/block-library/blocks/more/editor.css 431 B
build/block-library/blocks/navigation-link/editor-rtl.css 489 B
build/block-library/blocks/navigation-link/editor.css 488 B
build/block-library/blocks/navigation-link/style-rtl.css 94 B
build/block-library/blocks/navigation-link/style.css 94 B
build/block-library/blocks/navigation/editor-rtl.css 1.72 kB
build/block-library/blocks/navigation/editor.css 1.72 kB
build/block-library/blocks/navigation/style-rtl.css 1.42 kB
build/block-library/blocks/navigation/style.css 1.41 kB
build/block-library/blocks/navigation/view.min.js 2.52 kB
build/block-library/blocks/nextpage/editor-rtl.css 395 B
build/block-library/blocks/nextpage/editor.css 395 B
build/block-library/blocks/page-list/editor-rtl.css 310 B
build/block-library/blocks/page-list/editor.css 310 B
build/block-library/blocks/page-list/style-rtl.css 241 B
build/block-library/blocks/page-list/style.css 241 B
build/block-library/blocks/paragraph/editor-rtl.css 157 B
build/block-library/blocks/paragraph/editor.css 157 B
build/block-library/blocks/paragraph/style-rtl.css 261 B
build/block-library/blocks/paragraph/style.css 261 B
build/block-library/blocks/post-author/editor-rtl.css 210 B
build/block-library/blocks/post-author/editor.css 210 B
build/block-library/blocks/post-author/style-rtl.css 182 B
build/block-library/blocks/post-author/style.css 181 B
build/block-library/blocks/post-comments-form/style-rtl.css 140 B
build/block-library/blocks/post-comments-form/style.css 140 B
build/block-library/blocks/post-comments/style-rtl.css 360 B
build/block-library/blocks/post-comments/style.css 359 B
build/block-library/blocks/post-content/editor-rtl.css 138 B
build/block-library/blocks/post-content/editor.css 138 B
build/block-library/blocks/post-excerpt/editor-rtl.css 73 B
build/block-library/blocks/post-excerpt/editor.css 73 B
build/block-library/blocks/post-excerpt/style-rtl.css 69 B
build/block-library/blocks/post-excerpt/style.css 69 B
build/block-library/blocks/post-featured-image/editor-rtl.css 398 B
build/block-library/blocks/post-featured-image/editor.css 398 B
build/block-library/blocks/post-featured-image/style-rtl.css 143 B
build/block-library/blocks/post-featured-image/style.css 143 B
build/block-library/blocks/post-template/editor-rtl.css 99 B
build/block-library/blocks/post-template/editor.css 98 B
build/block-library/blocks/post-template/style-rtl.css 378 B
build/block-library/blocks/post-template/style.css 379 B
build/block-library/blocks/post-terms/style-rtl.css 73 B
build/block-library/blocks/post-terms/style.css 73 B
build/block-library/blocks/post-title/style-rtl.css 60 B
build/block-library/blocks/post-title/style.css 60 B
build/block-library/blocks/preformatted/style-rtl.css 103 B
build/block-library/blocks/preformatted/style.css 103 B
build/block-library/blocks/pullquote/editor-rtl.css 198 B
build/block-library/blocks/pullquote/editor.css 198 B
build/block-library/blocks/pullquote/style-rtl.css 378 B
build/block-library/blocks/pullquote/style.css 378 B
build/block-library/blocks/pullquote/theme-rtl.css 167 B
build/block-library/blocks/pullquote/theme.css 167 B
build/block-library/blocks/query-pagination-numbers/editor-rtl.css 122 B
build/block-library/blocks/query-pagination-numbers/editor.css 121 B
build/block-library/blocks/query-pagination/editor-rtl.css 270 B
build/block-library/blocks/query-pagination/editor.css 262 B
build/block-library/blocks/query-pagination/style-rtl.css 239 B
build/block-library/blocks/query-pagination/style.css 236 B
build/block-library/blocks/query-title/editor-rtl.css 85 B
build/block-library/blocks/query-title/editor.css 85 B
build/block-library/blocks/query/editor-rtl.css 131 B
build/block-library/blocks/query/editor.css 132 B
build/block-library/blocks/quote/style-rtl.css 187 B
build/block-library/blocks/quote/style.css 187 B
build/block-library/blocks/quote/theme-rtl.css 220 B
build/block-library/blocks/quote/theme.css 222 B
build/block-library/blocks/rss/editor-rtl.css 202 B
build/block-library/blocks/rss/editor.css 204 B
build/block-library/blocks/rss/style-rtl.css 289 B
build/block-library/blocks/rss/style.css 288 B
build/block-library/blocks/search/editor-rtl.css 165 B
build/block-library/blocks/search/editor.css 165 B
build/block-library/blocks/search/style-rtl.css 374 B
build/block-library/blocks/search/style.css 375 B
build/block-library/blocks/search/theme-rtl.css 64 B
build/block-library/blocks/search/theme.css 64 B
build/block-library/blocks/separator/editor-rtl.css 99 B
build/block-library/blocks/separator/editor.css 99 B
build/block-library/blocks/separator/style-rtl.css 250 B
build/block-library/blocks/separator/style.css 250 B
build/block-library/blocks/separator/theme-rtl.css 172 B
build/block-library/blocks/separator/theme.css 172 B
build/block-library/blocks/shortcode/editor-rtl.css 474 B
build/block-library/blocks/shortcode/editor.css 474 B
build/block-library/blocks/site-logo/editor-rtl.css 462 B
build/block-library/blocks/site-logo/editor.css 464 B
build/block-library/blocks/site-logo/style-rtl.css 153 B
build/block-library/blocks/site-logo/style.css 153 B
build/block-library/blocks/site-tagline/editor-rtl.css 86 B
build/block-library/blocks/site-tagline/editor.css 86 B
build/block-library/blocks/site-title/editor-rtl.css 84 B
build/block-library/blocks/site-title/editor.css 84 B
build/block-library/blocks/social-link/editor-rtl.css 165 B
build/block-library/blocks/social-link/editor.css 165 B
build/block-library/blocks/social-links/editor-rtl.css 812 B
build/block-library/blocks/social-links/editor.css 811 B
build/block-library/blocks/social-links/style-rtl.css 1.3 kB
build/block-library/blocks/social-links/style.css 1.3 kB
build/block-library/blocks/spacer/editor-rtl.css 307 B
build/block-library/blocks/spacer/editor.css 307 B
build/block-library/blocks/spacer/style-rtl.css 48 B
build/block-library/blocks/spacer/style.css 48 B
build/block-library/blocks/table/editor-rtl.css 471 B
build/block-library/blocks/table/editor.css 472 B
build/block-library/blocks/table/style-rtl.css 481 B
build/block-library/blocks/table/style.css 481 B
build/block-library/blocks/table/theme-rtl.css 188 B
build/block-library/blocks/table/theme.css 188 B
build/block-library/blocks/tag-cloud/style-rtl.css 146 B
build/block-library/blocks/tag-cloud/style.css 146 B
build/block-library/blocks/template-part/editor-rtl.css 636 B
build/block-library/blocks/template-part/editor.css 635 B
build/block-library/blocks/template-part/theme-rtl.css 101 B
build/block-library/blocks/template-part/theme.css 101 B
build/block-library/blocks/term-description/editor-rtl.css 90 B
build/block-library/blocks/term-description/editor.css 90 B
build/block-library/blocks/text-columns/editor-rtl.css 95 B
build/block-library/blocks/text-columns/editor.css 95 B
build/block-library/blocks/text-columns/style-rtl.css 166 B
build/block-library/blocks/text-columns/style.css 166 B
build/block-library/blocks/verse/style-rtl.css 87 B
build/block-library/blocks/verse/style.css 87 B
build/block-library/blocks/video/editor-rtl.css 571 B
build/block-library/blocks/video/editor.css 572 B
build/block-library/blocks/video/style-rtl.css 173 B
build/block-library/blocks/video/style.css 173 B
build/block-library/blocks/video/theme-rtl.css 124 B
build/block-library/blocks/video/theme.css 124 B
build/block-library/common-rtl.css 853 B
build/block-library/common.css 849 B
build/block-library/editor-rtl.css 9.54 kB
build/block-library/editor.css 9.52 kB
build/block-library/index.min.js 151 kB
build/block-library/reset-rtl.css 527 B
build/block-library/reset.css 527 B
build/block-library/style-rtl.css 10.2 kB
build/block-library/style.css 10.2 kB
build/block-library/theme-rtl.css 658 B
build/block-library/theme.css 663 B
build/block-serialization-default-parser/index.min.js 1.09 kB
build/block-serialization-spec-parser/index.min.js 2.79 kB
build/blocks/index.min.js 46.9 kB
build/components/index.min.js 209 kB
build/components/style-rtl.css 15.8 kB
build/components/style.css 15.8 kB
build/compose/index.min.js 10.2 kB
build/core-data/index.min.js 12.3 kB
build/customize-widgets/index.min.js 11.1 kB
build/customize-widgets/style-rtl.css 1.5 kB
build/customize-widgets/style.css 1.49 kB
build/data-controls/index.min.js 614 B
build/data/index.min.js 7.1 kB
build/date/index.min.js 31.5 kB
build/deprecated/index.min.js 428 B
build/dom-ready/index.min.js 304 B
build/dom/index.min.js 4.53 kB
build/edit-navigation/index.min.js 14.9 kB
build/edit-navigation/style-rtl.css 3.37 kB
build/edit-navigation/style.css 3.37 kB
build/edit-post/classic-rtl.css 492 B
build/edit-post/classic.css 494 B
build/edit-post/index.min.js 29 kB
build/edit-post/style-rtl.css 7.2 kB
build/edit-post/style.css 7.2 kB
build/edit-site/index.min.js 26.4 kB
build/edit-site/style-rtl.css 5.07 kB
build/edit-site/style.css 5.07 kB
build/edit-widgets/index.min.js 16.1 kB
build/edit-widgets/style-rtl.css 4.06 kB
build/edit-widgets/style.css 4.06 kB
build/editor/index.min.js 37.7 kB
build/editor/style-rtl.css 3.74 kB
build/editor/style.css 3.73 kB
build/element/index.min.js 3.17 kB
build/escape-html/index.min.js 517 B
build/format-library/index.min.js 5.36 kB
build/format-library/style-rtl.css 668 B
build/format-library/style.css 669 B
build/hooks/index.min.js 1.55 kB
build/html-entities/index.min.js 424 B
build/i18n/index.min.js 3.6 kB
build/is-shallow-equal/index.min.js 501 B
build/keyboard-shortcuts/index.min.js 1.49 kB
build/keycodes/index.min.js 1.25 kB
build/list-reusable-blocks/index.min.js 1.85 kB
build/list-reusable-blocks/style-rtl.css 838 B
build/list-reusable-blocks/style.css 838 B
build/media-utils/index.min.js 2.88 kB
build/notices/index.min.js 845 B
build/nux/index.min.js 2.03 kB
build/nux/style-rtl.css 747 B
build/nux/style.css 743 B
build/plugins/index.min.js 1.83 kB
build/primitives/index.min.js 921 B
build/priority-queue/index.min.js 582 B
build/react-i18n/index.min.js 671 B
build/redux-routine/index.min.js 2.63 kB
build/reusable-blocks/index.min.js 2.28 kB
build/reusable-blocks/style-rtl.css 256 B
build/reusable-blocks/style.css 256 B
build/rich-text/index.min.js 10.6 kB
build/server-side-render/index.min.js 1.32 kB
build/shortcode/index.min.js 1.48 kB
build/token-list/index.min.js 562 B
build/url/index.min.js 1.74 kB
build/viewport/index.min.js 1.02 kB
build/warning/index.min.js 248 B
build/widgets/index.min.js 7.27 kB
build/widgets/style-rtl.css 1.17 kB
build/widgets/style.css 1.18 kB
build/wordcount/index.min.js 1.04 kB

compressed-size-action

Copy link
Contributor

@tellthemachines tellthemachines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! I mostly left questions below 😄

Is there any way we can test this, or shall we just merge it and test live?

packages/e2e-tests/config/setup-test-framework.js Outdated Show resolved Hide resolved
.github/report-flaky-tests/index.js Outdated Show resolved Hide resolved
.github/workflows/end2end-test.yml Show resolved Hide resolved
packages/e2e-tests/config/flaky-tests-reporter.js Outdated Show resolved Hide resolved
@kevin940726
Copy link
Member Author

I think the only way we can test it for now without merging this is to create your own fork and merge this PR onto your fork's trunk. Might also want to create a intentionally flaky test and skip all the other tests for faster feedbacks.

Copy link
Member

@noisysocks noisysocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is freaking cool! 🎉

Should we only retry tests and report flaky tests when CI runs on trunk? Two reasons:

  • It aligns more closely with the discussion in Retry flaky e2e tests at most 2 times #31682 where the consensus seems to be that we should build a report of flaky tests before we automatically retry tests on PRs.
  • It lets us incrementally ramp up this new bit of infrastructure. A staged rollout, if you will. This would reduce the likelihood that e.g. the job accidentally spams the repo with GitHub issues after it is deployed.

packages/e2e-tests/jest.config.js Outdated Show resolved Hide resolved
packages/e2e-tests/config/setup-test-framework.js Outdated Show resolved Hide resolved
.gitignore Outdated Show resolved Hide resolved
.github/workflows/flaky-tests.yml Outdated Show resolved Hide resolved
.github/report-flaky-tests/index.js Outdated Show resolved Hide resolved
.github/report-flaky-tests/index.js Show resolved Hide resolved
.github/report-flaky-tests/index.js Show resolved Hide resolved
.github/report-flaky-tests/index.js Show resolved Hide resolved

body =
renderIssueDescription( { meta, testTitle, testPath } ) +
body.slice(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wanted to really lean into the "HTML comments are obviously the best place to store data why are you looking at me like that" way of life then you could use wp.blocks.serialize() and wp.blocks.parse() instead of string concatenation 😂

.github/report-flaky-tests/index.js Show resolved Hide resolved
@kevin940726 kevin940726 force-pushed the try/flaky-tests-issue-report branch from 27a38d8 to f11f308 Compare September 1, 2021 08:34
@adamziel
Copy link
Contributor

adamziel commented Sep 1, 2021

This is such a good idea, thank you for working on that! ❤️

@talldan
Copy link
Contributor

talldan commented Sep 1, 2021

Amazing work here 👏

Should we only retry tests and report flaky tests when CI runs on trunk?

I did also have a concern that because PRs are a work in progress, a failing test could be in a PR, but the test could be fixed in the PR and never merged to trunk. An issue wouldn't make sense for that. That said, it would probably be a rare occurrence with the way this uses retries to detect flakiness.

On a different note, there's also an alternative model to this proposal, instead any test failure in trunk could be reported in an issue and wouldn't need to be retried, because by definition any failure in trunk is undesired.

The retries could still happen in PRs, but trunk is instead used as the source of truth.

@kevin940726
Copy link
Member Author

On a different note, there's also an alternative model to this proposal, instead any test failure in trunk could be reported in an issue and wouldn't need to be retried, because by definition any failure in trunk is undesired.

The retries could still happen in PRs, but trunk is instead used as the source of truth.

This is also a cool idea, but I'm afraid if there's actually a bug in trunk then it will be catastrophic 😅 .

For instance, if some commit in WordPress core causes all e2e tests in gutenberg fail out of a sudden, the action will then create an issue for each of them. I think auto-retrying is a safer way to determine if a test is flaky or not. At least until we can make sure our trunk is stable and predictable enough 😂.

* - If a test fails, it will automatically re-run at most 2 times.
* - If it pass after retrying (below 2 times), then it's marked as **flaky**
* but displayed as **passed** in the original test suite.
* - If it fail all 3 times, then it's a **failed** test.
Copy link
Member

@gziolo gziolo Sep 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interesting part in this proposal is that when the test fails but then passes with re-tries we open an issue but when the test fails all the time we do nothing. I would worry more about the latter – clearly a regression 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the test fails all the time it's not considered flaky, it's just doing its job 😄

Tests that fail consistently on CI usually get noticed and fixed pretty quickly, but tests that fail 1 out of 10 times less so.

Copy link
Member

@gziolo gziolo Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tellthemachines, all correct and we are on the same page here. I'm just saying that it could be also beneficial to open automatically a bur report when we detect a regression to make it easier to coordinate the flow for fixing it. In case of regression, it could also leave a message on WordPress Slack like it happens in #core channel when CI job on trunk fails.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already the "check" itself on each commit reporting the failure of the tests. Yep, we could add some notifications or webhook if we desire, but I don't think we need to open an "issue" for them. If the tests suddenly start to fail on trunk then I think we can assume that something went wrong in our pipeline. Excluding flaky tests, the failures should have something to do with third-party libraries (core or plugins update for example). In that case, we should immediately start to fix them, opening an issue doesn't seem to help much other than creating additional noice 🤔.

And after all it's not in the scope of this PR 😅. Adding a webhook to notify failing tests on Slack is also pretty trivial with the help of the official webhook API of both GitHub and Slack.

@nerrad
Copy link
Contributor

nerrad commented Sep 7, 2021

Generally I really love this idea and think we should ship it and see how it works in practice on the repo. We can revert if it introduces too much noise initially. One thing that could be tried is posting comments on a single issue as part of the initial iteration to help with fine tuning frequency and content but I appreciate that might result in a lot of extra code.

@kevin940726
Copy link
Member Author

We discussed this in our weekly core-js meeting. I believe we're in agreement with giving this a try.

There are still some minor issues we need to figure out before merging this though.

  1. Should we run this on all PRs and trunk or only on trunk at first?
  2. What should be the name of the flaky test label?
  3. As proposed in Try reporting flaky tests to issues #34432 (comment), should we only post comments in one issue instead?

And here are my thoughts:

  1. Let's only run this on trunk at first and iterate to all PRs when ready.
  2. I'm thinking [Type] Flaky Test.
  3. That would require a lot of changes in code but should be possible. I don't think it's worth it though. As mentioned in the original discussion in the meeting, I believe all the flaky tests have values and worth having their own corresponding issues. We can always revert this PR anytime, or several times, until we fix the noise issues (if there's any).

@nerrad
Copy link
Contributor

nerrad commented Sep 8, 2021

If you go the route of creating issues, one tweak that might make this nicer to follow up on is to have the bot leave a reaction on an issue that's already created. That way we can sort based on reactions (using the GitHub issue UI) to determine high flakiness tests (and prioritize which get addressed first).

I do think there needs to be some process for regularly addressing flaky tests surfaced by this new logic (either via removing, or rewriting the test) because by it's very nature a flaky test is unreliable and thus can give a false impression of code coverage that doesn't actually exist. So while I still love the idea, the only concern I have is issues created by this process will pile up and not be addressed.

@nerrad
Copy link
Contributor

nerrad commented Sep 8, 2021

  1. Should we run this on all PRs and trunk or only on trunk at first?

Having this run only on trunk first seems to remove one of the main benefits of this work, which is to improve the experience for contributors where flaky tests contribute to unexpected failures in PRS where changes are unrelated to the test.

@kevin940726
Copy link
Member Author

one tweak that might make this nicer to follow up on is to have the bot leave a reaction on an issue that's already created. That way we can sort based on reactions (using the GitHub issue UI) to determine high flakiness tests (and prioritize which get addressed first).

I've thought about this, but immediately dropped it. I don't think there's an API to add multiple same reactions to an issue from the same (bot) user. I initially wanted to append new errors as comments to the issue and lock the issue, but I guess that makes the issue even less visible.

the only concern I have is issues created by this process will pile up and not be addressed.

We already have this problem now, I don't think it'd be a new thing. Just that flaky tests are more visible now, but are also blocking contributors at the same time. The way we usually temporary fix this if we haven't come up with a proper solution for the flaky test is to simply skip it. It's the last resort and is very invisible to us (a less invisible way to overcome this is to... create an issue, which is exactly what this PR does).

Having this run only on trunk first seems to remove one of the main benefits of this work, which is to improve the experience for contributors where flaky tests contribute to unexpected failures in PRS where changes are unrelated to the test.

Yep, agree. The end goal is always to run on all PRs. Running it on trunk is just a trial to see if it works before rolling out to all the contributors. We could also argue that it's not necessary and we can always just revert this PR though. Hence I'm open to all suggestions.

@noisysocks
Copy link
Member

And here are my thoughts:

I concur with all your thoughts here—let's do it! 🙂

@kevin940726 kevin940726 force-pushed the try/flaky-tests-issue-report branch from fb2ecb2 to e5499d3 Compare September 9, 2021 09:15
@kevin940726 kevin940726 marked this pull request as ready for review September 9, 2021 09:16
Copy link
Member

@noisysocks noisysocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try it! 👍

@kevin940726 kevin940726 merged commit 23b75cc into trunk Sep 13, 2021
@kevin940726 kevin940726 deleted the try/flaky-tests-issue-report branch September 13, 2021 08:59
@github-actions github-actions bot added this to the Gutenberg 11.6 milestone Sep 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Type] Automated Testing Testing infrastructure changes impacting the execution of end-to-end (E2E) and/or unit tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants