-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add screenshot testing #1068
Add screenshot testing #1068
Conversation
d347ebd
to
6ef74f1
Compare
.eslintrc
Outdated
@@ -7,6 +7,7 @@ globals: | |||
browser: true | |||
context: true | |||
jestPuppeteer: true | |||
BASE_URL: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was missing from .eslintrc
, thus causing the linter error. Sorry!
@@ -16,3 +16,6 @@ s3/ | |||
node_modules/ | |||
npm-debug.log | |||
*tgz | |||
|
|||
### IDE ### | |||
.vscode/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignoring .vscode
settings
puppeteer.setup.js
Outdated
@@ -7,6 +10,8 @@ import { setDefaultOptions } from 'expect-puppeteer'; | |||
jest.setTimeout(30 * 1000); | |||
setDefaultOptions({ timeout: 3 * 1000 }); | |||
|
|||
jest.retryTimes(4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add retry 4 times for a total test run of 5 times before jest reporting a failed test. This is useful for avoiding minor flakiness when doing UI testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was finding it relatively hard to force an error (I was trying to do so by mucking with CSS, to cause a noticeable visual change), I think partially because retries were set high enough that it was taking quite some time to exit.
In deciding on your retry/timeout settings, mind doing the same, just to confirm that an expected failure occurs in a reasonable period of time, and bubbles appropriately.
test/integration/helpers.js
Outdated
@@ -0,0 +1,32 @@ | |||
const WAIT_BETWEEN_SCREENSHOTS_MS = 100; | |||
|
|||
export async function waitUntilScreenStable() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function ensures the screen's pixels are no longer changing, before we run toMatchImageSnapshot()
This is really nice. I had a chance to test it out and the code seemed good (although I am new to these kinds of tests) and I'm excited with all the things we can do here. Do you know why the images I'm getting may have failed? (they look the same to your screenshots, to my eyes at least)
(I think it's fine to use the "live" zika build as this isn't changing currently, but one day soon i'll make a bunch of "test" datasets available which we can guarantee won't change.) Also, could I ask you to
|
@tihuan @jameshadfield I'm excited to see this approach in action, I've talked to peers who swear by image snapshots for integration tests (I know this approach is used heavily at Facebook, as an example). Thought for improvementFor CI/CD my only concern is, can we get it so a user contributing to the project can easily see the visual difference they've caused? Reading up on Jest Image Snapshot, it sounds like we could upload the image somewhere custom. What if we did something slick like leaving the image as a comment on a failing PR? I can dig in to how practical this would be. DEV_DOCS.md vs., CONTRIBUTING.md@jameshadfield this is absolutely a nit, so feel free to ignore me 😆 but it's fairly common to have a file called Would you be open to |
test/integration/zika.test.js
Outdated
}) | ||
describe("region", () => { | ||
it("matches the screenshot", async () => { | ||
await toMatchImageSnapshot("region", async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach, you're ensuring that specific components on the page match visually (this is quite nice).
6ef74f1
to
91d057d
Compare
Thanks for the quick feedback on this, everyone 🤩🙏!! Thanks for the headsup! I just rebased this from Ah thanks for the reminder, I will add some doc today! For the image diffing sometimes fails but our eyes can't tell the difference thing, I did read something yesterday about different machines, graphic cards, etc. could cause subtle pixel diff that only machines can tell. Which is part of the reason why I added We might need to tweak the threshold for failing, like 3%, 4%, 5% or something? Depends on how much margin of error we're comfortable here! First time doing image diffing, so not sure what other heuristic config people do to prevent flakiness 😆 @bcoe Love the idea of uploading image diffing and show that to people! Depends on how Auspice is set up, we will need to hook it up to an S3 bucket and management permission etc to get GH Action to upload it and send the image link. That'd be super convenient and awesome!! |
Just checked the failed integration test output and we're getting almost 18% diff! So probably a real failure From what @jameshadfield was saying, it sounds like the "live" zika dataset is probably different from the seeded one, so that could be why? If that's the case, should we add seed datasets in this PR and run the tests against them to have consistency? Thanks all 😄 ! |
uhh interesting, I just added 5% failure threshold, and the tests magically passed now -.- Given that it was almost 18% diff, not sure what made it to pass now! HMM |
Until recently we had a contributing.md but PR #978 moved these to DEV_DOCS so that all projects can pull in a generic "nextstrain" contributing.md. Happy to discuss directions in another issue, but for this PR let's add to dev_docs. Re: CI tests. I see a multi-step process:
Zika hasn't been updated since last year, so that's not the case. But more generally we should add a "test set" of JSONs to the S3 bucket which nextstrain draws from so that datasets don't change underneath us. This can be done in a future PR. (Please don't add dataset JSONs to this repo, they will add too much weight!) I think the only thing for this PR is understanding why the image diff percentages are varying. I don't think there's anything stochastic about how we render auspice. I'll also look into this today. |
Oh great! Love the idea of adding test set JSONs to S3 instead of the repo 🎉 Yeah another possibility is the I think I might do a different approach to test snapshot against the expected snapshot for up to N times or until timeout happens, say 10 seconds. So we don't need to rely on WIll update the PR later today! Thank you! |
8004f3f
to
9e3202c
Compare
OK I updated the code to use the new approach, and GH Action tests have been passing consistently! Can anyone try to run locally too for sanity check? If that works, then the last thing I need to do is to update Thanks all 😄 ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple comments from my own testing @tihuan 👌 I'm excited to see if I can figure out how to make screenshots upload to the PR with GitHub actions, as follow on work.
puppeteer.setup.js
Outdated
@@ -7,6 +10,8 @@ import { setDefaultOptions } from 'expect-puppeteer'; | |||
jest.setTimeout(30 * 1000); | |||
setDefaultOptions({ timeout: 3 * 1000 }); | |||
|
|||
jest.retryTimes(4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was finding it relatively hard to force an error (I was trying to do so by mucking with CSS, to cause a noticeable visual change), I think partially because retries were set high enough that it was taking quite some time to exit.
In deciding on your retry/timeout settings, mind doing the same, just to confirm that an expected failure occurs in a reasonable period of time, and bubbles appropriately.
test/integration/zika.test.js
Outdated
* https://github.com/americanexpress/jest-image-snapshot#%EF%B8%8F-api | ||
*/ | ||
const SNAPSHOT_CONFIG = { | ||
failureThreshold: 5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar to my prior comment, I was finding I was able to cause quite a few visual oddities in the page, before I triggered an error with the 5% threshold. To decide on the failureThreshold
and blur
values, I think a good approach might be to:
- purposely remove a couple elements from the page that aren't tested (make sure the missing elements trigger a failure in a reasonable amount of time).
- muck with CSS to make the page look noticeably visually different, and make sure we get a failure in a reasonable amount of time 👌
Thanks for the excellent feedback, @bcoe ! Will play with the As for |
b4c6ce9
to
89f4896
Compare
Hi all! I removed Digging a little deeper why it was passing before, I learned that Also, thanks to @bcoe 's awesome image diff comment service here(super convenient! Thank you!✨ 👏), it seems like the 17% is caused by the rendering difference between macOS and Linux? Like font, default browser element styling, etc.. And looking at this article, they suggest running image diffing in a Docker image for consistent environment, so that could increase the scope of this PR quite a bit. Or maybe we can somehow come up with a creative temp solution to reap enough benefits of screenshot testing now? |
@tihuan, GitHub actions actually supports a Another thing worth checking ... Have any of the datasets changed significantly since your initial snapshots, you might try pulling a fresh set of data. I like @jameshadfield's idea of having a "test set", would help eliminate false positives in the future. |
I like that idea but I would make sure that those test assets are versioned, otherwise tests may start failing without any changes to the code which is probably not the intention. |
You may want to enable git lfs so that over time as more commits and image snapshots are added git does not slow to a crawl. |
c62c0bd
to
22f2bfd
Compare
Thanks so much for all the amazing suggestions, @bcoe and @anescobar1991 ! Changing GH Action's Will look into adding CC: @jameshadfield for the macos question 😄 Thanks! |
@jameshadfield has the real say on this matter (as I've just swooped in and started volunteering 😆).... but, If the
How did testing go with purposefully breaking a few elements? One thought, could we test that we could have caught this real regression reported by @jameshadfield: 👆If we could catch this breakage, it would be a great indicator that our comparison is sensitive enough. |
22f2bfd
to
86a3453
Compare
@bcoe Sounds great! Yeah I also feel like getting some snapshots to work now is better than nothing, but definitely up to @jameshadfield 😆 I tried to get Re: intentionally breaking the app to make sure the snapshot works, since I removed |
Hey @tihuan, @bcoe et al., this all looks great. I'm happy where this branch has gotten to -- it introduces integration tests, gives examples, and provides easy-to-follow documentation. Since they work in CI, they are immediately valuable for finding regressions on future PRs 🎉 The tests do appear to be very browser specific. I'm using MacOS 10.14.6 and the screenshot tests fail for me, with all differences appearing to be related to text rendering: So that's to say that I think the testing approach is fine, and the browser-specificity can be tackled in a separate PR. Following your comments above, before adding further tests, I think we should tackle the following in separate PRs:
One last question before merge: edit: I think we just do the first one here... The 4 screenshots already add ~1Mb to the repo. This isn't a deal-breaker, but we can't keep going in this direction. So, what are your thoughts on what we should do here?
Is there anything else to consider before merging this? Other notes:
|
Hi all! GOOD NEWS, there's a way around installing I just tried pushing up a commit to my fork and got:
To install Install git lfs
After
That should do 🤞 Re: Docker, yeah I think that's the only sure way to have OS/browser consistent snapshot testing, although I'm not super familiar with Docker so would be helpful to see if other people want to take on that task! Thanks all! |
See additions to dev-docs for rationale. Having this in master should allow us to rebase #1068 and therefore have git-lfs used from day zero.
Thanks so much @tihuan, @bcoe and @anescobar1991. This has now been merged via #1084 which was a rebase of this onto master so that we use |
Description of proposed changes
This PR adds
jest-image-snapshot
andjest.retryTimes()
to do basic screenshot diffing for website's functionalityRelated issue(s)
Related to #917
Testing
What steps should be taken to test the changes you've proposed?
If you added or changed behavior in the codebase, did you update the tests, or do you need help with this?
Please help test this PR locally by:
Make sure you have the default
zika.json
data via:curl http://data.nextstrain.org/zika.json --compressed -o data/zika.json
Run
npm run dev
In another terminal tab run:
npm run integration-test
and all snapshot tests should pass:Thank you for contributing to Nextstrain!