Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add E2E testing #917

Open
jameshadfield opened this issue Mar 4, 2020 · 32 comments
Open

Add E2E testing #917

jameshadfield opened this issue Mar 4, 2020 · 32 comments
Labels
please take this issue proposal Proposals that warrant further discussion

Comments

@jameshadfield
Copy link
Member

Thoughts, implementations and advice welcome!

@jameshadfield jameshadfield added proposal Proposals that warrant further discussion please take this issue labels Mar 4, 2020
@dnprock
Copy link
Contributor

dnprock commented Mar 4, 2020

Testing frameworks in node.js are fragmented. I think we separate unit tests and integration (feature) tests. We use different frameworks for each test suite.

Currently, I think:

  • enzyme unit tests for react client components
  • jasmine unit tests for server components
  • zombie.js for integration (feature) tests

@jameshadfield
Copy link
Member Author

I don't have the expertise here to comment, but am happy to have them separated.

In general, I think feature (e2e?) tests are needed. Over the years we've added amazing capabilities to auspice, and we have hundreds of datasets which make different use of these varied features. As such, manually testing a decent proportion of the datasets at each PR has become untenable and this leads to bugs which often take a while to discover 😦 Creating a dev environment which can give us confidence in this area would be wonderful for nextstrain.

@dnprock
Copy link
Contributor

dnprock commented Mar 5, 2020

I'm happy to help out here. Can you write down the main test cases that you want to have coverage for? Especially the use cases where you think are most important.

For example:

Rendering

  1. Minimal dataset rendering.
  2. Complex dataset rendering.

Tree Layout
3. Change Color By option.
4. Change Tree Layout option.

Map Layout
5. Change Geographic resolution option.

@hydrosquall
Copy link
Contributor

hydrosquall commented Mar 5, 2020

For integration testing of general frontend logic, I've had good experiences with cypress ( https://www.cypress.io/ ), especially in contrast to previous experiences with setting up Selenium/PhantomJS based testing. It integrates well with CI, so it seems like a logical first step might be to add a Cypress test step to the existing Travis pipeline.

For visual regression testing, I've heard that others have had good experiences with percy.io, but I haven't personally used it. It doesn't sound like this is necessary, but mentioning it just in case it's what you had in mind.

In order to test that the app works with these different datasets - would you be comfortable with having the fixture data files hardcoded into this repository? If so, it would be helpful to know what is a representative set of datasets to store that have a good mixture of properties for a test suite to exercise.

@dnprock
Copy link
Contributor

dnprock commented Mar 5, 2020

cypress.io looks good. It seems like the framework to get started with now.

@tihuan
Copy link
Contributor

tihuan commented Mar 5, 2020

Cypress's bottleneck is test recording and parallelization, which are bundled together. So if we run out of test quota, CI/CD tests will stop working.

Our team recently (last 3 months) evaluated between Cypress and Puppeteer, and went with Puppeteer to avoid the hard bottleneck and also paved way for easier upgrade path to Playwright for cross browser testing in all Chrome, Firefox, and Safari down the road!

For unit testing React, Jest + Enzyme will do
Integration test can be Jest + Puppeteer

Under the hood, Cypress uses Mocha instead of Jasmine, whereas Jest uses Jasmine already, so that could streamline vendor library dependencies too

Resources:

  1. Puppeteer
  2. Jest-Puppeteeer
  3. Playwright - Original Puppeteer team from Google moved on to Microsoft to create this
  4. Jest
  5. Enzyme

@dnprock
Copy link
Contributor

dnprock commented Mar 6, 2020

@tihuan thanks for the info, very useful. I always find node.js ecosystem a mess :). In these situations, I'd delegate the decision to whoever takes the lead. I'm happy to help out with writing some tests.

@tihuan
Copy link
Contributor

tihuan commented Mar 6, 2020

Thanks for the quick reply, @dnprock 😄

Yeah, not sure who's gonna lead the testing framework setup, but I can help prototype integration tests with jest + puppeteer if needed!

@hydrosquall
Copy link
Contributor

hydrosquall commented Mar 6, 2020 via email

@tihuan
Copy link
Contributor

tihuan commented Mar 6, 2020

Hey @hydrosquall thanks for adding the extra info!

Cypress ties dashboard and parallel tests together, since it's pretty much their money maker from what I read here: cypress-io/cypress#2520, so no dashboard or quota will block parallel tests. In our trial, the tests stopped working altogether once the quota ran out

There's a homegrown solution to self host dashboard, but seems like it's still in alpha:

https://github.com/agoldis/sorry-cypress

The free OSS plan is promising! Not sure if its limit of 5 users would be enough for the project though

And thanks for bringing up that Cypress supports Firefox and Edge 👍 Puppeteer also has support for Firefox and Edge, given that now Edge is Chromium based too (https://blogs.windows.com/msedgedev/2019/11/04/edge-chromium-release-candidate-get-ready)

Both Cypress and Puppeteer are easy to setup, so maybe it boils down to which approach is more maintainable in the long run for this project, in terms of contributor interest, experience, free plan user limit, etc.?

@MargotLepizzera
Copy link

Hey, quickly chiming in to mention that Datadog has an E2E testing product, called browser tests and an OSS plan as well
If you think it'd be a good fit for your project I'd be happy to work on having that project approved internally :)

@tihuan
Copy link
Contributor

tihuan commented Mar 6, 2020

Ah yes! Thanks for mentioning Datadog's browser tests and flagging the OSS plan, @MargotLepizzera 🥇

My project uses Datadog to periodically (5 mins) ping our app to make sure it's functional, and the integration with Slack is super neat ✨

For writing integration tests, I'm thinking we might need a version control solution, so the test code can be checked into this repo and run locally and on every PR too

A coworker just shared this article: https://medium.com/prodperfect/how-to-build-e2e-test-cases-b0cb3e3c97bf

tl;dr is integration tests should mostly cover critical user paths (a small set of all the tests), and use unit tests and component tests to cover larger area of the code base. This way we don't end up writing all tests in integration, resulting in taking hours to run the test suite!

@tihuan
Copy link
Contributor

tihuan commented Mar 7, 2020

I just created a puppeteer branch with a POC integration test to click on the Play button with Zika dataset:

https://github.com/nextstrain/auspice/compare/master...tihuan:jest-test-setup?expand=1

To run the integration test:

  1. Please make sure you have the zika dataset
  2. Run auspice view --datasetDir data in one terminal tab
  3. In another terminal tab, run npm install && npm run integration-test

You should get:

Screen Shot 2020-03-07 at 2 37 56 AM

  1. If you wanna see the integration test in browser, you can run: HEADLESS=false SLOWMO=10 npm run integration-test

Please let me know if you encounter any issue 😄 Thank you!

@jameshadfield
Copy link
Member Author

Thanks all for your input. I'll defer to @colinmegill to guide which testing framework we go with, as I think this is a very important decision for the project, but I wanted to give a summary of the kind of tests we desperately need. Currently, our testing consists of me loading a number of datasets and trying to run all the functionality I believe the PR may have affected. Things fall through the cracks this way, especially as we have more developers and more features.

Example of where testing would have been super helpful this week

We had a PR last week which fixed an open issue where the labels in the legend would overflow & look bad. The PR limited the characters in the legend name, so they didn't overflow. Great! I tested it on a few examples, it worked, I merged it. But this broke legend display for numeric legend traits, which I didn't test on 😢 . These are the kind of things I'd love testing to automate out of my life!

List of test actions it would be wonderful to automate

For (e.g.) 10 different datasets, with the acknowledgement that some params will be specific to each one). This list isn't exhaustive (but is equally too exhaustive to begin with).

  • Load dataset & check it renders. Note that there are changes over time -- e.g. as we update the nCoV dataset the tree will gain more tips, the map will gain more locations. We could get around this by having a set of datasets which don't change and which the tests are based on.

  • Change colorBy to X, then to Y

  • Change geo Resolution to X

  • Move between all 4 tree layouts

  • Toggle between divergence & temporal tree views

  • Hover over different entries in the legend and observe that the tips change size

  • Click on different branches & observe that the tree zooms & the map changes.

  • Use the dataset selector to change to a different dataset

  • Enable a second tree via the sidebar dropdown (and repeat above tests)

  • Click on a bar in the entropy panel and see tree coloring update

  • Load a dataset with queries defined via URL (e.g. ?c=country) and check they modulate state appropriately

  • Animate dataset & check that it looks as it should (I presume an entry like this must have a number of defined checkboxes for puppeteer?)

@colinmegill
Copy link
Collaborator

I will +1 puppeteer! Looks great @tihuan :)

@tihuan
Copy link
Contributor

tihuan commented Mar 12, 2020

Sorry for getting back to this late, got a workload burst this week 🚀

Thanks for checking out the PR, @colinmegill 🤩🙏

Love the list you provided too, @jameshadfield ! Sounds like it will involve SVG testing, which I don't have much experience in and will require some research on how to do it right.

@colinmegill how do we do that in cellxgene?

Are other contributors okay with Puppeteer? Does the PR look good to everyone?

Thanks all!

@dnprock
Copy link
Contributor

dnprock commented Mar 12, 2020

@tihuan I see some changes in formating of .eslintrc file. It may conflict with other devs already on the project.

Otherwise, the test PR looks good. I'm happy to help out.

@tihuan
Copy link
Contributor

tihuan commented Mar 12, 2020

Hey @dnprock ! Thanks for the quick review! That's a really good point!

I just reverted the linter changes to .eslintrc file and created a PR

PTAL again 🙏 Thanks a lot!

@jameshadfield
Copy link
Member Author

jameshadfield commented Mar 19, 2020

Update: Auspice now has jest / puppeteer support thanks to @tihuan's #943 🎉

@tihuan & @colinmegill do you recommend closing this issue and creating new issues for each of #917 (comment) or is there more general e2e discussion to happen here first?

@tihuan
Copy link
Contributor

tihuan commented Mar 20, 2020

Yay thanks everyone for the discussion and reviews! 😍🙏🎉

I think it makes sense to break down #917 (comment) into individual issues, so people can tackle them in parallel

In general, I would recommend people use https://github.com/smooth-code/jest-puppeteer/blob/master/packages/expect-puppeteer/README.md#api where possible, instead of going straight to using Puppeteer's own API in writing tests, unless expect-puppeteer doesn't have the functionality you need. This is because jest-puppeteer has a more user friendly API than Puppeteer (src)!

Once we have individual issues for each test case, we could also tag them with priority, so people know which ones to choose first?

Thanks all!

@colinmegill
Copy link
Collaborator

Agree with all the above, thanks so much @tihuan! And thanks for referencing those jest-puppeteer docs!

@jameshadfield
Copy link
Member Author

👍 @colinmegill / @tihuan if you have time, could you turn one of those example tests into an issue as you're better placed to know how much detail is required (and please reference those recommendations in the docs so that the test implementations are consistent).

@tihuan
Copy link
Contributor

tihuan commented Mar 20, 2020

Thanks both! Yeah I'll pick a test you described above and turn it into an issue and see how people like it! ✍️

@tihuan
Copy link
Contributor

tihuan commented Mar 26, 2020

Hi all! Quick update on writing an issue for a test 😄

I'm working on Change colorBy to X, then to Y now

Given that it's asserting SVG changes, I wanna try using https://github.com/americanexpress/jest-image-snapshot to diff images. Another option is https://percy.io/, but not sure if they have OSS free plan thus testing with jest-image-snapshot first

The expected outcome of the issue is to provide a list of assertions + image diffs we want the test to do, along with suggested jest-puppeteer API to use to click around, so whoever picks up the ticket will be able to do so

Ideally, I'd like to do a spike on jest-image-snapshot first, write a test for it, get it merged, so it can be an example for other tests to follow. What do people think of that approach?

Extra discussion:

  1. Should we add tests and linter as part of TraisCI PR/branch check?

In my work project, our TravisCI setup runs a few different checks in parallel:

Screen Shot 2020-03-26 at 10 25 26 AM

  1. Since we have eslint setup already, should we add a precommit hook to run eslint, so it autofixes/fails a commit when linter is not happy? I see a few commits in the history that run eslint retroactively, so thought we could automate it too! The tool my work project uses is https://github.com/typicode/husky

Love to hear your thoughts on all three questions!

Thanks all!

@jameshadfield
Copy link
Member Author

Hi @tihuan -- thanks for commenting, it's exciting that this is progressing!

Given that it's asserting SVG changes

This, to me, is the big question. The datasets which auspice sources (via npm run get-data) change over time, as new genomes become available, so we'd have to provide a set of stable datasets in order to generate the same SVG over time. But this isn't hard. @tihuan I'd recommend using a ncov dataset with the date in the filename for the current test.

What happens if, for instance, we change the font size of the axis text label? Presumably this breaks any test with the tree in view. Is there a way to programmatically regenerate the images needed for all the tests?

Worth me asking at this point if there other approaches than comparing image snapshots which we should consider? E.g. ensuring that a certain SVG element is in the DOM & has the correct colour after a colour-by change (I have no idea what's best practice at the moment).

write a test for it, get it merged, so it can be an example for other tests to follow.

Agreed, with the caveat that we should pick the right approach here as (hopefully) many tests will be forthcoming which use it. Really like your thoughts on detailing a suggested API for others to follow ⭐️

Should we add tests and linter as part of TraisCI PR/branch check?

Seems sensible. Recent work means we're now at 0 errors & warnings (perhaps the first time this has happened!). Enforcing this via husky seems great -- I haven't used autofixes, so not sure about failing the commit vs autofixing.

@tihuan
Copy link
Contributor

tihuan commented Mar 27, 2020

Morning @jameshadfield ! Oh thank YOU and your team for running the show and give us such a great tool to use. Especially right now!

I'm just pitching in where I can and trying to be helpful :-)

Hi @tihuan -- thanks for commenting, it's exciting that this is progressing!

Given that it's asserting SVG changes

This, to me, is the big question. The datasets which auspice sources (via npm run get-data) change over time, as new genomes become available, so we'd have to provide a set of stable datasets in order to generate the same SVG over time. But this isn't hard. @tihuan I'd recommend using a ncov dataset with the date in the filename for the current test.

What happens if, for instance, we change the font size of the axis text label? Presumably this breaks any test with the tree in view. Is there a way to programmatically regenerate the images needed for all the tests?

Yes! The image diffing library uses the same flag as updating regular jest snapshots -u: doc, so it's easy to update the files when needed

I'm thinking we can create a new command in package.json to run tests and update snapshots, so people don't need to optionally pass -u if they expect snapshots and screenshots to change for their commit

Worth me asking at this point if there other approaches than comparing image snapshots which we should consider? E.g. ensuring that a certain SVG element is in the DOM & has the correct colour after a colour-by change (I have no idea what's best practice at the moment).

Great question! We also have unit testing we can use to snapshot DOM structure via jest snapshot testing and enzyme from Airbnb. It will do what you described, which is asserting the SVG elements are rendered with the right attributes, inner text, etc., as expected.

Personally I think we should do both unit testing at the component level by feeding the component different combinations of props, and snapshot expected DOM output. And also UI testing with screenshots to make sure the whole user flow works in the app as expected.

Additionally, since UI/E2E tests are much slower than unit testing, they should only cover critical user flows / happy paths that block prod deployment when any of them breaks. So it's meant to be spot checking things that should never break, while unit testing can cover way more edge cases to ensure any function/component works as intended given any input

Quick definition of unit, integration, and UI tests that I'm referring to, since I feel there are times different people define tests differently 😆 E.g., referring to UI tests as integration tests and vice versa. So just wanted to make sure we're aligned on what those terms mean!

write a test for it, get it merged, so it can be an example for other tests to follow.

Agreed, with the caveat that we should pick the right approach here as (hopefully) many tests will be forthcoming which use it. Really like your thoughts on detailing a suggested API for others to follow ⭐️

That sounds great! Yeah, defining patterns that will be used at scale will involve risks, but we can always adjust them as we go when team members discover new patterns that work better.

I think the discussion should continue, so we get more ideas on figuring out best practices for testing data virtualization. Hopefully we can get more people to chime in!

In the meantime I'll work on a prototype and get a PR out for screenshot testing, so we have a concrete example that we can use to discuss pros and cons

Should we add tests and linter as part of TraisCI PR/branch check?

Seems sensible. Recent work means we're now at 0 errors & warnings (perhaps the first time this has happened!). Enforcing this via husky seems great -- I haven't used autofixes, so not sure about failing the commit vs autofixing.

HURRAY!! That's a super exciting win 🎉🤩👏!

Yeah, ESLint comes with rules that can be autofixed, so as long as we pass --fix flag, ESLint will attempt to fix the fixable errors for us and fail the linting only when it encounters errors that it can't autofix

In addition to ESLint, for the code base to have a consistent coding style and format, we can use prettier, which has been wildly popular since its introduction 3 years ago. Definitely highly recommend it!

@jameshadfield
Copy link
Member Author

In the meantime I'll work on a prototype and get a PR out for screenshot testing, so we have a concrete example that we can use to discuss pros and cons

💯 -- thanks!

@tihuan
Copy link
Contributor

tihuan commented Apr 9, 2020

Hi all! Sorry for lack of updates in the past week, been swamped with work 🌊🏊‍♂️

Currently I got screenshot diffing to work, but a few things I wanna add before submitting a PR:

  1. Add jest.retryTimes(4), so our test suite will automatically retry a failing test up to 4 times, in case of unexplainable flaky tests. Which will require adding jest-circus as test runner doc.

It's super easy to set up, I just have been working nights to get a major project on track right now 😆

  1. Add a helper function await waitUntilScreenSteady() to address an issue I discovered while doing screenshot testing, which is that when d3 updates the tree, there's a tiny window where some edges disappear/reappear. So screenshot sometimes fails the diffing because of that.

So I was thinking the helper function could take a screenshot periodically, say, every 100ms, and compare the screenshots until they are identical (else retry). So we know the page is steady and ready for the real screenshot diffing test.

I'm hoping to get my work project on track this week, then get back on this testing stuff on Sunday

Hope that's okay! Thank you 😊 🙏

@tihuan
Copy link
Contributor

tihuan commented Apr 16, 2020

Just submitted a PR for adding screenshot testing!

PTAL all 😄 🙏 !

#1068

Thank you!

@bcoe
Copy link
Contributor

bcoe commented Apr 19, 2020

CC: @nellyk @anescobar1991 (two of the maintainers of jest-image-snapshot). As @tihuan works towards adding screenshot testing in #1068, do you have any advice for initial configuration options based on your experience?

@anescobar1991
Copy link

anescobar1991 commented Apr 19, 2020

First of all thanks to all of you for working on a project like this one that is helping us in our fight against covid19 (and other malicious pathogens)!

What I'd recommend is executing the tests in a docker container in order to ensure repeatability across operating systems. Otherwise you will likely have issues with snapshots looking one way on a developer's machine (let's say running macOS) but then on CI the snapshots look slightly different causing failures. Many times things like fonts render differently across operating systems causing these issues. This seems heavy handed (and is) but over the years we really have found that it is the only way to have minimally brittle integration tests.

That and make sure that all data is mocked to avoid dependencies on things you don't have control over.

Other than that the things mentioned in #917 (comment) make sense!

Let me know if you face any issues!

@rleir
Copy link
Contributor

rleir commented May 9, 2020

Hi all, There is another linter (no, call it a static source analyzer?) called Deepscan with a free open source plan. Currently it shows 24 issues in Auspice. My PR's have been fixes for Deepscan warnings, and Deepscan makes it easy by recommending a fix. Those 24 issues might be harmless but there could be one in 24 which actually breaks something.

  • JavaScript/TypeScript analysis
  • Auto sync with GitHub
  • Simple grade system for code quality
  • Three months of historical trends
    They don't mention React in this list, but Deepscan seems to understand it. Cheers -- Rick

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
please take this issue proposal Proposals that warrant further discussion
Projects
No open projects
Development

No branches or pull requests