Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ci): CI/CD workflows and composite actions to check test results (#1533) #1537

Closed

Conversation

dgrebb
Copy link
Contributor

@dgrebb dgrebb commented Jan 16, 2024

Overview

This PR implements new test check workflows, incorporates Composite Actions for efficient workflow coding, and sets up caching for Playwright binaries, extending the existing caching mechanism for Puppeteer.

Key Updates:

  • New test check workflows for schema validation and expected output, enhancing PR change detection.
  • Integration of Composite Actions for reusable workflow segments.
  • Playwright bitmap references added for smoke tests, and inclusion of test/__fixtures__ for report.json snapshots.
  • Workflow runs currently triggered by workflow_dispatch, with potential for pull_request activation upon code modification.

Test ChecksDocker Test Checks

Potentially closes #1533.

Details

Composite Actions

Located in .github/actions/[action]/action.yml, these actions simplify workflow setup, execution, and validation. Future plans include converting existing workflows to utilize these actions.

Workflow Testing

The workflows focus on the backstop test command execution and the validation of report.json against predefined fixtures. Tests catch overlooked sections and detect changes in report property names.

General Workflow Steps

  1. Execute an npm test.
  2. Compare report.json with corresponding fixture (./test/__fixtures__/[npm-script-name].json).
  3. Pre-filter report.json properties for shape consistency.
  4. Summarize results with a Pass/Fail determination.

Smoke and Integration Test Specifics

  • Smoke Tests: Address discrepancies between local and GitHub runs by filtering report.json before comparison.
  • Integration Tests: Focus on the final report.json generated by backstop test, using a Bash script to select the latest report.

Workflow Files

  • integration-test-check.yml: Runs integration tests and assesses report.json.
  • sanity-test-checks.yml: Executes and compares sanity-test and sanity-test-playwright.
  • smoke-test-checks.yml: Handles both smoke-test and smoke-test-playwright.
  • Docker-related workflows follow a similar structure for sanity and smoke tests.

Playwright Binaries Caching

Improves efficiency by caching Playwright installations on GitHub Actions, using OS and version for cache identification.

Conclusion

These updates aim to streamline testing processes and improve reliability. Feedback on the inclusion or modification of these features is welcome.

Cheers!


Notes and Further Details

Composite Actions are repeatable pieces of code that can take inputs and produce outputs. They live in .github/actions/[action]/action.yml, and are a great way to keep setup, execution, and validation patterns DRY.

I haven't changed any existing workflows to use composite actions yet, but am happy to do so.

New workflows are set to run only by workflow_dispatch for now. They can be enabled for pull_request if desired, but need a code change. Let me know if interested and I'll add a commit :)

What are they Testing?

Perhaps a portion of the backstop test command was accidentally commented out. For example:

module.exports = {
  execute: function (config) {
    const executeCommand = require('./index');
    // if (shouldRunDocker(config)) {
    //   return runDocker(config, 'test')
    //     .finally(() => {
    //       if (config.openReport && config.report && config.report.indexOf('browser') > -1) {
    //         executeCommand('_openReport', config);
    //       }
    //     });
    // } else {
    //   return createBitmaps(config, false).then(function () {
    //     // return executeCommand('_report', config);
    //   });
    // }
  }
};

Running npm run sanity-test does not catch this in command output:

COMMAND | Executing core for "test"
COMMAND | Resolved already:test
COMMAND | Command "test" successfully executed in [0.001s]

However, by expecting a test/configs/backstop_data/bitmaps_test/[TIMESTAMP]/report.json we can catch a failed test run by diff (explained in detail later):

image

Another example, maybe someone changes a report property name:

image

Both are forced examples, but provide a glimpse into what's possible.

Smoke Test Caveat

I've seen a few smoke tests pass on GitHub but fail locally. For now, test comparisons first filter the report.json objects, deleting properties we know will have different shapes (or not exist at all in a pass):

jq 'walk(if type == "object" then with_entries(.value |= if type == "object" or type == "array" then . else "" end) else . end) | \
\
del(.tests[].pair.diff, .tests[].pair.diffImage)' \
\
test/__fixtures__/smoke-test.json

Line breaks added for readability

diffImage doesn't exist on passing tests, so it's removed before analyzing report.json . As you previously mentioned, smoke tests are somewhat unreliable. "misMatchThreshold" : 0.1, could also be bumped a bit, to be more forgiving.

We can take a look at polishing smoke tests at some point, but this gets the job done! Below is a snapshot of the failing diff before filtering with jq.

image

Integration Caveat

The integration-test script generates two reports. One when running backstop reference, and the other after backstop test. We apply a fancy bash one-liner to find the most recently modified directory, and only diff the final report.

report.json Filtration Details

First and foremost, during a test check, .tests[.pair] object values are set to empty strings. Some values will never be 1:1, due to system runtime differences, browser changes over time, etc. Data shape only is being tested in these new workflows.

jq is used to traverse the report.json object and set non array or object property values to empty strings: "", which is also applied to nested properties within any aforementioned object or array.

This affords a way to test the general "shape" of data we expect backstop test to produce, comparing it with corresponding JSON files in test/__fixtures/.

That ends up looking like this, which is the shape tested in integration and sanity "check" workflows introduced in this PR:

{
  "testSuite": "",
  "tests": [
    {
      "pair": {
        "reference": "",
        "test": "",
        "selector": "",
        "fileName": "",
        "label": "",
        "requireSameDimensions": "",
        "misMatchThreshold": "",
        "url": "",
        "referenceUrl": "",
        "expect": "",
        "viewportLabel": "",
        "diff": {
          "isSameDimensions": "",
          "dimensionDifference": {
            "width": "",
            "height": ""
          },
          "rawMisMatchPercentage": "",
          "misMatchPercentage": "",
          "analysisTime": ""
        },
        "diffImage": ""
      },
      "status": ""
    },
    {
      "pair": {
        "reference": "",
        "test": "",
        "selector": "",
        "fileName": "",
        "label": "",
        "requireSameDimensions": "",
        "misMatchThreshold": "",
        "url": "",
        "referenceUrl": "",
        "expect": "",
        "viewportLabel": "",
        "diff": {
          "isSameDimensions": "",
          "dimensionDifference": {
            "width": "",
            "height": ""
          },
          "rawMisMatchPercentage": "",
          "misMatchPercentage": "",
          "analysisTime": ""
        },
        "diffImage": ""
      },
      "status": ""
    }
  ],
  "id": ""
}

Happy to discuss in detail :)

Workflows

integration-test-check.yml

This runs npm run integration-test, then tests the resultant report.json , which is the last step in the project's integration test: backstop test.

The GitHub workflow results in a pass/fail based on shape alone. The unfiltered A/B fixture/CI diff is included in the workflow's summary for further analysis.

![NOTE]
All workflow summaries contain the unfiltered diff, under the "Unfiltered Diff" heading in workflow summary. There will always be timestamp directory-name differences in the "test" property, which further illustrates why property/value filtering is needed.

image

sanity-test-checks.yml

Runs both sanity-test and sanity-test-playwright then compares the corresponding fixture and report.json.

smoke-test-checks.yml

Runs both smoke-test and smoke-test-playwright then compares the corresponding fixture and report.json.

docker-sanity-test-checks.yml and docker-smoke-test-checks.yml

Same, but via Docker.

Playwright Binaries Caching

Playwright takes a long time to install on every run, so I found a way to cache the binaries in GitHub Actions, using OS and version as a the "Caches" name (and lookup):

image
Located here in the GitHub UI: https://github.com/dgrebb/BackstopJS/actions/caches

@dgrebb dgrebb force-pushed the feat/1533-github-actions-report-validation branch from 8fe92eb to b77da6d Compare January 26, 2024 19:09
@dgrebb dgrebb force-pushed the feat/1533-github-actions-report-validation branch from b77da6d to 5ab976a Compare January 26, 2024 19:11
@dgrebb dgrebb closed this Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GH Actions tests -- what are they testing?
1 participant