review options for accessibility linting (versus documentation) #6314

cfm · 2022-03-02T20:54:04Z

Require WCAG compliance along with other linting enforced automatically in CI. Here's an overview of how to do so and the W3C's list of such tools, of which Pa11y and AccessLint look especially promising.

Originally posted by @cfm in #5743 (comment)

skyvine · 2023-01-20T19:20:22Z

Machine Testing

Following the conversation in #6725, I took a look at pa11y/pa11y-ci and how it could be used for testing in this project. It turns out that pa11y is a wrapper around puppeteer (to manage the browser instance) and HTML CodeSniffer (to run tests on the page received by the browser) which provides some benefits over using them directly:

Management of browser setup/teardown is transparent (as long as Chrome is installed)
Configuring the browser & the tests can happen in one place (the pa11y config file)
A variety of built-in output formats, including json for programmatic processing
Built-in system for running user actions before running tests
Filtering error reports
- Ignore specific error codes
- Control reporting of notices & warnings (neither are considered actionable by pa11y maintainers)

Unfortunately, pa11y is hardcoded to Chrome. It might be possible to convert it to firefox (see additional notes below); however, even in this case, it seems like using pa11y to manage puppeteer to manage a separate firefox instance would add complexity to the test module, with both pa11y and the existing selenium wrappers implementing similar functionality. HTML CodeSniffer is a client-side script, injecting it using selenium would result in a less complex system overall.

On the other hand, HTML CodeSniffer does not have a built-in method for reporting results (it just saves them in memory), so reporting would need to be handled separately. Reporting seems simlpe, because all we would need to do is save the results into a cookie, and I see at least one example of cookie-reading already in the test suite. The code is minimal and readable.

Example json output of a pa11y run on the source login page

> pa11y http://127.0.0.1:8080 --reporter json
[{
  "code": "WCAG2AA.Principle1.Guideline1_4.1_4_3.G18.Fail",
  "type": "error",
  "typeCode": 1,
  "message": "This element has insufficient contrast at this conformance level. Expected a contrast ratio of at least 4.5: 1, but text in this element has a contrast ratio of 2.65: 1. Recommendation:   change text colour to #747394.",
  "context": "<p id=\"footer-fpf\">\n    SecureDrop is a project of...</p>",
  "selector": "#footer-fpf",
  "runner": "htmlcs",
  "runnerExtras": {}
}]

pa11y being hardcoded to Chrome

pa11y uses puppeteer; puppeteer does support firefox, as mentioned in their documentation. It might be relatively straightforward to tell pa11y to use firefox, but it's not immediately obvious to me how to do so. pa11y does forward launch options to puppeteer without any processing:

https://github.com/pa11y/pa11y-ci/blob/master/lib/pa11y-ci.js#L39

// This function does all the setup and actually runs Pa11y
// against the passed in URLs. It accepts options in the form
// of an object and returns a Promise
function pa11yCi(urls, options) {
    // eslint-disable-next-line no-async-promise-executor
    return new Promise(async resolve => {
        // Create a test browser to assign to tests
        let testBrowser;

        // Issue #128: on specific URLs, Chrome will sometimes fail to load
        //  the page, or crash during or after loading. This attempts to
        //  relaunch the browser just once before bailing out, in case Chrome
        //  crashed at startup. This is just an attempt at mitigating it,
        //  it won't fix #128 completely or even at all
        try {
            testBrowser = await puppeteer.launch(
                options.chromeLaunchConfig
            );
        } catch (error) {
            testBrowser = await puppeteer.launch(
                options.chromeLaunchConfig
            );
        }

So I tried the most obvious thing and added "product": "firefox" to the configuration file:

{
  "defaults": {
    "chromeLaunchConfig": {
      "product": "firefox"
    }
  }
}

It wasn't clear how to verify whether firefox was used, so I just straced the command. Based on the some of the filenames which appeared in the output, I think it was using Chrome:

access("/usr/local/lib/node_modules/pa11y/node_modules/puppeteer/.local-chromium/linux-869685", F_OK) = 0
statx(AT_FDCWD, "/tmp/puppeteer_dev_chrome_profile-xpNUZU", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFDIR|0700, stx_size=4096, ...}) = 0
rmdir("/tmp/puppeteer_dev_chrome_profile-xpNUZU/Default") = 0

I do not know that this is a precise or reliable way of checking the engine being used, but it seems like a reasonable approximation. Considering the reasons listed above for using htmlcs directly, and my ignorance about whether anybody else has looked into this particular issue already, I have not yet looked into it further.

Human Review

I have also put some thought into management of the requirements that need human review, although my ability to weigh the options here is limited because of my unfamiliarity with project resources & management. I have 2 initial thoughts on this though.

Periodically review entire website

We could have an issue template that has a checklist of all of the items that require human review and create it automatically. This might happen based on the passage of time (for example, once per quarter) or based on release schedules (for example, requiring this review before a major release). This option makes it easier to control the amount of resources the project uses on this review, but might leave gaps where accessibility issues are unnoticed.

Require accessibility review whenever a relevant file changes

We could also require the accessibility checklist to be filled out each time there are changes to html or css. This makes sure that there will be no gaps, but could cause a significant drain on resources. In the most absurd situation, we could end up telling people to re-verify that images have alt text when all they're doing is clarifying a sentence in a paragraph. I could argue that this should be trivial to confirm, because the author knows that they haven't added any images, but this kind of heavy-handed bureaucracy is likely to frustrate contributors and make it less likely that they will pay attention to accessibility issues that are actually relevant to their change.

skyvine · 2023-01-22T17:49:55Z

I sketched out what the 2 different methods (pa11y as a subprocess vs html codesniffer/htmlcs injection) would look like with some code that does the bare minimum to reproduce the error message from the manual pa11y run in the test suite. The code is extremely hacky, but gives a good idea of what the major immediate challenges for each approach are. Output from make test TESTFILES=tests/functional/accessibility.py is also linked.

Using pa11y

The 2 things that stand out to me here are installing pa11y in the container and using firefox (or tor if possible) instead of chrome. I am currently handling the former by running apt install and npm install in the test running script, but IIUC it should be possible to just update the docker container to contain the dependency. I've never been on that end of docker before though.

The latter is trickier. There is an option in pa11y to pass in a pre-constructed browser instance, so theoretically we should be able to just have a small wrapper script and use pa11y as a library (the htmlcs version also has a small wrapper script to run the sniffs). I don't see any indication that puppeteer can use TOR Browser specifically, but I see that some tests currently in the suite are using the Firefox driver. Not sure what the criteria for that is.

test-pa11y-subprocess.txt

Using HTML CodeSniffer (htmlcs)

The thing that stands out to me here is managing the htmlcs build. This scratch test just tosses the built javascript into the repo, but I think it would be ideal to place this in the docker image, similar to the assessment of pa11y above.

Human Review

One more note on this topic: it looks like the notice level in htmlcs lists at least some (possibly all) of the items that require human review, so we could leverage this instead of manually maintaining the list of items.

test-htmlcs.txt

cfm · 2023-01-23T21:22:21Z

Exciting results, @saffronsnail! Since Tor Browser mostly inherits Firefox's accessibility properties—and since the Source Interface is JavaScript-free to encourage the use of Tor Browser's strict mode, which otherwise modifies those properties—I think using the Firefox driver where necessary should be both adequate and already a dramatic improvement over not having this linting at all!

Note that, like #6072, we'll probably want these findings to be informational (non-failing) in CI to begin with—until we're confident enough in the tool's findings, and our ability to address them, to make them a hard requirement on subsequent pull requests.

skyvine · 2023-01-24T16:22:36Z

Makes sense, it looks like I can make sure that the accessibility tests aren't immediately fatal by adding || true as in the final commit for that series. IIUC, we want 2 separate additions, one which reports the error results from the linter in the test suite and will eventually have full status in make test, and one which reports warnings and notice items which require human review, perhaps make accessibility-review. I expect the former to have fewer decision points in terms of integration, since it's just adding it to the existing test output, but the latter seems novel. I'm imagining that by default it would print out summary statistics at the top, then individual information for each page, along these lines:

SUMMARY
page1.html: 22 notice, 11 warning
page2.html: 3 notice, 12 warning

---
page1.html

Notice 1: make sure to check that the baz has been barred.
Notice 2: are all the gizmos on top of the appropriate gadget?
<etc>

---
page2.html
<etc>

Regarding the browser discussion, it sounds like you're leaning towards using pa11y? I can definitely implement it that way, I'm just a little surprised because the HTML CodeSniffer version works with the tor_browser_driver fixture, so this seems like a more reliable test even if firefox is generally the same. That said, I'm not sure if the fact that it works with the tor_browser_driver raises any flags on your end, because that indicates that the driver being tested has javascript enabled.

skyvine · 2023-01-25T00:46:30Z

Here's what I did today:

Added a dataclass to represent the messages from pa11y; they are imported into this data structure, so that everything else is loosely coupled.
Refactored the bulk of the existing code into a run_accessiblity_test function. My expectation is that individual tests will focus on navigating to the correct page, then they can just toss the URL into this helper in order to do the right thing.
Added 2 new make targets, accessibility-test and accessibility-review, for running the tests or printing the warnings & notices (through the roundabout method of causing test failures)
- Internally, this adds similarly-named command-line flags to the pytest invocation, which were added in the same place that the --page-layouts flag was added
- Also updated securedrop/dockerfiles/focal/python3/Dockerfile to install npm & pa11y`
Added a new accessibility-tests target to .cicrleci/config.yml; this was based on other targets in the file, I'm new to circleci & not sure how to verify this addition without actually pushing

The make accessibility-review target differs a bit from the output I described in the previous comment, because it's not going to summarize the information on all of the pages at the same time. Basically, the test and review plumbing is shared, with the only difference being which messages are checked by run_accessibility_test.

You can view a diff of this version here. Is posting a diff sufficient or should I open a new pull request at this point?

skyvine · 2023-01-25T22:09:18Z

So, it turns out that using pa11y is more work than I expected. While pa11y lets me use actions to navigate through the page, creating everything to crawl through all the pages would be non-trivial, and it looks like the app navigators have most if not all of this implemented already. I'm switching back to HTML CodeSniffer for now, let me know if there's an issue with that (as a reminder, this is the underlying tool that pa11y uses to check for accessibility issues, so it will give us the same results, just the mechanics of processing it is different).

The base of the change seems to be in a solid place right now. All that's really left to do is create the code that crawls through the different pages to make sure everything is tested. Fingers crossed I'll have something submittable by next Monday. You can view the current diff here.

skyvine · 2023-01-26T23:27:19Z

I think I have all of the source pages tested. Went a little faster than I expected, the journalist interface looks more complicated though. I haven't done a comprehensive review of the error messages, but all the ones I've seen have had to do with insufficient contrast. I expect the number of changes should be much smaller than the number of errors, for example all of the blue buttons have a contrast error, and I assume that the "blue button background color" is stored in one place, so adjusting that will fix it everywhere.

I have a few outstanding questions, if you have the chance to look at them now great, if not then I'll include them in the body of the pull request when the commit is in a good state. Here is the current diff.

We trust apt but verify the TOR download. It looks like npm will verify signatures if they are provided, but html_codesniffer does not have the requisite field. How concerning is this? The HTML CodeSniffer GitHub releases also do not include a signature. I don't see a good solution here, but at a minimum I could check that the sha256sum is what we expect.
Is there a systematic way to verify that I've hit all the pages/page variants that need to be tested?
When does sd_servers_with_clean_state need to be used (instead of plain sd_servers)? IIUC, session-specific changes (eg, logging in) do not need the clean state, but persistent changes (such as creating a new source) do.
The source tests have separate functions for testing the page after each action. For example, there is a test for the source page immediately after logging in, and a separate test for the source page immediately after submitting a message. These pages are largely the same, with the addition of an element after the message is submitted. I'm erring on the side of keeping them as separate tests, so that it is easy to see if an action causes the responsible element to appear, but it could be reasonable to combine some of them, such as the 2 previously mentioned. Let me know if you would prefer these tests be combined where possible. Some still need to remain separate, for example the page when there are no replies and when there are replies, so because each contains at least one element that the other does not.

skyvine · 2023-01-28T00:40:47Z

I have a decent chunk of journalist tests done. They're a little flaky because sometimes the page doesn't load before nav_helper.wait_for times out, but I'm hoping that's due to the power of my machine. If not, I suspect that some tricks similar to what securedrop/tests/functional/conftest.py::create_source_and_submission does could help with the load times.

Also, since this is getting to be a pretty hefty change, I split it up into a few different diffs:

Add accessibility testing infrastructure (Note that, currently, the sniff_accessibility_issue helper passes regardless of HTML CodeSniffer output, to make it easier to identify tests that fail due to incorrect code during development)
Add accessibility tests for source interface
Add journalists tests for login page and source list page
Add journalist tests for source message list
Add journalist tests for admin interface

Outstanding questions unchanged from previous comment.

skyvine · 2023-01-30T19:30:23Z

Tests for the admin interface are in place. I still need to review the diffs to make sure everything is organized correctly, add docstrings, and check that the test times are reasonable. Diffs:

Add accessibility testing infrastructure (Note that, currently, the sniff_accessibility_issue helper passes regardless of HTML CodeSniffer output, to make it easier to identify tests that fail due to incorrect code during development)
Add accessibility tests for source interface
Add journalist tests for login page and source list page
Add journalist tests for source message list
Add journalist tests for admin interface
Add journalist tests for 2FA reset forms

cfm · 2023-01-31T00:32:42Z

This looks like terrific work, @saffronsnail. Thank you for approaching it so rigorously! I've answered your recent questions inline (below).

#6314 (comment):

Is posting a diff sufficient or should I open a new pull request at this point?

We use pull requests to review all code changes, so please do open a pull request against this repository. Feel free to assign the review to me.

#6314 (comment):

We trust apt but verify the TOR download. It looks like npm will verify signatures if they are provided, but html_codesniffer does not have the requisite field. How concerning is this? The HTML CodeSniffer GitHub releases also do not include a signature. I don't see a good solution here, but at a minimum I could check that the sha256sum is what we expect.

Thanks for flagging this. Can you clarify why you say that the html_codesniffer npm package is missing the requisite field? I read https://docs.npmjs.com/about-registry-signatures to say that all "[p]ackages published to the public npm registry are signed", while only third-party registries must provide their own signatures for npm to check. That's corroborated by:

user@sd-dev:~$ curl -s https://registry.npmjs.org/html_codesniffer | jq '.versions["2.5.1"].dist'
{
  "integrity": "sha512-vcz0yAaX/OaV6sdNHuT9alBOKkSxYb8h5Yq26dUqgi7XmCgGUSa7U9PiY1PBXQFMjKv1wVPs5/QzHlGuxPDUGg==",
  "shasum": "d76d124b8f5cd0e58b3c1b142fd095a40573ea28",
  "tarball": "https://registry.npmjs.org/html_codesniffer/-/html_codesniffer-2.5.1.tgz",
  "fileCount": 136,
  "unpackedSize": 1271493,
  "npm-signature": "-----BEGIN PGP SIGNATURE-----\r\nVersion: OpenPGP.js v3.0.4\r\nComment: https://openpgpjs.org\r\n\r\nwsFcBAEBCAAQBQJelPYMCRA9TVsSAnZWagAAfOAP/j0GG0g+7bEMvAjp/asy\nlHpqDNMJtVmvl4kXF3t6cWdq70ZpKpLD+xpuqbZH8D/iF5DtlTJ0Ekfsopr8\njKwb5zIf8HWg8qwSvwnFJdxkNy6rhWR4Ph4o0twHnPWeaLw5GkiW2iZAmigi\nMAA2OOS041FoVXC84eE+p5mGaYYiM1eSf4i9ikmOm/GBKExHrDjGPEP6idUo\nRoKE8dTKHS6GsnXGuoHvEX1NW0DR1WTuPn/BleYnqR1Yx+WbLaxWqaiZU9xr\n82d7Xpoqki7BPAS+z9KnZ5Z/3sc1pRMW8MVtRlwzi6s7JwKjOSKuEFFi5XNL\nSP71yr08J3tLr5J4twI4DgUhT0y9LuxUsei+6o4cc9DL2knaVpW7IdS5QppS\nP7NnyZ4KWTtTVvz1x2slpPNYogfBHQFLVOCflpIj1dB1fGNaDus9zusfgCPQ\nV17suyuu3+OI/+Mnrfm1l/bZaUQ7xvJ3syH1OoDZyzr8cspmN6fxcCANLipw\nCTwkhTG7yLrLc+3955Y+W9D01HDdnYsVymZZdQhwVDlYe2KOuSDTUYQrBSgj\n+rFZiql1M5xDZrRJC2Aq97GtqrR36k/R4t9GN3GzBWFM/DZU3bIFCYDCFgH8\nz+2NmmGFTaz4rkBsYHmsla0PoBCKE3C1CNDE6o4cD3SzQ+wvcDMFrnI9aTMk\n1Fp/\r\n=6T1P\r\n-----END PGP SIGNATURE-----\r\n",
  "signatures": [
    {
      "keyid": "SHA256:jl3bwswu80PjjokCgh0o2w5c2U4LhQAE57gj9cz1kzA",
      "sig": "MEYCIQDCW/oPd0XThYnTS+76sK72UB3q9uEEYGTH4rNQonkWLQIhAKxqe8D9bWdnetw7p9pIOWtcJPYBVkELB369grEGOO14"
    }
  ]
}

In any event, I'll make sure to flag this to our security engineers during review, since this change involves adding a new toolchain, not just pulling in an new dependency.

Is there a systematic way to verify that I've hit all the pages/page variants that need to be tested?

The source tests have separate functions for testing the page after each action. For example, there is a test for the source page immediately after logging in, and a separate test for the source page immediately after submitting a message. These pages are largely the same, with the addition of an element after the message is submitted. I'm erring on the side of keeping them as separate tests, so that it is easy to see if an action causes the responsible element to appear, but it could be reasonable to combine some of them, such as the 2 previously mentioned. Let me know if you would prefer these tests be combined where possible. Some still need to remain separate, for example the page when there are no replies and when there are replies, so because each contains at least one element that the other does not.

There's no canonical list of views, states, or conditions under test, and it seems like this is the main disadvantage of establishing a separate accessibility test suite: it's not just tacking on the accessibility evaluator to each of the existing functional tests. If that's not possible—e.g., by something like subclassing TestJournalist, super()ing each of its methods, and appending your sniff_accessibility_issues(), then please feel free to start small, with a single test per view. We can always add more incrementally once we've proven this pattern.

(My thinking with the above suggestion is that it would be better for long-term maintenance to get a good-enough first pass of accessibility validation "for free", whenever someone adds a new functional test, than to make authors of view-level changes also responsible for accessibility as well as unit-level and functional tests. This is in the spirit of linting—it's automatic, comprehensive, and intended above all to prevent trivial regressions—more than full functional testing, and I recognize that it may not be doable in this approach or with the tools available.)

When does sd_servers_with_clean_state need to be used (instead of plain sd_servers)? IIUC, session-specific changes (eg, logging in) do not need the clean state, but persistent changes (such as creating a new source) do.

Yes, stateless and stateful changes, respectively, per #6382:

securedrop/securedrop/tests/functional/conftest.py

Lines 200 to 203 in afe4a8b

    
               This fixture is session-scoped so the apps are only spawned once during the test session, and 
        
               shared between the different unit tests. If your test needs to modify the state of the apps 
        
               (example: a source submits a message), use the sd_servers_with_clean_state fixture, which is 
        
               slower.

skyvine · 2023-02-01T16:34:48Z

@cfm Thanks for the info! I think most of the changes in the first diff are applicable to the more minimal approach you're looking for, I'll open a pull request once I have something that is less intrusive. I wasn't sure if it made sense to open a pull request as soon as I had an interesting diff or if it was better to wait until it was ready to go, it sounds like the former.

Thanks for flagging this. Can you clarify why you say that the html_codesniffer npm package is missing the requisite field? I read https://docs.npmjs.com/about-registry-signatures to say that all "[p]ackages published to the public npm registry are signed", while only third-party registries must provide their own signatures for npm to check.

It looks like I misread the documentation, the "third party registry" context was pretty important there.

There's no canonical list of views, states, or conditions under test, and it seems like this is the main disadvantage of establishing a separate accessibility test suite: it's not just tacking on the accessibility evaluator to each of the existing functional tests.

It looks like the pagelayout tests you mentioned earlier would be a good place to "sneak this in" since it's going through, at a minimum, some reasonable subset of existing pages. What would you think about adding a call to sniff_accessibility_issues to these functions, immediately after they call save_screenshot_and_html? This seems particularly convenient because these tests are already non-fatal, and the the accessibility sniffs seem similar in spirit in checking the page rendering of different languages (making sure that the pages "look ok" for users with different backgrounds).

cfm · 2023-02-08T01:38:09Z

On Wed, Feb 01, 2023 at 08:34:58AM -0800, Skyler Ferris wrote: @cfm Thanks for the info! I think most of the changes in the first diff are applicable to the more minimal approach you're looking for, I'll open a pull request once I have something that is less intrusive. I wasn't sure if it made sense to open a pull request as soon as I had an interesting diff or if it was better to wait until it was ready to go, it sounds like the former.

Either way! You can always leave it marked as draft to make clear that you're seeking feedback on work still in progress. :-)

It looks like the pagelayout tests you mentioned earlier would be a good place to "sneak this in" since it's going through, at a minimum, some reasonable subset of existing pages. What would you think about adding a call to `sniff_accessibility_issues` to these functions, immediately after they call `save_screenshot_and_html`? This seems particularly convenient because these tests are already non-fatal, and the the accessibility sniffs seem similar in spirit in checking the page rendering of different languages (making sure that the pages "look ok" for users with different backgrounds).

That sounds like a great approach.

cfm added the a11y Issues related to accessibility label Mar 2, 2022

cfm mentioned this issue Mar 2, 2022

Review and prioritize 2019 accessibility report findings #5972

Closed

17 tasks

zenmonkeykstop added good first issue hackathon labels Nov 2, 2022

cfm mentioned this issue Jan 18, 2023

Pull descriptive titles from headings #6725

Closed

5 tasks

skyvine mentioned this issue Feb 2, 2023

Add title blocks. #6738

Merged

4 tasks

skyvine mentioned this issue Feb 8, 2023

Add accessibility sniffs to pageslayout tests #6745

Merged

4 tasks

cfm closed this as completed in #6745 Apr 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

review options for accessibility linting (versus documentation) #6314

review options for accessibility linting (versus documentation) #6314

cfm commented Mar 2, 2022

skyvine commented Jan 20, 2023 •

edited

Loading

skyvine commented Jan 22, 2023 •

edited

Loading

cfm commented Jan 23, 2023

skyvine commented Jan 24, 2023

skyvine commented Jan 25, 2023

skyvine commented Jan 25, 2023

skyvine commented Jan 26, 2023

skyvine commented Jan 28, 2023 •

edited

Loading

skyvine commented Jan 30, 2023

cfm commented Jan 31, 2023

skyvine commented Feb 1, 2023

cfm commented Feb 8, 2023 via email

review options for accessibility linting (versus documentation) #6314

review options for accessibility linting (versus documentation) #6314

Comments

cfm commented Mar 2, 2022

skyvine commented Jan 20, 2023 • edited Loading

Machine Testing

Human Review

Periodically review entire website

Require accessibility review whenever a relevant file changes

skyvine commented Jan 22, 2023 • edited Loading

Using pa11y

Using HTML CodeSniffer (htmlcs)

Human Review

cfm commented Jan 23, 2023

skyvine commented Jan 24, 2023

skyvine commented Jan 25, 2023

skyvine commented Jan 25, 2023

skyvine commented Jan 26, 2023

skyvine commented Jan 28, 2023 • edited Loading

skyvine commented Jan 30, 2023

cfm commented Jan 31, 2023

skyvine commented Feb 1, 2023

cfm commented Feb 8, 2023 via email

skyvine commented Jan 20, 2023 •

edited

Loading

skyvine commented Jan 22, 2023 •

edited

Loading

skyvine commented Jan 28, 2023 •

edited

Loading