act-rules-format.bs

<pre class='metadata'>
Title: Accessibility Conformance Testing Rules Format
Shortname: ACT-Format
URL: https://w3c.github.io/wcag-act/act-rules-format.html
Previous Version: https://w3c.github.io/wcag-act/archive_act-format/2016-12-17.html
Level: 1.0
Status: ED
Group: act-framework
Editor: Wilco Fiers, Deque Systems
Editor: Maureen Kraft, IBM Corp.
Abstract: The Accessibility Conformance Testing (ACT) Rules Format 1.0 defines a format for writing accessibility test rules. These rules can be carried out fully-automatically, semi-automatically, and manually. This common format allows any party involved in accessibility testing to document and share their testing procedures in a robust and understandable manner. This enables transparency and harmonization of testing methods, including methods implemented by accessibility test tools.
Markup Shorthands: markdown yes
</pre>

Introduction {#intro}
=====================

There are currently many test procedures and tools available which aid their users in testing web content for conformance to accessibility standards such as the [Web Content Accessibility Guidelines (WCAG)](https://www.w3.org/WAI/standards-guidelines/wcag/) [[WCAG]]. As the web develops in both size and complexity, these procedures and tools are essential for managing the accessibility of resources available on the web.

This format is intended to enable a consistent interpretation of how to test for [accessibility requirements](#structure-accessibility-requirements) and promote consistent results of accessibility tests. It is intended to be used to describe both manual accessibility tests as well as automated tests performed by accessibility testing tools (ATTs).

Describing how to test certain accessibility requirements will result in accessibility tests that are transparent, with test results that are reproducible. The Accessibility Conformance Testing (ACT) Rules Format defines the requirements for these test descriptions, known as Accessibility Conformance Testing Rules (ACT Rules).


Scope {#scope}
==============

The ACT Rules Format defined in this specification is focused on the description of rules that can be used in testing content created using web technologies, such as [Hypertext Markup Language](https://www.w3.org/TR/html/) [[HTML]], [Cascading Style Sheets](https://www.w3.org/TR/CSS/) [[CSS2]], [Accessible Rich Internet Applications](https://www.w3.org/WAI/standards-guidelines/aria/) [[WAI-ARIA]], [Scaleable Vector Graphics](https://www.w3.org/TR/SVG/) [[SVG2]] and more, including digital publishing. The ACT Rules Format, however, is designed to be technology agnostic, meaning that it can conceivably be used to describe test rules for other technologies.

The ACT Rules Format can be used to describe ACT Rules dedicated to testing the [accessibility requirements](#structure-accessibility-requirements) defined in Web Content Accessibility Guidelines [[WCAG]], which are specifically designed for web content. Other accessibility requirements applicable to web technologies can also be testable with ACT Rules. For example, ACT Rules could be developed to test the conformance of web-based user agents to the [User Agent Accessibility Guidelines](https://www.w3.org/WAI/standards-guidelines/uaag/) [[UAAG20]]. However, the ACT Rules Format may not always be suitable to describe tests for other types of accessibility requirements.

Because ACT Rules rarely test the entirety of an accessibility requirement, passing ACT rules does not necessarily mean that an accessibility requirement is met. It is important to understand that **ACT Rules test non-conformance** to accessibility requirements. In some cases conformance can be inferred from the absence of failures. Unlike WCAG sufficient techniques, ACT Rules should not be used for conformance claims unless the rule explicitly states it can be used that way. See [Rule Aggregation](#output-aggregation) for details.


ACT Rule Types {#rule-types}
============================

In accessibility, there are often different technical solutions to make the same type of content accessible. These solutions could be tested in a single rule; however, such a rule tends to be quite complex, making it difficult to understand and maintain. The ACT Rules Format solves this by providing two types of rules:

- <dfn>Atomic rules</dfn> describe how to test a specific type of solution. It contains a precise definition of what elements, nodes or other parts of a web page are to be tested, and when those elements are considered to fail the rule. These rules should be kept small and atomic. Meaning that atomic rules test a single "failure condition", and do so without using results from other rules.

- <dfn>Composed rules</dfn> describe how results from atomic rules should be used to decide if an accessibility requirement was failed. If there are multiple failures necessary for an accessibility requirement to fail, each "type" of failure should be written as separate atomic rules and be aggregated by a composed rule.

The separation between atomic rules and composed rules creates a division of responsibility. Atomic rules test if web content is correctly implemented in a particular solution. Composed rules test if a combination of `pass` outcomes from atomic rules is sufficient for an accessibility requirement not to `fail`. Not all atomic rules have to be part of a composed rule. Atomic rules only have to be part of a composed rule if failing that atomic rule does not directly fail [Accessibility Requirements](#structure-accessibility-requirements). An atomic rule MAY be part of multiple composed rules.

A composed rule defines how the outcomes from its atomic rules are aggregated into a single outcome for each applicable test target. Note that atomic rules in a composed rule MAY have different applicability. Because of this, not every element applicable within the composed rule is tested by every atomic rule. Atomic rules MAY also be disabled during a test run due to accessibility support concerns. See [Accessibility Support](#acc-support) for details.

<div class="example">
  <p>Composed rule: Header cells in HTML tables (<a href="https://www.w3.org/WAI/WCAG21/quickref/#info-and-relationships" target="_blank">WCAG 2 Success Criterion 1.3.1</a>).</p>
  <p>This rule uses results from the following atomic rules:</p>
  <ul>
    <li>Header indicated through implicit scope</li>
    <li>Header indicated through the `scope` attribute</li>
    <li>Header indicated by using the `headers` attribute</li>
    <li>Header indicated by using ARIA labels</li>
  </ul>
  <p>If any one of these rules passes, the table cell can pass the composed rule.</p>
</div>


ACT Rule Structure {#act-rule-structure}
===============================

An ACT Rule MUST consist of the following items

* Descriptive Title
* [Rule Identifier](#rule-identifier)
* [Rule Description](#structure-description)
* [Accessibility Requirements](#structure-accessibility-requirements), if any
* [Aspects Under Test](#input-aspects) (for atomic rules) OR [Atomic Rules List](#atomic-rules-list) (for composed rules)
* [Test Definition](#test-def) (for atomic rules) OR [Aggregation Definition](#aggregation-definition) (for composed rules)
* [Limitations, Assumptions or Exceptions](#structure-limitations-assumptions-exceptions), if any
* [Accessibility Support](#acc-support) (optional)

Rule Identifier {#rule-identifier}
==================================

An ACT Rule MUST have a unique identifier that can be any unique text value, such as plain text, URL or a database identifier.


Rule Description {#structure-description}
=========================================

An ACT Rule MUST have a description that is in plain language and provides a brief explanation of what the rule does.


Accessibility Requirements {#structure-accessibility-requirements}
==================================================================

<div class=note>
  **Editor's note:** The ACT Task Force is looking for feedback about the use of the term "pass" in relation to rules. While rules can "pass", their corresponding accessibility requirement can fail. A section has been added to make this explicit in the rule, but we would like to know if this is sufficient.
</div>

Accessibility requirements are just that: A requirement that a particular web page must conform to for it to be considered accessible. This can (and usually does) include the WCAG success criteria. Often organizations have additional requirements which may come from different sources, such as local laws, internal standards, etc. These too are considered accessibility requirements and can be tested using the ACT Rules Format. What the precise requirements are for any test is beyond the scope of the ACT Rules Format.

[=Atomic rules=] and [=composed rules=] SHOULD identify the accessibility requirements that fail when the outcome of a rule is Fail. An ACT Rule is a complete or partial test for one or more accessibility requirements. This means that most ACT rules test only part of an accessibility requirement, but it MUST NOT test more than the accessibility requirement it lists.

Because ACT Rules often have a smaller scope than the accessibility requirement they test, passing a rule does not necessarily mean that the accessibility requirement has passed. ACT Rules MUST indicate when they can not be used to determine that the accessibility requirement passed.

Outcomes from an ACT Rule SHOULD be consistent with the accessibility requirement, e.g. a rule only returns the outcome Fail when the content fails the accessibility requirement. This means that the rule maps to the accessibility requirement, as opposed to it merely being related to the requirement, thematically or otherwise. Because of this, atomic rules used in composed rules often do not map to any accessibility requirement. Failing the composed rule fails the accessibility requirement, but failing any of its atomic rules may not. In such cases the atomic rules MUST NOT list the accessibility requirements. These could be provided as background information instead.


Aspects Under Test (Atomic rules only) {#input-aspects}
=======================================================

An aspect is a distinct part of the [test subject](#output-test-subject) or its underlying implementation. For example, rendering a particular piece of content to an end user involves multiple different technologies, some or all of which may be of interest to an ACT Rule. Some rules need to operate directly on the [Hypertext Transfer Protocol](https://tools.ietf.org/html/rfc7230) [[http11]] messages exchanged between a server and a client, while others need to operate on the [Document Object Model](https://dom.spec.whatwg.org) [[DOM]] tree exposed by a web browser.

[=Atomic rules=] MUST list the aspects used in the [Test Definition](#test-def). Some rules may need to operate on several aspects simultaneously, such as both the HTTP messages and the DOM tree. Since there is no Test Definition in Composed rules, there SHOULD NOT be an aspects under test list for composed rules.

An atomic rule MUST include a description of all the aspects under test by the rule. Each aspect MUST be discrete with no overlap between the aspects. Some aspects are already well defined within the context of web content, such as HTTP messages, DOM tree, and CSS styling [[CSS2]], and do not warrant a detailed description. Other aspects may not be well defined or even specific to web content. In these cases, an ACT Rule SHOULD include either a detailed description of the aspect in question or a reference to that description.


Common Aspects {#input-aspects-common}
--------------------------------------

### HTTP Messages ### {#input-aspects-http}

The HTTP messages [[http11]] exchanged between a client and a server as part of requesting a web page may be of interest to ACT Rules. For example, analyzing HTTP messages to perform validation of HTTP headers or unparsed HTML [[HTML]] and CSS [[CSS2]].

### DOM Tree ### {#input-aspects-dom}

The DOM [[DOM]] tree constructed from parsing HTML [[HTML]], and optionally executing DOM manipulating JavaScript, may be of interest to ACT Rules to test the structure of web pages. In the DOM tree, information about individual elements of a web page, and their relations, becomes available.

The means by which the DOM tree is constructed, be it by a web browser or not, is not of importance as long as the construction follows the [Document Object Model](https://dom.spec.whatwg.org) [[DOM]].

### CSS Styling ### {#input-aspects-css}

The computed CSS [[CSS2]] styling resulting from parsing CSS and applying it to the DOM [[DOM]] may be of interest to ACT Rules that wish to test the web page as presented to the user. Through CSS styling, information about the position, the foreground and background colors, the visibility, and more, of elements becomes available.

The means by which the CSS styling is computed, be it by a web browser or not, is not of importance as long as the computation follows any applicable specifications that might exist, such as the [CSS Object Model](https://www.w3.org/TR/cssom/) [[CSSOM]].

### Accessibility Tree ### {#input-aspects-accessibility}

The accessibility tree constructed from extracting information from both the DOM [[DOM]] tree and the CSS [[CSS2]] styling may be of interest to ACT Rules. This can be used to test the web page as presented to assistive technologies such as screen readers. Through the accessibility tree, information about the semantic roles, accessible names and descriptions, and more, of elements becomes available.

The means by which the accessibility tree is constructed, be it by a web browser or not, is not of importance as long as the construction follows any applicable specifications that might exist, such as the [Core Accessibility API Mappings 1.1](https://www.w3.org/TR/core-aam-1.1/) [[CORE-AAM-1.1]].

### Language ### {#input-aspects-text}

Language, either written or spoken, contained in nodes of the DOM [[DOM]] or accessibility trees may be of interest to ACT Rules that intend to test things like complexity or intention of the language. For example, an ACT Rule might test that paragraphs of text within the DOM tree do not exceed a certain readability score or that the text alternative of an image provides a sufficient description.

The means by which the language is assessed, whether by a person or a machine, is not of importance as long as the assessment meets the criteria defined in [[wcag2-tech-req#humantestable]] [[WCAG]].


Test Definition (Atomic rules only) {#test-def}
===============================================

The test definition of [=atomic rules=] describes "what" (parts of) the [test subject](#output-test-subject) should be tested (the test target), and what the requirements for those [test targets](#output-test-target) are. Instead of a [test description](#structure-description), a [=composed rule=] has an [aggregation definition](#aggregation-definition).

Applicability {#test-applicability}
-----------------------------------

The applicability section is a required part of an [=atomic rule=]. It MUST contain a precise description of the parts of the [test subject](#output-test-subject) to which the rule applies. For example, specific nodes in the DOM [[DOM]] tree, or tags that are incorrectly closed in an HTML [[HTML]] document. These are known as the [test targets](#output-test-target). The applicability MUST only use information provided through [test aspects](#input-aspects) of the same rule. No other information should be used in the applicability.

Applicability MUST be described objectively, unambiguously and in plain language. When a formal syntax, such as a [CSS selector](https://www.w3.org/TR/selectors-3/) [[css3-selectors]] or [XML Path Language](https://www.w3.org/TR/xpath/) [[XPATH]], can be used, that (part of the) applicability MAY use that syntax in addition to the plain language description. While testing, if nothing within the [test subject](#output-test-subject) matches the applicability of the rule, the outcome is `inapplicable`.

An objective description is one that can be resolved without uncertainty in a given technology. Examples of objective properties in HTML are element names, their computed role, the spacing between two elements, etc. Subjective properties on the other hand, are concepts like decorative, navigation mechanism and pre-recorded. Even concepts like headings and images can be misunderstood. For example, describing that the rule examines the tag name, the accessibility role, or the element's purpose on the web page. The latter of which is almost impossible to define objectively. When used in applicability, these concepts MUST have an objective definition. This definition can be part of a larger glossary shared between rules.

<div class=example>
  <p>The applicability of an atomic rule testing <a href="https://www.w3.org/WAI/WCAG21/quickref/#audio-control" target="_blank">WCAG 2.1 Audio Control:</a></p>
  <blockquote>
    <p>Any <code>video</code> or <code>audio</code> element(s) with the <code>autoplay</code> attribute, as well as any <code>object</code> element(s) that is used for automatically playing video or audio when the web page loads.</p>
  </blockquote>
</div>


Expectations {#test-expectations}
---------------------------------

An [=atomic rule=] MUST contain one or more expectations. An expectation is an assertion that must be true about each [test target](#output-test-target) (see [Applicability](#test-applicability)). Each expectation MUST be distinct, unambiguous, and be written in plain language. Unlike in applicability, a certain level of subjectivity is allowed in expectations. Meaning that the expectation has only one possible meaning, but that meaning isn't fully quantifiable.

When all expectations are true for a test target, the test target `passed` the rule. If one or more expectations are false, the test target `failed` the rule. If the atomic rule is used in a [=composed rule=], the composed rule may be `passed` when the atomic rule `failed`, depending on the [aggregation definition](#aggregation-definition) of the composed rule.

<div class=example>
  <p>A rule for labels of HTML `input` elements may have the following expectations:</p>
  <ol>
    <li>The test target has an accessible name (as described in [Accessible Name and Description: Computation and API Mappings 1.1](https://www.w3.org/TR/accname-aam-1.1/#mapping_additional_nd_te)). [[accname-aam-1.1]]</li>
    <li>The accessible name describes the purpose of the test target.</li>
  </ol>
</div>

An atomic rule expectation MUST only use information available in the [test aspects](#input-aspects), from the applicability, and other expectations of the same rule. No other information can be used in the expectation. So for instance, an expectation could be "Expectation 1 is true and ...", but it can't be "Rule XYZ passed and ...". This ensures the rule is encapsulated.


Atomic Rules List (Composed rules only) {#atomic-rules-list}
==========================================================

A [=composed rule=] uses results from [=atomic rules=] and aggregates them so that for each [test target](#output-test-target) a single [outcome](#output-outcome) can be determined. All atomic rules used in the [aggregation definition](#aggregation-definition) MUST be listed in the composed rule. The atomic rules list describes the input for composed rules, similar to how [aspects under test](#input-aspects) describe the input for atomic rules.


Aggregation Definition (Composed rules only) {#aggregation-definition}
======================================================================

[=Composed rules=] MUST describe how results from [=atomic rules=] should be aggregated to determine a single `pass` or `fail` result. This is done in the aggregation definition. Only composed rules contain an aggregation definition, since atomic rules are encapsulated and do not use results from other rules.

<p class="note" role="note">
"Editor's note: We are considering merging this definition with test definition. We are looking for feedback."
</p>

Aggregation Applicability {#aggregation-applicability}
------------------------------------------

The applicability of a [=composed rule=] is defined as the union of all the applicability sections of its [=atomic rules=]. Because of this, applicability of a composed rule can be inferred from the atomic rules it aggregates. Since the applicability can be inferred, rule authors MAY omit applicability from the aggregation definition. This can be useful if it is difficult to express the combined applicability in plain language.

<div class=example>
  <p>A composed rule about `img` elements aggregates results from atomic rules that have the following applicability:</p>
  <ul>
    <li><strong>Atomic rule1</strong>: All `img` element with an `alt` attribute</li>
    <li><strong>Atomic rule1</strong>: All `img` element without an `alt` attribute</li>
  </ul>

  <p>The applicability of the composed rule combines the applicability of both atomic rules. This becomes:</p>
  <blockquote>
    <p>All `img` elements.</p>
  </blockquote>
</div>


Aggregation Expectations {#aggregation-expectations}
----------------------------------------

A [=composed rule=] MUST contain one or more expectations. An expectation is an assertion, written in plain language, that must be true about the outcomes of [=atomic rules=] listed in [aggregated rules](#aggregation-definition). A composed rule expectation MUST NOT use information from [test aspects](#input-aspects) or from the [test applicability](#test-applicability).

When all expectations are true for a test target, the test target `passed` the rule. If one or more expectations is false, the test target `failed` the rule. This works the same way for atomic rules.

<div class="example">
  <p>A composed rule for <a href="https://www.w3.org/WAI/WCAG21/quickref/#audio-description-or-media-alternative-prerecorded" target="_blank">WCAG 2.1 Audio Description or Media Alternative</a> aggregates three atomic rules. The expectation of the composed rule is as follows:</p>
  <blockquote>
    <p>For each test target, the outcome of one of the following rules is `passed`:</p>
    <ul>
      <li>Video elements have an audio description</li>
      <li>Video elements have a media alternative</li>
      <li>Video elements have a media alternative</li>
    </ul>
  </blockquote>
</div>


Limitations, Assumptions or Exceptions {#structure-limitations-assumptions-exceptions}
======================================================================================

An ACT Rule MUST list any limitations, assumptions or any exceptions for the test, the test environment, technologies being used or the subject being tested. For example, a rule that would partially test <a href="https://www.w3.org/WAI/WCAG20/quickref/#visual-audio-contrast-contrast" target="_blank">WCAG 2.0 Success Criterion 1.4.3 Contrast (Minimum)</a> based on the inspection of CSS properties could state that it is only applicable to HTML text content stylable with CSS, and that the rule does not support images of text.

Sometimes there are multiple plausible ways that an accessibility requirement can be interpreted. For instance, it is not immediately obvious if emoji characters should be considered "text" or "non-text content" under WCAG 2.0. Whatever the interpretation is, this MUST be documented in the rule.


Accessibility Support {#acc-support}
====================================

ACT Rules are designed to test the conformance of content using web technologies to accessibility requirements. However, not every feature of a web technology is implemented in all assistive technologies or user agents that a website may need to support. The concept of [accessibility supported](https://www.w3.org/TR/WCAG20/#accessibility-supporteddef) use of a Web technology is defined in WCAG [[WCAG]]. Because of this, ACT Rules are not necessarily applicable in all test scenarios. For instance, a web page that has to work in assistive technologies that have no WAI-ARIA [[WAI-ARIA]] support, wouldn’t be tested with an ACT Rule that relies on WAI-ARIA support, since this could lead to false positive results.

Even within a [=composed rule=], some [=atomic rules=] may not always be applicable. This leaves one fewer solution for passing that particular composed rule. To support users of ACT Rules in properly defining the [accessibility support baseline](https://www.w3.org/TR/WCAG-EM/#step1c) in their test scenarios, an ACT Rule SHOULD include a warning if there are significant accessibility support concerns known about a rule.


ACT Data Format (Output Data) {#output}
=======================================

With ACT Rules, it is important that data coming from different sources can be compared. By having a shared vocabulary, accessibility data that is produced by different auditors can be compared and, where necessary, aggregated. Therefore, every ACT Rule MUST express the output in a format that has all of the features described in the ACT Data Format.

Rules are tested in two steps. Firstly, the applicability is used to find a list of [Test Targets](#output-test-target) (elements, tags or other "components") within the web page or other [test subject](#output-test-subject). Then each test target is tested to see if all of the [expectations](#test-expectations) are true. This will give the outcome for each test target. For contextual information, the output data must also include [test subject](#output-test-subject) and the [rule identifier](#rule-identifier).

This will mean that every time a rule is executed on a page, it will return a set with zero or more results, each of which MUST have at least the following properties:

- [Rule Identifier](#rule-identifier) (test)
- [Test Subject](#output-test-subject) (Web page)
- [Test Target](#output-test-target) (pointer)
- [Outcome](#output-outcome) (`Passed`, `Failed`, or `Inapplicable`)

<div class=example>
Output data using EARL and JSON-LD. (See [Evaluation and Report Language](https://www.w3.org/WAI/standards-guidelines/earl/) [[EARL10-Schema]] and [Java Script Object Notation (JSON)](https://www.json.org).)

```javascript
{
  "@context": "https://raw.githubusercontent.com/w3c/wcag-act/master/earl-act.json",
  "@type": "Assertion",
  "subject": "https://example.org/",
  "test": "auto-wcag:rules/SC1-1-1-css-image.html",
  "result": {
    "outcome": "Failed",
    "pointer": "html > body > h1:first-child"
  }
}
```
</div>


Test Subject {#output-test-subject}
-----------------------------------

When a single URL can be used to reference the web page, or other test subject, this URL MUST be used. In scenarios where more complex actions are required to obtain the test subject (in the state that it is to be tested), it is left to ATT developers to determine which method is best used to express the test subject.


Test Target {#output-test-target}
---------------------------------

When representing the test target in the output data, it is often impractical or impossible to serialize the test target as a whole. Instead of this, a pointer can be used to indicate where the test target exists within the web page or other [test subject](#output-test-subject). There are a variety of pointer methods available, such as those defined in [Pointer Methods in RDF 1.0](https://www.w3.org/TR/Pointers-in-RDF/) [[Pointers-in-RDF]].

The pointer method used in the output data of an ACT Rule MUST include the pointer method used in [Test cases](#quality-test-cases).


Outcome {#output-outcome}
-------------------------

The definition of a rule MUST always result in one of the following outcomes:

- **Passed**: All [expectations](#test-expectations) for the [Test Target](#output-test-target) were true
- **Failed**: One or more expectations for the Test Target was false
- **Inapplicable**: There were no Test Targets in the [Test Subject](#output-test-subject)

<div class="note">
  While *inapplicable* is a valid result for ACT Rules, it may not be a valid result for all [accessibility requirements](#structure-accessibility-requirements). Notably the success criteria of WCAG 2.0 and WCAG 2.1 can only be evaluated to true (passed) or false (failed). This translation of results is part of [output aggregation](#output-aggregation)
</div>

<div class="note">
  In addition to `Passed` `Failed` and `Inapplicable`, [[EARL10-Schema]] also defined an `Incomplete` outcome. While this should never be the outcome of a rule when applied in its entirety, it often happens that rules are only partially executed. For example, when applicability was automated, but the expectations have to be evaluated manually. Such "interim" results can be expressed with the "Incomplete" outcome.
</div>


Rule Quality Assurance {#quality}
=================================

Test Cases (Atomic rules only) {#quality-test-cases}
----------------------------------------------------

Test cases are (snippets of) content that can be used to validate the implementation of an [=atomic rule=]. They consist of two pieces of data, a snippet of each [test aspect](#input-aspects) for a rule, and the [expected result](#output) that should come from that rule. Test cases serve two functions, firstly as example scenarios for readers to understand when a rule passes, when it fails, and when it is inapplicable. But also for developers and users of automated accessibility test tools to validate that a rule is correctly implemented.

When executing a test, the test aspect(s), for instance an HTML code snippet, is evaluated by applying the rule's test definition. The result is then compared to the expected result of the test case. The expected result consists of a list of [test targets](#output-test-target) and the expected [outcome](#output-outcome) (Passed, Failed, Inapplicable) of the evaluation.


Accuracy Benchmarking {#quality-accuracy}
-----------------------------------------

The web is ever changing, and technologies are used in such diverse and creative ways that it is impossible to predict in advance, all the ways that accessibility issues can occur and all the ways they can be solved for. When writing ACT Rules, it is almost inevitable that exceptions will be overlooked during the design of a rule, or that new technologies will emerge that introduce new exceptions.

This makes it important to be able to regularly test if the rule has the accuracy that is expected of it. This can be done by benchmark testing. In benchmark testing, the accuracy of a rule is measured by comparing its results to those obtained through accessibility expert testing.

The accuracy of a rule is the average between the false positives and false negatives, which are in turn calculated as follows:

- **False positives**: This is the percentage of [test targets](#output-test-target), that were failed by the rule, but were not failed by an accessibility expert.

- **False negatives**: This is the percentage of test targets, that were passed by the rule, but were failed by an accessibility expert.

There are several ways this can be done. For instance, accessibility test tools can implement a feature which lets users indicate that a result is in error, or pages that for which accessibility results are known, can be tested using ATT, and the results are compared. To compare results from ACT Rules to those of expert evaluations, [data aggregation](#output-aggregation) may be necessary.


Rule Aggregation {#output-aggregation}
======================================

As described in section [[#output]] a rule will return a list of results, each of which contain 1) the [Rule ID](#rule-identifier), 2) the [test subject](#output-test-subject), 3) the [test target](#output-test-target), and 4) an [outcome](#output-outcome) (Passed, Failed, Inapplicable). Data expressed this way has a great deal of detail, as it gives multiple pass / fail results for each rule.

Most expert evaluations do not report results at this level of detail. Often reports are limited to giving a single outcome (Passed, Failed, Inapplicable) per page, for each success criteria (or other accessibility requirement). To compare the data, results from rules can be combined, so that they are at the same level.

When all rules pass, that does not mean that all [accessibility requirements](#structure-accessibility-requirements) are met. Only if the rules can test 100% of what should be tested, can this claim be made. Otherwise the outcome for a criterion is inconclusive.

**Example:** An expert evaluates a success criterion to fail on a specific page. When testing that page using ACT Rules, there are two rules that map to this criterion. The first rule returns no results. The second rule finds 2 test targets that pass, and a 3rd test target that fails.

In this example, the first rule is inapplicable (0 results), and the second rule has failed (1 fail, 2 pass). Combining this inapplicable and fail, means the success criterion has failed.

See [[#appendix-data-example]] on how this could be expressed using JSON-LD and EARL.


Update Management {#quality-updates}
====================================

Change Log {#quality-updates-changes}
-------------------------------------

It is important to keep track of changes to the ACT rules so that users of the rules can understand if changes in test results are due to changes in the rules used when performing the tests, rather than changes in the content itself. All changes to an ACT Rule that can change the [outcome](#output-outcome) of a test MUST be recorded in a change log. The change log can either be part of the rule document itself or be referenced from it.

Each new release of an ACT Rule MUST be identifiable with either a date or a version number. Additionally, a reference to the previous version of that rule MUST be available. For extensive changes, a new rule SHOULD be created and the old rule SHOULD be deprecated.

**Example:** An example of when a new rule should be created is when a rule that tests for the use of a `blink` element changes to instead look for any animated style changes. This potentially adds several new failures that were previously out of scope. Using that same rule as an example, adding an exception to allow `blink` elements positioned off screen should be done by updating the existing rule.


Issues List {#quality-updates-issues}
-------------------------------------

An ACT Rule MAY include an issues list. When included, the issues list MUST be used to record cases in which the ACT Rule might return a failure where it should have returned a pass or vice versa. There are several reasons why this might occur, including:

- Certain scenarios or the use of technologies that are very rare and were not included in the rule for that reason.
- Certain accessibility features are impossible to test within the test environment. For instance, they might only be testable by accessing the accessibility API, require screen capturing, etc.
- The scenario did not exist (due to changing technologies) or was overlooked during the initial design of the rule.

The issues list serves two purposes. For users of ACT Rules, the issues list may give insight into why an inaccurate result occurred, as well as provide confidence in the result of that rule. For the designer of the rule, the issues list is also useful to plan future updates to the rule. In a new version of the rule, resolved issues would be moved to the change log.


Appendix 1: Aggregation examples, using JSON-LD and EARL {#appendix-data-example}
=================================================================================

**Example:**

```javascript
{
  "@context": "https://raw.githubusercontent.com/w3c/wcag-act/master/earl-act.json",
  "@type": "Assertion",
  "subject": "https://example.org/",
  "test": "auto-wcag:SC1-1-1-css-image.html",
  "result": {
    "outcome": "Failed",
    "source": [{
      "test": "auto-wcag:SC1-1-1-css-image.html",
      "result": {
        "outcome": "Failed",
        "pointer": "html > body > h1:first-child"
      }
    }, {
      "test": "auto-wcag:SC1-1-1-css-image.html",
      "result": {
        "outcome": "Passed",
        "pointer": "html > body > h1:nth-child(2)"
      }
    }]
  }
}
```

**Example:** Aggregate rules to a WCAG success criterion

```javascript
{
  "@context": "https://raw.githubusercontent.com/w3c/wcag-act/master/earl-act.json",
  "@type": "Assertion",
  "subject": "https://example.org/",
  "test": {
    "@id": "wcag20:#text-equiv-all",
    "title": "1.1.1 Non-text Content"
  },
  "result": {
    "outcome": "Failed",
    "source": [{
      "test": "auto-wcag:SC1-1-1-css-image.html",
      "result": {
        "outcome": "Failed",
        "pointer": "html > body > h1:first-child"
      }
    }, {
      "test": "auto-wcag:SC1-1-1-longdesc.html",
      "result": {
        "outcome": "Passed",
        "pointer": "html > body > img:nth-child(2)"
      }
    }]
  }
}
```

**Example:** Aggregate a list of results to a result for the website

```javascript
{
  "@context": "https://raw.githubusercontent.com/w3c/wcag-act/master/earl-act.json",
  "@type": "Assertion",
  "subject": {
    "@type": ["WebSite", "TestSubject"],
    "@value": "https://example.org/"
  }
  "test": "http://www.w3.org/WAI/WCAG2A-Conformance",
  "result": {
    "outcome": "Failed",
    "source": [{
      "test": "wcag20:text-equiv-all",
      "result": {
        "outcome": "Failed",
        "source": [ … ]
      }
    }, {
      "test": "wcag20:media-equiv-av-only-alt",
      "result": {
        "outcome": "Passed",
        "source": [ … ]
      }
    }, {
      "test": "wcag20:media-equiv-captions",
      "result" : {
        "outcome": "Inapplicable",
        "source": [ … ]
      }
    }, … ]
  }
}
```