[Accessibility] Stripping Markdown should not remove HTML tags (related to #3615) #3360

compulim · 2020-07-23T01:42:39Z

Screenshots

Version

4.9.2

Describe the bug

tl;dr a Markdown activity with content "Hello, World!", the screen reader will narrate it as "Hello" only.

When a Markdown-based message activity arrived and it contains HTML tags, the content of the HTML tags are not narrated by screen reader.

The screen reader-only text prefers plain text instead of Markdown. Since the activity schema does not support alt-text, Web Chat use remark/strip-markdown to strip out Markdown syntax into plain text.

The strip-markdown engine do not parse the content of HTML and simply remove everything in it.

Steps to reproduce

Turn on screen reader
Connect to a bot that will reply with a message activity:
- type set to "message"
- text set to "Hello, World!"
- (Optional) textFormat set to markdown

Expected behavior

The screen reader should read "Hello, World!" or indicate there is a HTML part that cannot be narrated.

Today, it narrating "Hello" only.

Additional context

Customer impact

As reported by the Omnichannel team, PVA send everything enclose with  tag, essentially muting screen reader.

Potential solutions

There are multiple solutions, not limited to:

Find a better Markdown-stripping engine
- How many HTML tags do we want to parse?
- What if the HTML is <dl><dt>Name</dt><dd>John Doe</dd></dl>, how should it present in plain text?
- May not solve the root cause
Instead of using <ScreenReaderText>, just use the normal HTML version instead
- I think we fixed that in 4.9.2
- But we have special screen reader only "live region", that one might still using or need to use strip-markdown
Preprocess the plain text for "live region": it will be narrated as "Hello, (Rich content)"
- The end-user should still browser to that rich content and listen to the original
Request for schema change and allow bot devs to send alt text if they are sending Markdown
- It is a bit weird to send 2 text formats, just like email, it has plain text and HTML. And if they are not in-sync, the problem is not very discoverable
- I think 80% of bots won't send alt text

[Bug]

The text was updated successfully, but these errors were encountered:

corinagum · 2020-09-09T21:19:23Z

@compulim in your second bullet under Potential Solutions, you said "I think we fixed that in 4.9.2". What do you mean? The <ScreenReaderText /> component is still in use. Do you mean the following:

Should we consider setting the dangerouslySetInnerHTML property in <ScreenReaderText /> in order to honor the markdown/html for the screenreader? I see we're already using it for BasicToast and TextContent. We already have the internalMarkdownIt ability to specify block rules...

I've been unable to find a markdown stripper that would fulfill our needs here.
I really don't think a schema change is practical, considering the high bar we always have for these requests to go through.

I'm investigating this topic and I will edit this message with notes.

Thoughts?

compulim · 2020-09-09T21:55:22Z

For the "we fixed that in 4.9.2", IIRC:

We separated transcript into 2 parts: live region and interactive/navigable. Both of them show similar set of activities
- Live region: activity only show up for ~1s, then removed (hack for screen readers). This activity will have Markdown stripped using Remark. This is done through <ScreenReaderActivity> component. Goal: "end-user can listen to incoming activity without touching any inputs with good fidelity. Perfect fidelity is not a requirement."
- Interactive: the activity here is not under live region, but navigable using CAPSLOCK + UP or similar combinations. The activity do not have Markdown stripped and we use dangerouslySetInnerHTML, given the Markdown is already sanitized. Goal: "end-user can use screen reader to navigate <table>, with perfect fidelity."

No dangerouslySetInnerHTML unless we sanitize the content. And I don't prefer using dSIH for every <ScreenReaderText>. Because I think what we are doing in the "navigable activity" part is great.

internalMarkdownIt is for localization only, i.e. "Send failed. Retry" text. And we don't sanitize them for performance reason. The text sent to internalMarkdownIt are always controlled by us, i.e. must be clean.

There are 2 cases here: live region and interactive. Right now, interactive is fine.

The problem is the activity in live region. When an activity come in as Markdown of Hello, it become empty after stripped. Thus, the live region announce nothing.

But when the end-user navigate to that (interactive version of) activity, they can hear "Hello".

corinagum · 2021-01-13T00:19:50Z

Related to #3568 #3615

-- Same dev should work on all related issues.

compulim · 2021-04-28T18:12:02Z

@corinagum do you think we can use speak property instead of stripping Markdown syntax from activity.text? Particularly, I mean:

const altText = stripXML(activity.speak || activity.text);

I am thinking, speak property should be either XML (i.e. SSML) or plain text. It is much easier to strip XML tags than dealing with Markdown.

corinagum · 2021-04-28T18:16:19Z

Wow, this is a good idea. I really like it and it's a simple solution.

The only drawback I see is that customers would be unhappy if activity.speak is different from activity.text, which we have seen instances of recently. We should discuss workaround for this.

compulim · 2021-05-27T22:56:15Z

IMO, if the bot developer set speak and text fields differently, the scenario will be same as if they "setting the alt attribute of <img> to something not representing the image itself", e.g. <img alt="An image shows a dog barking." src="cat-meow.jpg" />.

However, they should be able to set speak field to "" (empty string). And we will treat it like aria-hidden="true" or role="presentation". This is to comply with Direct Line spec:

A3030: The speak field MAY contain an empty string to indicate no speech should be generated.

We can still keep the "strip Markdown" feature with best-effort:

Use useRenderMarkdown hook to render the Markdown into HTML (as string)
Use DOMParser().parseFromString() to parse the HTML string into HTMLDocument
- Works on IE11, but not React Native
Walk the HTMLDocument, flatten and concatenate all text nodes

What do you think?

corinagum · 2021-06-01T21:36:50Z

IMO, if the bot developer set speak and text fields differently, the scenario will be same as if they "setting the alt attribute of <img> to something not representing the image itself", e.g. <img alt="An image shows a dog barking." src="cat-meow.jpg" />.

Agree. It's more a matter of what the user expects, where alt is the 'semantic ideal' haha

However, they should be able to set speak field to "" (empty string). And we will treat it like aria-hidden="true" or role="presentation". This is to comply with Direct Line spec:

Related to #3780, FYI. Web Chat is currently non-compliant.

A3030: The speak field MAY contain an empty string to indicate no speech should be generated.

We can still keep the "strip Markdown" feature with best-effort:

Use useRenderMarkdown hook to render the Markdown into HTML (as string)

Use DOMParser().parseFromString() to parse the HTML string into HTMLDocument

Works on IE11, but not React Native

Walk the HTMLDocument, flatten and concatenate all text nodes

What do you think?

Looks good. Not relevant quite yet, but how to deal with for RN?

compulim · 2021-06-01T22:10:40Z

I think it's not too difficult to find a DOMParser that works in RN. I have seen HTML parser in pure Node.js in the past.

Yes, #3780 need to be fixed differently away from this as we are not using speak field now.

compulim added bug Indicates an unexpected problem or an unintended behavior. area-accessibility external-omnichannel labels Jul 23, 2020

compulim changed the title ~~[Accessibility] Stripping Markdown will remove HTML tags~~ [Accessibility] Stripping Markdown should not remove HTML tags Jul 24, 2020

compulim added the R11 label Jul 31, 2020

compulim assigned compulim and unassigned compulim Jul 31, 2020

compulim added this to the R11 milestone Aug 10, 2020

compulim self-assigned this Sep 18, 2020

corinagum removed the R11 label Sep 30, 2020

cwhitten added the p1 Painful if we don't fix, won't block releasing label Oct 14, 2020

cwhitten modified the milestones: R11, R12 Oct 20, 2020

corinagum mentioned this issue Oct 30, 2020

Outdated dependency CVE-2020-7753 #3568

Closed

corinagum added the needs-team-attention label Jan 13, 2021

corinagum changed the title ~~[Accessibility] Stripping Markdown should not remove HTML tags~~ [Accessibility] Stripping Markdown should not remove HTML tags related to #3615 Jan 13, 2021

corinagum changed the title ~~[Accessibility] Stripping Markdown should not remove HTML tags related to #3615~~ [Accessibility] Stripping Markdown should not remove HTML tags (related to #3615) Jan 13, 2021

cwhitten removed this from the R12 milestone Jan 15, 2021

corinagum added backlog Out of scope for the current iteration but it will be evaluated in a future release. front-burner needs-scheduling labels Mar 10, 2021

compulim mentioned this issue May 27, 2021

Bump deps and remove remark and strip-markdown #3917

Merged

11 tasks

corinagum closed this as completed in #3917 Jun 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Accessibility] Stripping Markdown should not remove HTML tags (related to #3615) #3360

[Accessibility] Stripping Markdown should not remove HTML tags (related to #3615) #3360

compulim commented Jul 23, 2020 •

edited

Loading

corinagum commented Sep 9, 2020

compulim commented Sep 9, 2020

corinagum commented Jan 13, 2021 •

edited

Loading

compulim commented Apr 28, 2021

corinagum commented Apr 28, 2021 •

edited

Loading

compulim commented May 27, 2021 •

edited

Loading

corinagum commented Jun 1, 2021

compulim commented Jun 1, 2021

[Accessibility] Stripping Markdown should not remove HTML tags (related to #3615) #3360

[Accessibility] Stripping Markdown should not remove HTML tags (related to #3615) #3360

Comments

compulim commented Jul 23, 2020 • edited Loading

Screenshots

Version

Describe the bug

Steps to reproduce

Expected behavior

Additional context

Customer impact

Potential solutions

corinagum commented Sep 9, 2020

compulim commented Sep 9, 2020

corinagum commented Jan 13, 2021 • edited Loading

compulim commented Apr 28, 2021

corinagum commented Apr 28, 2021 • edited Loading

compulim commented May 27, 2021 • edited Loading

corinagum commented Jun 1, 2021

compulim commented Jun 1, 2021

compulim commented Jul 23, 2020 •

edited

Loading

corinagum commented Jan 13, 2021 •

edited

Loading

corinagum commented Apr 28, 2021 •

edited

Loading

compulim commented May 27, 2021 •

edited

Loading