Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix abbreviation bugs in CTA and LegislativeList #291

Merged
merged 3 commits into from
Oct 12, 2023
Merged

Conversation

jkempster34
Copy link
Contributor

@jkempster34 jkempster34 commented Oct 9, 2023

https://trello.com/c/I15QapuL

Parse call to action as part of the main conversion

Kramdown handles abbreviations and footnotes natively.

Call to action and legislative list are custom Govspeak components built as
extensions to Kramdown. These sections are parsed separately in preprocessing,
they are ignored in any subsequent parsing because Kramdown ignores HTML.

Because of this, we have added custom handling of abbreviations and footnotes
specifically for these components 1. This custom implementation means that
abbreviations defined anywhere in the document will be applied to call to
action and legislative list components. Otherwise, only abbreviations defined
within these components would be applied.

This custom implementation has been shown to contain bugs through several
Zendesk tickets. The main issues reported are:

  1. Acronyms are inserted into the produced HTML, even if undesirable. For
    example, a link <[email protected]> with the acronym *[email]: Electronic mail is a method of transmitting and receiving messages would produce
    <p>href="mailto:<abbr title="Electronic mail is a method of transmitting and receiving messages">email</abbr>@example.com"<abbr title="Electronic mail is a method of transmitting and receiving messages">email</abbr>@example.com</p>
    rather than the expected link.

  2. Because the add_acronym_alt_text method runs through the text for each
    acronym, acronyms can be inserted into other acronyms creating invalid HTML.

If we change tack, and instead allow Kramdown to parse these components as part
of the main conversion to HTML (instead of in preprocessing), Kramdown will
handle abbreviations and footnotes correctly, fixing both of these issues (as well
as some other undocumented issues).

The call to action component requires a surrounding div with the
call-to-action class. Because Kramdown syntax is normally not processed inside
a HTML tag, we must toggle the parse_block_html option 2.

There is a small regression in that the $CTA block must be preceded by a
blank line (demonstrated in test/govspeak_test.rb:428), but this seems okay
as it's standard markdown.

Some of the tests can likely be removed as we are now testing the functionality
of the library, however, for now these represent that there are no
regressions.

Parse legislative list as part of the main conversion

Kramdown handles abbreviations and footnotes natively.

Call to action and legislative list are custom Govspeak components built as
extensions to Kramdown. These sections are parsed separately in preprocessing,
they are ignored in any subsequent parsing because Kramdown ignores HTML.

Because of this, we have added custom handling of abbreviations and footnotes
specifically for these components 1. This custom implementation means that
abbreviations defined anywhere in the document will be applied to call to
action and legislative list components. Otherwise, only abbreviations defined
within these components would be applied.

This custom implementation has been shown to contain bugs through several
Zendesk tickets. The main issues reported are:

  1. Acronyms are inserted into the produced HTML, even if undesirable. For
    example, a link <[email protected]> with the acronym *[email]: Electronic mail is a method of transmitting and receiving messages would produce
    <p>href="mailto:<abbr title="Electronic mail is a method of transmitting and receiving messages">email</abbr>@example.com"<abbr title="Electronic mail is a method of transmitting and receiving messages">email</abbr>@example.com</p>
    rather than the expected link.

  2. Because the add_acronym_alt_text method runs through the text for each
    acronym, acronyms can be inserted into other acronyms creating invalid HTML.

If we change tack, and instead allow Kramdown to parse these components as part
of the main conversion to HTML (instead of in preprocessing), Kramdown will
handle abbreviations and footnotes correctly, fixing both of these issues (as
well as some other undocumented issues).

The legislative list component currently works by overriding Kramdowns list
extension 2 and disabling ordered lists 3. So that we can parse this
component as part of the main conversion, this applies two options:
parse_block_html 4, so that markdown inside the legislative-list-wrapper
div is parsed by Kramdown, and a new custom option of
ordered_lists_disabled, so that we can control flow inside our
Parser::Govuk class (subclass of the Kramdown parser). It is possible to
override parse_list instead of parse_block_html, but this means that the
outermost list will not be parsed (by the time parse_list is called the
outermost list is already being parsed)

The only difference between this new iteration of the legislative list and the
previous iteration, is that we now wrap the component in a
legislative-list-wrapper div. We could remove this in postprocessing, but
this requires additional modification of the Kramdown produced HTML which seems
unnecessary.

This also allows us to remove all remaining code which replicates the footnote
and acronym functions of Kramdown.

Some of the tests can likely be removed as we are now testing the functionality
of the library, however, for now these represent that there are no
regressions.

@jkempster34 jkempster34 changed the title Fix abbr bugs Fix abbreviation bugs in CTA and LegislativeList Oct 9, 2023
@jkempster34 jkempster34 force-pushed the fix-abbr-bugs branch 4 times, most recently from f577306 to 580599e Compare October 9, 2023 13:56
@mike29736 mike29736 self-assigned this Oct 9, 2023
Copy link
Contributor

@mike29736 mike29736 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit messages are great, thanks.

This sort of change merits a minor version bump and changelog entry, right? Especially with the change in usage requirements for $CTA.

@jkempster34
Copy link
Contributor Author

The commit messages are great, thanks.

This sort of change merits a minor version bump and changelog entry, right? Especially with the change in usage requirements for $CTA.

I think so yeah, I'm not sure of the release process for Govspeak, I'll have to look back through the history to figure it out

Kramdown handles abbreviations and footnotes natively.

Call to action and legislative list are custom Govspeak components built as
extensions to Kramdown. These sections are parsed separately in preprocessing,
they are ignored in any subsequent parsing because Kramdown ignores HTML.

Because of this, we have added custom handling of abbreviations and footnotes
specifically for these components [1]. This custom implementation means that
abbreviations defined anywhere in the document will be applied to call to
action and legislative list components. Otherwise, only abbreviations defined
within these components would be applied.

This custom implementation has been shown to contain bugs through several
Zendesk tickets. The main issues reported are:

1. Acronyms are inserted into the produced HTML, even if undesirable. For
example, a link `<[email protected]>` with the acronym `*[email]: Electronic
mail is a method of transmitting and receiving messages` would produce
`<p>href="mailto:<abbr title="Electronic mail is a method of transmitting and
receiving messages">email</abbr>@example.com"<abbr title="Electronic mail is a
method of transmitting and receiving messages">email</abbr>@example.com</p>`
rather than the expected link.

2. Because the `add_acronym_alt_text` method runs through the text for each
acronym, acronyms can be inserted into other acronyms creating invalid HTML.

If we change tack, and instead allow Kramdown to parse these components as part
of the main conversion to HTML (instead of in preprocessing), Kramdown will
handle abbreviations and footnotes correctly, fixing both of these issues (as well
as some other undocumented issues). 

The call to action component requires a surrounding `div` with the
`call-to-action` class. Because Kramdown syntax is normally not processed inside
a HTML tag, we must toggle the `parse_block_html` option [2]. 

There is a small regression in that the `$CTA` block must be preceded by a
blank line (demonstrated in `test/govspeak_test.rb:428`), but this seems okay
as it's standard markdown.

Some of the tests can likely be removed as we are now testing the functionality
of the library, however, for now these represent that there are no
regressions.

[1]: #285
[2]: https://kramdown.gettalong.org/quickref.html#html-elements
Kramdown handles abbreviations and footnotes natively.

Call to action and legislative list are custom Govspeak components built as
extensions to Kramdown. These sections are parsed separately in preprocessing,
they are ignored in any subsequent parsing because Kramdown ignores HTML.

Because of this, we have added custom handling of abbreviations and footnotes
specifically for these components [1]. This custom implementation means that
abbreviations defined anywhere in the document will be applied to call to
action and legislative list components. Otherwise, only abbreviations defined
within these components would be applied.

This custom implementation has been shown to contain bugs through several
Zendesk tickets. The main issues reported are:

1. Acronyms are inserted into the produced HTML, even if undesirable. For
example, a link `<[email protected]>` with the acronym `*[email]: Electronic
mail is a method of transmitting and receiving messages` would produce
`<p>href="mailto:<abbr title="Electronic mail is a method of transmitting and
receiving messages">email</abbr>@example.com"<abbr title="Electronic mail is a
method of transmitting and receiving messages">email</abbr>@example.com</p>`
rather than the expected link.

2. Because the `add_acronym_alt_text` method runs through the text for each
acronym, acronyms can be inserted into other acronyms creating invalid HTML.

If we change tack, and instead allow Kramdown to parse these components as part
of the main conversion to HTML (instead of in preprocessing), Kramdown will
handle abbreviations and footnotes correctly, fixing both of these issues (as
well as some other undocumented issues). 

The legislative list component currently works by overriding Kramdowns `list`
extension [2] and disabling ordered lists [3]. So that we can parse this
component as part of the main conversion, this applies two options:
`parse_block_html` [4], so that markdown inside the `legislative-list-wrapper`
`div` is parsed by Kramdown, and a new custom option of
`ordered_lists_disabled`, so that we can control flow inside our
`Parser::Govuk` class (subclass of the Kramdown parser). It is possible to
override `parse_list` instead of `parse_block_html`, but this means that the
outermost list will not be parsed (by the time `parse_list` is called the
outermost list is already being parsed)

The only difference between this new iteration of the legislative list and the
previous iteration, is that we now wrap the component in a
`legislative-list-wrapper` `div`. We could remove this in postprocessing, but
this requires additional modification of the Kramdown produced HTML which seems
unnecessary.

This also allows us to remove all remaining code which replicates the footnote
and acronym functions of Kramdown.

Some of the tests can likely be removed as we are now testing the functionality
of the library, however, for now these represent that there are no
regressions.

[1]: #285
[2]:
https://github.com/gettalong/kramdown/blob/bd678ecb59f70778fdb3b08bdcd39e2ab7379b45/lib/kramdown/parser/kramdown/list.rb#L54
[3]: #25
[4]: https://kramdown.gettalong.org/quickref.html#html-elements
@jkempster34 jkempster34 merged commit 2044bf6 into main Oct 12, 2023
5 checks passed
@jkempster34 jkempster34 deleted the fix-abbr-bugs branch October 12, 2023 09:20
@jkempster34 jkempster34 mentioned this pull request Oct 12, 2023
jkempster34 added a commit that referenced this pull request Oct 18, 2023
In #291, we made a change to how the call to action and legislative list
components are parsed. This involved the use of the `parse_block_html` option,
which enables parsing in HTML blocks until it is toggled off.

Since this change, we have received Zendesk tickets regarding the use of the
image component inside a call to action. Because we toggle off the parsing of
HTML blocks entirely within call to actions, the HTML of the image is not
parsed correctly.

To fix this, we can enable parsing of the surrounding divs of the call to
action and legislative list components only. This means that any inner blocks
(such as the `figure` tag surrounding an image) are parsed using their default
mechanism [1].

[1]: (https://kramdown.gettalong.org/syntax.html#html-blocks).
jkempster34 added a commit that referenced this pull request Oct 18, 2023
In #291, we made a change to how the call to action and legislative list
components are parsed. This involved the use of the `parse_block_html` option,
which enables parsing in HTML blocks until it is toggled off.

Since this change, we have received Zendesk tickets regarding the use of the
image component inside a call to action. Because we toggle off the parsing of
HTML blocks entirely within call to actions, the HTML of the image is not
parsed correctly.

To fix this, we can enable parsing of the surrounding divs of the call to
action and legislative list components only. This means that any inner blocks
(such as the `figure` tag surrounding an image) are parsed using their default
mechanism [1].

[1]: https://kramdown.gettalong.org/syntax.html#html-blocks
jkempster34 added a commit that referenced this pull request Oct 18, 2023
In #291, we made a change to how the call to action and legislative list
components are parsed. This involved the use of the `parse_block_html` option,
which enables parsing in HTML blocks until it is toggled off.

Since this change, we have received Zendesk tickets regarding the use of the
image component inside a call to action. Because we toggle off the parsing of
HTML blocks entirely within call to actions, the HTML of the image is not
parsed correctly.

To fix this, we can enable parsing of the surrounding divs of the call to
action and legislative list components only. This means that any inner blocks
(such as the `figure` tag surrounding an image) are parsed using their default
mechanism [1].

[1]: https://kramdown.gettalong.org/syntax.html#html-blocks
jkempster34 added a commit that referenced this pull request Oct 18, 2023
In #291, we made a change to how the call to action and legislative list
components are parsed. This involved the use of the `parse_block_html` option,
which enables parsing in HTML blocks until it is toggled off.

Since this change, we have received Zendesk tickets regarding the use of the
image component inside a call to action. Because we toggle off the parsing of
HTML blocks entirely within call to actions, the HTML of the image is not
parsed correctly.

To fix this, we can enable parsing of the surrounding divs of the call to
action and legislative list components only. This means that any inner blocks
(such as the `figure` tag surrounding an image) are parsed using their default
mechanism [1].

[1]: https://kramdown.gettalong.org/syntax.html#html-blocks
jkempster34 added a commit that referenced this pull request Oct 18, 2023
In #291, we made a change to how the call to action and legislative list
components are parsed. This involved the use of the `parse_block_html` option,
which enables parsing in HTML blocks until it is toggled off.

Since this change, we have received Zendesk tickets regarding the use of the
image component inside a call to action. Because we toggle off the parsing of
HTML blocks entirely within call to actions, the HTML of the image is not
parsed correctly.

To fix this, we can enable parsing of the surrounding divs of the call to
action and legislative list components only. This means that any inner blocks
(such as the `figure` tag surrounding an image) are parsed using their default
mechanism [1].

[1]: https://kramdown.gettalong.org/syntax.html#html-blocks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants