Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds link validation for clay-paragraph, clay-subheader #1505

Merged
merged 18 commits into from
May 27, 2021

Conversation

mattoberle
Copy link
Contributor

@mattoberle mattoberle commented Nov 19, 2020

Updated PR Description

This pull request implements link validation for QuillJS <a> tags embeded in the clay-paragraph and clay-subheader components. Validation is performed by overriding the default QuillJS Link class with a custom SafeLink sub class.

SafeLink sets class="kiln-link-invalid" on the <a> tag whenever a link does not meet any of the following criteria:

  • The link is an anchor link (eg. #section-2).
  • The link is relative to the site root (eg. /section).
  • The link is a mailto link (eg. mailto:[email protected]).
  • The link is in the format {protocol}://{domain}/*.

Kiln will block publishing when a page contains invalid links.
Invalid links are styled in red to make them easily distinguishable.

In addition to flagging invalid links, this pull request also removes links applied to single punctuation and white-space characters. QuillJS requires users to select the full hyperlinked text to remove a link; this makes it very easy to leave a space or punctuation character hyperlinked. Tracking down those single-character links is like looking for a needle in a haystack.

Old PR Description

This pull request contains several iterations of link validation.

The first commit demonstrates how we can subclass formats/link to create a Quill Blot that self-removes when invalid. This approach works but felt like a hack, it was ultimately scrapped.

The current state of the PR demonstrates how we can utilize lib/validators/built-in/invalid-links.js to surface validation errors to end users (while retaining their original input). The validators approach closely mirrors what we do in lib/validators/edit-links.js and feels most at-home in Kiln.

I have some reservations about the way edit-links.js and invalid-links.js work.
The fact that both validation functions are using regular expressions to pull out anchor tag values from partial HTML blobs feels fragile. It also feels redundant to iterate over the state twice to validate links.

Note:

Tests executed with jest pass on this branch, although we seem to have accumulated quite a few files that don't pass Kiln's eslint rules. The linting is what fails in CircleCI but I don't want to muddy this PR with a bunch of scattered style updates. This is fixed.

Note 2:

Tests do not pass with the Node8 container runtime, but do pass with Node10+.
I've opened #1507 which will adjust the runtime used for tests (to reflect the currently supported Node versions).
This is merged.

Future Suggestions

These suggestions are likely outside of the scope of this pull request but would result in improvements to Kiln:

One-iteration Validators

It seems wasteful to have each built-in validator iterate over the entire state.
A more efficient pattern would be to compose all of the validators together and iterate over the state once.

Behaviors

@reubenson pointed out that Kiln has a concept of behaviors that may allow components to opt-in/out of validation.

@mattoberle mattoberle changed the title [wip] provides proof-of-concept for invalid link removal Adds link validation for clay-paragraph, clay-subheader Nov 20, 2020
@mattoberle mattoberle marked this pull request as ready for review November 20, 2020 19:29
@mattoberle mattoberle force-pushed the quill-remove-invalid-links branch 2 times, most recently from f076658 to c52f104 Compare December 7, 2020 22:16
@coveralls
Copy link

coveralls commented Dec 7, 2020

Coverage Status

Coverage increased (+0.08%) to 34.93% when pulling 7d35a0c on quill-remove-invalid-links into 2adb7bb on master.

@mattoberle mattoberle force-pushed the quill-remove-invalid-links branch from 7a1faad to 77b60ae Compare January 13, 2021 16:21
@mattoberle mattoberle force-pushed the quill-remove-invalid-links branch from a1bef19 to d4e1cd7 Compare April 9, 2021 22:36
mattoberle and others added 16 commits May 10, 2021 10:52
This commit provides a working sub-class of Link that prevents the
creation of invalid anchor tags.

The validation is accomplished in the `SafeLink.optimize` method.
I was unable to get validation working in the `SafeLink.create` step
due to some actions Parchment takes after `create` to build the linked
list.

We need to determine how to organize custom Quill classes and what the
best practice is for loading them.

As an alternative, we could utilize the `SafeLink.create` method to set
an attribute on the anchor tag that marks the link as invalid (and
provide visual feedback via CSS? plus use that attribute as
validation?). I'll admit, I'm pretty out of my element when it comes to
front-end work so I would need to run that by the experts.
The tests expected a '…' character in the preview text.
This commit adds that character.
We don't need to run link validation on every component + every
property on the page. Our primary concern is links created with Quill
which we can target using generic Clay components.

The limit on this filter can be lifted if the use case presents itself,
but it's better to keep this number of affected components small to
start.
Since we are transpiling Kiln on release we probably don't need to
guarantee the ability to build with Node10 (which is EOL in a few
months). Removing that build compatability requirement will allow us to
use `string.prototype.matchAll` (needed for link validation).

The alternative would be to add another dependency or come up with our
own implementation in the short term.

All current major browsers support `.matchAll` so it's not worth the
hassle IMO.
In Firefox this will throw an Error:

```
new URL("https://www. example .com ")
```

In Chrome the same action will result in:

```
https://www.%20example%20.com
```

As a result the try/catch pattern used for validation rarely flagged
links on Chrome.

The approach in this commit uses a lenient regular expression to
validate URLs. A URL just needs a protocol, a domain, and optionally a
path. The domain and protocol have a limited character set.
Validation that disallows same-site links might be too strict for
legacy content. Although protocol-less links cannot be created through
Kiln, Clay users can easily create `href="/section/page"`
links by importing content through the API.

This commit allows same-site links to avoid situations where old pages
cannot be re-published through the UI.

This commit does **not** allow same-site links that are not anchored
with a leading "/".
This commit replaces the dubious regex extraction of <a href=""> values
with a reliable method based on the document query selector.

Co-authored-by: jz <[email protected]>
This commit addresses two UX issues with Kiln link validation.

**Invalid Link Styling**

Prior to this commit the only visual "hint" identifying invalid links
was the list of invalid components in the Kiln sidebar. By highlighting
links in red these errors will be more obvious to users.

:warning: I probably didn't organize this css change properly.

**Link Removal**

QuillJS requires the _full_ text of a link to be highlighted in order to
remove it. This can lead to little bits of punctuation or whitespace
retaining their hyperlink status. Identifying these single-character
links is especially frustrating if link validation is preventing a page
from being published.

This commit will automatically strip `<a>` tags that contain only a
single non-alphanumeric character as text.
Using `new URL` was too unpredictable across environments.
We were seeing tests fail in CircleCI, but pass locally (in `node`).

Paired with variability in browser behavior, I'm not sure how we can
adequately test a `new URL` implementation without a selenium test suite
(or something similar).

The regular expression method accepts:

* Anchor links
* Relative links to the site root
* Any link with `<protocol>:<path>`

More generally, the expression rejects anything containing spaces.
The invalid link tests were still expecting the link-parsing to be
performed during the validation function, but that responsibility was
moved to QuillJS. This commit adjusts the tests to look for the new
class.

:warning: This means we don't have any actual tests for link validation
in our custom link class.
@mattoberle mattoberle force-pushed the quill-remove-invalid-links branch from 82bf908 to 6def82f Compare May 10, 2021 14:54
@mattoberle mattoberle force-pushed the quill-remove-invalid-links branch from adc8177 to 7d35a0c Compare May 11, 2021 19:10
@jpope19 jpope19 merged commit 1409ec9 into master May 27, 2021
@jpope19 jpope19 deleted the quill-remove-invalid-links branch May 27, 2021 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants