Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flawed getUniqAttr Parsing May Corrupt HTML Output #202

Open
mmichaelis opened this issue Oct 24, 2023 · 3 comments
Open

Flawed getUniqAttr Parsing May Corrupt HTML Output #202

mmichaelis opened this issue Oct 24, 2023 · 3 comments
Assignees
Labels
bug Something isn't working P1 Moderate Issue

Comments

@mmichaelis
Copy link

/**
* Gets value from
* @example
* getUniqAttr({ 'foo': true, 'bar': bar' }) => 'bar'
* @param attrs
* @returns {string}
*/
const getUniqAttr = (attrs) => keysReduce(
attrs,
(res, key) => (attrs[key] === key ? attrs[key] : null),
null,
);

Without understanding the details, the description may benefit from some enhanced description (see below). Given my assumptions and tests are correct, I will refer to a possibly even dangerous flaw in getUniqAttr handling, which can be summarized as: You can fake unique attributes within BBCode.

Suggestion for Description Enhancement

/**
 * Given a record of string to some value, this method will
 * retrieve the last entry in the record and return its key
 * when it is equal to its value.
 *
 * Such entries typically represent so-called _unique attributes_
 * after parsing, so that `[url=someUrl]` gets parsed to an
 * attributes object like: `{ someUrl: "someUrl" }`.
 *
 * @example
 * getUniqAttr({ 'foo': true, 'bar': bar' }) => 'bar'
 * @example
 * getUniqAttr({ 'bar': bar', 'foo': true }) => null
 * @param attrs - record of strings to attribute values
 * @returns {string|null} `null`, if no unique attribute could be determined
 */

The Flaw

BBCode Actual HTML Expected HTML (Suggestion)
[url fakeUnique=fakeUnique]T[/url] <a href="fakeUnique">T</a> <a href="T" fakeUnique="fakeUnique">T</a>
[url=https://example.org/ fakeUnique=fakeUnique]T[/url] <a href="fakeUnique">T</a> <a href="https://example.org/" fakeUnique="fakeUnique">T</a>
[url=https://example.org/ hidden]T[/url] <a href="hidden">T</a> <a href="T" hidden="hidden">T</a>
[url=https://example.org/ hidden]T[/url] <a href="hidden">T</a> <a href="T" hidden="hidden">T</a>
[table=onclick][tr][td]T[/td][/tr][/table] <table onclick="onclick"><tr><td>T</td></tr></table> undecided
[table onclick=onclick][tr][td]T[/td][/tr][/table] <table onclick="onclick"><tr><td>T</td></tr></table> undecided

Stumbled across this while trying to add a sanitizer that forbids on* attributes to be created during processing.

Thus, the attribute found as being "unique" may not always have been "unique" within the original BBCode.

Perhaps one possible option would be using a Symbol() as key for the unique attribute. But I did not dive into parsing, if this is even feasible.

@JiLiZART
Copy link
Owner

JiLiZART commented Oct 25, 2023

Main goal of getUniqAttr is to handle situations where attribute name is a tag name for bbcodes without any attributes but with attribute value like [url=example.com]T[/url] any other variations of usage are falsy.
Maybe it's a bad decidion for handle this as { 'example.com': example.com } maybe better will be handling this as { url: 'example.com' }, but it breaking change.

Missunderstandings will be fixed when I done TypeScript support

@JiLiZART JiLiZART self-assigned this Oct 25, 2023
@JiLiZART JiLiZART added bug Something isn't working P1 Moderate Issue labels Oct 25, 2023
@mmichaelis
Copy link
Author

Maybe it's a bad decidion for handle this as { 'example.com': example.com } maybe better will be handling this as { url: 'example.com' }, but it breaking change.

Yes, it will be a breaking change (and we will for sure stumble across this when the behavior changes; hopefully by some integration tests that will trigger us on update).

Introducing Symbols (which may be the best option here regarding non-conflicting behavior), an alternative representation may be some characters, that will never make it to an attribute name (at least when parsing BBCode). A space character may be the best option here, or a closing bracket:

  • unique
  • uni]que

To smoothen updates, all API clients should be advised to only access unique attributes via dedicated API instead of applying their own guesses on the behavior. Possibly even a challenge for us, as we try to prevent any on* handlers to sneak into the attributes.

mmichaelis added a commit to CoreMedia/ckeditor-plugins that referenced this issue Oct 25, 2023
Adding an integration test for behavior influenced by:

* JiLiZART/BBob#202

We decided (better: will decide with subsequent commits) to accept
the possible flaw as corner case, that a URL tag like

```
[url=onclick]T[/url]
```

will still get the `onclick` value removed, as it is denoted in
as `onclick=onclick` in attributes (concept of unique attributes as
it is now in BBob).

If these tests fail, we have to revisit corresponding code and see,
if we need or can adapt something.
@mmichaelis
Copy link
Author

This issue becomes more severe, when you want to have tags, that only handle "unique attributes" differently, but keep all others as is. Thus, if you want to stick to the default mapping, for all not explicitly mapped tags/elements.

Example input without extra mapping configured:

[unknown=unique fakeUnique=fakeUnique]T[/unknown]

will become:

<unknown unique="unique" fakeUnique="fakeUnique">T</unknown>

which, in case of providing an alternative mapping for [url] where you "keep all other but unique attributes" will trigger the following mapping:

[url=https://example.org/ fakeUnique=fakeUnique]T[/url]

to

<a https://example.org/=https://example.org/ href="fakeUnique">T</a>

Follow-Up Options:

The following follow-up options may influence a possible solution to this issue.

Option 1) As soon as this issue is fixed, I would suggest adapting attrsToString to even ignore unique attributes, which are left after processing:

const attrsToString = (values) => {

Thus, the [unknown] example above, should rather be mapped to:

<unknown fakeUnique="fakeUnique">T</unknown>

Doing so now, without fixing this issue, may cause other subsequent bugs and breaking behavior.

Option 2) Or perhaps a very different approach, where you may leave attrsToString untouched: Choose a unique attribute representation that uses as "namespace" prefix. Example: bbob:unique. Without explicit mapping, the [unknown] example would then render to:

<unknown fakeUnique="fakeUnique" bbob:unique="unique">T</unknown>

Option 3) Or, not to use namespaces, but data-attributes instead, which then again represents a valid HTML attribute:

<unknown fakeUnique="fakeUnique" data-bbob-unique="unique">T</unknown>

(accessible in JavaScript via el.dataset.bbobUnique).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P1 Moderate Issue
Projects
None yet
Development

No branches or pull requests

2 participants