-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Vertical Bar Character as class name throws an Exception #1998
Comments
Yes, Inspired by mathiasbynens/CSS.escape, I wrote the following code, /*
Given a CSS identifier (such as a tag, ID, or class), escape any CSS special characters that would otherwise not be
valid in a selector.
*/
public static String escapeCssIdentifier(String in) {
StringBuilder result = StringUtil.borrowBuilder();
int[] codePoints = in.codePoints().toArray();
int length = codePoints.length;
int firstCodePoint = codePoints[0];
// If the character is the first character and is a `-` (U+002D), and
// there is no second character, […]
if (length == 1 && firstCodePoint == 0x002D) {
return ESC + in;
}
int index = -1;
while (++index < length) {
int codePoint = codePoints[index];
// Note: there’s no need to special-case astral symbols, surrogate
// pairs, or lone surrogates.
// If the character is NULL (U+0000), then the REPLACEMENT CHARACTER
// (U+FFFD).
if (codePoint == 0x0000) {
result.append('\uFFFD');
continue;
}
if (
// If the character is in the range [\1-\1F] (U+0001 to U+001F) or is
// U+007F, […]
(codePoint >= 0x0001 && codePoint <= 0x001F) || codePoint == 0x007F ||
// If the character is the first character and is in the range [0-9]
// (U+0030 to U+0039), […]
(index == 0 && codePoint >= 0x0030 && codePoint <= 0x0039) ||
// If the character is the second character and is in the range [0-9]
// (U+0030 to U+0039) and the first character is a `-` (U+002D), […]
(
index == 1 &&
codePoint >= 0x0030 && codePoint <= 0x0039 &&
firstCodePoint == 0x002D
)
) {
// https://drafts.csswg.org/cssom/#escape-a-character-as-code-point
result.append(ESC).append(Integer.toHexString(codePoint)).append(' ');
continue;
}
// If the character is not handled by one of the above rules and is
// greater than or equal to U+0080, is `-` (U+002D) or `_` (U+005F), or
// is in one of the ranges [0-9] (U+0030 to U+0039), [A-Z] (U+0041 to
// U+005A), or [a-z] (U+0061 to U+007A), […]
if (
codePoint >= 0x0080 ||
codePoint == 0x002D ||
codePoint == 0x005F ||
codePoint >= 0x0030 && codePoint <= 0x0039 ||
codePoint >= 0x0041 && codePoint <= 0x005A ||
codePoint >= 0x0061 && codePoint <= 0x007A
) {
// the character itself
result.append(new String(codePoints, index, 1));
continue;
}
// Otherwise, the escaped character.
// https://drafts.csswg.org/cssom/#escape-a-character
result.append(ESC).append(new String(codePoints, index, 1));
}
return StringUtil.releaseBuilder(result);
} then
It seems to be a bit troublesome to make it work with Android. And I've found out a few more related issues.
See, |
Was fixed in #2146 |
Consider the following HTML:
This HTML is valid according to https://validator.w3.org/nu/#textarea
If I try to get the CSS selector of the button, a SelectorParseException is being thrown:
Stacktrace (Jsoup latest stable version 1.16.1):
A real-world example where a class name is being used that consists only of a vertical bar character can be found here: https://www.mueller.at/ (Search for
.\|
in the browser DevTools to find an example node)If the vertical bar character appears inside a class name (e.g.
a|b
) a different error is thrown:The text was updated successfully, but these errors were encountered: