Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escaped css selector names in SVGs break the parser #91

Open
Watercycle opened this issue Jun 28, 2023 · 1 comment
Open

Escaped css selector names in SVGs break the parser #91

Watercycle opened this issue Jun 28, 2023 · 1 comment

Comments

@Watercycle
Copy link

Watercycle commented Jun 28, 2023

Version: 0.16.0

Given:

$oddSvg = <<<SVG
<svg>
    <defs>
        <style>
            .\37 15ca94c-fc50-4dc6-8356-e55b8cb855fa { fill: #1d526b; }
        </style>
    </defs>
</svg>
SVG;

(new enshrined\svgSanitize\Sanitizer())->sanitize($oddSvg); // => ""

Expected: There should be no changes.
Actual: It returns an empty string.

Context:
Apparently some SVG generators use UUID class names. According to the CSS spec class selectors can lead with escaped digits (TIL).


I'm not sure I'll have time to look into the solution; but, wanted to file this so others know. Ideally any tools starting its CSS selectors with numbers should be thrown to the wolves.

@verdy-p
Copy link

verdy-p commented Sep 8, 2024

More on this:
https://stackoverflow.com/questions/34456630/escaping-characters-in-css-class-selectors
But there are some good reasons to use escapes in CSS even if the DOM does not need or allow them (e.g. you can't name an HTML class with a space, but it is allowed to use colons :, dots . or braces {} in HTML class names without using any escape, but you need to escape them in CSS selectors... This cases occurs sometimes during conversion (when it is not always safe to simply change the HTML or DOM to replace them by '-' or '_' without creating collisions). The most common usage case is with colons and dots, used in many naming schemes for various languages which use these namespace separators distinctly from hyphens or underscores for basic names within the same namespace (CSS still does not have the concept of namespaces for selectors: adding escapes in CSS allow such integration and compatibility with many other language requirements)

However you should not use any escape in CSS selector names if you dont need to. In your example
.\37 15ca94c-fc50-4dc6-8356-e55b8cb855fa { fill: #1d526b; }
should be rewritten simply as:
.715ca94c-fc50-4dc6-8356-e55b8cb855fa { fill: #1d526b; }
This second form should be the one returned by the sanitizer (technically your example escape is valid, but overkill, both are strictly equivalent; escapes should only be used to escape characters reserved for the CSS syntax itself for selectors or for delimiting sets of properties, notably .,;*+>[]{}. It could as well return it in maximally compacted equivalent form:
.715ca94c-fc50-4dc6-8356-e55b8cb855fa{fill:#1d526b}
whithout any redundant whitespaces or newlines, or any semicolon terminator for the last property before the closing brace, just like it could also drop all comments. The sanitizer may also return it in "pretty printed" form with indentation whitespace and newlines, or whitespaces around braces or between a property name and its given value (possibly indented to the longest property name and optionally breaking lines with continuation for long values, if possible, to avoid long lines for human editors), with an extra option to preserve the contents of comments inserted between tokens. "Pretty printing" the compact form is entirely a preference and can be automated, as the same is true for compacting such redundant presentation (if one just wants to reduce the resident memory usage, or the storage/transmission footprint, even if generic data compression scheme is used, this compaction helps optimizing resource usage for later loading and parsing CSS documents, notably in clients with limited resources, low amount of memory, slower CPUs and with limited multithreading capabilities).

However I wonder if the escaping is needed for some CSS parsers whose limited lexer could interpret the start as a decimal floatting floatting point followed by a unspecified unit (this should not be the case here, a number or number with unit is not valid at that place in a CSS document, such number or measure token should only occur within properties or in parameters (between parentheses) of a named functional selector. The only tokens allowed there are whitespaces, comments, media rules, a CSS selection operator (dot, hash, opening square bracket), a "wildcard" (*) pseudo-element selector, or an element name: the leading dot can only be the CSS class selection operator and must be separated from other lexical tokens what follows. A good CSS parser should use a different lexer (or a lexer with a different starting state) that do not neceasserily return the same tokens from arbitrary contextual positions (this is needed anyway in any minimal lexer for correctly parsing comments, included their embedded whitespaces, to determine the breaking positions of significant tokens).


Also let's remember that CSS is independant of HTML. It can be used as a stylesheet language for other presentational language than HTML, it could be used as well to style components rendered in an application using its own internal naming rules. So CSS allows names with escapes for selectors that will in find match nothing in the document or the application because the valid CSS selector uses characters or substrings not permitted in the naming model for the document or application. You can perfectly use CSS to style an application written in C, C++, C#, PHP or Java, or with any graphics library for a game engine. You could use CSS to style the rendering output of a debugger, where you want to expose all its allowed internal names and do not want to create any dangerous collisions or ambiguities.

As well, CSS is also not the only existing styling language: such CSS may result of the conversion from another styling language (possibly automatically and dynamically at runtime), for example in a renderer that would convert an existing application to render it on the web (or other non-web private networks or communication links), using CSS and very huge choice of presentation languages (not just HTML) or client-side frameworks (e.g. in Javascript, and with custom networking protocols and various security mechanisms). It's not silly to imagine a standalone game engine or CAD design software running purely locally, but whose presentation will be designed and styled easily with CSS, where the application would embed a CSS adaptation layer.

CSS escapes are there to allow such things, even if they first seem "silly" for styling an HTML document. And no reason at all for authors (or automated generators) of such thing to be "thrown to the wolfes", as nothing is violated and everything here is perfectly defined in the relevant standard. If this effectibvely broke a rule, the parser should generate an exception or error, but here it stays silent and that looks fine for me.

However I wonder if the fact that your exemple returns an empty string is just the fact that the SVG only contains a def element with an unused style element; there's no reference at all to any part of the actual SVG content, which is then completely empty (and even the "svg" element itself becomes superfluous: a totally empty SVG document is still valid and safe and will render exactly the same as the "missing" svg root element is just implied and there's no other attributes for it (this SVG has an implicit empty bounding box, is fully transparent, does not need any clipping and will render identically on any placement; if this SVG was embedded anywhere in another document, it would not change at all its aspect; it may still have a DOM with an implicit svg root element and no subelement, its rendering would do absolutely nothing).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants