-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New methods for excluding elements with specific missing or empty attributes #45
base: master
Are you sure you want to change the base?
Conversation
… missing or emtpy attributes +Removed AutoCloseableHtmlStreamRenderer for Java SE 6 compatibility
…ttributes or elements that do NOT match a pattern
Thank you very, very much for submitting a patch with your report. More from Mike soon! Aloha,Jim Manico
|
/** | ||
* Disallows the given element from appearing without the given attribute. | ||
*/ | ||
public HtmlPolicyBuilder disallowWithoutAttribute(String elementName, final String attributeName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, we can say
myPolicyBuilder.allowAttributes("src", ...).onElements("img")
and if I understand your goal, that is problematic because it allows but does not require src="..."
on <img ...>
.
I'd prefer
myPolicyBuilder.withAttributes("src", ...).required().onElements("img")
which allows mixing required()
into the existing flow by which elements are associated with attributes instead of creating a new API.
This change seems to be missing any tests. Could you at least check that it works with the kind of policy you want to work. |
What PDF engine is this? If you don't maintain it then it might be worth filing a bug with them as well. Should this change also require |
Hello Mike, sorry for the wait. Our goal was indeed to be able to require certain attributes when their element would make no sense or as in this case even cause harm without them, which was not possible with the current allowAttributes()-implementation. Your proposed mimic of a chainable required() seems indeed a cleaner approach. Yet my understanding of the project code is currently not sufficient to change the implementation to that within reasonable effort. I will however try and provide a test case to show the validity of the current implementation shortly. Maybe then you could refactor it to the required()-mimic as you find the time. We use Apache FOP to export PDFs from our web application. As you mention, it might actually be useful to require the attribute in the standard sanitizer so others do not run into that kind of issue unsuspectingly. At least no valid use case comes to mind for an img without a src that would forbid it from being required. Best regards, |
Hello Mike, I added a TestCase for the disallowWithoutAttribute() functionality that represents the constellation that caused us problems (arbitrary alt, missing src). Best regards, |
It looks like this is mostly resolved. @mikesamuel - can we merge in the test case or resolve this pull request somehow? It does not look like major work is needed to resolve this.. |
OWASP#206) * Do not lcase element or attribute names that match SVG or MathML names exactly > Currently all names are converted to lowercase which is ok when > you're using it for HTML only, but if there is an SVG image nested > inside the HTML it breaks. For example, when `viewBox` attribute is > converted to `viewbox` the image is not displayed correctly. This commit splits *HtmlLexer*.*canonicalName* into variants which preserve items on whitelists derived from the SVG and MathML specifications, and adjusts callers of *canonicalName* to use the appropriate variant. Fixes OWASP#182 * add unittests for mixed-case SVG names
Bump @mikesamuel ? |
Sorry for ghosting. I'll take a look shortly. |
{dis,}allowWithoutAttributes were reworked recently. https://github.com/OWASP/java-html-sanitizer/blob/main/change_log.md says
which affected java-html-sanitizer/src/main/java/org/owasp/html/HtmlPolicyBuilder.java Lines 314 to 343 in ca40697
IIUC, what you need is a way to drop specific elements that are missing some core attributes. The following passes for me, which uses an ElementPolicy to reject String input = "<img src=foo.png><img alt=bar>";
PolicyFactory imgOkWithSrc = new HtmlPolicyBuilder()
.allowElements(
((elementName, attrs) -> (attrs.indexOf("src") & 1) == 0 ? elementName : null),
"img"
)
.allowAttributes("alt", "src").onElements("img")
.toFactory();
assertEquals("<img src=\"foo.png\" />", imgOkWithSrc.sanitize(input)); |
Bumps [junit](https://github.com/junit-team/junit4) from 4.12 to 4.13.1. - [Release notes](https://github.com/junit-team/junit4/releases) - [Changelog](https://github.com/junit-team/junit4/blob/main/doc/ReleaseNotes4.12.md) - [Commits](junit-team/junit4@r4.12...r4.13.1) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* allowAtributes("style") * Global style test
This may still be overridden with `-Dguava-version=...`.
This addresses a vulnerability where policies that allow `<style>` elements with text in `<option>` elements are vulnerable to XSS as disclosed in https://docs.google.com/document/d/11SoX296sMS0XoQiQbpxc5pNxSdbJKDJkm5BDv0zrX50/edit?usp=sharing This changes behavior for rendering of `<style>` element text so may change behavior. Specifically, `<style>` element text that includes the strings `-->` or `]]>` will no longer sanitize.
Rather than mucking with `<style>` tag content in all cases, this is a more tailored fix to the recent vulnerability that just closes `<style>` elements when we realize they're in a dodgy parsing context.
As described in issue OWASP#254 `¶` is a full complete character reference when decoding text node content, but not when decoding attribute content which causes problems for URL attribute values like /test?param1=foo¶m2=bar As shown via JS test code in that issue, a small set of next characters prevent a character reference name match from being considered complete. This commit: - modifies the decode functions to take an extra parameter `boolean inAttribute`, and modifies the Trie traversal loops to not store a longest match so far based on that parameter and some next character tests - modifies the HTML lexer to pass that attribute appropriately - for backwards compat, leaves the old APIs in place but `@deprecated` - adds unit tests for the decode functions - adds a unit test for the specific input from the issue This change should make us more conformant with observed browser behaviour so is not expected to cause compatibility problems for existing users. Fixes OWASP#254
…P#266) CssTokens code assumed that consumeIdentOrUrlOrFunctions always returned a token type and consumed characters. This commit audits all uses of that function and checks that they make progress.
We recently came across a problem where an img-element with missing src-attribute caused our PDF engine to break. As the current version of the html sanitizer only contains a method for disallowing elements without any attributes, we suggest adding methods to target a specific missing/empty attribute.
As a convenience we also included a method to check for a non-matching regex pattern.