Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New methods for excluding elements with specific missing or empty attributes #45

Open
wants to merge 30 commits into
base: master
Choose a base branch
from

Conversation

forum-is
Copy link

@forum-is forum-is commented Oct 9, 2015

We recently came across a problem where an img-element with missing src-attribute caused our PDF engine to break. As the current version of the html sanitizer only contains a method for disallowing elements without any attributes, we suggest adding methods to target a specific missing/empty attribute.
As a convenience we also included a method to check for a non-matching regex pattern.

Sebastian Uecker added 3 commits October 9, 2015 15:48
… missing or emtpy attributes

+Removed AutoCloseableHtmlStreamRenderer for Java SE 6 compatibility
…ttributes or elements that do NOT match a pattern
@jmanico
Copy link
Member

jmanico commented Oct 9, 2015

Thank you very, very much for submitting a patch with your report. More from Mike soon!

Aloha,

Jim Manico
@manicode

On Oct 9, 2015, at 4:53 PM, FORUM Gesellschaft für Informationssicherheit mbH [email protected] wrote:

We recently came across a problem where an img-element with missing src-attribute caused our PDF engine to break. As the current version of the html sanitizer only contains a method for disallowing elements without any attributes, we suggest adding methods to target a specific missing/empty attribute.
As a convenience we also included a method to check for a non-matching regex pattern.

You can view, comment on, or merge this pull request online at:

#45

Commit Summary

+Added HtmlPolicyBuilder methods for excluding elements with specific missing or emtpy attributes
Reverted changes
added methods for excluding elements with specific empty or missing attributes or elements that do NOT match a pattern
File Changes

M src/main/java/org/owasp/html/HtmlPolicyBuilder.java (49)
Patch Links:

https://github.com/OWASP/java-html-sanitizer/pull/45.patch
https://github.com/OWASP/java-html-sanitizer/pull/45.diff

Reply to this email directly or view it on GitHub.

/**
* Disallows the given element from appearing without the given attribute.
*/
public HtmlPolicyBuilder disallowWithoutAttribute(String elementName, final String attributeName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, we can say

    myPolicyBuilder.allowAttributes("src", ...).onElements("img")

and if I understand your goal, that is problematic because it allows but does not require src="..." on <img ...>.

I'd prefer

    myPolicyBuilder.withAttributes("src", ...).required().onElements("img")

which allows mixing required() into the existing flow by which elements are associated with attributes instead of creating a new API.

@mikesamuel
Copy link
Contributor

This change seems to be missing any tests. Could you at least check that it works with the kind of policy you want to work.

@mikesamuel
Copy link
Contributor

We recently came across a problem where an img-element with missing src-attribute caused our PDF engine to break. As the current version of the html sanitizer only contains a method for disallowing elements without any attributes, we suggest adding methods to target a specific missing/empty attribute.
As a convenience we also included a method to check for a non-matching regex pattern.

What PDF engine is this? If you don't maintain it then it might be worth filing a bug with them as well.

Should this change also require src="..." on <img> in Sanitizers.IMAGES?

@forum-is
Copy link
Author

Hello Mike,

sorry for the wait.

Our goal was indeed to be able to require certain attributes when their element would make no sense or as in this case even cause harm without them, which was not possible with the current allowAttributes()-implementation.

Your proposed mimic of a chainable required() seems indeed a cleaner approach. Yet my understanding of the project code is currently not sufficient to change the implementation to that within reasonable effort. I will however try and provide a test case to show the validity of the current implementation shortly. Maybe then you could refactor it to the required()-mimic as you find the time.

We use Apache FOP to export PDFs from our web application.

As you mention, it might actually be useful to require the attribute in the standard sanitizer so others do not run into that kind of issue unsuspectingly. At least no valid use case comes to mind for an img without a src that would forbid it from being required.

Best regards,
Sebastian

@forum-is
Copy link
Author

forum-is commented Nov 9, 2015

Hello Mike,

I added a TestCase for the disallowWithoutAttribute() functionality that represents the constellation that caused us problems (arbitrary alt, missing src).

Best regards,
Sebastian

@jmanico
Copy link
Member

jmanico commented Jul 17, 2016

It looks like this is mostly resolved. @mikesamuel - can we merge in the test case or resolve this pull request somehow? It does not look like major work is needed to resolve this..

OWASP#206)

* Do not lcase element or attribute names that match SVG or MathML names exactly

> Currently all names are converted to lowercase which is ok when
> you're using it for HTML only, but if there is an SVG image nested
> inside the HTML it breaks.  For example, when `viewBox` attribute is
> converted to `viewbox` the image is not displayed correctly.

This commit splits *HtmlLexer*.*canonicalName* into variants which preserve
items on whitelists derived from the SVG and MathML specifications, and
adjusts callers of *canonicalName* to use the appropriate variant.

Fixes OWASP#182

* add unittests for mixed-case SVG names
@jmanico
Copy link
Member

jmanico commented Sep 17, 2020

Bump @mikesamuel ?

@mikesamuel mikesamuel self-assigned this Sep 18, 2020
@mikesamuel
Copy link
Contributor

Sorry for ghosting. I'll take a look shortly.

@mikesamuel
Copy link
Contributor

{dis,}allowWithoutAttributes were reworked recently. https://github.com/OWASP/java-html-sanitizer/blob/main/change_log.md says

Release 20200615.1

  • Change .and when combining two policies to respect explicit skipIfEmpty decisions.

which affected

/**
* Assuming the given elements are allowed, allows them to appear without
* attributes.
*
* @see #DEFAULT_SKIP_TAG_MAP_IF_EMPTY_ATTR
* @see #disallowWithoutAttributes
*/
public HtmlPolicyBuilder allowWithoutAttributes(String... elementNames) {
invalidateCompiledState();
for (String elementName : elementNames) {
elementName = HtmlLexer.canonicalElementName(elementName);
skipIssueTagMap.put(elementName, HtmlTagSkipType.DO_NOT_SKIP);
}
return this;
}
/**
* Disallows the given elements from appearing without attributes.
*
* @see #DEFAULT_SKIP_TAG_MAP_IF_EMPTY_ATTR
* @see #allowWithoutAttributes
*/
public HtmlPolicyBuilder disallowWithoutAttributes(String... elementNames) {
invalidateCompiledState();
for (String elementName : elementNames) {
elementName = HtmlLexer.canonicalElementName(elementName);
skipIssueTagMap.put(elementName, HtmlTagSkipType.SKIP);
}
return this;
}

IIUC, what you need is a way to drop specific elements that are missing some core attributes.

The following passes for me, which uses an ElementPolicy to reject <img> tags that don't have "src" in a key position. It may spuriously drop <img alt="src" src="foo.png"> but works as a POC.

    String input = "<img src=foo.png><img alt=bar>";
    PolicyFactory imgOkWithSrc = new HtmlPolicyBuilder()
            .allowElements(
                    ((elementName, attrs) -> (attrs.indexOf("src") & 1) == 0 ? elementName : null),
                    "img"
            )
            .allowAttributes("alt", "src").onElements("img")
            .toFactory();
    assertEquals("<img src=\"foo.png\" />", imgOkWithSrc.sanitize(input));

dependabot bot and others added 5 commits December 7, 2020 14:30
Bumps [junit](https://github.com/junit-team/junit4) from 4.12 to 4.13.1.
- [Release notes](https://github.com/junit-team/junit4/releases)
- [Changelog](https://github.com/junit-team/junit4/blob/main/doc/ReleaseNotes4.12.md)
- [Commits](junit-team/junit4@r4.12...r4.13.1)

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* allowAtributes("style")

* Global style test
This may still be overridden with `-Dguava-version=...`.
mikesamuel and others added 13 commits October 18, 2021 09:23
This addresses a vulnerability where policies that allow `<style>`
elements with text in `<option>` elements are vulnerable to XSS as
disclosed in

https://docs.google.com/document/d/11SoX296sMS0XoQiQbpxc5pNxSdbJKDJkm5BDv0zrX50/edit?usp=sharing

This changes behavior for rendering of `<style>` element text so may
change behavior.

Specifically, `<style>` element text that includes the strings `-->`
or `]]>` will no longer sanitize.
Rather than mucking with `<style>` tag content in all cases, this is a more
tailored fix to the recent vulnerability that just closes `<style>` elements
when we realize they're in a dodgy parsing context.
As described in issue OWASP#254 `&para` is a full complete character
reference when decoding text node content, but not when
decoding attribute content which causes problems for URL attribute
values like

    /test?param1=foo&param2=bar

As shown via JS test code in that issue, a small set of
next characters prevent a character reference name match
from being considered complete.

This commit:
- modifies the decode functions to take an extra parameter
  `boolean inAttribute`, and modifies the Trie traversal
  loops to not store a longest match so far based on that
  parameter and some next character tests
- modifies the HTML lexer to pass that attribute appropriately
- for backwards compat, leaves the old APIs in place but `@deprecated`
- adds unit tests for the decode functions
- adds a unit test for the specific input from the issue

This change should make us more conformant with observed
browser behaviour so is not expected to cause compatibility
problems for existing users.

Fixes OWASP#254
…P#266)

CssTokens code assumed that consumeIdentOrUrlOrFunctions always
returned a token type and consumed characters.

This commit audits all uses of that function and checks that
they make progress.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants