Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3 failing test examples that previously worked with AntiSamy 1.7.3. #388

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

kwwall
Copy link
Contributor

@kwwall kwwall commented Nov 11, 2023

Run 'mvn test' and observe the 3 failing tests. The question is not how to fix this, but is there an explanation of why this is the case and can you perhaps conjecture as to what sort of tainted input would lead to this.

See GitHub issue #389 for details regarding expectations.

…rent for some tainted input strings than for previous version of AntiSamy 1.7.3.
@davewichers
Copy link
Collaborator

@spassarop - Are we going to do anything about this? Or did you already, like add notes to the README or something?

@spassarop
Copy link
Collaborator

I did: #389 (comment)

The parser keeps changing and we are dependent on their updates :(

@rbri
Copy link
Contributor

rbri commented Jan 16, 2024

@kwwall had a look at your test cases and made another fix for neko. The tag name parsing is now much closer to the spec/real browsers and your test cases are now test cases for neko also.

see HtmlUnit/htmlunit-neko@55053e4

additionally a new neko 3.11.0-SNAPSHOT is available if you like to test.

@davewichers
Copy link
Collaborator

@spassarop - I tested the new 3.11.0-SNAPSHOT and AntiSamy fails 3 tests like so:

[ERROR] Errors:
[ERROR] AntiSamyTest.issue37:1002 » Scan org.w3c.dom.DOMException: INVALID_CHARACTER_ERR: An invalid or illegal XML character is specified.
[ERROR] AntiSamyTest.scriptAttacks:154 » Scan org.w3c.dom.DOMException: INVALID_CHARACTER_ERR: An invalid or illegal XML character is specified.
[ERROR] AntiSamyTest.testSmuggledTagsInStyleContentCase1:2526 » Scan org.w3c.dom.DOMException: INVALID_CHARACTER_ERR: An invalid or illegal XML character is specified.
[ERROR] Tests run: 108, Failures: 0, Errors: 3, Skipped: 0

Can you determine if our test cases needed to be changed to pass somehow? Or is a fix needed in the updates to NekoHtml?

@rbri - Maybe you want to look into this too.

@rbri
Copy link
Contributor

rbri commented Jan 18, 2024

ok, can reproduce this but i have to think about.....

@spassarop
Copy link
Collaborator

spassarop commented Jan 20, 2024 via email

@rbri
Copy link
Contributor

rbri commented Jan 21, 2024

SNAPSHOT is updated again

Current status

  • have added unit tests for Dom/Fragment/Sax parser - so far we had no unit test for the pasers; hope this will make side effects of changes more visible
    image
  • major change - the scanner now supports many more chars as valid for tag names - this is in sync with the spec and the current browsers
    • this change was also the root of the exception in the dom parser - this is fixed (and unit tested)
  • some more fixes for bugs found on the way (https://github.com/HtmlUnit/htmlunit/blob/master/src/changes/changes.xml)

@rbri
Copy link
Contributor

rbri commented Jan 21, 2024

@kwwall - your tests are still failing but i used your test case to improve the tag name scanning/parsing
It will be great if you can have a look at this - i have no real idea if the cleanup done on top of the dom tree are now correct.

testAntiSamyRegressionCDATAWithJavascriptURL

<style/>b<![cdata[</style><a href=javascript:alert(1)>test

Now you got

b&lt;![cdata[test

the parsed dom tree is now like in real browsers
image
I guess this is related to the major improvements done for the last version to handle script/comment/cdata sections in sync with the browsers/spec.

testOnfocusAfterStyleClosing

<select<style/>k<input<</>input/onfocus=alert(1)>

Now you got

kinput/onfocus=alert(1)&gt;

the parsed dom tree is now like in real browsers
image
The changed output here is a result of the changed tag name parsing - like in real browsers 'select<style' and 'input<<' are now parsed as valid tag names.

testScriptTagAfterStyleClosing

<select<style/>W<xmp<script>alert(1)</script>

Now you got

Walert(1)

the parsed dom tree is now like in real browsers
image
The changed output here is a result of the changed tag name parsing - like in real browsers 'select<style' and 'xmp<script' are now parsed as valid tag names.

@rbri
Copy link
Contributor

rbri commented Jan 21, 2024

The parser keeps changing and we are dependent on their updates :(

At least from my site this is also the plan for the future - there is still a lot to do to be more and more in sync with the browser parsers.

@kwwall
Copy link
Contributor Author

kwwall commented Jan 21, 2024

@rbri - I looked at this and I'm fine with it as it is. I've already noted in our ESAPI release notes that sanitized results may be different than they previously work. As long as it sanitizes safely, it's all good and it does that. I probably will rewrite these 3 JUnit tests to just confirm that the dangerous portions have been properly cleansed. I explained my motivation for noting noting in issue #389 in #389 (comment).

@davewichers
Copy link
Collaborator

@kwwall @spassarop - what is the status of this PR? Close it, work still to be done, ???

@kwwall
Copy link
Contributor Author

kwwall commented Jun 18, 2024

@davewichers - Let me retest with the latest ESAPI release and get back to you. But I think it had been fixed and just didn't get closed. Will let you know by noon tomorrow.

@kwwall
Copy link
Contributor Author

kwwall commented Jun 19, 2024

@davewichers and @spassarop - Okay, I looked at this, and from ESAPI's perspective, we just adjusted our JUnit tests in ESAPI's at:

https://github.com/ESAPI/esapi-java-legacy/blob/develop/src/test/java/org/owasp/esapi/reference/validation/HTMLValidationRuleCleanTest.java

to agree with whatever 1.7.5 was producing. So in that specific sense, I think you can close this PR with a suitable comment. (Although, I guess I thought you may want to incorporate these test cases into your own AntiSamy JUnit regression tests. Maybe @spassarop has already done that; I haven't bothered to check.)

However, I also think this points to a larger problem and that is how do we test sanitized output? The way that I generally try to do so is that I run the tainted output through AntiSamy's DOM parser and look at the output. Something like

   System.out.println( (new AntiSamy).scan(tainted, esapi_policy, AntiSamy.DOM).getCleanHTML() );

and look at the output and then use it to write a JUnit test. When possible, I try to write the JUnit test to see if the tainted payload is still present, but one can't always do that (e.g., looking for the absence of onfocus=alert(1) in the one example would be the obvious thing to do, but that string was never removed since we've added that specific test), so in those cases, I just eyeball the cleansed output (and maybe pop it into a browser to test it as well), and if it's safe, I just use the entire cleansed string as the expected value to compare it against. (That is in fact how we discovered those 3 tests that changed that we're discussing here.)

I really don't know what to do for such cases though. And I expect that this affects a lot more than just ESAPI and its Validator.getValidSafeHTML() methods. Certainly (well, hopefully) there are others who have similar, but different regression tests. You really need that if you've done any customization at all of the AntiSamy policy file, which I expect to be somewhat common.

But given that the the cleansed AntiSamy output for any specific input is not constant (and I'm not debating that there are sensible reasons for that), it seems that this problem will be popping up now and again. In fact, over the history of AntiSamy, it's probably popped up a lot more than we've observed because even between AntiSamy and ESAPI teams, we likely are only testing some of the obvious edge cases.

I'm not sure what to do about this larger problem though. This GitHub PR and the associated issue doesn't really do that larger problem justice. I guess you can add a blurb to your README that this just should be expected now and again on an ongoing basis, and ask people report specific edge cases that they encounter when it changes between AntiSamy releases so you can note it in your release notes, but more importantly, use it to improve your regression test cases. But beyond that, I have no suggestions. Maybe you want to consider that (noting it in the README) before closing this GitHub PR. I don't know. It's your call. I only reported this because I figured that you'd want to know, but maybe you don't care and only want things reported when they possibly could result in a vulnerability,

@davewichers
Copy link
Collaborator

@spassarop - do you want to propose an update to the README and then we can merge that and close this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants