Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify behaviour of ServletResponse.setCharacterEncoding(null) #377

Closed
markt-asf opened this issue Nov 27, 2020 · 6 comments
Closed

Clarify behaviour of ServletResponse.setCharacterEncoding(null) #377

markt-asf opened this issue Nov 27, 2020 · 6 comments

Comments

@markt-asf
Copy link
Contributor

The Javadoc indicates that only IANA character set names are allowed but it does not define the expected behaviour for any values that are not allowed - including null.

Generally, I'd like to see UnsupportedEncodingException but that would not be a backwards compatible change so I don't see how we could do that.

The question for null arose from Tomcat bug 64938. I can see potential use cases for allowing null - even if there are alternative solutions for some (all?) of those use cases.

I propose we modify the Javadoc for ServletResponse.setCharacterEncoding(null) along the following lines:

  • Calling with null means behave as if no character encoding has been explicitly set
  • Invalid values will be silently ignored
  • The container should generate an appropriate log message if an invalid value is ignored
@stuartwdouglas
Copy link
Contributor

IMHO:

  • null should clear the charset (same as your suggestion)
  • invalid values should still be set as normal
  • invalid values will return in an error when attempting to construct a Writer (but not an OutputStream)
  • We should not be logging anything in this case. If you have an app where the user can supply the charset then an attacker could send a lot of requests that would result in an invalid charset, which would flood the logs causing excessive IO and potential disk space exhaustion.

My thinking for allowing the invalid values is that you may have old/broken/weird clients that need a charset that is not recognised by the JVM. You should still be able to send it, but you would need to do the encoding manually and write it to the stream rather than using the writer.

@markt-asf
Copy link
Contributor Author

I think we can declare consensus for null.

Invalid I'm less sure about. The Javadoc requires an IANA registered character set although I'm fairly sure Java doesn't implement all of them so the user could still provide a valid but unrecognised value. Tomcat 7.0.x behaves the way you describe. Tomcat 8.5.x behaves in line with my proposal. We have had releases with that variation for ~5 years now and I don't recall any problems. I can see the flexibility of your approach but I like the additional validation of only allowing valid character sets. I want both :). More thought required.

As an aside, I'm not concerned about the security implications you mentioned as apps should be validating any client provided data before using it.

@gregw
Copy link
Contributor

gregw commented Dec 6, 2020

I think we need to be a little more explicit with regards to null. It should clear even non explicitly set charsets, eg a charset from a locale. eg:

    response.setLocale(java.util.Locale.JAPAN);
    response.setCharacterEncoding(null);

will result in no charset being set. Ditto for charsets obtained from setting the content type as in:

    response.setContentType("text/plain;charset=utf16");
    response.setCharacterEncoding(null);

Also, what is the behaviour with regards to a response default character encoding set in the web.xml? Does null revert to that default or does it make it as if there is no default their either?

Currently in Jetty we have the following enum:

    private enum EncodingFrom
    {
        NOT_SET, 
        INFERRED,   // eg utf-8 for json mimetype
        SET_LOCALE, 
        SET_CONTENT_TYPE, 
        SET_CHARACTER_ENCODING
    }

Currently we treat a setCharacterEncoding(null) as a reversion to NOT_SET thus when a writer it obtained we will first try to infer a charset and if that does not find one, we use the context default.

I think that is reasonable behaviour, but it could be that setCharacterEncoding(null) could required a SET_NULL value in the enum, which would force no character encoding and an exception if getWriter is called. I prefer the former behaviour.

I also agree that setCharacterEncoding should not throw, but we can throw from getWriter if we don't know the charset

@markt-asf
Copy link
Contributor Author

I think I have enough to work on some updated text. The summary is:

  • calling with null clears any value set explictly or implcitly via a call to a method of this class but does not
    affect any default value set via the ServletContext nor the deployment descriptor
  • calling with an invalid value has no effect until getWriter() is called at which point an UnsupportedEncodingException exception is thrown
  • containers may choose to log invalid character sets passed to this method
    If there is consensus on the above summary, I'll work on some updated text.

@gregw
Copy link
Contributor

gregw commented Dec 10, 2020

@markt-asf sounds good and maybe check setLocale(null) at the same time to let it use the same verbage!

@stuartwdouglas
Copy link
Contributor

I am happy with these changes, they match what Undertow currently does (with the exception of logging).

markt-asf added a commit to markt-asf/servlet-api that referenced this issue Jan 18, 2021
markt-asf added a commit to markt-asf/servlet-api that referenced this issue Jan 18, 2021
stuartwdouglas added a commit that referenced this issue Feb 4, 2021
Fix #377 - clarify setCharacterEncoding(null) and related
markt-asf added a commit to markt-asf/jakartaee-tck that referenced this issue Dec 3, 2021
markt-asf added a commit to markt-asf/jakartaee-tck that referenced this issue Jan 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants