-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
handle invalid accept-charset in requests - default to utf-8 #584
Merged
Merged
Changes from 2 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
38cceb3
add tests for accept-charset
pjfanning 479dfa1
default to utf-8 if charset is invalid
pjfanning 3ccd053
xml tests (1 broken still)
pjfanning c8514e7
add charsetWithUtf8Failover function
pjfanning 1ed3fd7
Update MarshallingSpec.scala
pjfanning 6c4aaf7
scala 2.12 compile issue
pjfanning File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize this is a comment on the current behavior, and not on your change, but this seems strange to me: it looks like the default string marshaller will return the string passed to it as
text/plain
claiming it is in whatever charset that was requested by the user, without doing anything to make sure the string is actually in that charset.This is only correct if you assume the implementation correctly took into account the charset, which seems extremely unlikely. I wonder if it wouldn't make more sense to leave out the optional
charset
parameter to the return content type in this case. If people have specific requirements (which seems somewhat unlikely) they can specify the charset explicitly.This would be a behavior change, but it seems like a reasonable one if we note it in the release notes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before the change to the main source code in this PR, this test actually fails - the response is an error response without my main source change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
Thanks for having a look at this bug / feature request. I recently sent in a duplicate issue (#583) unaware of this one. I worked on a similar PR the past two days, and have now found and looked through your changes - that would seem to fix it nicely! Great.
For your consideration, I'd like to suggest adding in an extra unit-test in "MarshallingSpec.scala" with something along the lines of:
Some refactoring thoughts; Perhaps the try-catch in "PredefinedToEntityMarshallers.scala" could be more subtly placed in "HttpCharset.scala" as:
and then have the "PredefinedToEntityMarshallers.scala" do:
Looking forward to seeing your changes merged.
Thanks again and best regards,
Michael
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @michaelf-crwd - I have added a new HttpCharsets function similar to what you suggested (c8514e7).
I will look later at adding extra tests like the one you suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@raboof would you be able to take a look at this again? I don't think this change breaks anything and makes the pekko-http support for
Accept-Charset
a bit more tolerant.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@raboof, WDYM with that? Java strings are basically UTF-16 encoded, so they are known to be unicode. What else could go wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what I meant is, when the route does
complete(someUnicodeString)
, and the marshalling logic results in aContent-Type: text/plain; charset=ISO-8859-1
header, that response seems wrong: the header promises the body will be in ISO-8859-1, but the body is actually in unicode. I think it would make more sense to make sure we produce aContent-Type: text/plain; charset=UTF-8
header unless the developer explicitly marked the response as being in a different charset.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or will the marshalling logic actually convert the unicode input into bytes using the charset from the content type? in that case of course it's all good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed it does, sorry about the noise 🤦
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍