Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding updates #234

Closed
wants to merge 3 commits into from
Closed

Conversation

Connorhd
Copy link
Contributor

Should resolve #224

Adds an encoding option and sets the body encoding based on this order of priorities:

  1. encoding option
  2. charset in content-type
  3. UTF-8 if mime-type starts with 'text/' (I believe the spec says ISO-8859-1, but I think UTF-8 is more commonly assumed)
  4. Binary

@headers = HTTP::Headers.coerce(headers || {})
@encoding = encoding

if connection_or_body.is_a? HTTP::Connection
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This didn't seem great, but I couldn't see a cleaner way to do it nicely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the above, moving this to a hash means just setting it based on key rather than having to do content type detection.

@zanker
Copy link
Contributor

zanker commented May 31, 2015

This seems fine. But assumption 3 seems risky, but then again the entire HTTP spec is a mess. Someone else want to weigh in?

@ixti
Copy link
Member

ixti commented May 31, 2015

Although HTTP spec is a mess, if it has declared defaults we should stick
with it. So i would like to see #3 being configurable with default value
set to what RFC says (if it does).
On 1 Jun 2015 01:27, "Zachary Anker" [email protected] wrote:

This seems fine. But assumption 3 seems risky, but then again the entire
HTTP spec is a mess. Someone else want to weigh in?


Reply to this email directly or view it on GitHub
#234 (comment).

@Connorhd
Copy link
Contributor Author

I think http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1 is the relevant RFC, from what I can google I'm not sure anyone actually honors this though. I think most clients do some kind of character detection (which could also be added to http.rb, but I wanted to avoid it in this change if possible).

@zanker
Copy link
Contributor

zanker commented Jun 1, 2015

Looking at Net::HTTP, it appears they just read directly to the string via socket.read unless there's compression then it forces ASCII 8BIT.

@Connorhd
Copy link
Contributor Author

Connorhd commented Jun 1, 2015

Well assuming its binary seems like a good last resort, but I think we can do a bit better. At least step 1 and 2 above seem pretty solid to me.

I could just remove 3 if that makes everyone more comfortable, or I'm happy to make it use ISO-8859-1 if thats preferred.

@tarcieri
Copy link
Member

tarcieri commented Jun 1, 2015

I'd be fine with #1 and #2. Hopefully good UTF-8 citizens declare they're using such.

Maybe leave #3 out for now and we can see how it works in practice and follow up on it if it seems necessary?

@Connorhd
Copy link
Contributor Author

Connorhd commented Jun 1, 2015

Ok, I've removed 3 for now, I think its a good point that utf-8 things are probably more likely to declare their encoding anyway.

@headers = HTTP::Headers.coerce(opts.fetch(:headers, {}))
@encoding = opts.fetch(:encoding, nil)

if opts.include?(:connection)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this magic. Response should receive only :body. And this Body streamer should be left where it is now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have done that but to set the encoding in the body I need to have the response object. Do you have a better suggestion for how to organize that?

@ixti
Copy link
Member

ixti commented Jun 1, 2015

LGTM (with fixing response class)

@ixti ixti modified the milestones: v0.9, v1.0 Jul 22, 2015
@ixti ixti modified the milestones: v1.0, v2.0.0 Nov 1, 2015
@tarcieri
Copy link
Member

@Connorhd looks like this needs a rebase. Want to finish it up before we release 1.0?

@ixti
Copy link
Member

ixti commented Dec 13, 2015

I'm working on this already :D

@ixti
Copy link
Member

ixti commented Dec 13, 2015

In other words this will be in 1.0 for sure :D

@ixti ixti mentioned this pull request Dec 13, 2015
@ixti
Copy link
Member

ixti commented Dec 13, 2015

After thinking a bit more, all further cleanups can be made in future patch releases, so I'm gonna merge #273 (this rebased PR) as is.

@ixti ixti closed this in #273 Dec 13, 2015
@ixti ixti removed this from the v1.1 milestone Mar 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Determine string encoding based on Content-Type
4 participants