Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

httparse should skip invalid headers #61

Closed
Coder-256 opened this issue Sep 10, 2019 · 11 comments
Closed

httparse should skip invalid headers #61

Coder-256 opened this issue Sep 10, 2019 · 11 comments

Comments

@Coder-256
Copy link

I am trying to use reqwest to parse a response from a server I don't control. reqwest uses hyper which uses httparse for parsing HTTP/1.x headers. Anyway, this server has a weird bug where it consistently returns a single corrupted header line in an otherwise completely valid response (the header contains unescaped non-token characters). Specifically, for some reason it tries to send the DOCTYPE as a header. The bug is unlikely to be fixed (this is old software), but it isn't really a problem because the page displays fine in all major browsers.

It seems that all major browsers simply ignore invalid header lines. However, httparse returns an error that aborts the entire parsing process. IMO this is a problem and should be fixed.

Here's a screenshot from Chrome that shows the invalid header being ignored:

ignored

original

In fact, Chrome's behavior is commented as: "skip malformed header".

Although technically changing this could be breaking, in this case, I can't imagine that any code would rely on response parsing to fail in this particular case.

Here are the relevant lines:

httparse/src/lib.rs

Lines 594 to 595 in 6f696f5

} else if !is_header_name_token(b) {
return Err(Error::HeaderName);

httparse/src/lib.rs

Lines 613 to 614 in 6f696f5

} else if !is_header_name_token(b) {
return Err(Error::HeaderName);

expect!(bytes.next() == b'\n' => Err(Error::HeaderValue));

httparse/src/lib.rs

Lines 613 to 614 in 6f696f5

} else if !is_header_name_token(b) {
return Err(Error::HeaderName);

I think all of these would be fixed by consuming b until the next newline then continue 'headers. I would open a PR but I just want to check that you agree that this change should be made.

@seanmonstar
Copy link
Owner

Hm, I'm surprised at the results, but then again, I shouldn't be, the HTTP1 spec has been ignored at every turn :D

Curious, do you know what other "libraries" do, like curl or golang?

@Coder-256
Copy link
Author

Coder-256 commented Sep 10, 2019

I haven't tested this, but it looks like golang errors on headers missing a colon but otherwise just parses it as a valid header

I also have confirmed that curl actually parses it as a valid header:

$ curl "$URL" -s -o /dev/null -D -
HTTP/1.1 200 OK
Date: Tue, 10 Sep 2019 18:50:02 GMT
Server: Apache/2.4.xx (Debian)
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http: //www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Transfer-Encoding: chunked

I took a quick look but its code is very dense. The header parsing logic is around Curl_http_readwrite_headers if you want to look further. It should be noted that curl doesn't actually do anything with headers except for important ones (ex. Transfer-Encoding). In fact AFAIK the most it will do in terms of parsing arbitrary headers is give you the entire message header with -D like above.

It was a good idea to check these libraries, I didn't realize that they would handle things differently. I guess there are 3 behaviors to choose from:

  • Ignore invalid headers (like Chrome, Firefox, etc.)
  • Pass on invalid headers (like curl and golang)
  • Error on invalid headers (like httparse)

I'll need to think a bit more about which would be better. What do you think?

Edit: After thinking about it a bit more, I think that especially since this project is used by apps like servo, it should definitely mimic Chrome/Firefox. This is a real example of a page that wouldn't load even though it works in major browsers.

@Coder-256
Copy link
Author

Coder-256 commented Oct 9, 2019

@seanmonstar Have you had a chance to look into this? I'll start working on a pull request, I just wanted to check if you agree that this is the correct behavior.

@seanmonstar
Copy link
Owner

Oh my, I completely forgot about this issue, sorry!

I suppose it makes sense to do what browsers do. I have a nagging worry that if ignore invalid headers, is there a point where we're parsing things that really aren't HTTP and saying it is?

@Coder-256
Copy link
Author

Well if the status line is intact, including HTTP version and all, I think it's definitely supposed to be HTTP. The headers are only parsed if the status line is valid.

I'll start making the PR.

@nox
Copy link
Contributor

nox commented Apr 3, 2021

FWIW this behaviour was completely killed from Squid last year.

squid-cache/squid@b453677

@nox
Copy link
Contributor

nox commented Apr 4, 2022

@Coder-256 Did you ever make a patch for this?

Found another website affected by this.

curl -I https://dbegoals.azdot.gov/
00000000  58 e2 80 90 46 72 61 6d  65 e2 80 90 4f 70 74 69  |X...Frame...Opti|
00000010  6f 6e 73 3a 20 44 45 4e  59                       |ons: DENY|

The hyphens aren't ASCII in the first X-Frame-Options entry.

@Coder-256
Copy link
Author

@nox I completely forgot! Also incredible catch, U+2010 strikes again. Anyway, after looking at this further, I am convinced that simply skipping invalid headers is the right option. In this case, Chromium and Firefox both ignore the invalid header while it looks like Safari actually treats the header name as a valid Latin 1-encoded header for whatever reason; none of them fail parsing entirely, as httparse does. I'll open a PR.

@nox
Copy link
Contributor

nox commented Apr 22, 2022

#114 will let you ignore the invalid header.

@Coder-256
Copy link
Author

@nox Thank you very much, lgtm! Sorry I didn’t get to this sooner.

@nox
Copy link
Contributor

nox commented Apr 25, 2022

@Coder-256 No worries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants