Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v12.x] http: fix monkey-patching of http_parser #30222

Closed
wants to merge 1 commit into from

Conversation

Jimbly
Copy link
Contributor

@Jimbly Jimbly commented Nov 2, 2019

Fixes inability to monkey-patch the HTTP Parser on Node v12.x. This is not an issue on Node v13+ since the parser was renamed back to the old (already monkey-patchable) http_parser name, so I made this PR to the v12.x branch, but I'm unclear as to whether or not that's the right move. However the test in master could also use updating so this doesn't break again (I'll send a separate PR for that).

Fixes: creationix/http-parser-js#65

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines

Fixes inability to monkey-patch the HTTP Parser on Node v12.x.
This is not an issue on Node v13+ since the parser was renamed
back to the old (already monkey-patchable) `http_parser` name,
however the test in master could also use updating.

Fixes: creationix/http-parser-js#65
@addaleax addaleax changed the title http: fix monkey-patching of http_parser [v12.x] http: fix monkey-patching of http_parser Nov 2, 2019
Copy link
Member

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this okay, but understand that this kind of monkey-patching is generally not supported or encouraged…

@Jimbly
Copy link
Contributor Author

Jimbly commented Nov 2, 2019

Understood! However, since Node's HTTP parser historically throws process-killing exceptions, or at least fails to parse, when encountering slightly malformed HTTP, it's pretty much a requirement if you want to do any website scraping or indexing using Node (and still take advantage of the robust NPM ecosystem that obviously calls directly into http and such). No one expects Node's internal modules to follow any kind of stable API, because that'd be unreasonable to support, but I think as long as the http parser is a stand-alone module, it should be replaceable.

@addaleax
Copy link
Member

addaleax commented Nov 2, 2019

@Jimbly Right – but in that case it would be much nicer to know what the parser fails to parse, and figure out a way to make that usable through the supported API.

@Jimbly
Copy link
Contributor Author

Jimbly commented Nov 2, 2019

There's lots of stuff that the current parser fails to parse, but it fails to parse it because it is technically invalid HTTP. For security and stability reasons, I think it's probably important that Node's HTTP parser only accepts actually valid HTTP (at least when parsing requests - people almost entirely leverage http-parser-js when dealing with invalid responses) - alternative is to add a lenient mode to it, but having a fast valid parser as the core is way more important, especially if a non-spec-conforming, lenient parser can live in user-land.

@addaleax
Copy link
Member

addaleax commented Nov 2, 2019

@Jimbly Yeah, that lenient mode would probably be a good idea.

Or to put it another way: Monkey-patching is simply not a viable long-term solution for this problem. This PR is a short-term workaround, and code will be broken when people hook into private, internal APIs.

@BridgeAR
Copy link
Member

BridgeAR commented Nov 2, 2019

@Jimbly I agree with @addaleax that adding a lenient mode would probably be best. It would be nice to just switch to that parser in case it's required for individual http calls. That should make it pretty straight forward for everyone who has to deal with broken http code.

@Jimbly
Copy link
Contributor Author

Jimbly commented Nov 2, 2019

code will be broken when people hook into private, internal APIs

Of course! http-parser-js has broken with every major node release in the past almost a decade, but that's to be expected. Luckily the HTTP parser has a super-narrow API, so it' never been more than an hour or so of work (plus, occasionally, many hours dealing with making PRs to Node ; ).

If someone wants to add a lenient mode to the parser in core node, I can happily gather up which of the tests fail on Node (basically anything added to that tree without a commit of "update for node v4", etc). But I certainly don't have the bandwidth to tackle adding it myself, nor would I want to go anywhere near trying to add complexity to code presumably in Node's critical performance path.

@addaleax
Copy link
Member

addaleax commented Nov 2, 2019

If someone wants to add a lenient mode to the parser in core node, I can happily gather up which of the tests fail on Node (basically anything added to that tree without a commit of "update for node v4", etc).

I think that would be a huge help, if you’re willing to do that.

But I certainly don't have the bandwidth to tackle adding it myself, nor would I want to go anywhere near trying to add complexity to code presumably in Node's critical performance path.

Yeah, that’s fine :)

@addaleax addaleax added the http_parser Issues and PRs related to the HTTP Parser dependency or the http_parser binding. label Nov 2, 2019
@Jimbly
Copy link
Contributor Author

Jimbly commented Nov 2, 2019

@addaleax Here ya go!: https://github.com/creationix/http-parser-js/blob/master/tests/README.md

Seems the new Node HTTP parser gracefully handles a lot more things. At least, none of the tests are crashing the Node process anymore, and a few of the old tests now mostly work (bad headers turning into empty strings in the headers list, but I'm sure that's fine). Only two real issues (at least, that we have tests for - but usually we try to add a test if a user reports something) - dealing with ambiguous chunked encoding + content-length and users opting in to allowing headers larger than the old 80KB limit (now 8KB limit in Node v12+, it seems).

@addaleax
Copy link
Member

addaleax commented Nov 2, 2019

Thank you! I’m not sure, but @indutny – would you be willing/able to help with some of the changes that would have to go into llhttp for a lenient mode? I’d probably be able to do that, but you already know your way around that codebase. (There’s at least one TODO commment in http.ts that indicates that a lenient mode might be a reasonable idea for the parser.)

@bdurrani
Copy link

bdurrani commented Nov 3, 2019

I dont mean to hijack this thread, but another case @Jimbly 's parser handles is http responses where the content-length header is duplicated, with the same value. I didn't see this case mentioned explicitly in the test notes above, so I thought I would mention it.

Here is what the curl output looks like for such a case

curl http://localhost:3000 -v
* Rebuilt URL to: http://localhost:3000/
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 3000 (#0)
> GET / HTTP/1.1
> Host: localhost:3000
> User-Agent: curl/7.58.0
> Accept: */*
>
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Content-Type: text/html
< Content-Length: 27
< Content-Length: 27
<
* Closing connection 0

Browsers seem to handle this fine. Node's http parser crashes with a code: HPE_UNEXPECTED_CONTENT_LENGTH and with the (correct) reason: 'Duplicate Content-Length'

I believe this case should be reasonable with the actual content length values are correct.

A lenient mode would be greatly appreciated. I would love to help with this effort, if that is ok. I can imagine this mode is something I would enable on a case-by-case basis, not enabled the whole time.

@Jimbly
Copy link
Contributor Author

Jimbly commented Nov 3, 2019

@bdurrani You're not hijacking at all, at least, no more than it already was, but that's mostly my fault =). I forgot about that case! I've added it to the http-parser-js test suite, and added it to the failing tests list. Someone should probably go back through the closed issues and see what else was fixed without ever being added to the tests =). Just note though, HPE_UNEXPECTED_CONTENT_LENGTH is from the legacy parser (Node v11 and lower, I think), however the new parser also errors on this, and just gives a generic "Parse Error" in this case.

@indutny
Copy link
Member

indutny commented Nov 3, 2019

@addaleax I'll be happy to help you. Feel free to send me questions over email, IRC, or on a github issue.

@nodejs-github-bot
Copy link
Collaborator

MylesBorins pushed a commit that referenced this pull request Jan 8, 2020
Fixes inability to monkey-patch the HTTP Parser on Node v12.x.
This is not an issue on Node v13+ since the parser was renamed
back to the old (already monkey-patchable) `http_parser` name,
however the test in master could also use updating.

Fixes: creationix/http-parser-js#65

PR-URL: #30222
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Michaël Zasso <[email protected]>
@MylesBorins
Copy link
Contributor

landed in 54635f5

@MylesBorins MylesBorins closed this Jan 8, 2020
BethGriggs pushed a commit that referenced this pull request Feb 6, 2020
Fixes inability to monkey-patch the HTTP Parser on Node v12.x.
This is not an issue on Node v13+ since the parser was renamed
back to the old (already monkey-patchable) `http_parser` name,
however the test in master could also use updating.

Fixes: creationix/http-parser-js#65

PR-URL: #30222
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Michaël Zasso <[email protected]>
@MylesBorins MylesBorins mentioned this pull request Feb 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
http_parser Issues and PRs related to the HTTP Parser dependency or the http_parser binding.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants