Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise http parsing #30

Closed
wants to merge 5 commits into from
Closed

Conversation

leppert
Copy link
Contributor

@leppert leppert commented Feb 22, 2023

This PR addresses #23 using a method intimated at by one of the llhttp authors in this issue to avoid relying too heavily on the node internals. We could, instead, use the parser directly, either by importing it via _http_common or by using the WASM bindings but this seemed like the method with the least implementation and maintenance overhead.

If we were to import the parser directly, http-parser-js provides example code that is nearly compatible with llhttp, though the event callbacks now have slightly different parameters (for example kOnHeadersComplete now receives all of its properties as individual arguments rather than a single object with properties). A more definitive implementation can be found at _http_server (for requests) and _http_client (for responses). These pull from a parsers pool that we might also tap into. For the sake of completeness, the last (and worst) option is to include llhttp using the internal bindings, like so:

// this script must be run using node's --expose-internals flag
import binding from 'internal/test/binding'
const { HTTPParser } = binding.internalBinding('http_parser')

In implementing this PR, I've chosen to make session.request and session.response return promises (example) since the parsing method itself is asynchronous, though I'm on the fence about this and we could easily await this only within the proxy such that those properties have already been assigned by the time the inject functions are called.

Another implementation note is the use of "mirror" sockets that subscribe to and re-emit data events from their parent (_src|dst) sockets. I had initially thought I could simply feed in the parent sockets to node's http.createServer and http.request functions but doing so appears to hijack those sockets and causes the proxy to hang.

One challenge in all of this (and something I suspect is an issue in the library today) is accounting for chunked headers. Using node's parsers in the way implemented here means the parsed headers are only available once the parser triggers kOnHeadersComplete but the spec allows headers to come in as chunks and you can see that node buffers them in the kOnHeaders handler. This means that transparent-proxy could fail or hang here when partial headers are sent.

Another side effect of this PR is that session.request|response no longer concatenate their bodys. The idea here is that the user would instead subscribe to data events (ex: session.request.on('data', () =>) as one would do with node's http.request or http.server. The data is also, of course, available via transparent-proxy's inject hooks. I think there's benefit in the familiarity of using node's existing style for this but understand if the concatenated body ends up being important; we could always add it as a property to the IncomingMessage object that's used to represent node's parsed requests and responses. That said, I'm unsure what access to a partially concatenated body gets the user.

All that said, feel free to consider this a first pass. Feedback welcomed.

Still need to account for chunked responses. Also need to understand
why `setResponse` is only working when the `request` is instantiated in
`setRequestSocket`, necessitating the ugly need for
`this._setResponseResolver`. It seems like this should work when
instantiated in `setResponse` where it can call the promise's `resolve()`
directly, but it does not.
@gr3p1p3
Copy link
Owner

gr3p1p3 commented Feb 24, 2023

I checked this out and I tried a "npm test". It didn't work. What do I have to do for making this work?

@leppert
Copy link
Contributor Author

leppert commented Feb 25, 2023

Sorry, I should have checked the tests before submitting. It was failing due to what you addressed here: db1ad52 i.e. there was no mechanism to prevent parsing attempts on encrypted messages.

I'd be happy to rebase against that, but it looks like you might be taking a different route on the parsing based on recent commits. Let me know if it's worth doing and I'll ping you once updated.

@leppert
Copy link
Contributor Author

leppert commented Feb 25, 2023

@gr3p1p3 I went ahead and pushed a version of checking for CONNECT requests and predicating the parsing on either https or a CONNECT request. Tests look like they pass but I'll give it another look in the morning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants