Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming request body parsing #41

Closed
wants to merge 66 commits into from
Closed

Conversation

WyriHaximus
Copy link
Member

@WyriHaximus WyriHaximus commented Oct 1, 2015

This PR is the follow up for #13. It started out to make multipart streaming but ended up making all bodies streaming.

The parsers emit a post event with the key and value of a post variable and file on uploaded files found in the request. On the request object getFiles is gone due to the streaming nature of the parsers. getPost is still there but it won't have everything until the entire request has been parsed.

Todo:

  • Normal body streaming
  • Multipart body streaming
  • Form URL Encoded body streaming
  • File object for files in the request
  • Make sure all parsers behave the same
  • Make the form parsers optional
  • Add buffered sink like helpers for request post fields and files
  • Update readme with an example

@WyriHaximus WyriHaximus mentioned this pull request Oct 1, 2015
@nazar-pc
Copy link

nazar-pc commented Oct 1, 2015

I saw gPsr, are request/response objects PSR-7 compatible now?

@WyriHaximus
Copy link
Member Author

Not yet, those are also in the works though. But this is a step in that direction. The gPsr you're referring to is only used to parse request headers.

@clue
Copy link
Member

clue commented Mar 19, 2016

The parsers emit a post event with the key and value of a post variable and file on uploaded files found in the request.

The HTTP message body can potentially have any size (even single fields can be huge), do we really want to store this automatically?

As an alternative, nodejs only exposes the body stream and leaves it up to the consumer to pass this stream to the correct parser (for example http://stackoverflow.com/questions/4295782/how-do-you-extract-post-data-in-node-js).

@WyriHaximus
Copy link
Member Author

The HTTP message body can potentially have any size (even single fields can be huge), do we really want to store this automatically?

No I want to go nodes way where a string or stream is emitted? Those can just be gathered using the buffered sink. Not sure yet about the stream part. It would result in less overhead for small fields but more code on the implementing side.

@clue
Copy link
Member

clue commented Mar 19, 2016

My personal vote here would be small, independent, composable parts instead of convenience built in.

Composable parts enable convenience on a higher level, such as this draft API:

$http->on('request', function (Request $request, Response $response) use ($formParser) {
    $formParser->parseDeferredStream($request)->then(function ($fields) use ($response) {
       $response->end('hello ' . $field['name']);
   }, function ($error) use ($response) {
       $response->writeHead(400);
   });
});

Once we look into PSR-7 support, we could probably build a convenient middleware around this concept in order to make this available to each request handler.

@WyriHaximus
Copy link
Member Author

That looks good, I'm assuming that is just listening to form events emitted from $request. Or are you suggesting that $request just emits data and the form parser listens into that handling the parsing. Do like that even more 👍 . Should be fairly easy to do as well, I'll make sure both parsers behave the before refactoring into that

@WyriHaximus WyriHaximus mentioned this pull request Mar 20, 2016
@clue
Copy link
Member

clue commented Mar 20, 2016

suggesting that $request just emits data and the form parser listens into that handling the parsing. Do like that even more 👍

This exactly 👍

Should be fairly easy to do as well […]

Yeah, I too suppose this should be easier than auto-wiring all parsers 👍

@WyriHaximus
Copy link
Member Author

Thanks for clarifying that @clue, working on that refactor right now 👍

@WyriHaximus
Copy link
Member Author

@clue what I'm working out now would have this API approximately:

$http->on('request', function (Request $request, Response $response) {
    FormParserFactory::create($request)->deferredStream()->then(function ($fields) use ($response) {
       $response->end('hello ' . $field['name']);
   }, function ($error) use ($response) {
       $response->writeHead(400);
   });
});

Currently decoupling all that auto-wiring code

@WyriHaximus WyriHaximus added this to the v0.4.2 milestone Mar 21, 2016
@WyriHaximus WyriHaximus self-assigned this Mar 21, 2016
@WyriHaximus
Copy link
Member Author

The last few commits remove the right wiring between the form parsers and request parser. They also add a form parser factory. Next step is adding methods like deferredStream to them.

@WyriHaximus
Copy link
Member Author

Yeah Github won't even let me merge it from the site 😝 . This PR is my top ReactPHP priority at the moment, would prefer to get reactphp/http-client#58 in A.S.A.P. so I can fully focus on this right here.

I'll discus with @clue how exactly we're going to cut it up, but since most of the discussion has already taken place here we can move relatively quick

@clue clue modified the milestones: v0.5, v0.4.2 Sep 13, 2016
@mu578
Copy link

mu578 commented Sep 24, 2016

Hello gents,

if I may, some remarks about the adopted design :

Input:
-- Any request may have query parameters HEAD, PUT, POST, DELETE and GET... (1)
-- It will make sense to me to have an onEvent taking the HTTP verb as filter not "request"
-- You are inlining the buffer of uploaded files (Apply to BODY and PUT) (2)
-- What happens if the Content-Length header is missing? or set to an offset out of range? bad things.

  • (1) Thus, you are forcing the user to juggle between one to another object representation to get everything.
  • (2) which bars you to handle large file ; also a security threat. In my opinion, it would be better to redirect the raw-input to a FIFO file session then carefully extract and parse per chunk from there ; everything in this safe space, even if the FS is involved ; the full in memory solution is an utopia.

Suggestions :
you should only pre-parse in memory the part-offsets from the FIFO file space ; then give the result totally parsed on demand ; files should be carefully extracted per chunk-unit-buffer (let's say 1024 or 2048) to their destination if the user request it, not inlined in memory.

Output:
Same concepts apply to the response object (streamable and not buffered) i.e handling file-downloads, ranged-streams or large body-response without starving the machine.

Regards.

@WyriHaximus
Copy link
Member Author

Been spending some time splitting this PR up into smaller ones, that resulted in the following pull requests:
#62: File object
#69: Streaming body parsers foundation
#70: Content length buffered sink
#71: Parser: urlencoded (depends on #69 and #70)
#72: Parser: multipart (depends on #69 and #62)
#73: Streaming parser bufferedsink (depends on #69 and #62)

-----------

#62: Uploaded File object that doesn't make sense on it's own but it provides something needed by #72 and #73.
#69: Provides the base factory and NoBody and RawBody parsers to get started.
#70: Mainly a buffered sink like the one from react/stream but cuts off at a set length or when the stream ends.
#71: Urlencoded parser that requires the base (#69) and the sink (#70).
#72: Multipart parser that requires the file object (#62) and parser factory (#69), will be updated once the factory is in and added to the factory.
#73: Buffered sink that takes a streaming parser and will resolve the promise when the parser is done parsing the incoming request.

-----------

Order of merging (all PR's will be squashed on merge keeping the history clear): #69, #62, #70, #73, #71, #72

-----------

Will go over @moe123's comment carefully and see where adjustment is necessary. One of the things I've already done due to @moe123's is make all the parsers cancelable.

@andig
Copy link
Contributor

andig commented Nov 30, 2016

Checking back with how close we are to merging 😮. From what I understand PRs should be fine up to #62 and #69, open comment on #70 and open tasks on #71, #72, #73.

Anything userland could do to get closer to merging apart from friendly nagging?

@WyriHaximus
Copy link
Member Author

WyriHaximus commented Nov 30, 2016

@andig yes #62 and #69 are done as far as I'm concerned, unless @jsor or @clue thinks otherwise. And I like to get them in soon, I'll ping them on IRC tonight and see how they look at it.

Once that is in, start working on completing the other PR's. One of the issues I came across is that the urlencoded parser (#71) is going to be interesting as I can't use build in PHP functions to do the parsing without buffering.

@andig
Copy link
Contributor

andig commented Dec 2, 2016

One of the issues I came across is that the urlencoded parser (#71) is going to be interesting as I can't use build in PHP functions to do the parsing without buffering.

Can't we assume- for the time being- that buffering for this case is ok, i.e. you either have a POST blob which doesn't need decoding or you have urlencoded data that will most likely not exceed a certain size?

@WyriHaximus
Copy link
Member Author

One of the issues I came across is that the urlencoded parser (#71) is going to be interesting as I can't use build in PHP functions to do the parsing without buffering.

Can't we assume- for the time being- that buffering for this case is ok, i.e. you either have a POST blob which doesn't need decoding or you have urlencoded data that will most likely not exceed a certain size?

We could do that, I've set up several milestones that allows us to release this in parts. For example first getting the foundation out in 0.5.0, then multipart in 0.5.1, and then urlencoded parser in 0.5.2.

@clue clue mentioned this pull request Feb 10, 2017
@clue clue modified the milestones: v0.5.0, v0.8.0 Feb 14, 2017
@bweston92
Copy link

Any update?

@WyriHaximus
Copy link
Member Author

WyriHaximus commented Mar 13, 2017

@bweston92 See this issue for our roadmap: #120

@@ -28,27 +37,23 @@ public function feed($data)

// Extract the header from the buffer
// in case the content isn't complete
list($headers, $this->buffer) = explode("\r\n\r\n", $this->buffer, 2);
list($headers, $buffer) = explode("\r\n\r\n", $this->buffer, 2);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might result in a large string operation. Better use the previous strpos and check that against the $this->maxSize before. You might also want to use substr as you already have the position then.

@jsor jsor closed this in #226 Oct 2, 2017
@clue clue removed this from the v0.8.0 milestone Nov 25, 2017
@clue clue deleted the streaming-multipart branch April 14, 2019 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.