Streaming request body parsing #41

WyriHaximus · 2015-10-01T13:14:09Z

This PR is the follow up for #13. It started out to make multipart streaming but ended up making all bodies streaming.

The parsers emit a post event with the key and value of a post variable and file on uploaded files found in the request. On the request object getFiles is gone due to the streaming nature of the parsers. getPost is still there but it won't have everything until the entire request has been parsed.

Todo:

Normal body streaming
Multipart body streaming
Form URL Encoded body streaming
File object for files in the request
Make sure all parsers behave the same
Make the form parsers optional
Add buffered sink like helpers for request post fields and files
Update readme with an example

nazar-pc · 2015-10-01T16:33:12Z

I saw gPsr, are request/response objects PSR-7 compatible now?

WyriHaximus · 2015-10-01T18:35:43Z

Not yet, those are also in the works though. But this is a step in that direction. The gPsr you're referring to is only used to parse request headers.

clue · 2016-03-19T16:11:34Z

The parsers emit a post event with the key and value of a post variable and file on uploaded files found in the request.

The HTTP message body can potentially have any size (even single fields can be huge), do we really want to store this automatically?

As an alternative, nodejs only exposes the body stream and leaves it up to the consumer to pass this stream to the correct parser (for example http://stackoverflow.com/questions/4295782/how-do-you-extract-post-data-in-node-js).

WyriHaximus · 2016-03-19T18:10:28Z

The HTTP message body can potentially have any size (even single fields can be huge), do we really want to store this automatically?

No I want to go nodes way where a string or stream is emitted? Those can just be gathered using the buffered sink. Not sure yet about the stream part. It would result in less overhead for small fields but more code on the implementing side.

clue · 2016-03-19T18:27:59Z

My personal vote here would be small, independent, composable parts instead of convenience built in.

Composable parts enable convenience on a higher level, such as this draft API:

$http->on('request', function (Request $request, Response $response) use ($formParser) {
    $formParser->parseDeferredStream($request)->then(function ($fields) use ($response) {
       $response->end('hello ' . $field['name']);
   }, function ($error) use ($response) {
       $response->writeHead(400);
   });
});

Once we look into PSR-7 support, we could probably build a convenient middleware around this concept in order to make this available to each request handler.

WyriHaximus · 2016-03-19T18:56:15Z

That looks good, I'm assuming that is just listening to form events emitted from $request. Or are you suggesting that $request just emits data and the form parser listens into that handling the parsing. Do like that even more 👍 . Should be fairly easy to do as well, I'll make sure both parsers behave the before refactoring into that

clue · 2016-03-20T12:37:27Z

suggesting that $request just emits data and the form parser listens into that handling the parsing. Do like that even more 👍

This exactly 👍

Should be fairly easy to do as well […]

Yeah, I too suppose this should be easier than auto-wiring all parsers 👍

…rser

WyriHaximus · 2016-03-20T18:52:58Z

Thanks for clarifying that @clue, working on that refactor right now 👍

WyriHaximus · 2016-03-20T21:25:13Z

@clue what I'm working out now would have this API approximately:

$http->on('request', function (Request $request, Response $response) {
    FormParserFactory::create($request)->deferredStream()->then(function ($fields) use ($response) {
       $response->end('hello ' . $field['name']);
   }, function ($error) use ($response) {
       $response->writeHead(400);
   });
});

Currently decoupling all that auto-wiring code

WyriHaximus · 2016-03-22T19:42:49Z

The last few commits remove the right wiring between the form parsers and request parser. They also add a form parser factory. Next step is adding methods like deferredStream to them.

WyriHaximus · 2016-08-20T11:51:43Z

Yeah Github won't even let me merge it from the site 😝 . This PR is my top ReactPHP priority at the moment, would prefer to get reactphp/http-client#58 in A.S.A.P. so I can fully focus on this right here.

I'll discus with @clue how exactly we're going to cut it up, but since most of the discussion has already taken place here we can move relatively quick

…mise

mu578 · 2016-09-24T15:14:59Z

Hello gents,

if I may, some remarks about the adopted design :

Input:
-- Any request may have query parameters HEAD, PUT, POST, DELETE and GET... (1)
-- It will make sense to me to have an onEvent taking the HTTP verb as filter not "request"
-- You are inlining the buffer of uploaded files (Apply to BODY and PUT) (2)
-- What happens if the Content-Length header is missing? or set to an offset out of range? bad things.

(1) Thus, you are forcing the user to juggle between one to another object representation to get everything.
(2) which bars you to handle large file ; also a security threat. In my opinion, it would be better to redirect the raw-input to a FIFO file session then carefully extract and parse per chunk from there ; everything in this safe space, even if the FS is involved ; the full in memory solution is an utopia.

Suggestions :
you should only pre-parse in memory the part-offsets from the FIFO file space ; then give the result totally parsed on demand ; files should be carefully extracted per chunk-unit-buffer (let's say 1024 or 2048) to their destination if the user request it, not inlined in memory.

Output:
Same concepts apply to the response object (streamable and not buffered) i.e handling file-downloads, ranged-streams or large body-response without starving the machine.

Regards.

WyriHaximus · 2016-10-08T16:01:05Z

Been spending some time splitting this PR up into smaller ones, that resulted in the following pull requests:
#62: File object
#69: Streaming body parsers foundation
#70: Content length buffered sink
#71: Parser: urlencoded (depends on #69 and #70)
#72: Parser: multipart (depends on #69 and #62)
#73: Streaming parser bufferedsink (depends on #69 and #62)

-----------

#62: Uploaded File object that doesn't make sense on it's own but it provides something needed by #72 and #73.
#69: Provides the base factory and NoBody and RawBody parsers to get started.
#70: Mainly a buffered sink like the one from react/stream but cuts off at a set length or when the stream ends.
#71: Urlencoded parser that requires the base (#69) and the sink (#70).
#72: Multipart parser that requires the file object (#62) and parser factory (#69), will be updated once the factory is in and added to the factory.
#73: Buffered sink that takes a streaming parser and will resolve the promise when the parser is done parsing the incoming request.

-----------

Order of merging (all PR's will be squashed on merge keeping the history clear): #69, #62, #70, #73, #71, #72

-----------

Will go over @moe123's comment carefully and see where adjustment is necessary. One of the things I've already done due to @moe123's is make all the parsers cancelable.

andig · 2016-11-30T15:10:52Z

Checking back with how close we are to merging 😮. From what I understand PRs should be fine up to #62 and #69, open comment on #70 and open tasks on #71, #72, #73.

Anything userland could do to get closer to merging apart from friendly nagging?

WyriHaximus · 2016-11-30T16:48:28Z

@andig yes #62 and #69 are done as far as I'm concerned, unless @jsor or @clue thinks otherwise. And I like to get them in soon, I'll ping them on IRC tonight and see how they look at it.

Once that is in, start working on completing the other PR's. One of the issues I came across is that the urlencoded parser (#71) is going to be interesting as I can't use build in PHP functions to do the parsing without buffering.

andig · 2016-12-02T12:13:17Z

One of the issues I came across is that the urlencoded parser (#71) is going to be interesting as I can't use build in PHP functions to do the parsing without buffering.

Can't we assume- for the time being- that buffering for this case is ok, i.e. you either have a POST blob which doesn't need decoding or you have urlencoded data that will most likely not exceed a certain size?

WyriHaximus · 2016-12-02T15:31:22Z

One of the issues I came across is that the urlencoded parser (#71) is going to be interesting as I can't use build in PHP functions to do the parsing without buffering.

Can't we assume- for the time being- that buffering for this case is ok, i.e. you either have a POST blob which doesn't need decoding or you have urlencoded data that will most likely not exceed a certain size?

We could do that, I've set up several milestones that allows us to release this in parts. For example first getting the foundation out in 0.5.0, then multipart in 0.5.1, and then urlencoded parser in 0.5.2.

bweston92 · 2017-03-13T14:05:33Z

Any update?

WyriHaximus · 2017-03-13T14:22:43Z

@bweston92 See this issue for our roadmap: #120

kelunik · 2017-05-28T15:56:57Z

src/RequestParser.php

@@ -28,27 +37,23 @@ public function feed($data)

            // Extract the header from the buffer
            // in case the content isn't complete
-            list($headers, $this->buffer) = explode("\r\n\r\n", $this->buffer, 2);
+            list($headers, $buffer) = explode("\r\n\r\n", $this->buffer, 2);


This might result in a large string operation. Better use the previous strpos and check that against the $this->maxSize before. You might also want to use substr as you already have the position then.

All request bodies now come in streaming

b3e9253

WyriHaximus mentioned this pull request Oct 1, 2015

Multipart handling #13

Merged

WyriHaximus added 3 commits October 1, 2015 16:12

Moved parsers to their own namespace

e1b0d03

File object representing uploaded files

dca940d

Cleaned up old test code I've forgotten to remove

816d1c6

WyriHaximus mentioned this pull request Oct 28, 2015

[BUG] Invalid POSTfields parsing in RequestParser #44

Closed

https://github.com/reactphp/http/issues/44#issuecomment-151941120

9eb1a93

End the stream when not streaming

9bb6116

WyriHaximus added 2 commits March 19, 2016 19:40

Properly trim off the file contents

41eaa92

Make sure extra headers are ignored

2860f10

WyriHaximus mentioned this pull request Mar 20, 2016

Tag new version #52

Closed

Emit the post vars instead of setting them from the FormUrlencoded pa…

64446d1

…rser

clue mentioned this pull request Mar 20, 2016

Header names are not case-sensitive #51

Closed

WyriHaximus added this to the v0.4.2 milestone Mar 21, 2016

WyriHaximus self-assigned this Mar 21, 2016

WyriHaximus added 4 commits March 21, 2016 22:42

FormParserFactory

25f4718

Changed both parsers to work with only the request

db68230

Revert old behavior and set request (partial) body

ab99c8f

Updated tests and removed multipart and urlencoded tests

20682f6

WyriHaximus mentioned this pull request Mar 22, 2016

"react/*": dev versions break downstream installs php-pm/php-pm#60

Closed

WyriHaximus mentioned this pull request Aug 26, 2016

Where are POST data? It dissappeared, but ... where and why? #61

Closed

WyriHaximus added a commit that referenced this pull request Aug 30, 2016

File object (taken from #41)

644aed4

WyriHaximus mentioned this pull request Aug 30, 2016

File object #62

Closed

WyriHaximus added 2 commits September 6, 2016 19:12

Catch the case where no content-type header is send

abb4ca0

Deal with an edge case where a zero length will never fulfill the pro…

d4cef86

…mise

clue modified the milestones: v0.5, v0.4.2 Sep 13, 2016

WyriHaximus mentioned this pull request Oct 15, 2016

Reading post data example reactphp/reactphp#137

Closed

andig mentioned this pull request Dec 21, 2016

React\Http\RequestParser not found php-pm/php-pm#190

Closed

clue mentioned this pull request Feb 10, 2017

Parse request body #105

Closed

clue added the new feature label Feb 10, 2017

clue modified the milestones: v0.5.0, v0.8.0 Feb 14, 2017

kelunik reviewed May 28, 2017

View reviewed changes

WyriHaximus mentioned this pull request May 30, 2017

Create a BufferedServer #198

Closed

WyriHaximus mentioned this pull request Sep 25, 2017

Support multipart parsing for RequestBodyParserMiddleware (file uploads) #226

Merged

2 tasks

jsor closed this in #226 Oct 2, 2017

clue removed this from the v0.8.0 milestone Nov 25, 2017

clue deleted the streaming-multipart branch April 14, 2019 21:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming request body parsing #41

Streaming request body parsing #41

WyriHaximus commented Oct 1, 2015 •

edited

Loading

nazar-pc commented Oct 1, 2015

WyriHaximus commented Oct 1, 2015

clue commented Mar 19, 2016

WyriHaximus commented Mar 19, 2016

clue commented Mar 19, 2016

WyriHaximus commented Mar 19, 2016

clue commented Mar 20, 2016

WyriHaximus commented Mar 20, 2016

WyriHaximus commented Mar 20, 2016

WyriHaximus commented Mar 22, 2016

WyriHaximus commented Aug 20, 2016

mu578 commented Sep 24, 2016 •

edited

Loading

WyriHaximus commented Oct 8, 2016

andig commented Nov 30, 2016

WyriHaximus commented Nov 30, 2016 •

edited

Loading

andig commented Dec 2, 2016

WyriHaximus commented Dec 2, 2016

bweston92 commented Mar 13, 2017

WyriHaximus commented Mar 13, 2017 •

edited

Loading

kelunik May 28, 2017

Streaming request body parsing #41

Streaming request body parsing #41

Conversation

WyriHaximus commented Oct 1, 2015 • edited Loading

nazar-pc commented Oct 1, 2015

WyriHaximus commented Oct 1, 2015

clue commented Mar 19, 2016

WyriHaximus commented Mar 19, 2016

clue commented Mar 19, 2016

WyriHaximus commented Mar 19, 2016

clue commented Mar 20, 2016

WyriHaximus commented Mar 20, 2016

WyriHaximus commented Mar 20, 2016

WyriHaximus commented Mar 22, 2016

WyriHaximus commented Aug 20, 2016

mu578 commented Sep 24, 2016 • edited Loading

WyriHaximus commented Oct 8, 2016

-----------

-----------

-----------

andig commented Nov 30, 2016

WyriHaximus commented Nov 30, 2016 • edited Loading

andig commented Dec 2, 2016

WyriHaximus commented Dec 2, 2016

bweston92 commented Mar 13, 2017

WyriHaximus commented Mar 13, 2017 • edited Loading

kelunik May 28, 2017

Choose a reason for hiding this comment

WyriHaximus commented Oct 1, 2015 •

edited

Loading

mu578 commented Sep 24, 2016 •

edited

Loading

WyriHaximus commented Nov 30, 2016 •

edited

Loading

WyriHaximus commented Mar 13, 2017 •

edited

Loading