Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HttpParser] New component #6

Closed
igorw opened this issue May 10, 2012 · 17 comments
Closed

[HttpParser] New component #6

igorw opened this issue May 10, 2012 · 17 comments

Comments

@igorw
Copy link
Contributor

igorw commented May 10, 2012

Port the mongrel http parser to PHP. Or maybe the node one.

Also provide adapters for PHP extensions:

@mtdowling
Copy link

I'm thinking of doing something similar for Guzzle, but not as error-tolerant as these appear to be. Basically I want to move all of the parses in Guzzle (cookies, URLs, requests, responses) into the Guzzle\Parser namespace and provide interfaces so that people can use different parsing implementations (e.g. pecl_http). You would likely get a speed boost by using a native C implementation.

For the sake of speed, the parsers I implement for Guzzle in PHP will be similar to what's already in Guzzle, so it might not be exactly what you're looking for.

So-- I'm still thinking about adding these interfaces and making the default Guzzle implementations, but if you implement parsers in react, then I can ship Guzzle with a built in adapter for react adapters (I could call it the Inception adapter :) ).

I'm also evaluating the possibility of subtree splits for Guzzle so that it would be easier to use different components. I'd love any pointers you have on how you did that for react.

@igorw
Copy link
Contributor Author

igorw commented May 11, 2012

The subtree split script is in the repo: https://github.com/react-php/react/blob/master/subtree-split.sh. All it requires is git-subtree, which I installed through homebrew (osx package manager).

@mtdowling
Copy link

Sweet! Thanks!

@mtdowling
Copy link

After quite a bit of refactoring, I added support for different parsing implementations to Guzzle. There are parsers for messages, cookies, and URLs (default is PHP's parse_url()). There's support for a pecl_http message parser, and it's actually ~4.5x faster than the pure PHP parser.

https://github.com/guzzle/guzzle/tree/master/src/Guzzle/Http/Parser

Parsers are registered with Guzzle\Http\Parser\ParserRegistry and lazy loaded as requested. The ParserRegistry uses the PHP based versions by default. If a user is looking for performance, they can globally switch out any of the parsing implementations using this class.

What do you think? Could it be improved? assuming I could subtree split the Guzzle\Http\Parsers namespace, is this something you could use?

@cboden
Copy link
Member

cboden commented May 13, 2012

Impressive work. I took it for a brief spin and I liked it. I'll see about incorporating it into the React HTTP lib in the next few days.

It's unfortunate the PECL lib is ~4.5x faster, because it's inaccurate:

Given a header string of Sec-WebSocket-Version: 13
MessageParser returns ["Sec-WebSocket-Version"] => "13"
PeclHttpMessageParser returns ["Sec-Websocket-Version"] => "13"

@nrk
Copy link
Member

nrk commented May 13, 2012

@cboden it's unclear to me the reason why the PECL lib manipulates them, but aren't HTTP header field names case-insensitive anyway? I believe the culprit is the _http_pretty_key function at line 73 in http_api.c

@mtdowling
Copy link

You're right, but as nrk said, it shouldn't matter. I had to refactor a bit of code and a lot of tests in Guzzle so that headers are completely case-insensitive. Here's another example of the differences between the two parsers:

HEAD / HTT/1.1
X-Foo: foo
x-foo: baz
X-FOO: Bar
X-FOO: Jaz

With Guzzle:

X-Foo => foo
x-foo => baz
X-FOO => Bar, Jaz

With Pecl:

X-Foo: foo, baz, Bar, Jaz

@cboden
Copy link
Member

cboden commented May 13, 2012

@nrk Excellent point. When I wrote the WebSocket framing methods RFC6455 always wrote the examples as Sec-WebSocket-Header, but glancing over the spec again I see:

Please note that according to RFC2616, all header field names in
both HTTP requests and HTTP responses are case-insensitive.

That being said, I have no idea why the PECL lib would deliberately change the casing in the way it did.

I'll update my code to be case insensitive. Thanks for the heads up.

@nrk
Copy link
Member

nrk commented May 13, 2012

@cboden on a second thought after seeing @mtdowling's reply, I believe they did it to group repeated field names with potentially different casing into one using a comma-separated list of values (which is indeed allowed by the specs). The fact that they used an ucfirst approach for each part of a field name is probably just for the sake of the look it, as suggested by the name of the _http_pretty_key function.

@cboden
Copy link
Member

cboden commented May 14, 2012

I started implementing @mtdowling's new Parser changes...as before, I was impressed with the PHP work by Michael...and disappointed by the PECL extension. Dropping in MessageParser, I updated a few LOC which improved the parsing speed and called less classes, now that it was only dealing with arrays being passed to the HttpFoundation Request - all unit tests worked as before. When I switched to PECL...well, here is what happened:

Given

GET / HTTP/1.1
Host: example.com:80
Connection: close

RANDOM DATA
 

It returned:

stdClass Object
(
    [type] => 0
    [httpVersion] => 0
    [headers] => Array
        (
        )

    [body] => 
    [parentMessage] => stdClass Object
        (
            [type] => 1
            [httpVersion] => 1.1
            [requestMethod] => GET
            [requestUrl] => /
            [headers] => Array
                (
                    [Host] => example.com:80
                    [Connection] => close
                )

            [body] => 
            [parentMessage] => 
        )

)

I figure with some tweaking we could check if type is 0 and replace it with parentMessage in the PeclHttpMessageParser...but where is the body? The PECL libraries intended use seems to be for a consumer, like Guzzle, although the method names and documentation would suggest otherwise.


While I'm ranting about pecl_http, Michael makes an excellent point about why the case change, to amalgamate the same headers together. My beef is that it's not documented and if they're going to change it, I would say just do all lower case...but that's just me (still angry no docs though).


Anyway, I like the new parser work, I'm happy to help contribute to it to get it to a stable point. A subtree split would be awesome. @igorw can weigh in, in about a week (busy prepping for conference).


Final score:

@mtdowling
Copy link

Glad you got to take it for a spin, and I'd be happy to have any feedback or contributions.

As for the pecl_http issue: It might be that pecl_http is not properly handling "\n" line endings (vs the expected \r\n).

I'm finishing some heavy refactoring in Guzzle this week. After this refactoring, I expect the API to be very stable, so at that point, it would make sense to look into creating subtree splits for the following:

  • Common
  • Http
  • Http\Parser
  • Service

@cboden
Copy link
Member

cboden commented May 15, 2012

The unit test was written like $message = "GET / HTTP/1.1\r\nHeader: Value\r\n\r\nBody here\r\n\r\n"; and PECL still failed.

@mtdowling
Copy link

Wow, that sucks. pecl_http seems to work for Guzzle, but it looks like it isn't something you can use. Maybe we still need an adapter for some of the HTTP parsers listed by @igorw.

@robocoder
Copy link

FYI section 4.2 of rfc2616 is where the spec states header field names are case-insensitive, may be folded using a comma separator, or continued (ie multi-line).

Note: the cookie spec has an exception. Folding does not apply to Set-Cookie.

@igorw
Copy link
Contributor Author

igorw commented May 22, 2012

The problem is that the case is sometimes violated (e.g. opera with WebSockets), so having control over this is kind of a big deal, especially when producing http messages.

Great work with the parser component! A subtree split would be great. Then we can also contribute a mongrel parser patch into it.

@clue
Copy link
Member

clue commented Jul 6, 2016

Also consider checking out #233.

@clue
Copy link
Member

clue commented Feb 8, 2017

See reactphp/http#14 and reactphp/http#19

@clue clue closed this as completed Feb 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants