Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http parser non strict mode #2332

Merged
merged 7 commits into from
Oct 17, 2017
Merged

http parser non strict mode #2332

merged 7 commits into from
Oct 17, 2017

Conversation

popravich
Copy link
Member

@popravich popravich commented Oct 17, 2017

What do these changes do?

Build http-parser in non-strict mode.

By default http-parser was built in strict mode.
This was causing requests having non-ascii URLs in request line to be
rejected with InvalidURLErrors.
Although it is considered to be invalid HTTP request most servers handle such requests.
So it is decided to build http-parser in non-strict mode by default.

(more discussion here on gitter)

This PR allows to control http-parse mode by forwarding HTTP_PARSER_STRICT environment
variable to extension compiler.

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • Documentation reflects the changes
  • Add a new news fragment into the changes folder

@popravich popravich changed the title WIP: http parser non strict mode http parser non strict mode Oct 17, 2017
@popravich popravich requested a review from asvetlov October 17, 2017 09:54
@asvetlov
Copy link
Member

Do you want to hop on the bus at last second?
Ok, I'll wait for a while. The change looks pretty important.

@popravich
Copy link
Member Author

Well, I just don't know where in docs should I put a note on this or a news in changes is enough?..

@asvetlov
Copy link
Member

Hmm, strict mode requires compilation from sources, isn't it?
We will not ship wheels in strict mode, right?

@popravich
Copy link
Member Author

True

@popravich popravich force-pushed the http_parser_non_strict_mode branch from 9e2b7e7 to 50fa505 Compare October 17, 2017 10:18
@popravich
Copy link
Member Author

Dropped unneeded control flags.

asvetlov
asvetlov previously approved these changes Oct 17, 2017
@@ -50,7 +50,7 @@ cov-dev: .develop
@echo "Run without extensions"
@AIOHTTP_NO_EXTENSIONS=1 py.test --cov=aiohttp tests
@py.test --cov=aiohttp --cov-report=term --cov-report=html --cov-append tests
@echo "open file://`pwd`/coverage/index.html"
@echo "open file://`pwd`/coverage/index.html"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!
I was wondering why I don't see the line in console output :)

@@ -0,0 +1 @@
Build http-parser extension in non-strict mode by default.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want non-strict mode by default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, does this switch implies that we have to ship two wheels with strict and non strict parsers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not?
We are strict on sending HTTP data but relaxed on receiving the data from foreign servers.
Basically we do the same in very many places.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According the sources, this check does a bit more than just allow non-ascii characters in URL. If I read correctly, users may send "GET / OLOLO/1.1" requests without worry about, servers may return malformed status codes, etc. I don't think we want to support this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GET / OLOLO/1.1 does not work — have just checked this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, then I read it wrong way. What exact it controls then? Can't found any documentation about.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After brief overview I've found the following behavior changes for non-strict mode.

It allows

  1. spaces, tabs and linefeeds in headers
  2. utf-8 in URLs
  3. underscore in hostname
  4. \9 (ht) and \12 (np) symbols.

Also looks like it has relaxed checks for \r\n (\r and \n are allowed).
"GET / OLOLO/1.1" if forbidden but "GET / HLOL/1.1" is allowed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asvetlov Thanks!

Are we ok with 1, 3 and 4?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think yes. It doesn't raise security issues IMHO

@kxepal
Copy link
Member

kxepal commented Oct 17, 2017

What's the behavior of pure Python parser btw?

@asvetlov
Copy link
Member

IMHO it works in relaxed mode.
@popravich ?

@popravich
Copy link
Member Author

pure-python parser works in relaxed mode

@popravich
Copy link
Member Author

Update the test to use all parsers

@codecov-io
Copy link

codecov-io commented Oct 17, 2017

Codecov Report

Merging #2332 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2332   +/-   ##
=======================================
  Coverage   97.23%   97.23%           
=======================================
  Files          39       39           
  Lines        8224     8224           
  Branches     1442     1442           
=======================================
  Hits         7997     7997           
  Misses         98       98           
  Partials      129      129

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f30e10e...bae044f. Read the comment docs.

@asvetlov
Copy link
Member

@fafhrd91 what is your opinion? Is it ok to ship C parser in relaxed mode?

@fafhrd91
Copy link
Member

I think it is fine

@asvetlov asvetlov merged commit 96412ae into master Oct 17, 2017
@asvetlov asvetlov deleted the http_parser_non_strict_mode branch October 17, 2017 15:00
@asvetlov
Copy link
Member

Thanks @popravich

@lock
Copy link

lock bot commented Oct 28, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a [new issue] for related bugs.
If you feel like there's important points made in this discussion, please include those exceprts into that [new issue].
[new issue]: https://github.com/aio-libs/aiohttp/issues/new

@lock lock bot added the outdated label Oct 28, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 28, 2019
@psf-chronographer psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Oct 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants