-
-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-69152: Add _proxy_response_headers attribute to HTTPConnection #26152
Conversation
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA). CLA MissingOur records indicate the following people have not signed the CLA: @OneMoreZanuda For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. If you have recently signed the CLA, please wait at least one business day You can check yourself to see if the CLA has been received. Thanks again for the contribution, we look forward to reviewing it! |
This PR is stale because it has been open for 30 days with no activity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ok
@vadmium, could you take a look at this? |
@MaxwellDupre, thanks for the review! |
efa9212
to
3258e0a
Compare
15e31c8
to
ef3244d
Compare
Lib/http/client.py
Outdated
@@ -943,21 +944,15 @@ def _tunnel(self): | |||
response = self.response_class(self.sock, method=self._method) | |||
(version, code, message) = response._read_status() | |||
|
|||
self._proxy_response_headers = parse_headers(response.fp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to how it is done in lines 337-341
Hi @orsenthil ! Could you take a look at this request? I see that you have worked a lot with module http/client.py. And I also found out that you were a reviewer for another PR with changes to the http/client.py. Or maybe you can recommend someone who could look at this code? |
@nametkin, sure. I will. |
8d1ea4d
to
3e0cdf9
Compare
3e0cdf9
to
ec7f577
Compare
Lib/http/client.py
Outdated
line = response.fp.readline(_MAXLINE + 1) | ||
if len(line) > _MAXLINE: | ||
raise LineTooLong("header line") | ||
if not line: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition if not line:
isn't captured in parse_headers
call.
Additional check of _MAXHEADERS is verified.
So, this isn't a strict 1:1 replacement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add the if not line
condition in _reader_headers method for equivalence.
However, given that _tunnel method already prints headers in line 960 in print('header:', line.decode())
, I am trying to understand, what additional benefit this patch brings
The discussion in #69152 seems adding additional states to the _tunnel method, and adding debug response to those states.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@orsenthil, thanks for the review!
Regarding condition if not line
. It seems to me that this is now a useless part of the code. Because the only variant of the line value in which this condition will be true, if I'm not mistaken, is line = b"
(because the method readline
can't return None
or just an empty string). And for the equality of line to the value of b'', we check a little below.
I looked at the history, these lines were added when there was no equality check for the empty binary object below. That is, at some point they made sense, but after this commit, they ceased to be necessary.
But we can add this check to the _read_headers method as a precaution. Do you think it should be done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- As for the benefits of the changes being made. Firstly, headers are needed in the situations described in the comment Add tunnel CONNECT response headers to httplib / http.client #69152 (comment). And secondly, now headers gets into the debug-logs only if we were able to go beyond this line: https://github.com/python/cpython/blob/main/Lib/http/client.py#L948.
But if we did not pass the authorization headers we need, then we will get an OSError without information about what happened, we will not be able to find out what authentication data is required. With my PR, I would like to take the first step towards solving the problems described here Add support for digest authentication with an HTTP proxy psf/requests#2526, Allow custom authentication (in particular NTLM) to proxies psf/requests#1582, http proxy negotiate/gssapi authentication? requests/requests-kerberos#83
Maybe then it would be possible to start returning not OSError, but some custom error that can be catched in the library code (urllib3, requests) and automatically prepare data for authentication based on the information contained in _proxy_response_headers. Similar to how it is done in https://github.com/requests/requests-kerberos/pull/149/files.
Maybe you know ways to solve these problems in a different way, I will be glad of any ideas and suggestions.
How @vadmium's proposal will solve the problem of the lack of any information about what went wrong when creating the tunnel, I did not understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might atleast want to make parse_headers
equivalent to the lines of code being removed.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
@orsenthil, please look at the two comments above (in conversation). Can we leave the PR in its current state? Or just in case, we should add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks good to me. Thanks, @nametkin
@gpshead - fyi. |
@@ -944,21 +945,16 @@ def _tunnel(self): | |||
try: | |||
(version, code, message) = response._read_status() | |||
|
|||
self._proxy_response_headers = parse_headers(response.fp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This always does complicated parsing of the data despite these headers not being used in 99.999% of use cases.
It'd be better to just call _read_headers(response.fp) and store that raw data privately, and add a get_proxy_response_headers()
public method that triggers the parsing into data structures.
The debuglevel > 0 print loop should just loop over the raw unparsed headers.
* main: pythongh-99113: Add PyInterpreterConfig.own_gil (pythongh-104204) pythongh-104146: Remove unused var 'parser_body_declarations' from clinic.py (python#104214) pythongh-99113: Add Py_MOD_PER_INTERPRETER_GIL_SUPPORTED (pythongh-104205) pythongh-104108: Add the Py_mod_multiple_interpreters Module Def Slot (pythongh-104148) pythongh-99113: Share the GIL via PyInterpreterState.ceval.gil (pythongh-104203) pythonGH-100479: Add `pathlib.PurePath.with_segments()` (pythonGH-103975) pythongh-69152: Add _proxy_response_headers attribute to HTTPConnection (python#26152) pythongh-103533: Use PEP 669 APIs for cprofile (pythonGH-103534) pythonGH-96803: Add three C-API functions to make _PyInterpreterFrame less opaque for users of PEP 523. (pythonGH-96849)
…on (python#26152) Add _proxy_response_headers attribute to HTTPConnection (python#26152) --------- Co-authored-by: Senthil Kumaran <[email protected]>
…ss (#104248) Add http.client.HTTPConnection method get_proxy_response_headers() - this is a followup to #26152 which added it as a non-public attribute. This way we don't pre-compute a headers dictionary that most users will never access. The new method is properly public and documented and triggers full proxy header parsing into a dict only when actually called. --------- Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: Gregory P. Smith <[email protected]>
It would be nice to be able to access the proxy response headers after tunneling through the proxy. Now these headers can only be obtained in debug mode.
This is necessary both for the case of a successful connection, and for the case of problems with establishing a connection (for example, in order to see the type of authentication required). Later, having access to the header Proxy-Authenticate, it will be possible to simplify the authentication on the proxy server in the dependent libraries
urllib3
,requests
. There are problems with this now (Digest Proxy Auth, NTLM Proxy Auth, Kerberos Proxy Auth)My proposed version is based on the idea proposed by Thomas Belhalfaoui in issue 24964. And also on the approach used in the library
http.rb
(here)https://bugs.python.org/issue24964