Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-69152: Add _proxy_response_headers attribute to HTTPConnection #26152

Merged
merged 7 commits into from
May 5, 2023

Conversation

nametkin
Copy link
Contributor

@nametkin nametkin commented May 15, 2021

It would be nice to be able to access the proxy response headers after tunneling through the proxy. Now these headers can only be obtained in debug mode.
This is necessary both for the case of a successful connection, and for the case of problems with establishing a connection (for example, in order to see the type of authentication required). Later, having access to the header Proxy-Authenticate, it will be possible to simplify the authentication on the proxy server in the dependent libraries urllib3, requests. There are problems with this now (Digest Proxy Auth, NTLM Proxy Auth, Kerberos Proxy Auth)

My proposed version is based on the idea proposed by Thomas Belhalfaoui in issue 24964. And also on the approach used in the library http.rb (here)

https://bugs.python.org/issue24964

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

CLA Missing

Our records indicate the following people have not signed the CLA:

@OneMoreZanuda

For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Jun 16, 2021
Copy link
Contributor

@MaxwellDupre MaxwellDupre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok

@nametkin
Copy link
Contributor Author

@vadmium, could you take a look at this?

@nametkin
Copy link
Contributor Author

@MaxwellDupre, thanks for the review!

@github-actions github-actions bot removed the stale Stale PR or inactive for long period of time. label Aug 8, 2022
@nametkin nametkin force-pushed the add_proxy_response_headers branch from efa9212 to 3258e0a Compare October 11, 2022 09:05
@cpython-cla-bot
Copy link

cpython-cla-bot bot commented Oct 11, 2022

All commit authors signed the Contributor License Agreement.
CLA signed

@nametkin nametkin force-pushed the add_proxy_response_headers branch 2 times, most recently from 15e31c8 to ef3244d Compare April 23, 2023 11:02
@@ -943,21 +944,15 @@ def _tunnel(self):
response = self.response_class(self.sock, method=self._method)
(version, code, message) = response._read_status()

self._proxy_response_headers = parse_headers(response.fp)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to how it is done in lines 337-341

@nametkin
Copy link
Contributor Author

Hi @orsenthil ! Could you take a look at this request? I see that you have worked a lot with module http/client.py. And I also found out that you were a reviewer for another PR with changes to the http/client.py. Or maybe you can recommend someone who could look at this code?

@orsenthil
Copy link
Member

@nametkin, sure. I will.

@orsenthil orsenthil self-assigned this Apr 23, 2023
@arhadthedev arhadthedev changed the title bpo-24964: Add _proxy_response_headers attribute to HTTPConnection gh-69152: Add _proxy_response_headers attribute to HTTPConnection Apr 24, 2023
@nametkin nametkin force-pushed the add_proxy_response_headers branch 6 times, most recently from 8d1ea4d to 3e0cdf9 Compare April 28, 2023 21:56
@nametkin nametkin force-pushed the add_proxy_response_headers branch from 3e0cdf9 to ec7f577 Compare April 29, 2023 17:20
line = response.fp.readline(_MAXLINE + 1)
if len(line) > _MAXLINE:
raise LineTooLong("header line")
if not line:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition if not line: isn't captured in parse_headers call.
Additional check of _MAXHEADERS is verified.
So, this isn't a strict 1:1 replacement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add the if not line condition in _reader_headers method for equivalence.

However, given that _tunnel method already prints headers in line 960 in print('header:', line.decode()), I am trying to understand, what additional benefit this patch brings

The discussion in #69152 seems adding additional states to the _tunnel method, and adding debug response to those states.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orsenthil, thanks for the review!
Regarding condition if not line. It seems to me that this is now a useless part of the code. Because the only variant of the line value in which this condition will be true, if I'm not mistaken, is line = b" (because the method readline can't return None or just an empty string). And for the equality of line to the value of b'', we check a little below.

I looked at the history, these lines were added when there was no equality check for the empty binary object below. That is, at some point they made sense, but after this commit, they ceased to be necessary.

But we can add this check to the _read_headers method as a precaution. Do you think it should be done?

Copy link
Contributor Author

@nametkin nametkin Apr 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. As for the benefits of the changes being made. Firstly, headers are needed in the situations described in the comment Add tunnel CONNECT response headers to httplib / http.client #69152 (comment). And secondly, now headers gets into the debug-logs only if we were able to go beyond this line: https://github.com/python/cpython/blob/main/Lib/http/client.py#L948.
    But if we did not pass the authorization headers we need, then we will get an OSError without information about what happened, we will not be able to find out what authentication data is required. With my PR, I would like to take the first step towards solving the problems described here Add support for digest authentication with an HTTP proxy psf/requests#2526, Allow custom authentication (in particular NTLM) to proxies  psf/requests#1582, http proxy negotiate/gssapi authentication? requests/requests-kerberos#83

Maybe then it would be possible to start returning not OSError, but some custom error that can be catched in the library code (urllib3, requests) and automatically prepare data for authentication based on the information contained in _proxy_response_headers. Similar to how it is done in https://github.com/requests/requests-kerberos/pull/149/files.

Maybe you know ways to solve these problems in a different way, I will be glad of any ideas and suggestions.
How @vadmium's proposal will solve the problem of the lack of any information about what went wrong when creating the tunnel, I did not understand.

Copy link
Member

@orsenthil orsenthil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might atleast want to make parse_headers equivalent to the lines of code being removed.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@nametkin
Copy link
Contributor Author

nametkin commented May 5, 2023

@orsenthil, please look at the two comments above (in conversation). Can we leave the PR in its current state? Or just in case, we should add if not line condition?

@arhadthedev arhadthedev added the stdlib Python modules in the Lib dir label May 5, 2023
@arhadthedev
Copy link
Member

@nametkin Could you sign the new CLA by clicking not signed button in the cpython-cla-bot's message, please? The message is above, posted Oct 11, 2022.

@nametkin
Copy link
Contributor Author

nametkin commented May 5, 2023

@nametkin Could you sign the new CLA by clicking not signed button in the cpython-cla-bot's message, please? The message is above, posted Oct 11, 2022.

Done.

@arhadthedev arhadthedev requested a review from orsenthil May 5, 2023 07:31
Copy link
Member

@orsenthil orsenthil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good to me. Thanks, @nametkin

@orsenthil
Copy link
Member

@gpshead - fyi.

@orsenthil orsenthil enabled auto-merge (squash) May 5, 2023 18:29
@orsenthil orsenthil merged commit 1afe0e0 into python:main May 5, 2023
@nametkin nametkin deleted the add_proxy_response_headers branch May 5, 2023 19:02
@@ -944,21 +945,16 @@ def _tunnel(self):
try:
(version, code, message) = response._read_status()

self._proxy_response_headers = parse_headers(response.fp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This always does complicated parsing of the data despite these headers not being used in 99.999% of use cases.

It'd be better to just call _read_headers(response.fp) and store that raw data privately, and add a get_proxy_response_headers() public method that triggers the parsing into data structures.

The debuglevel > 0 print loop should just loop over the raw unparsed headers.

carljm added a commit to carljm/cpython that referenced this pull request May 5, 2023
* main:
  pythongh-99113: Add PyInterpreterConfig.own_gil (pythongh-104204)
  pythongh-104146: Remove unused var 'parser_body_declarations' from clinic.py (python#104214)
  pythongh-99113: Add Py_MOD_PER_INTERPRETER_GIL_SUPPORTED (pythongh-104205)
  pythongh-104108: Add the Py_mod_multiple_interpreters Module Def Slot (pythongh-104148)
  pythongh-99113: Share the GIL via PyInterpreterState.ceval.gil (pythongh-104203)
  pythonGH-100479: Add `pathlib.PurePath.with_segments()` (pythonGH-103975)
  pythongh-69152: Add _proxy_response_headers attribute to HTTPConnection (python#26152)
  pythongh-103533: Use PEP 669 APIs for cprofile (pythonGH-103534)
  pythonGH-96803: Add three C-API functions to make _PyInterpreterFrame less opaque for users of PEP 523. (pythonGH-96849)
jbower-fb pushed a commit to jbower-fb/cpython-jbowerfb that referenced this pull request May 8, 2023
…on (python#26152)

Add _proxy_response_headers attribute to HTTPConnection (python#26152)

---------

Co-authored-by: Senthil Kumaran <[email protected]>
gpshead added a commit that referenced this pull request May 16, 2023
…ss (#104248)

Add http.client.HTTPConnection method get_proxy_response_headers() - this is a followup to #26152 which added it as a non-public attribute. This way we don't pre-compute a headers dictionary that most users will never access. The new method is properly public and documented and triggers full proxy header parsing into a dict only when actually called.

---------

Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Gregory P. Smith <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants