Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve behavior of HTTP redirects #182

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from
Draft

Conversation

pyrooka
Copy link
Member

@pyrooka pyrooka commented Nov 22, 2023

This commit modifies the Python core so that it will include "safe" headers
when performing a cross-site redirect where both the original and redirected
hosts are within IBM's "cloud.ibm.com" domain.

This commit modifies the Python core so that it will include "safe" headers
when performing a cross-site redirect where both the original and redirected
hosts are within IBM's "cloud.ibm.com" domain.

Signed-off-by: Norbert Biczo <[email protected]>
Signed-off-by: Norbert Biczo <[email protected]>
@pyrooka pyrooka marked this pull request as draft November 23, 2023 15:35
@pyrooka
Copy link
Member Author

pyrooka commented Nov 23, 2023

I've converted this PR to to draft, to prevent merging it before the necessary CISO approvals.

Copy link
Member

@padamstx padamstx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good start, but I think there are some missing pieces to the implementation.


# If both the original and the redirected URL are under the `.cloud.ibm.com` domain,
# copy the safe headers that are used for authentication purposes,
if self.service_url.endswith('.cloud.ibm.com') and urlparse(next_request.url).netloc.endswith(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.service_url.endswith('.cloud.ibm.com') will most definitely not work in the general case. To see why, consider a service url like https://myhost.ibm.com/api or https://myhost.ibm.com:8080. I think you'll need to parse self.service_url and extract the hostname from it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good point and I already parse the url in the second part of the expression... :)


# If we reached the max number of redirects and the last response is still a redirect
# stop processing the response and return an error to the user.
if redirects_count == MAX_REDIRECTS and response.is_redirect:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not do the limit check immediately after incrementing redirect_count above? I think it could be cleaner as you would just need to check:
if redirects_count >= MAX_REDIRECTS:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! I need to excuse myself so: initially I implemented the error handling in a different way then I forgot to refactor that check.. 😄

@@ -788,6 +788,166 @@ def test_retry_config_external():
assert retry_err.value.reason == error


@responses.activate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add some additional tests:

  • redirect to same host (both inside cloud.ibm.com)
  • redirect to same host (both outside cloud.ibm.com)

Also, because you are completely overriding the default "follow redirects" behavior, I think you'll need to add some tests for redirect scenarios involving POST and a representative sample of redirect status codes. There are some subtle gotcha's when implementing redirects and we need to make sure that we are providing equivalent function as compared to the default "follow redirects" behavior.
I had to do this in the Java implementation as well. For example, for 301/302 status codes, a POST should be re-written as a GET with no body (it may fail in the redirected server, but that's what the HTTP semantics should be). On the other hand, 307/308 status codes should NOT cause a POST to be re-written as a GET.
You should study the code that you are overriding and try to match it as closely as possible other than adding support for the *.cloud.ibm.com stuff.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are already there those scenarios. Copied from the tests

def test_redirect_ibm_to_ibm_success():
    url_from = 'http://region1.cloud.ibm.com/'
    url_to = 'http://region2.cloud.ibm.com/'
    ...

and

def test_redirect_not_ibm_to_not_ibm_fail():
    url_from = 'http://region1.notcloud.ibm.com/'
    url_to = 'http://region2.notcloud.ibm.com/'
    ...

Are you talking about something else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, because you are completely overriding the default "follow redirects" behavior...

That's not the case. Still the requests package handles the redirections, the "allow_redirect": False" option only prevents to fire the next request automatically, but it's still being processed and resolved appropriately. From the session.py in the requests package:

# If redirects aren't being followed, store the response on the Request for Response.next().
        if not allow_redirects:
            try:
                r._next = next(
                    self.resolve_redirects(r, request, yield_requests=True, **kwargs)
                )
            except StopIteration:
                pass

Let's touch base later, to make sure we are on the same page! :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you talking about something else?

  1. redirect to same host (both inside cloud.ibm.com):
    from: region1.cloud.ibm.com
    to: region1.cloud.ibm.com

  2. redirect to same host (both outside cloud.ibm.com):
    from: region1.notcloud.ibm.com
    to: region1.notcloud.ibm.com

Redirecting to the same host SHOULD allow the safe headers to be included in the redirected request, regardless of whether the host is inside or outside the cloud.ibm.com domain.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's touch base later, to make sure we are on the same page!

Yes, let's discuss tomorrow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redirecting to the same host SHOULD allow the safe headers to be included in the redirected request, regardless of whether the host is inside or outside the cloud.ibm.com domain.

Ok it's my bad, I used the domain and host words interchangeably(which is incorrect, I know :))...

Signed-off-by: Norbert Biczo <[email protected]>
@pyrooka
Copy link
Member Author

pyrooka commented Dec 4, 2023

@padamstx I've made the tests more robust, but it may be overkill...:) let me know what you think!

Copy link
Member

@padamstx padamstx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional testcases look good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants