Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: redirect in route.fetch may jump to a wrong url #30903

Closed
tqobqbq opened this issue May 18, 2024 · 5 comments · Fixed by #30904
Closed

[Bug]: redirect in route.fetch may jump to a wrong url #30903

tqobqbq opened this issue May 18, 2024 · 5 comments · Fixed by #30904
Assignees
Labels

Comments

@tqobqbq
Copy link

tqobqbq commented May 18, 2024

Version

1.43.0

Steps to reproduce

Example steps (replace with your own):

  1. Clone my repo at https://github.com/tqobqbq/playwright_bug_test.git
  2. python test.py
  3. You should see the error come up

Expected behavior

I expect 'https://www.suruga-ya.jp/search?search_word=%E3%83%9E' will jump to ' https://www.suruga-ya.jp/search?category=&search_word=%E3%83%9E'
And if I set a route function:
def route_continue(route): return route.continue_()
or without setting any route function,I can get the proper behavior

Actual behavior

however,if I set a route function:
def route_fetch(route): return route.fulfill(response=route.fetch(max_redirects=0))
It will jump to a wrong url 'https://www.suruga-ya.jp/search?category=&search_word=%C3%A3%C2%83%C2%9E'

Additional context

I think the error may stem from the url parse of response.headers['location'] in route.fetch

Environment

- Operating System: [Windows 11]
- CPU: [arm64]
- Browser: [All, Chromium, Firefox, WebKit]
- Python Version: [3.9]
- Other info:
@mxschmitt
Copy link
Member

The issue seems related to nodejs/node#17390 (comment) where the Location header seems to contain UTF-8 values while the spec / we only support US ASCII.

curl -o /dev/null -iv --raw "https://www.suruga-ya.jp/search?search_word=マ"

Since all browsers seem to support UTF-8, we should do that as well. Will evaluate with the team if we should post-process the headers and try to parse them as UTF-8 as well.

@mxschmitt mxschmitt transferred this issue from microsoft/playwright-python May 20, 2024
@mxschmitt mxschmitt self-assigned this May 20, 2024
@tqobqbq
Copy link
Author

tqobqbq commented May 21, 2024

@mxschmitt another bug maybe not relevant but still about route.fetch in my test:

def test_route2():
    with sync_playwright() as p:
        browser = p.chromium.launch(
            # headless=False
        )


        def route_fetch(route):
            request=route.request
            response=route.fetch()
            print('route.request.url in route_fetch:',request.url)
            print('response.url in route_fetch:',response.url,'status:',response.status,'\n')
            return route.fulfill(response=response)

        def print_response(response):
            if response.url.startswith('https://www.suruga-ya.jp/search'):
                request=response.request
                print('response.request.url in print_response:',request.url)
                print('response.request.redirected_from in print_response:',request.redirected_from)
                print('response.url in print_response:',response.url,'status:',response.status)
                
        url='https://www.suruga-ya.jp/search?search_word=%E3%83%9E'

        page=browser.new_page()
        page.on('response',print_response)

        page.route('https://www.suruga-ya.jp/search*',route_fetch)
        page.goto(url,wait_until='commit')

if __name__ == '__main__':
    test_route2()

the output is:
image

the request of 'search?search_word=%E3%83%9E' is redirected to 'search?category=&search_word=%C3%A3%C2%83%C2%9E' in the route.fetch()
but in 'page.on('response',print_response)',both response.url and response.request.url are still 'search?search_word=%E3%83%9E',and response.request.redirected_from is None

@mxschmitt
Copy link
Member

This is expected as of today. Since you do route.fetch() which internally will fetch the resource and resolve the redirects. When you pass the APIResponse to fulfill, the browser does not know anything about the redirects etc. since you pass the response to it. Hence redirected_from and url is both unchanged. Its only filled when the browser processes them.

Does this create any issues for you?

@tqobqbq
Copy link
Author

tqobqbq commented May 21, 2024

I want to get the redirects information in the route.fetch.Is there a way like response.history of python requests module to catch it?

@mxschmitt
Copy link
Member

There is not as of today. Feel free to file a separate feature request if you need it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants