[Bug]: redirect in route.fetch may jump to a wrong url #30903

tqobqbq · 2024-05-18T14:10:00Z

Version

1.43.0

Steps to reproduce

Example steps (replace with your own):

Clone my repo at https://github.com/tqobqbq/playwright_bug_test.git
python test.py
You should see the error come up

Expected behavior

I expect 'https://www.suruga-ya.jp/search?search_word=%E3%83%9E' will jump to ' https://www.suruga-ya.jp/search?category=&search_word=%E3%83%9E'
And if I set a route function:
def route_continue(route): return route.continue_()
or without setting any route function,I can get the proper behavior

Actual behavior

however,if I set a route function:
def route_fetch(route): return route.fulfill(response=route.fetch(max_redirects=0))
It will jump to a wrong url 'https://www.suruga-ya.jp/search?category=&search_word=%C3%A3%C2%83%C2%9E'

Additional context

I think the error may stem from the url parse of response.headers['location'] in route.fetch

Environment

- Operating System: [Windows 11]
- CPU: [arm64]
- Browser: [All, Chromium, Firefox, WebKit]
- Python Version: [3.9]
- Other info:

The text was updated successfully, but these errors were encountered:

mxschmitt · 2024-05-20T08:39:27Z

The issue seems related to nodejs/node#17390 (comment) where the Location header seems to contain UTF-8 values while the spec / we only support US ASCII.

curl -o /dev/null -iv --raw "https://www.suruga-ya.jp/search?search_word=マ"

Since all browsers seem to support UTF-8, we should do that as well. Will evaluate with the team if we should post-process the headers and try to parse them as UTF-8 as well.

tqobqbq · 2024-05-21T14:43:53Z

@mxschmitt another bug maybe not relevant but still about route.fetch in my test:

def test_route2():
    with sync_playwright() as p:
        browser = p.chromium.launch(
            # headless=False
        )


        def route_fetch(route):
            request=route.request
            response=route.fetch()
            print('route.request.url in route_fetch:',request.url)
            print('response.url in route_fetch:',response.url,'status:',response.status,'\n')
            return route.fulfill(response=response)

        def print_response(response):
            if response.url.startswith('https://www.suruga-ya.jp/search'):
                request=response.request
                print('response.request.url in print_response:',request.url)
                print('response.request.redirected_from in print_response:',request.redirected_from)
                print('response.url in print_response:',response.url,'status:',response.status)
                
        url='https://www.suruga-ya.jp/search?search_word=%E3%83%9E'

        page=browser.new_page()
        page.on('response',print_response)

        page.route('https://www.suruga-ya.jp/search*',route_fetch)
        page.goto(url,wait_until='commit')

if __name__ == '__main__':
    test_route2()

the output is:

the request of 'search?search_word=%E3%83%9E' is redirected to 'search?category=&search_word=%C3%A3%C2%83%C2%9E' in the route.fetch()
but in 'page.on('response',print_response)',both response.url and response.request.url are still 'search?search_word=%E3%83%9E',and response.request.redirected_from is None

mxschmitt · 2024-05-21T14:55:44Z

This is expected as of today. Since you do route.fetch() which internally will fetch the resource and resolve the redirects. When you pass the APIResponse to fulfill, the browser does not know anything about the redirects etc. since you pass the response to it. Hence redirected_from and url is both unchanged. Its only filled when the browser processes them.

Does this create any issues for you?

tqobqbq · 2024-05-21T15:22:43Z

I want to get the redirects information in the route.fetch.Is there a way like response.history of python requests module to catch it?

mxschmitt · 2024-05-21T15:30:55Z

There is not as of today. Feel free to file a separate feature request if you need it.

mxschmitt transferred this issue from microsoft/playwright-python May 20, 2024

mxschmitt mentioned this issue May 20, 2024

fix(fetch): allow UTF-8 in Location header #30904

Merged

mxschmitt added the v1.45 label May 20, 2024

mxschmitt self-assigned this May 20, 2024

mxschmitt closed this as completed in #30904 May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: redirect in route.fetch may jump to a wrong url #30903

[Bug]: redirect in route.fetch may jump to a wrong url #30903

tqobqbq commented May 18, 2024 •

edited

Loading

mxschmitt commented May 20, 2024

tqobqbq commented May 21, 2024 •

edited

Loading

mxschmitt commented May 21, 2024

tqobqbq commented May 21, 2024

mxschmitt commented May 21, 2024

[Bug]: redirect in route.fetch may jump to a wrong url #30903

[Bug]: redirect in route.fetch may jump to a wrong url #30903

Comments

tqobqbq commented May 18, 2024 • edited Loading

Version

Steps to reproduce

Expected behavior

Actual behavior

Additional context

Environment

mxschmitt commented May 20, 2024

tqobqbq commented May 21, 2024 • edited Loading

mxschmitt commented May 21, 2024

tqobqbq commented May 21, 2024

mxschmitt commented May 21, 2024

tqobqbq commented May 18, 2024 •

edited

Loading

tqobqbq commented May 21, 2024 •

edited

Loading