Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chrome headless hangs - fix to work with new behaviour of newer Chrome versions #890

Closed
kensoh opened this issue Dec 9, 2020 · 11 comments
Closed
Labels

Comments

@kensoh
Copy link
Member

kensoh commented Dec 9, 2020

see simple flow

tagui https://raw.githubusercontent.com/kelaberetiv/TagUI/master/flows/samples/1_google.tag -headless

run output log

START - automation started - Wed Dec 09 2020 15:06:29 GMT+0800 (+08)
https://www.google.com/ - Google

type q as latest movies[enter]

tagui chrome log

[tagui] START  - listening for inputs

[tagui] INPUT  - 
[1] {"id":1,"method":"Page.setDownloadBehavior","params":{"behavior":"allow","downloadPath":"/Users/kensoh/Desktop"}}
[tagui] OUTPUT - 
[1] {"id":1,"result":{}}

[tagui] INPUT  - 
[2] {"id":2,"method":"Page.navigate","params":{"url":"https://www.google.com/"}}
[tagui] OUTPUT - 
[2] {"id":2,"result":{"frameId":"2FDE9605F9AFA624135BFFBC7AD2F0D2","loaderId":"78128FB53CE1DBA80D8DFC2D2B83D7A9"}}

[tagui] INPUT  - 
[3] {"id":3,"method":"Runtime.evaluate","params":{"expression":"document.title"}}
[tagui] OUTPUT - 
[3] {"id":3,"result":{"result":{"type":"string","value":"Google"}}}

[tagui] INPUT  - 
[4] {"id":4,"method":"Runtime.evaluate","params":{"expression":"document.querySelectorAll('q').length"}}
[tagui] OUTPUT - 
[4] {"id":4,"result":{"result":{"type":"number","value":0,"description":"0"}}}

[tagui] INPUT  - 
[5] {"id":5,"method":"Runtime.evaluate","params":{"expression":"document.evaluate('//*[@id=\"q\"]',document,null,XPathResult.ORDERED_NODE_SNAPSHOT_TYPE,null).snapshotLength"}}
[tagui] OUTPUT - 
[5] {"id":5,"result":{"result":{"type":"number","value":0,"description":"0"}}}

[tagui] INPUT  - 
[6] {"id":6,"method":"Runtime.evaluate","params":{"expression":"document.evaluate('//*[contains(@id,\"q\")]',document,null,XPathResult.ORDERED_NODE_SNAPSHOT_TYPE,null).snapshotLength"}}
@kensoh kensoh added the bug label Dec 9, 2020
@kensoh
Copy link
Member Author

kensoh commented Dec 9, 2020

Adding on, running the same flow in visible Chrome mode runs at normal speed instead of hanging waiting v long for reply.

This is likely due to some change in behaviour of headless Chrome in newer Chrome releases, as it previously works.

@kensoh
Copy link
Member Author

kensoh commented Dec 9, 2020

on debugging the websocket connection, keeps getting below errors non-stop -

WebSocket\ConnectionException: Empty read; connection dead?  Stream state: {"timed_out":true,"blocked":true,"eof":false,"stream_type":"tcp_socket\/ssl","mode":"r+","unread_bytes":0,"seekable":false}

@kensoh
Copy link
Member Author

kensoh commented Dec 11, 2020

running below code to test Chrome headless works -

https://medium.com/@lagenar/using-headless-chrome-via-the-websockets-interface-5f498fb67e0f

import json
import time
import subprocess
import requests
from websocket import create_connection

def start_browser(browser_path, debugging_port):
    options = ['--headless', ' --disable-gpu',
               '--remote-debugging-port={}'.format(debugging_port)]
    browser_proc = subprocess.Popen([browser_path] + options)
    wait_seconds = 10.0
    sleep_step = 0.25
    while wait_seconds > 0:
        try:
            url = 'http://127.0.0.1:{}/json'.format(debugging_port)
            resp = requests.get(url).json()
            ws_url = resp[0]['webSocketDebuggerUrl']
            return browser_proc, create_connection(ws_url)
        except requests.exceptions.ConnectionError:
            time.sleep(sleep_step)
            wait_seconds -= sleep_step
    raise Exception('Unable to connect to chrome')

request_id = 0

def run_command(conn, method, **kwargs):
    global request_id
    request_id += 1
    command = {'method': method,
               'id': request_id,
               'params': kwargs}
    conn.send(json.dumps(command))
    while True:
        msg = json.loads(conn.recv())
        if msg.get('id') == request_id:
            return msg

gnews_url = 'https://news.google.com/news/?ned=us&hl=en'
chrome_path = '/Applications/Google Chrome.app/Contents/MacOS/Google Chrome'
browser, conn = start_browser(chrome_path, 9222)
run_command(conn, 'Page.navigate', url=gnews_url)
time.sleep(5) # let it load
js = """
var sel = 'h3 > a';
var headings = document.querySelectorAll(sel);
headings = [].slice.call(headings).map((link)=>{return link.innerText});
JSON.stringify(headings);
"""
result = run_command(conn, 'Runtime.evaluate', expression=js)

headings = json.loads(result['result']['result']['value'])
for heading in headings:
    print(heading)
browser.terminate()
[1211/104953.088401:ERROR:xattr.cc(63)] setxattr org.chromium.crashpad.database.initialized on file /var/folders/4c/21d_62nx5tnf_t9l0fl9jych0000gn/T/: Operation not permitted (1)
[1211/104953.091128:ERROR:file_io.cc(90)] ReadExactly: expected 8, observed 0
[1211/104953.093504:ERROR:xattr.cc(63)] setxattr org.chromium.crashpad.database.initialized on file /var/folders/4c/21d_62nx5tnf_t9l0fl9jych0000gn/T/: Operation not permitted (1)
[1211/104953.205417:ERROR:socket_posix.cc(148)] bind() failed: Address already in use (48)

DevTools listening on ws://[::1]:9222/devtools/browser/f0b4d7d2-2331-4b2a-8e2f-4d60e8e12e0d
With time running out, Trump and GOP allies turn up pressure on Supreme Court in election assault
Second stimulus check updates: McConnell says no GOP support for emerging COVID-19 relief deal
Biden's pick of Denis McDonough for VA sparks pushback from veterans
Hopes dwindle for Northern Lights over parts of the US tonight
Body cam footage shows raid on former Florida Covid data scientist's home
Republican NH House Speaker Dies Of COVID-19
'I literally lost it': Kim Kardashian reacts to Brandon Bernard's scheduled execution, details last phone call
Majority of House GOP support lawsuit aimed at overturning election - Business Insider
Hoped for northern lights in New England a 'big miss,' U.S. space forecaster says
Hillary Clinton says Republicans who 'humor' Trump's election fraud claims 'have no spines'
Inhofe slams Trump administration on Western Sahara policy
Trump administration reportedly sanctioning Turkey over S-400 - Business Insider
Chinese citizen journalist detained for reporting on Wuhan coronavirus outbreak "may not survive"
Spain Evicts Francisco Franco's Heirs From Late Dictator's Summer Palace

@kensoh
Copy link
Member Author

kensoh commented Dec 11, 2020

following simple TagUI script base on above, works in normal mode,

https://news.google.com/news/?ned=us&hl=en
wait
dom return document.evaluate('//h3/a',document,null,XPathResult.ORDERED_NODE_SNAPSHOT_TYPE,null).snapshotItem(0).innerText
echo `dom_result`

but throws the same error using headless mode -

WebSocket\ConnectionException: Empty read; connection dead?  Stream state: {"timed_out":false,"blocked":true,"eof":true,"stream_type":"tcp_socket\/ssl","mode":"r+","unread_bytes":0,"seekable":false} in /Users/kensoh/Cloud Drive/Marketing/Website/api/tagui/src/ws/Base.php:269
Stack trace:
#0 /Users/kensoh/Cloud Drive/Marketing/Website/api/tagui/src/ws/Base.php(143): WebSocket\Base->read(2)
#1 /Users/kensoh/Cloud Drive/Marketing/Website/api/tagui/src/ws/Base.php(135): WebSocket\Base->receive_fragment()
#2 /Users/kensoh/Cloud Drive/Marketing/Website/api/tagui/src/tagui_chrome.php(48): WebSocket\Base->receive()
#3 {main}

@kensoh
Copy link
Member Author

kensoh commented Dec 11, 2020

consider the following simple script

https://news.google.com/news/?ned=us&hl=en
chrome_step('Runtime.evaluate',{expression: 'document.title'});
chrome_step('Runtime.evaluate',{expression: 'document.title'});
chrome_step('Runtime.evaluate',{expression: 'document.title'});
chrome_step('Runtime.evaluate',{expression: 'document.title'});
chrome_step('Runtime.evaluate',{expression: 'document.title'});
chrome_step('Runtime.evaluate',{expression: 'document.title'});
chrome_step('Runtime.evaluate',{expression: 'document.title'});
chrome_step('Runtime.evaluate',{expression: 'document.title'});
chrome_step('Runtime.evaluate',{expression: 'document.title'});

the same simple calls can lead to persistent timeouts after a couple of times

[tagui] START  - listening for inputs

[tagui] INPUT  - 
[1] {"id":1,"method":"Page.setDownloadBehavior","params":{"behavior":"allow","downloadPath":"/Users/kensoh/Desktop"}}
TEST - {"id":1,"result":{}}

[tagui] OUTPUT - 
[1] {"id":1,"result":{}}

[tagui] INPUT  - 
[2] {"id":2,"method":"Page.navigate","params":{"url":"https://news.google.com/news/?ned=us&hl=en"}}
TEST - {"id":2,"result":{"frameId":"AAD7ABF6A619E2C079F45FAEC16EE19C","loaderId":"10C8D76494D16867EE469A872C4FE419"}}

[tagui] OUTPUT - 
[2] {"id":2,"result":{"frameId":"AAD7ABF6A619E2C079F45FAEC16EE19C","loaderId":"10C8D76494D16867EE469A872C4FE419"}}

[tagui] INPUT  - 
[3] {"id":3,"method":"Runtime.evaluate","params":{"expression":"document.title"}}
TEST - {"id":3,"result":{"result":{"type":"string","value":"Google News"}}}

[tagui] OUTPUT - 
[3] {"id":3,"result":{"result":{"type":"string","value":"Google News"}}}

[tagui] INPUT  - 
[4] {"id":4,"method":"Runtime.evaluate","params":{"expression":"document.title"}}
TEST - {"id":4,"result":{"result":{"type":"string","value":"Google News"}}}

[tagui] OUTPUT - 
[4] {"id":4,"result":{"result":{"type":"string","value":"Google News"}}}

[tagui] INPUT  - 
[5] {"id":5,"method":"Runtime.evaluate","params":{"expression":"document.title"}}
TEST - WebSocket\ConnectionException: Empty read; connection dead?  Stream state: {"timed_out":true,"blocked":true,"eof":false,"stream_type":"tcp_socket\/ssl","mode":"r+","unread_bytes":0,"seekable":false} in /Users/kensoh/Cloud Drive/Marketing/Website/api/tagui/src/ws/Base.php:269
Stack trace:
#0 /Users/kensoh/Cloud Drive/Marketing/Website/api/tagui/src/ws/Base.php(143): WebSocket\Base->read(2)
#1 /Users/kensoh/Cloud Drive/Marketing/Website/api/tagui/src/ws/Base.php(135): WebSocket\Base->receive_fragment()
#2 /Users/kensoh/Cloud Drive/Marketing/Website/api/tagui/src/tagui_chrome.php(48): WebSocket\Base->receive()
#3 {main}

TEST - {"id":5,"result":{"result":{"type":"string","value":"Google News"}}}

[tagui] OUTPUT - 
[5] {"id":5,"result":{"result":{"type":"string","value":"Google News"}}}

[tagui] INPUT  - 
[6] {"id":6,"method":"Runtime.evaluate","params":{"expression":"document.title"}}
TEST - WebSocket\ConnectionException: Empty read; connection dead?  Stream state: {"timed_out":true,"blocked":true,"eof":false,"stream_type":"tcp_socket\/ssl","mode":"r+","unread_bytes":0,"seekable":false} in /Users/kensoh/Cloud Drive/Marketing/Website/api/tagui/src/ws/Base.php:269
Stack trace:
#0 /Users/kensoh/Cloud Drive/Marketing/Website/api/tagui/src/ws/Base.php(143): WebSocket\Base->read(2)
#1 /Users/kensoh/Cloud Drive/Marketing/Website/api/tagui/src/ws/Base.php(135): WebSocket\Base->receive_fragment()
#2 /Users/kensoh/Cloud Drive/Marketing/Website/api/tagui/src/tagui_chrome.php(48): WebSocket\Base->receive()
#3 {main}

@kensoh
Copy link
Member Author

kensoh commented Dec 11, 2020

doing a wait to throttle the requests will see the same timeout messages, but still seeing response after some time -

https://news.google.com/news/?ned=us&hl=en
wait 10 seconds
chrome_step('Runtime.evaluate',{expression: 'document.title'});
wait 10 seconds
chrome_step('Runtime.evaluate',{expression: 'document.title'});
wait 10 seconds
chrome_step('Runtime.evaluate',{expression: 'document.title'});
wait seconds
chrome_step('Runtime.evaluate',{expression: 'document.title'});

@kensoh
Copy link
Member Author

kensoh commented Dec 11, 2020

Some clues found. Error happens in headless mode when the user profile directory is a relative path -

--user-data-dir=chrome/tagui_user_profile

However, above in tagui/src/tagui works in normal visible mode.

And changing the relative path to absolute path makes it work for headless mode.

Ie some difference in behaviour for newer versions of Chrome in headless mode.

More references on using full path name -

@kensoh
Copy link
Member Author

kensoh commented Dec 12, 2020

Adding on below what I shared with Chrome Remote Interface (another project using DevTools Protocol) maintainer -

It seems like a situation unique with my implementation for TagUI. What happens is in headless mode, when I provide --user-data-dir= with a relative path it no longer works, when it used to work in the past 2 years. When I tweak the relative path provided into a full path, it works in headless mode. For visible mode, it works whether relative or absolute path is provided.

Something probably has changed with how headless Chrome behaves when the path provided is a relative path. I'll close this issue because I don't think it happens outside of the TagUI implementation. I tried replicating the issue using Python websocket but it can't be replicated. So the fix has to be an updated implementation for TagUI headless Chrome.

@kensoh
Copy link
Member Author

kensoh commented Dec 12, 2020

above commit fixes headless Chrome to work on macOS and Linux. to check status for Windows and see if fix required

kensoh added a commit that referenced this issue Dec 13, 2020
@kensoh kensoh changed the title Some issue with Chrome headless - seems to take super long to reply back to TagUI Chrome headless hangs - fixed to work with behaviour of newer Chrome versions Dec 13, 2020
@kensoh kensoh changed the title Chrome headless hangs - fixed to work with behaviour of newer Chrome versions Chrome headless hangs - fix to work with new behaviour of newer Chrome versions Dec 13, 2020
@kensoh
Copy link
Member Author

kensoh commented Dec 13, 2020

Above commits fix headless Chrome to work on Windows. So headless now working for all OSes.

Users can download the latest copy of TagUI from here and unzip to overwrite your existing installation (please drag the folders under tagui\src to overwrite your existing installation) - https://github.com/kelaberetiv/TagUI/archive/master.zip

In the next release, this fix will be part of the packaged zip files.

@kensoh
Copy link
Member Author

kensoh commented Dec 19, 2020

Closing issue since the latest packaged release TagUI v6.14 is out.

Release notes - https://github.com/kelaberetiv/TagUI/releases/tag/v6.14.0
To download v6.14 - https://tagui.readthedocs.io/en/latest/setup.html
Documentation - https://tagui.readthedocs.io/en/latest/index.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant