Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[🐛 Bug]: se:forwardCdp is returning the internal docker IP which causes Dynamic distributed selenium grid to not work correctly #11910

Closed
PinhoL opened this issue Apr 17, 2023 · 16 comments

Comments

@PinhoL
Copy link

PinhoL commented Apr 17, 2023

What happened?

I have a selenium grid mounted over different VMs where I can run tests successfully. I use WebdriverIO to run it, and everything works fine.
These are the se capabilities returned:

[0-0]   'se:bidiEnabled': false,
[0-0]   'se:cdp': 'ws://VIP-MACHINE-1:4444/session/09a56f82da1526f0e8fc61d1e17d020c/se/cdp',
[0-0]   'se:cdpVersion': '111.0.5563.110',
[0-0]   'se:vnc': 'ws://VIP-MACHINE-1:4444/session/09a56f82da1526f0e8fc61d1e17d020c/se/vnc',
[0-0]   'se:vncEnabled': true,
[0-0]   'se:vncLocalAddress': 'ws://172.17.0.2:7900',

(I start my chrome container with the --grid-url http://VIP-MACHINE-1:4444 flag)

I'm trying to implement now a Dynamic Selenium Grid 4 over different VMs, and I was able to create it successfully, however the tests can't run because they're trying to connect to a faulty IP.
With this Dynamic approach, I get the following se capabilities returned:

[0-0]   'se:bidiEnabled': false,
[0-0]   'se:cdp': 'ws://VIP-MACHINE-1:4444/session/be2c51a8a4be7f22cc82d13525aac4d8/se/cdp',
[0-0]   'se:cdpVersion': '111.0.5563.146',
[0-0]   'se:forwardCdp': 'ws://172.17.0.6:6211/session/be2c51a8a4be7f22cc82d13525aac4d8/se/fwd',
[0-0]   'se:vnc': 'ws://VIP-MACHINE-1:4444/session/be2c51a8a4be7f22cc82d13525aac4d8/se/vnc',
[0-0]   'se:vncEnabled': true,
[0-0]   'se:vncLocalAddress': 'ws://172.17.0.6:7900',

From the code used on WDIO https://github.com/webdriverio/webdriverio/blob/main/packages/webdriverio/src/commands/browser/getPuppeteer.ts#L60 I see that they're looking at the cdp and trying to connect to it however, I think that the se:forwardCdp endpoint is somewhat overriding it (which makes sense because the session will not be present on the node-docker container, but on the new container created by it). I've this log on my container:

14:19:33.088 INFO [ProxyNodeWebsockets.apply] - Found endpoint where CDP connection needs to be forwarded
14:19:33.090 INFO [ProxyNodeWebsockets.createWsEndPoint] - Establishing connection to ws://172.17.0.6:6211/session/be2c51a8a4be7f22cc82d13525aac4d8/se/fwd

In sum, this is problematic because we're not able to connect to 172.17.0.6 because that's the internal container IP. I found this comment by @diemol #9202 (comment) where maybe you were fixing the same problem? Maybe we need a flag for the forwardCdp param? (ideally not because I think that the address will be the same as the se:cdp ?

How can we reproduce the issue?

Started the Hub on Box 1:

docker run -p 4442-4444:4442-4444 selenium/hub:4.8.3-20230404

Started the Node on Box 2:

docker run --net=host -p 5555:5555 \
    -e SE_EVENT_BUS_HOST=<IP-VM1> \
    -e SE_NODE_GRID_URL=http://<IP-VM1>:4444 \
    -e SE_NODE_HOST=<IP-VM2> \
    -e SE_NODE_PORT=5555 \
    -e START_XVFB=false \
    -e SE_EVENT_BUS_PUBLISH_PORT=4442 \
    -e SE_EVENT_BUS_SUBSCRIBE_PORT=4443 \
    -e SE_NODE_OVERRIDE_MAX_SESSIONS=true \
    -v ${PWD}/config.toml:/opt/bin/config.toml \
    -v ${PWD}/assets:/opt/selenium/assets \
    -v /var/run/docker.sock:/var/run/docker.sock \
    selenium/node-docker:4.8.3-20230404

I ran a simple test in WDIO:

describe("webdriver.io page", () => {
    it("should have the right title", async () => {
        const pageMock = await browser.mock("https://google.com/");
        pageMock.respond("https://webdriver.io");
        await browser.url("https://google.com");
        console.log(await browser.getTitle()); // returns "WebdriverIO · Next-gen browser and mobile automation test framework for Node.js"
    });
});

Basically something that reaches this point: https://github.com/webdriverio/webdriverio/blob/main/packages/webdriverio/src/commands/browser/getPuppeteer.ts#L60 that makes puppeteer connect to the grid websocket.



### Relevant log output

```shell
NA

Operating System

CentOS 7

Selenium version

NA

What are the browser(s) and version(s) where you see this issue?

Node-Docker

What are the browser driver(s) and version(s) where you see this issue?

NA

Are you using Selenium Grid?

4.8.3-20230404

@github-actions
Copy link

@PinhoL, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

@PinhoL PinhoL changed the title [🐛 Bug]: se:forwardCdp is not overridden by the --grid-url flag [🐛 Bug]: se:forwardCdp is returning the internal docker IP Apr 17, 2023
@PinhoL PinhoL changed the title [🐛 Bug]: se:forwardCdp is returning the internal docker IP [🐛 Bug]: se:forwardCdp is returning the internal docker IP which causes distributed selenium grid to not work correctly Apr 17, 2023
@PinhoL PinhoL changed the title [🐛 Bug]: se:forwardCdp is returning the internal docker IP which causes distributed selenium grid to not work correctly [🐛 Bug]: se:forwardCdp is returning the internal docker IP which causes Dynamic distributed selenium grid to not work correctly Apr 17, 2023
@diemol
Copy link
Member

diemol commented Apr 17, 2023

se:forwardCdp is used internally by the Grid. It has to be that internal IP because is what the node-docker can use to forward the connection to the child container. You cannot connect to it directly, it is an internal Docker IP.

Not sure what the issue is because the client should connect to the "public" given url and the node-docker takes care of redirecting and forwarding. So far, it has been working well with Selenium bindings for quite a while.

@PinhoL
Copy link
Author

PinhoL commented Apr 17, 2023

Hello @diemol thanks a lot for the answer.

From my understanding, Puppeteer needs an endpoint to connect to. Previously, when this #9202 was implemented, we had a connection of hub-node, and passing the --grid-url solved the issue. Now the grid is forwarding it to the new node with the internal IP of the container, right?

Because on my side, what is happening is that Puppeteer is trying to connect to the URL passed by the --grid-url flag, but it fails because it eventually has to connect to the new node created with the se:frowardCdp endpoint which is unreachable.

Something like:

this.puppeteer = await puppeteer.connect({browserWSEndpoint: "VIP-VM1"})

gives me on the node-docker logs:

[ProxyNodeWebsockets.apply] - Found endpoint where CDP connection needs to be forwarded
[ProxyNodeWebsockets.createWsEndPoint] - Establishing connection to ws://172.17.0.6:6211/session/be2c51a8a4be7f22cc82d13525aac4d8/se/fwd

Which makes me believe that the connection to that websocket cannot be done because the puppetteer instance is not running on that docker container. A little bit similar to what I believe was happening here #9202 (comment) where the connection could not be made because the IP returned was the internal IP. If the grid automatically redirects the WS when the se:forwardCdp is present, and I can't override it, it means that puppeteer can never connect to that spawned container I think

@diemol
Copy link
Member

diemol commented Apr 17, 2023

That is strange because when the Selenium bindings use CDP features over websockets things work normally, the client connects to the Grid URL and the internal forwarding does not affect things at all.

I cannot say what Puppeteer is doing, so not sure what needs to be done. If you provide a way to reproduce it with Selenium, I'd be happy to look at it. Else, I'd be happy to review a PR that helps Puppeteer work properly with Grid.

@PinhoL
Copy link
Author

PinhoL commented Apr 18, 2023

I've created a small example https://github.com/PinhoL/dynamic-grid to run a test against the dynamic grid to demonstrate this error. Just npm i and change the https://github.com/PinhoL/dynamic-grid/blob/main/wdio.conf.js#L31 to the hub ip/vip then run npm run test.

I don't have my grid setup public, so unfortunately I can't provide this endpoint, however these are the commands I used to spin up my grid:

Hub:

docker run -p 4442-4444:4442-4444 selenium/hub:4.8.3-20230404

Node:

docker run --net=host -p 5555:5555 \
    -e SE_EVENT_BUS_HOST=VIP-HUB \
    -e SE_NODE_GRID_URL=http://VIP-HUB:4444 \
    -e SE_NODE_HOST=VM2-IP \
    -e SE_NODE_PORT=5555 \
    -e START_XVFB=false \
    -e SE_EVENT_BUS_PUBLISH_PORT=4442 \
    -e SE_EVENT_BUS_SUBSCRIBE_PORT=4443 \
    -e SE_NODE_OVERRIDE_MAX_SESSIONS=true \
    -v ${PWD}/config.toml:/opt/bin/config.toml \
    -v ${PWD}/assets:/opt/selenium/assets \
    -v /var/run/docker.sock:/var/run/docker.sock \
    selenium/node-docker:4.8.3-20230404

And my config.toml:

[docker]
 configs = [
     "selenium/standalone-chrome:4.8.3-20230404", '{"browserName": "chrome"}'
         ]
         url = "http://127.0.0.1:2375"
         video-image = "selenium/video:ffmpeg-4.3.1-20230404"
         assets-path= "/opt/selenium/assets"

         [server]
         host = "IP"
         port = "4444"

Some notes:
This is where puppeteer comes in action: https://github.com/webdriverio/webdriverio/blob/main/packages/webdriverio/src/commands/browser/getPuppeteer.ts#L62
They're connecting to the se:cdp returned endpoint directly. This is what the config holds when it reaches that point:

[0-0] CAPS  {
[0-0]   acceptInsecureCerts: true,
[0-0]   browserName: 'chrome',
[0-0]   browserVersion: '111.0.5563.146',
[0-0]   chrome: {
[0-0]     chromedriverVersion: '111.0.5563.64 (c710e93d5b63b7095afe8c2c17df34408078439d-refs/branch-heads/5563@{#995})',
[0-0]     userDataDir: '/tmp/.com.google.Chrome.xN5vcw'
[0-0]   },
[0-0]   'goog:chromeOptions': { debuggerAddress: 'localhost:46813' },
[0-0]   networkConnectionEnabled: false,
[0-0]   pageLoadStrategy: 'normal',
[0-0]   platformName: 'LINUX',
[0-0]   proxy: {},
[0-0]   'se:bidiEnabled': false,
[0-0]   'se:cdp': 'ws://VIP-HUB:4444/session/835ea671d818a2dcee991ce6c5351e5b/se/cdp',
[0-0]   'se:cdpVersion': '111.0.5563.146',
[0-0]   'se:forwardCdp': 'ws://172.17.0.2:26637/session/835ea671d818a2dcee991ce6c5351e5b/se/fwd',
[0-0]   'se:vnc': 'ws://VIP-HUB:4444/session/835ea671d818a2dcee991ce6c5351e5b/se/vnc',
[0-0]   'se:vncEnabled': true,
[0-0]   'se:vncLocalAddress': 'ws://172.17.0.2:7900',
[0-0]   setWindowRect: true,
[0-0]   strictFileInteractability: false,
[0-0]   timeouts: { implicit: 0, pageLoad: 300000, script: 30000 },
[0-0]   unhandledPromptBehavior: 'dismiss and notify',
[0-0]   'wdio:devtoolsOptions': { headless: true },
[0-0]   'webauthn:extension:credBlob': true,
[0-0]   'webauthn:extension:largeBlob': true,
[0-0]   'webauthn:extension:minPinLength': true,
[0-0]   'webauthn:extension:prf': true,
[0-0]   'webauthn:virtualAuthenticators': true
[0-0] }

And in this instruction, which is native to the puppeteer-core project, I get this log on the node-docker:

09:19:18.386 INFO [LocalNode.newSession] - Session created by the Node. Id: 835ea671d818a2dcee991ce6c5351e5b, Caps: Capabilities {acceptInsecureCerts: true, browserName: chrome, browserVersion: 111.0.5563.146, chrome: {chromedriverVersion: 111.0.5563.64 (c710e93d5b63..., userDataDir: /tmp/.com.google.Chrome.xN5vcw}, goog:chromeOptions: {debuggerAddress: localhost:46813}, networkConnectionEnabled: false, pageLoadStrategy: normal, platformName: LINUX, proxy: Proxy(), se:bidiEnabled: false, se:cdp: ws://VIP-HUB..., se:cdpVersion: 111.0.5563.146, se:forwardCdp: ws://172.17.0.2:26637/sessi..., se:vnc: ws://VIP-HUB..., se:vncEnabled: true, se:vncLocalAddress: ws://172.17.0.2:7900, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify, wdio:devtoolsOptions: {headless: true}, webauthn:extension:credBlob: true, webauthn:extension:largeBlob: true, webauthn:extension:minPinLength: true, webauthn:extension:prf: true, webauthn:virtualAuthenticators: true}
09:19:18.526 INFO [ProxyNodeWebsockets.apply] - Found endpoint where CDP connection needs to be forwarded
09:19:18.527 INFO [ProxyNodeWebsockets.createWsEndPoint] - Establishing connection to ws://172.17.0.2:26637/session/835ea671d818a2dcee991ce6c5351e5b/se/fwd
09:19:18.532 WARN [DefaultChannelPipeline.onUnhandledInboundException] - An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
java.lang.NullPointerException
	at org.openqa.selenium.netty.server.RequestConverter.channelRead0(RequestConverter.java:131)
	at org.openqa.selenium.netty.server.RequestConverter.channelRead0(RequestConverter.java:55)
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:102)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:102)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:93)
	at org.openqa.selenium.netty.server.WebSocketUpgradeHandler.channelRead(WebSocketUpgradeHandler.java:100)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:93)
	at io.netty.handler.codec.http.websocketx.extensions.WebSocketServerExtensionHandler.channelRead(WebSocketServerExtensionHandler.java:109)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:93)
	at io.netty.handler.codec.http.HttpServerKeepAliveHandler.channelRead(HttpServerKeepAliveHandler.java:64)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318)
	at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:829)

@PinhoL
Copy link
Author

PinhoL commented Apr 18, 2023

This is an example of a log from the container that is spawned by the node-docker

2023-04-18 10:20:50,477 INFO Included extra file "/etc/supervisor/conf.d/selenium.conf" during parsing
2023-04-18 10:20:50,481 INFO RPC interface 'supervisor' initialized
2023-04-18 10:20:50,481 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2023-04-18 10:20:50,482 INFO supervisord started with pid 9
2023-04-18 10:20:51,485 INFO spawned: 'xvfb' with pid 11
2023-04-18 10:20:51,488 INFO spawned: 'vnc' with pid 12
2023-04-18 10:20:51,491 INFO spawned: 'novnc' with pid 13
2023-04-18 10:20:51,494 INFO spawned: 'selenium-standalone' with pid 14
Appending Selenium options: --log-level CONFIG
2023-04-18 10:20:51,504 INFO success: xvfb entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2023-04-18 10:20:51,505 INFO success: vnc entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2023-04-18 10:20:51,505 INFO success: novnc entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2023-04-18 10:20:51,505 INFO success: selenium-standalone entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
Selenium Grid Standalone configuration:
[network]
relax-checks = true

[node]
grid-url = "http://VIP-HUB:4444"
drain-after-session-count = 0

[[node.driver-configuration]]
display-name = "chrome"
stereotype = '{"browserName": "chrome", "browserVersion": "111.0", "platformName": "Linux"}'

Starting Selenium Grid Standalone...
Tracing is disabled
10:20:52.127 INFO [LoggingOptions.configureLogEncoding] - Using the system default encoding
10:20:52.134 INFO [OpenTelemetryTracer.createTracer] - Using OpenTelemetry for tracing
10:20:52.986 INFO [NodeOptions.getSessionFactories] - Detected 8 available processors
10:20:52.987 WARN [NodeOptions.getSessionFactories] - Overriding max recommended number of 8 concurrent sessions. Session stability and reliability might suffer!
10:20:52.988 WARN [NodeOptions.getSessionFactories] - One browser session is recommended per available processor. Safari is always limited to 1 session per host.
10:20:52.988 WARN [NodeOptions.getSessionFactories] - Overriding this value for Internet Explorer is not recommended. Issues related to parallel testing with Internet Explored won't be accepted.
10:20:52.988 WARN [NodeOptions.getSessionFactories] - Double check if enabling 'override-max-sessions' is really needed
10:20:53.048 INFO [NodeOptions.report] - Adding chrome for {"browserVersion": "111.0","se:noVncPort": 7900,"browserName": "chrome","platformName": "LINUX","se:vncEnabled": true} 2 times (Host)
10:20:53.069 INFO [Node.<init>] - Binding additional locator mechanisms: relative
10:20:53.094 INFO [GridModel.setAvailability] - Switching Node 6add8b8b-eadf-49dd-aa9c-317b8877d936 (uri: http://172.17.0.2:4444) from DOWN to UP
10:20:53.094 INFO [LocalDistributor.add] - Added node 6add8b8b-eadf-49dd-aa9c-317b8877d936 at http://172.17.0.2:4444. Health check every 120s
10:20:53.296 INFO [Standalone.execute] - Started Selenium Standalone 4.8.3 (revision b19b418e60): http://172.17.0.2:4444
10:20:53.710 INFO [LocalDistributor.newSession] - Session request received by the Distributor:
 [Capabilities {acceptInsecureCerts: true, browserName: chrome, wdio:devtoolsOptions: {headless: true}}]
Starting ChromeDriver 111.0.5563.64 (c710e93d5b63b7095afe8c2c17df34408078439d-refs/branch-heads/5563@{#995}) on port 30429
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
[16Chr8o1m8e1D3r2i5v3e.r7 5w5a]s[ SsEtVaErRtEe]d:  sCurcecaetsesPflualtlfyo.r
mSocket() failed: Address family not supported by protocol (97)
[1681813254.456][SEVERE]: CreatePlatformSocket() failed: Address family not supported by protocol (97)
10:20:54.734 INFO [LocalNode.newSession] - Session created by the Node. Id: 2607831b996107509e414150b35c3337, Caps: Capabilities {acceptInsecureCerts: true, browserName: chrome, browserVersion: 111.0.5563.146, chrome: {chromedriverVersion: 111.0.5563.64 (c710e93d5b63..., userDataDir: /tmp/.com.google.Chrome.qeta2x}, goog:chromeOptions: {debuggerAddress: localhost:35846}, networkConnectionEnabled: false, pageLoadStrategy: normal, platformName: LINUX, proxy: Proxy(), se:cdp: http://localhost:35846, se:cdpVersion: 111.0.5563.146, se:vncEnabled: true, se:vncLocalAddress: ws://172.17.0.2:7900, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify, webauthn:extension:credBlob: true, webauthn:extension:largeBlob: true, webauthn:extension:minPinLength: true, webauthn:extension:prf: true, webauthn:virtualAuthenticators: true}
10:20:54.743 INFO [LocalDistributor.newSession] - Session created by the Distributor. Id: 2607831b996107509e414150b35c3337
 Caps: Capabilities {acceptInsecureCerts: true, browserName: chrome, browserVersion: 111.0.5563.146, chrome: {chromedriverVersion: 111.0.5563.64 (c710e93d5b63..., userDataDir: /tmp/.com.google.Chrome.qeta2x}, goog:chromeOptions: {debuggerAddress: localhost:35846}, networkConnectionEnabled: false, pageLoadStrategy: normal, platformName: LINUX, proxy: Proxy(), se:bidiEnabled: false, se:cdp: ws://VIP-HUB..., se:cdpVersion: 111.0.5563.146, se:vnc: ws://VIP-HUB..., se:vncEnabled: true, se:vncLocalAddress: ws://172.17.0.2:7900, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify, wdio:devtoolsOptions: {headless: true}, webauthn:extension:credBlob: true, webauthn:extension:largeBlob: true, webauthn:extension:minPinLength: true, webauthn:extension:prf: true, webauthn:virtualAuthenticators: true}
10:20:55.406 INFO [LocalSessionMap.lambda$new$0] - Deleted session from local Session Map, Id: 2607831b996107509e414150b35c3337
10:20:55.408 INFO [GridModel.release] - Releasing slot for session id 2607831b996107509e414150b35c3337
10:20:55.410 INFO [SessionSlot.stop] - Stopping session 2607831b996107509e414150b35c3337
Trapped SIGTERM/SIGINT/x so shutting down supervisord...
2023-04-18 10:20:56,413 WARN received SIGTERM indicating exit request
2023-04-18 10:20:56,413 INFO waiting for xvfb, vnc, novnc, selenium-standalone to die
2023-04-18 10:20:56,748 INFO stopped: selenium-standalone (terminated by SIGTERM)
2023-04-18 10:20:57,750 INFO stopped: novnc (terminated by SIGTERM)

@PinhoL
Copy link
Author

PinhoL commented Apr 18, 2023

Another test I did:

Having this session:
image

And querying that websocket:
Screenshot 2023-04-18 at 16 52 55

I've the socket error.

However, If I do the same on my "old" setup (hub->chrome) I can successfully connect to it:
Screenshot 2023-04-18 at 16 57 44

Could it be the parameter --net=host that I use to start my node-docker that is impacting all this? Because the internal IP of the docker is messed up?
While running a test, I also tested this scenario:

[root@VM2 pinhol]# docker ps
CONTAINER ID   IMAGE                                 COMMAND                  CREATED         STATUS         PORTS                               NAMES
fa7f56873908   1ac7ccac8ba0                          "/opt/bin/entry_poin…"   4 seconds ago   Up 3 seconds   5900/tcp, 0.0.0.0:23141->4444/tcp   dreamy_wozniak
86e82dbc655c   selenium/node-docker:4.8.3-20230404   "/opt/bin/entry_poin…"   9 minutes ago   Up 9 minutes                                       happy_thompson
[root@VM2 pinhol]# curl -vvv localhost:23141
* About to connect() to localhost port 23141 (#0)
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 23141 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: localhost:23141
> Accept: */*
>
< HTTP/1.1 302 Found
< content-length: 0
< Location: /ui
<
* Connection #0 to host localhost left intact

So that port (the exposed by the child) is accessible through localhost, but the node is using this one to connect:

[root@VM2 pinhol]# curl -vvv 172.17.0.2:23141
* About to connect() to 172.17.0.2 port 23141 (#0)
*   Trying 172.17.0.2...
* Connection refused
* Failed connect to 172.17.0.2:23141; Connection refused
* Closing connection 0
curl: (7) Failed connect to 172.17.0.2:23141; Connection refused

And on this one I can't access it. Maybe because I run the node-docker with --net=host and the child container is not attached to the host network, this is where it gets messed up?

EDIT: I confirmed that the node-docker container is connected to the host docker network, while the child is connected to the bridge network. Now, if you confirm that this is most likely the problem here, another question arises, should the child container have the same network as the parent? (I think that it should because we can have all sort of network configurations and we always want the spawned container to be on the same network as it's father (because it's father will be connected to the hub))

@diemol diemol added this to the 4.10 milestone Apr 19, 2023
@diemol
Copy link
Member

diemol commented Apr 25, 2023

I was able to reproduce the issue using Java Selenium and your test as well.

For some reason that I do not completely understand, when the WebSocket was being closed, Chrome was responding with a -1 as statusCode. Seems incorrect on the browser side but 🤷

I've added code surrounding that situation and it is working now. This should be released in ~1 week.

@diemol diemol closed this as completed in 1ea3134 Apr 25, 2023
@PinhoL
Copy link
Author

PinhoL commented Apr 26, 2023

That's great @diemol thanks a lot! 🙇

@PinhoL
Copy link
Author

PinhoL commented May 15, 2023

Hey @diemol , hope you're well.. I've tested today with the newest version (4.9.1 hub/node) but the problem persists. When you did your tests, did you test with the --net=host parameter?

@diemol
Copy link
Member

diemol commented May 15, 2023

The failure was present even without the --net=host. Does it work when you do not specify --net=host?

@PinhoL
Copy link
Author

PinhoL commented May 15, 2023

Yes, without that parameter it works. It seems that if there's a mix up of networks (node-container is started on host but the actual chrome is started in bridge) then the CDP connection is not created and the socket hangs up. I'm not too prolific on this, but isn't the address 172.17.0.2 pointing to different stuff between container networks?

@diemol
Copy link
Member

diemol commented May 16, 2023

I can see it does not work when using host as a network. Why do you need to use host?

@diemol
Copy link
Member

diemol commented May 16, 2023

I was making changes to the code to see if supporting host was easy or not, and I ended up making a mess because several places need changes. Plus, the way the video recording works, would mean that we need to make changes to the images structure and variables for that.

I am sorry to say, but for now, I do not think we are going to support network host. Maybe we will end up moving video recording to the main image, and that will make things easier.

@PinhoL
Copy link
Author

PinhoL commented May 18, 2023

Thanks for having to time to look into this @diemol !
We use the host networking in order to optimize performance because we need to handle a large range of ports (we're running lot's of instances in parallel and it's way easier to expose host port range rather than managing this on the container itself) as it does not require network address translation (NAT).

I understand that this might be a big change and it's not easy to ship it. I'll keep my eyes on this and if eventually its released in the future, i'll be able to use it. Again, thanks for having the time to look into it

Copy link

github-actions bot commented Dec 9, 2023

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked and limited conversation to collaborators Dec 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants