You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current substrate client implementation in impl.go has several critical issues:
No Real Connection Pooling
Thread Safety Issues
No proactive health checking
No failover mechanism
Current behavior:
While the client can initialize a connection from multiple URLs, it only uses one of them to establish a single active connection. During operation, it lacks a failover system to switch to a new URL if the initial connection goes down. Instead, it reconnects only to the same URL when it comes back online. If the initially selected RPC node becomes unavailable, the client is unable to send transactions until that specific node is restored.
To Reproduce
I’ve included steps here to replicate a recent issue we encountered where some RPC nodes went offline, leading to missed uptime reports by many ZOS nodes. These steps can also be used to verify the issue once the connection pooling feature has been implemented.
sameh-farouk
changed the title
Go Substrate Client - Connection Pool Implementation
Go Substrate Client - Connection Failover Implementation
Nov 25, 2024
Verified
This feature was deployed to Devnet as part of ZOS. A failover occurred after the operations team was requested to stop one RPC node on Devnet. After the connection was closed, the system switched to another URL on the first use.
Describe the bug
The current substrate client implementation in
impl.go
has several critical issues:Current behavior:
While the client can initialize a connection from multiple URLs, it only uses one of them to establish a single active connection. During operation, it lacks a failover system to switch to a new URL if the initial connection goes down. Instead, it reconnects only to the same URL when it comes back online. If the initially selected RPC node becomes unavailable, the client is unable to send transactions until that specific node is restored.
To Reproduce
I’ve included steps here to replicate a recent issue we encountered where some RPC nodes went offline, leading to missed uptime reports by many ZOS nodes. These steps can also be used to verify the issue once the connection pooling feature has been implemented.
Steps to reproduce the behavior:
cd /Projects/tfchain/clients
main.go
and paste the following code:Check the "connecting to..." log message to identify the chosen URL/port.
Stop the node using
docker stop {name}
and observe that the client fails to switch to a different URL.docker start {name}
and notice that the client reconnects only after the original node is online again.The text was updated successfully, but these errors were encountered: