You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have 6 ScyllaDB nodes in the cluster. Recently, we pulled out one of the servers to do some maintenance. We expected the requests to be sent to 5 active nodes and skip inactive node with RoundRobin policy. However, the requests are still being sent to the inactive node.
Let's assume that we have
node_1 - active
node_2 - active
node_3 - active
node_4 - inactive
node_5 - active
node_6 - active
In the configuration, we set node_1, node_2, node_3, node_5, and node_6 in contact points.
However, the driver queries all peers from the initialization steps via (SELECT * FROM system.peers). The inactive node is returned and initialized with the Up status.
Code:
Then, the round-robin policy tries to send the request to this node for some period (Randomly around 10 - 120 minutes). The requests all failed until x minutes, and then the problem was gone. (I assume it should be status changed event being fired)
In my opinion, adding a new node should not start with UP status.
I have forked the repo and changed the initial status to Unknown. master...CEmocca:cdrs-tokio:master
This solves the issue in my case. But I'm unsure if it is the right way to do it in general. I think the status event listener should handle this case.
What do you think?
The text was updated successfully, but these errors were encountered:
Nodes with unknown status are ignored by default, so we don't incur the penalty of sending requests to potentially downed ones. They are set to up after a topology event is received, proving they are indeed up. In essence, this mechanism prevents the effect you are seeing from occurring all the time through the lifetime of the client, at the cost of occurring once on startup.
Hi,
We have 6 ScyllaDB nodes in the cluster. Recently, we pulled out one of the servers to do some maintenance. We expected the requests to be sent to 5 active nodes and skip inactive node with RoundRobin policy. However, the requests are still being sent to the inactive node.
Let's assume that we have
In the configuration, we set node_1, node_2, node_3, node_5, and node_6 in contact points.
However, the driver queries all peers from the initialization steps via (
SELECT * FROM system.peers
). The inactive node is returned and initialized with theUp
status.Code:
cdrs-tokio/cdrs-tokio/src/cluster/metadata_builder.rs
Line 37 in 4ed2ea3
Then, the round-robin policy tries to send the request to this node for some period (Randomly around 10 - 120 minutes). The requests all failed until x minutes, and then the problem was gone. (I assume it should be status changed event being fired)
Debug log
In my opinion, adding a new node should not start with
UP
status.I have forked the repo and changed the initial status to
Unknown.
master...CEmocca:cdrs-tokio:master
This solves the issue in my case. But I'm unsure if it is the right way to do it in general. I think the status event listener should handle this case.
What do you think?
The text was updated successfully, but these errors were encountered: