Request is sent to inactive node #195

CEmocca · 2025-01-07T14:18:01Z

Hi,

We have 6 ScyllaDB nodes in the cluster. Recently, we pulled out one of the servers to do some maintenance. We expected the requests to be sent to 5 active nodes and skip inactive node with RoundRobin policy. However, the requests are still being sent to the inactive node.

Let's assume that we have

node_1 - active
node_2 - active
node_3 - active
node_4 - inactive
node_5 - active
node_6 - active

In the configuration, we set node_1, node_2, node_3, node_5, and node_6 in contact points.
However, the driver queries all peers from the initialization steps via (SELECT * FROM system.peers). The inactive node is returned and initialized with the Up status.
Code:

cdrs-tokio/cdrs-tokio/src/cluster/metadata_builder.rs

Line 37 in 4ed2ea3

NodeState::Up,

Then, the round-robin policy tries to send the request to this node for some period (Randomly around 10 - 120 minutes). The requests all failed until x minutes, and then the problem was gone. (I assume it should be status changed event being fired)

Debug log

2025-01-07T14:10:59.267467Z DEBUG ThreadId(03) cdrs_tokio::cluster::control_connection: 95: Establishing new control connection...
2025-01-07T14:10:59.267705Z DEBUG ThreadId(03) cdrs_tokio::cluster::topology::node: 233: Establishing new connection to node...
2025-01-07T14:10:59.374566Z DEBUG ThreadId(11) cdrs_tokio::cluster::control_connection: 121: Established new control connection.
2025-01-07T14:10:59.496255Z DEBUG ThreadId(11) cdrs_tokio::cluster::metadata_builder: 27: Copying contact point. node_info=NodeInfo { host_id: 23db827d-4f95-4d67-b642-7409d0076ab9, broadcast_rpc_address: node_1:9042, broadcast_address: None, datacenter: "th", rack: "rack2" }
2025-01-07T14:10:59.496332Z DEBUG ThreadId(11) cdrs_tokio::cluster::metadata_builder: 27: Copying contact point. node_info=NodeInfo { host_id: 290c2a93-9d24-4dab-9f2a-32a7bbeb3804, broadcast_rpc_address: node_2:9042, broadcast_address: None, datacenter: "th", rack: "rack3" }
2025-01-07T14:10:59.496355Z DEBUG ThreadId(11) cdrs_tokio::cluster::metadata_builder: 27: Copying contact point. node_info=NodeInfo { host_id: bea093f2-a031-421e-8e57-4b09d4855d98, broadcast_rpc_address: node_3:9042, broadcast_address: None, datacenter: "th", rack: "rack3" }
2025-01-07T14:10:59.496376Z DEBUG ThreadId(11) cdrs_tokio::cluster::metadata_builder: 30: Adding new node. node_info=NodeInfo { host_id: c82e11de-ac57-4257-848c-6e0bfee14bfd, broadcast_rpc_address: node_4:9042, broadcast_address: None, datacenter: "th", rack: "rack2" }
2025-01-07T14:10:59.496397Z DEBUG ThreadId(11) cdrs_tokio::cluster::metadata_builder: 27: Copying contact point. node_info=NodeInfo { host_id: 022c3be3-c7e0-43c3-869d-5a7df5a1e805, broadcast_rpc_address: node_5:9042, broadcast_address: None, datacenter: "th", rack: "rack1" }
2025-01-07T14:10:59.496455Z DEBUG ThreadId(11) cdrs_tokio::cluster::metadata_builder: 27: Copying contact point. node_info=NodeInfo { host_id: 9ca4d73c-7fb0-443b-908b-ba155dc0e0cc, broadcast_rpc_address: node_6:9042, broadcast_address: None, datacenter: "th", rack: "rack1" }

In my opinion, adding a new node should not start with UP status.
I have forked the repo and changed the initial status to Unknown.
master...CEmocca:cdrs-tokio:master

This solves the issue in my case. But I'm unsure if it is the right way to do it in general. I think the status event listener should handle this case.

What do you think?

The text was updated successfully, but these errors were encountered:

krojew · 2025-01-10T07:28:58Z

Nodes with unknown status are ignored by default, so we don't incur the penalty of sending requests to potentially downed ones. They are set to up after a topology event is received, proving they are indeed up. In essence, this mechanism prevents the effect you are seeing from occurring all the time through the lifetime of the client, at the cost of occurring once on startup.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request is sent to inactive node #195

Request is sent to inactive node #195

CEmocca commented Jan 7, 2025 •

edited

Loading

krojew commented Jan 10, 2025

Request is sent to inactive node #195

Request is sent to inactive node #195

Comments

CEmocca commented Jan 7, 2025 • edited Loading

krojew commented Jan 10, 2025

CEmocca commented Jan 7, 2025 •

edited

Loading