Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v2.0.0] Token-aware routing #131

Open
kw217 opened this issue Mar 31, 2017 · 10 comments
Open

[v2.0.0] Token-aware routing #131

kw217 opened this issue Mar 31, 2017 · 10 comments

Comments

@kw217
Copy link

kw217 commented Mar 31, 2017

Please could CDRS implement token-aware routing? This is important for performance, because it reduces network hops and also reduces load on the Cassandra cluster.

(How difficult would this be to add?)

@harrydevnull
Copy link
Contributor

harrydevnull commented Apr 2, 2017

If I am not mistaken that would be fairly complex task.
the simplest algorithm which i can think of is

  1. select * from system.peers;
  2. cache the result
  3. for the incoming query; based on the primary keys of that particular table; compute the murmur hash (assuming it murmur hash on the server side)
  4. identify the host that from the system.peers result and route appropriately.

cc-ing @tupshin @AlexPikalov what do you think?

@AlexPikalov
Copy link
Owner

@harrydevnull @kw217 I apologise these days I have very limited access to internet. Next week I'll take a look on this closely.

@AlexPikalov AlexPikalov added this to the v2.0.0 milestone Nov 24, 2017
@AlexPikalov AlexPikalov changed the title Token-aware routing [v2.0.0] Token-aware routing Nov 26, 2017
@artem-v-shamsutdinov
Copy link

Hello,

I'm also interested in Token-aware routing. Would like to implement high traffic query service in Rust but not having this feature actually makes Go (which has it) more effective. Is this likely to be implemented sometime in 2020?

Sincerely,

Artem

@AlexPikalov
Copy link
Owner

Hello @Russoturisto,

Yes, it must be a task with one of highest priority in 2020.

@artem-v-shamsutdinov
Copy link

artem-v-shamsutdinov commented Dec 29, 2019

Did a bit of digging in ScyllaDB go driver. I have now idea what I'm talking about, but hopefully this will help:

Murmur3 hash appears to be used: https://github.com/gocql/gocql/blob/master/token.go
Here is the Go implementation of the policy: https://github.com/gocql/gocql/blob/master/policies.go

On a separate note, here is the ScyllaDB protocol extension that connects to the right shard (not just node):
https://github.com/scylladb/scylla/blob/master/docs/protocol-extensions.md

And the pull request that makes it happen (again in Go):
apache/cassandra-gocql-driver#1211

I don't have enough Rust + Cassandra/Scylla knowledge to help right now but will try next autumn (8 month from now) if nothing happens by then. Either way I'm implementing my read services in Rust (I'll have an LRU cache and can't afford the extra heap space required by Go garbage collection). Hope this helps!

Artem

@AlexPikalov
Copy link
Owner

Hi @Russoturisto ,
really appreciate the research you've done. 🥇

@psarna
Copy link

psarna commented Jan 3, 2020

Hi, I'm from ScyllaDB, but this post is from my private spare time, so no warranty :)
I stumbled upon this issue while lurking for places to learn Rust from, and I have some more context, maybe it will help someone. I refer to Scylla java driver in the links, as it's rather clear to read and it has all these features implemented already.

First of all, in order to have token-aware policy, the driver indeed needs to store information about the cluster. In the java driver, the policy uses cluster metadata, which is kept up to date with an additional control connection. The connection is used to fetch info from system.peers and other system tables, it also accepts push notifications from Scylla - e.g. when a server is detected to be down, nodes can send a notification about it to all connected drivers. Once a control connection would be implemented, it should be used to maintain and update cluster metadata - a cached version of what the driver thinks the cluster state is. The metadata provides vital information for token-aware load balancing:

  • type of the partitioner
  • token ranges each node is responsible for

With this, for each query, we can extract its partition key (if it's provided), and compute its token - e.g. if the cluster uses Murmur3 partitioner, we compute a murmur3 hash of the key's data. Then, since the driver knows which tokens are most likely owned by which nodes, it can pick a correct one.

Also, note that token-aware policy needs a child policy to fall back to - e.g. if we don't have enough information to compute the correct node. It's also possible to have different strategies for load-balancing inside token-aware policy, more info here: https://github.com/scylladb/java-driver/blob/3.7.1-scylla/driver-core/src/main/java/com/datastax/driver/core/policies/TokenAwarePolicy.java#L62

As for a very important Scylla-specific optimisation - shard awareness - we store more information about every node (e.g. the number of its shards), and also try to create a separate connection for every shard. Then, given a partition key, we can leverage this information to compute which shard it belongs to in a specific node, and send the request to the correct shard, which results in better performance.

So, in short, in order to have token-aware routing the first important thing is to have a way of fetching information about the cluster from the cluster itself. Then, once we know which partitioner is used and which nodes own which tokens, we can compute appropriate tokens and route queries in a more optimized way. I see that Rust already has several crates that offer murmur3 hashing, so that's convenient :)

@psarna
Copy link

psarna commented Jan 3, 2020

Also, as an exercise for myself, I wrote a quite useless (at least for now) snippet that reads Scylla sharding info from a node and prints it: psarna@1b25aff. Perhaps it can be used one day as a template for implementing shard awareness in cdrs on top of token awareness. It's also literally my first code in Rust, so don't judge :)

@AlexPikalov
Copy link
Owner

AlexPikalov commented Jan 3, 2020

Hi @psarna,
Thanks for this great insight and for your code snippet! I think it may be a very useful basis for the future routing. Perhaps your code can be included into the CDRS as a some feature. So if somebody needs it he or she could use CDRS with such feature enabled even now. For other users, it won't be included into the bundle.
So, feel free to submit a MR with a new Rust-feature and your code. If you need any help with conditional compiling I'll gladly help you.

@psarna
Copy link

psarna commented Jan 6, 2020

Thanks. I'm a little short on time, but I'll try to figure out Rust's conditional compiling by myself, and perhaps one day I'll push something more substantial. And if token-aware routing makes it into cdrs one day, I could definitely help integrating Scylla's shard awareness on top of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants