Allow randomization of read candidates using Redis Cluster #834

nborole · 2018-08-22T12:55:18Z

Bug Report

Current Behavior

I am trying to connect to Redis cluster with 5 slave nodes and 1 master. All read queries are redirected to only one slave throughout the application scope.

Input Code

Java/Kotlin/Scala/Groovy/… or Repo link if applicable:

// connection pool creation:
GenericObjectPool<StatefulRedisClusterConnection<String, String>> connectionPool;
GenericObjectPoolConfig poolConfig = new GenericObjectPoolConfig();
            poolConfig.setMinIdle(8);
            poolConfig.setMaxIdle(8);
            poolConfig.setMaxTotal(16);
            poolConfig.setMinEvictableIdleTimeMillis(1000*30);
            poolConfig.setSoftMinEvictableIdleTimeMillis(1000*30);
            poolConfig.setMaxWaitMillis(0);
            connectionPool = ConnectionPoolSupport.createGenericObjectPool(() -> {
                logger.info("Requesting new StatefulRedisClusterConnection "+System.currentTimeMillis());
                return redisClient.connect();
                }, poolConfig);

// 
public StatefulRedisClusterConnection<String, String> getConnection() throws Exception{
   StatefulRedisClusterConnection<String, String> connection =connectionPool.borrowObject();
   connection.setReadFrom(ReadFrom.SLAVE);
    return connection;
}

Expected behavior/code

It should distribute read queries on all slaves.

Environment

Lettuce version(s): 5.1.0.M1
Redis version: 3.2.10

Possible Solution

Additional context

mp911de · 2018-08-22T19:02:40Z

Lettuce selects a slave per slot hash and uses the selected slave for read operations until the next topology refresh. This is by design to reduce CPU overhead.

stillerrr · 2018-10-25T03:02:13Z

In high-concurrency, reading data from a slave node can improve request effectiveness if a master node in the cluster has multiple slave nodes and we can share read requests on these slave nodes.
but the lettuce now just select one of the slaves for read request, it would not balance the read requests among the slaves, one of the slave would be busy and the others are not. And the request effectiveness can not be improved a lot.

mp911de · 2018-10-25T10:32:40Z

You can mitigate this behavior by enabling periodic topology refresh. Each time the topology is refreshed, Lettuce discards the slot hash connection cache and determines a different slave. This also works if there were no changes to the topology.

stillerrr · 2018-10-25T12:49:54Z

How does lettuce do the refresh, by "cluster slot" command? if so, it also increase the host pressure and the "cluster slot" command spend a lot of time. I think it would be better to provide interface that we can DIY by ourselves.

mp911de · 2018-10-26T08:24:26Z

Sorry if I wasn't precise enough: Lettuce calculates the slot-hash itself and then determines read candidates for the hash slot by obtaining the node selection from ReadFrom. The read candidates are cached for the particular slot until a topology refresh happens. Subsequent read requests use the cached node selection instead of invoking ReadFrom again.

stillerrr · 2018-10-27T10:20:44Z

in our project, the TPS is very high, we want to distribute read requests across slave nodes.
as you said, the read requests would go to the same node before the topology refresh, and no request goes to the other nodes, just one read node can't support so high tps.
how about the way like below, get the random node from candidates in PooledClusterConnectionProvider:

if (cached) {

            return CompletableFuture.allOf(readerCandidates).thenCompose(v -> {  
              
                int index = (int)(Math.random()*selectedReaderCandidates.length);              
                if (selectedReaderCandidates[index].join().isOpen()) {
                    return selectedReaderCandidates[index];
                }else {
                    index = (index + 1) % selectedReaderCandidates.length;                    
                    if (selectedReaderCandidates[index].join().isOpen()) {
                        return selectedReaderCandidates[index];
                    }
                }    
            
                for (CompletableFuture<StatefulRedisConnection<K, V>> candidate : selectedReaderCandidates) {
                    if (candidate.join().isOpen()) {
                        return candidate;
                    }
                }
                return selectedReaderCandidates[0];
            });
        }

mp911de · 2018-11-07T10:57:37Z

That's a neat approach to apply randomization without creating objects. Currently, what's missing is the ability to configure this behavior. ReadFrom contradicts randomization to some extent as a specific node type preference is no longer given if we apply randomization.

We could introduce a boolean randomize() method to ReadFrom: This would basically signal to perform what the code does above (without the second randomize fallback).

mp911de · 2019-01-02T15:52:16Z

I pushed a change to our feature branch for this ticket and the branch build that has produced artifact snapshots with version 5.2.0.gh-834-SNAPSHOT. Please upgrade to this version and run your tests to evaluate how the change affects your throughput/Redis utilization.

Make sure to include our snapshot repository:

<dependencies>
  <dependency>
    <groupId>io.lettuce</groupId>
    <artifactId>lettuce-core</artifactId>
    <version>5.2.0.gh-834-SNAPSHOT</version>
  </dependency>
</dependencies>

<repositories>
  <repository>
    <id>sonatype-snapshots</id>
    <url>https://oss.sonatype.org/content/repositories/snapshots</url>
    <snapshots>
      <enabled>true</enabled>
    </snapshots>
  </repository>
</repositories>

stillerrr · 2019-01-04T10:06:38Z

if we go to the random branch, the connections would not be cached, does it affect performance?

mp911de · 2019-01-04T11:37:55Z

if we go to the random branch, the connections would not be cached, does it affect performance?

Connections are held in AsyncConnectionProvider. The thing that is not cached is the candidate list and I think there is a penalty associated with this but that is the price to pay as we no longer can apply certain optimizations.

stillerrr · 2019-01-09T04:02:36Z

after testing, the read lantency maybe increases 5%.
how about add the random selection in the cached connection too? like the code I provided before

mp911de · 2019-01-09T07:46:13Z

Thanks a lot for testing the changes. I reinstantiated caching of read candidates which restricts randomization to some extent.

stillerrr · 2019-01-11T06:38:14Z

good job, my test result that the lantency reduces 4%

one more thing, I find that the lettuce cluster would get the "cluster nodes" info and the "client list" info when refresh the topology refresh.
why the lettuce need the "client list" info ?

I happened a problem that if a cluster is big which has hundreds of nodes and there are handreds of lettuce clients, the "cluster nodes" and "client list" info would be a large data and the commands would spend a lot of time, the redis instance's cpu rate would raise and the lantency would increase a lot.

the default refreshPeriod is 60s, it is so short that in our situation the "CLUSTER" and "CLIENT" commands' tps becomes very high

mp911de · 2019-01-11T08:17:11Z

Lettuce uses the client count to determine the cluster node connect ordering (sort by least number of clients). This is, to connect to a default node. The default node is used for key-less commands (such as Pub/Sub).

ReadFrom can now express whether it is order-sensitive or non-sensitive. This allows the using code to apply optimizations such as caching for order-sensitive implementation or randomization for non-sensitive implementations. ReadFrom.ANY is an implementation that allows reads from any node without considering the actual node ordering.

mp911de · 2019-01-11T10:33:20Z

That change is in place now.

mp911de · 2019-01-11T10:34:08Z

@StillerRen How about increasing the refresh period and enabling adaptive timeouts (that's a different issue than this one, though).

stillerrr · 2019-01-12T01:44:29Z

Just give advice, if some one uses the lettuce like this, he need pay attention to increasing the refresh period or set "false" for refresh.
no need to change the code. thanks : )

smoomrik · 2019-06-19T08:04:49Z

@mp911de As I see, the support for the read distribution from replicas will be added in the 5.2.x. When this going to be released?

mp911de · 2019-06-19T08:12:08Z

Probably September, see https://github.com/lettuce-io/lettuce-core/milestone/51.

smoomrik · 2019-06-19T08:15:30Z

@mp911de Thanks for the quick reply. We are really waiting for this to be released.

adityak93 · 2019-06-26T09:51:03Z

@mp911de @smoomrik AWS has recently introduced reader endpoints for Redis which manages distribution of reads among the slaves. You can check more details here: https://aws.amazon.com/about-aws/whats-new/2019/06/amazon-elasticache-launches-reader-endpoint-for-redis/

Might want to check this out. Using this should help avoid having to provide individual replica endpoints in the application giving more flexibility.
Not sure about exact underlying implementation of this on AWS side. Waiting for more clarification. Initial tests which we have done on test environment (1 master, 2 replicas) showed roughly equal distribution of load among the 2 replicas

vaibhavflip · 2022-02-04T13:57:41Z

hey @adityak93 , we have been using reader endpoint, but its not distributing the traffic equally among read replicas for us.sometimes, even when other read replicase are under heavy load, few read replicase have barely any requests coming their way. can you tell us if this is consistent with your observations , or if you used any additional configuration apart from read preferred and periodic topology refresh.

nborole changed the title ~~Always redirecting read to single slave node to read in clustered Redis.~~ Always redirecting read to single slave node in clustered Redis. Aug 22, 2018

mp911de added the status: waiting-for-triage label Aug 22, 2018

mp911de added type: feature A new feature and removed status: waiting-for-triage labels Nov 7, 2018

mp911de added this to the 5.2.0 milestone Jan 2, 2019

mp911de changed the title ~~Always redirecting read to single slave node in clustered Redis.~~ Allow randomization of read candidates using Redis Cluster Jan 9, 2019

mp911de closed this as completed Jan 11, 2019

mp911de mentioned this issue Oct 15, 2019

Read request is not dispatched to slave nodes evenly when ReadFrom.slave #1148

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow randomization of read candidates using Redis Cluster #834

Allow randomization of read candidates using Redis Cluster #834

nborole commented Aug 22, 2018 •

edited

Loading

mp911de commented Aug 22, 2018

stillerrr commented Oct 25, 2018

mp911de commented Oct 25, 2018

stillerrr commented Oct 25, 2018

mp911de commented Oct 26, 2018

stillerrr commented Oct 27, 2018

mp911de commented Nov 7, 2018

mp911de commented Jan 2, 2019

stillerrr commented Jan 4, 2019

mp911de commented Jan 4, 2019

stillerrr commented Jan 9, 2019

mp911de commented Jan 9, 2019

stillerrr commented Jan 11, 2019

mp911de commented Jan 11, 2019

mp911de commented Jan 11, 2019

mp911de commented Jan 11, 2019

stillerrr commented Jan 12, 2019

smoomrik commented Jun 19, 2019

mp911de commented Jun 19, 2019

smoomrik commented Jun 19, 2019

adityak93 commented Jun 26, 2019

vaibhavflip commented Feb 4, 2022

Allow randomization of read candidates using Redis Cluster #834

Allow randomization of read candidates using Redis Cluster #834

Comments

nborole commented Aug 22, 2018 • edited Loading

Bug Report

Current Behavior

Input Code

Expected behavior/code

Environment

Possible Solution

Additional context

mp911de commented Aug 22, 2018

stillerrr commented Oct 25, 2018

mp911de commented Oct 25, 2018

stillerrr commented Oct 25, 2018

mp911de commented Oct 26, 2018

stillerrr commented Oct 27, 2018

mp911de commented Nov 7, 2018

mp911de commented Jan 2, 2019

stillerrr commented Jan 4, 2019

mp911de commented Jan 4, 2019

stillerrr commented Jan 9, 2019

mp911de commented Jan 9, 2019

stillerrr commented Jan 11, 2019

mp911de commented Jan 11, 2019

mp911de commented Jan 11, 2019

mp911de commented Jan 11, 2019

stillerrr commented Jan 12, 2019

smoomrik commented Jun 19, 2019

mp911de commented Jun 19, 2019

smoomrik commented Jun 19, 2019

adityak93 commented Jun 26, 2019

vaibhavflip commented Feb 4, 2022

nborole commented Aug 22, 2018 •

edited

Loading