Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow randomization of read candidates using Redis Cluster #834

Closed
nborole opened this issue Aug 22, 2018 · 22 comments
Closed

Allow randomization of read candidates using Redis Cluster #834

nborole opened this issue Aug 22, 2018 · 22 comments
Labels
type: feature A new feature
Milestone

Comments

@nborole
Copy link

nborole commented Aug 22, 2018

Bug Report

Current Behavior

I am trying to connect to Redis cluster with 5 slave nodes and 1 master. All read queries are redirected to only one slave throughout the application scope.

Input Code

  • Java/Kotlin/Scala/Groovy/… or Repo link if applicable:
// connection pool creation:
GenericObjectPool<StatefulRedisClusterConnection<String, String>> connectionPool;
GenericObjectPoolConfig poolConfig = new GenericObjectPoolConfig();
            poolConfig.setMinIdle(8);
            poolConfig.setMaxIdle(8);
            poolConfig.setMaxTotal(16);
            poolConfig.setMinEvictableIdleTimeMillis(1000*30);
            poolConfig.setSoftMinEvictableIdleTimeMillis(1000*30);
            poolConfig.setMaxWaitMillis(0);
            connectionPool = ConnectionPoolSupport.createGenericObjectPool(() -> {
                logger.info("Requesting new StatefulRedisClusterConnection "+System.currentTimeMillis());
                return redisClient.connect();
                }, poolConfig);

// 
public StatefulRedisClusterConnection<String, String> getConnection() throws Exception{
   StatefulRedisClusterConnection<String, String> connection =connectionPool.borrowObject();
   connection.setReadFrom(ReadFrom.SLAVE);
    return connection;
}

Expected behavior/code

It should distribute read queries on all slaves.

Environment

  • Lettuce version(s): 5.1.0.M1
  • Redis version: 3.2.10

Possible Solution

Additional context

@nborole nborole changed the title Always redirecting read to single slave node to read in clustered Redis. Always redirecting read to single slave node in clustered Redis. Aug 22, 2018
@mp911de
Copy link
Collaborator

mp911de commented Aug 22, 2018

Lettuce selects a slave per slot hash and uses the selected slave for read operations until the next topology refresh. This is by design to reduce CPU overhead.

@stillerrr
Copy link

In high-concurrency, reading data from a slave node can improve request effectiveness if a master node in the cluster has multiple slave nodes and we can share read requests on these slave nodes.
but the lettuce now just select one of the slaves for read request, it would not balance the read requests among the slaves, one of the slave would be busy and the others are not. And the request effectiveness can not be improved a lot.

@mp911de
Copy link
Collaborator

mp911de commented Oct 25, 2018

You can mitigate this behavior by enabling periodic topology refresh. Each time the topology is refreshed, Lettuce discards the slot hash connection cache and determines a different slave. This also works if there were no changes to the topology.

@stillerrr
Copy link

How does lettuce do the refresh, by "cluster slot" command? if so, it also increase the host pressure and the "cluster slot" command spend a lot of time. I think it would be better to provide interface that we can DIY by ourselves.

@mp911de
Copy link
Collaborator

mp911de commented Oct 26, 2018

Sorry if I wasn't precise enough: Lettuce calculates the slot-hash itself and then determines read candidates for the hash slot by obtaining the node selection from ReadFrom. The read candidates are cached for the particular slot until a topology refresh happens. Subsequent read requests use the cached node selection instead of invoking ReadFrom again.

@stillerrr
Copy link

in our project, the TPS is very high, we want to distribute read requests across slave nodes.
as you said, the read requests would go to the same node before the topology refresh, and no request goes to the other nodes, just one read node can't support so high tps.
how about the way like below, get the random node from candidates in PooledClusterConnectionProvider:

if (cached) {

            return CompletableFuture.allOf(readerCandidates).thenCompose(v -> {  
              
                int index = (int)(Math.random()*selectedReaderCandidates.length);              
                if (selectedReaderCandidates[index].join().isOpen()) {
                    return selectedReaderCandidates[index];
                }else {
                    index = (index + 1) % selectedReaderCandidates.length;                    
                    if (selectedReaderCandidates[index].join().isOpen()) {
                        return selectedReaderCandidates[index];
                    }
                }    
            
                for (CompletableFuture<StatefulRedisConnection<K, V>> candidate : selectedReaderCandidates) {
                    if (candidate.join().isOpen()) {
                        return candidate;
                    }
                }
                return selectedReaderCandidates[0];
            });
        }

@mp911de
Copy link
Collaborator

mp911de commented Nov 7, 2018

That's a neat approach to apply randomization without creating objects. Currently, what's missing is the ability to configure this behavior. ReadFrom contradicts randomization to some extent as a specific node type preference is no longer given if we apply randomization.

We could introduce a boolean randomize() method to ReadFrom: This would basically signal to perform what the code does above (without the second randomize fallback).

@mp911de mp911de added this to the 5.2.0 milestone Jan 2, 2019
@mp911de
Copy link
Collaborator

mp911de commented Jan 2, 2019

I pushed a change to our feature branch for this ticket and the branch build that has produced artifact snapshots with version 5.2.0.gh-834-SNAPSHOT. Please upgrade to this version and run your tests to evaluate how the change affects your throughput/Redis utilization.

Make sure to include our snapshot repository:

<dependencies>
  <dependency>
    <groupId>io.lettuce</groupId>
    <artifactId>lettuce-core</artifactId>
    <version>5.2.0.gh-834-SNAPSHOT</version>
  </dependency>
</dependencies>

<repositories>
  <repository>
    <id>sonatype-snapshots</id>
    <url>https://oss.sonatype.org/content/repositories/snapshots</url>
    <snapshots>
      <enabled>true</enabled>
    </snapshots>
  </repository>
</repositories>

@stillerrr
Copy link

if we go to the random branch, the connections would not be cached, does it affect performance?

@mp911de
Copy link
Collaborator

mp911de commented Jan 4, 2019

if we go to the random branch, the connections would not be cached, does it affect performance?

Connections are held in AsyncConnectionProvider. The thing that is not cached is the candidate list and I think there is a penalty associated with this but that is the price to pay as we no longer can apply certain optimizations.

@stillerrr
Copy link

after testing, the read lantency maybe increases 5%.
how about add the random selection in the cached connection too? like the code I provided before

@mp911de mp911de changed the title Always redirecting read to single slave node in clustered Redis. Allow randomization of read candidates using Redis Cluster Jan 9, 2019
@mp911de
Copy link
Collaborator

mp911de commented Jan 9, 2019

Thanks a lot for testing the changes. I reinstantiated caching of read candidates which restricts randomization to some extent.

@stillerrr
Copy link

good job, my test result that the lantency reduces 4%

one more thing, I find that the lettuce cluster would get the "cluster nodes" info and the "client list" info when refresh the topology refresh.
why the lettuce need the "client list" info ?

I happened a problem that if a cluster is big which has hundreds of nodes and there are handreds of lettuce clients, the "cluster nodes" and "client list" info would be a large data and the commands would spend a lot of time, the redis instance's cpu rate would raise and the lantency would increase a lot.

the default refreshPeriod is 60s, it is so short that in our situation the "CLUSTER" and "CLIENT" commands' tps becomes very high

@mp911de
Copy link
Collaborator

mp911de commented Jan 11, 2019

Lettuce uses the client count to determine the cluster node connect ordering (sort by least number of clients). This is, to connect to a default node. The default node is used for key-less commands (such as Pub/Sub).

mp911de added a commit that referenced this issue Jan 11, 2019
ReadFrom can now express whether it is order-sensitive or non-sensitive. This allows the using code to apply optimizations such as caching for order-sensitive implementation or randomization for non-sensitive implementations.

ReadFrom.ANY is an implementation that allows reads from any node without considering the actual node ordering.
@mp911de
Copy link
Collaborator

mp911de commented Jan 11, 2019

That change is in place now.

@mp911de mp911de closed this as completed Jan 11, 2019
@mp911de
Copy link
Collaborator

mp911de commented Jan 11, 2019

@StillerRen How about increasing the refresh period and enabling adaptive timeouts (that's a different issue than this one, though).

@stillerrr
Copy link

Just give advice, if some one uses the lettuce like this, he need pay attention to increasing the refresh period or set "false" for refresh.
no need to change the code. thanks : )

@smoomrik
Copy link

@mp911de As I see, the support for the read distribution from replicas will be added in the 5.2.x. When this going to be released?

@mp911de
Copy link
Collaborator

mp911de commented Jun 19, 2019

@smoomrik
Copy link

@mp911de Thanks for the quick reply. We are really waiting for this to be released.

@adityak93
Copy link

@mp911de @smoomrik AWS has recently introduced reader endpoints for Redis which manages distribution of reads among the slaves. You can check more details here: https://aws.amazon.com/about-aws/whats-new/2019/06/amazon-elasticache-launches-reader-endpoint-for-redis/

Might want to check this out. Using this should help avoid having to provide individual replica endpoints in the application giving more flexibility.
Not sure about exact underlying implementation of this on AWS side. Waiting for more clarification. Initial tests which we have done on test environment (1 master, 2 replicas) showed roughly equal distribution of load among the 2 replicas

@vaibhavflip
Copy link

hey @adityak93 , we have been using reader endpoint, but its not distributing the traffic equally among read replicas for us.sometimes, even when other read replicase are under heavy load, few read replicase have barely any requests coming their way. can you tell us if this is consistent with your observations , or if you used any additional configuration apart from read preferred and periodic topology refresh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature A new feature
Projects
None yet
Development

No branches or pull requests

6 participants