Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MULTI is dispatched to slave nodes using SLAVE readFrom #779

Closed
Yipei opened this issue May 19, 2018 · 12 comments
Closed

MULTI is dispatched to slave nodes using SLAVE readFrom #779

Yipei opened this issue May 19, 2018 · 12 comments
Labels
type: bug A general bug
Milestone

Comments

@Yipei
Copy link

Yipei commented May 19, 2018

Hi there,

We are experiencing an issue on multi/exec transactions with master/slave configurations (readFrom is set as SLAVE). We are using spring-data-redis and selected lettuce as the redis framework.

Here is the code we wrote to catch the issue

RedisConnection connection= RedisConnectionUtils.getConnection(stringRedisTemplate.getConnectionFactory(), true);
      try {
                connection.watch(key.getBytes());
                connection.multi();                  
                connection.hashCommands().hMSet(key.getBytes(), hash);
                connection.exec();
        } catch (Exception e) {
        } finally {
            RedisConnectionUtils.releaseConnection(connection, stringRedisTemplate.getConnectionFactory());
        }

We set a master and slave to test it out
hosts:
- 127.0.0.1:6379 (master)
- 127.0.0.1:6380

Then we hit the exception
nested exception is io.lettuce.core.RedisCommandExecutionException: ERR EXEC without MULTI,
And we found the the watch(), multi(), exec() are distributed between the master and slave.

MASTER
[0 127.0.0.1:54159] "WATCH" "xxxx"
[0 127.0.0.1:54159] "HMSET" "xxxx"

SLAVE
[0 127.0.0.1:54157] "PING" "NODES"
[0 127.0.0.1:54160] "MULTI"

This could explain why the "ERR EXEC without MULTI" happens. But is this a bug or there is a way to work around? We also tried with stringRedisTemplate.execute(new SessionCallback<Boolean>(){...} but there are no difference.

Right now What I can think about is to have another ConnectionFactory with readFrom -> MASTER, and move all transactional commands to its maintained connections. But this looks like a ugly fix... Please advice.

I also saw a comment in another ticket : "Lettuce connections require single-threaded/synchronized access when using transactions. If two or more threads call concurrently MULTI/EXEC methods this will destroy the connection state. I don't think there is anything we could do here." Wondering the "single-threaded/synchronized" requirement mentioned there is to the RedisConnection or to the shared native connection?

Thanks

@mp911de mp911de added the type: bug A general bug label May 19, 2018
@mp911de
Copy link
Collaborator

mp911de commented May 19, 2018

There are two issues here:

The first one is that MULTI is considered a Read-Only command, that's why it's dispatched to a slave node. We should remove MULTI from Read-Only commands that's why this ticket is labelled as bug.

The second issue is that when using transactions and Master/Slave with non-master reads, you are likely to experience issues because there is no distributed transaction support. Read-only commands are sent to slaves and write commands hit the master.

Consider the following sequence of commands:

  1. MULTI
  2. INCR
  3. GET
  4. EXEC

In this case, all commands except GET would be sent to the master and GET would happen outside the transaction.
There's no possibility to pin a transaction to a particular node as the transactional commands cannot indicate what is going to happen next. That being said, do not use transactions on Master/Slave connections. If you have to use transactions, open a direct connection to the particular node and operate on that one.

@Yipei
Copy link
Author

Yipei commented May 19, 2018

Hi @mp911de

Thank you for the explanations, it's very helpful. As you mentioned "Read-only commands are sent to slaves and write commands hit the master." Does it mean if the readFrom set as MASTER, this might work? Or we still need to guarantee the single-thread/synchronized when handling the connection?

If you don't mind, could you please help explain how the LettuceConnection works with the native connection to redis node? I saw multiple LettuceConnection could share the native connection, When you say to do the "single-thread/synchronized" on each connection, did you mean LettuceConnection which fetched from ConnectionFactory, or the native single-thread connection? I saw there is a setting lettceConnectionFactory.setShareNativeConnection(false); Do we also need set it to be false to avoid possible concurrent transactional commands on one connection, if we still like to use ConnectionFactory to provide connection for transactional calls (When set readFrom to MASTER)?

We could try to create a separate connection directly to the master node. But since Lettuce already handles well on the auto failover of master-slaves, is there a way we could get the current master ip/port or connection info from ConnectionFactory (or another component)?

Thanks

@mp911de
Copy link
Collaborator

mp911de commented May 21, 2018

Does it mean if the readFrom set as MASTER, this might work

Yes, this will work

Or we still need to guarantee the single-thread/synchronized when handling the connection

Using transactions and/or blocking commands (BRPOP and such) requires a synchronized connection access (potentially connection pooling). If you don't use transactions or blocking commands, then use a single connection.

If you don't mind, could you please help explain how the LettuceConnection

That's a specific implementation of Spring Data Redis. Spring Data takes care of the hairy bits for you meaning, that transactions, pipelining, and blocking commands get issued on dedicated connections.

ShareNativeConnection defaults to true to to maximize efficiency. Lettuce is thread-safe, so a single connection may be shared across multiple threads unless you change the connection state or block the connection. That's what Spring Data is adapting to. The only thing you need pay attention to is disabling slave reads when you want to use transactions.

If you want to use Master/Slave with slave reads, then spin up another LettuceConnectionFactory (and share ClientResources amongst these to share thread pools) that isn't used for transactions.

Does this make sense?

@mp911de mp911de added the status: waiting-for-feedback We need additional information before we can continue label May 21, 2018
@mp911de
Copy link
Collaborator

mp911de commented Jun 5, 2018

Closing due to lack of requested feedback. If you would like us to look at this issue, please provide the requested information and we will re-open the issue.

@mp911de mp911de closed this as completed Jun 5, 2018
@asgs
Copy link

asgs commented Jun 13, 2018

@mp911de We're facing the same issue OP brought up. We were using Jedis up until now when we realized we were not utilizing the two slaves our (single) Master has connected to. So to improve the utilization, we switched to Lettuce client and it's been working great except for this one use-case where we want to batch a few writes and deletes inside a transaction.

So the upshot of this discussion is, if we want to execute transactions on a master-slave deployment, use a separate instance of LettuceClusterPool with readFrom set to MASTER (or MASTER_PREFERRED) while the other LettuceClusterPool instance with readFrom set to SLAVE (or SLAVE_PREFERRED) can be used for the rest of the non-transactional Redis operations. Please confirm if this understanding is correct.

Also, a couple of more related questions -

  1. when MASTER_PREFERRED is used and the master goes down or becomes unavailable, will the same problem arise? Shouldn't the transaction (MULTI/EXEC) fail because slaves are read-only?
  2. You mentioned that Multi is currently treated as a Read-only command. We're using 4.4.4.Final version and after setting the MASTER_PREFERRED option, this issue didn't happen. So, is it because the entire transaction was executed on the Master node as it was available at that time?

@mp911de mp911de reopened this Jun 13, 2018
@mp911de
Copy link
Collaborator

mp911de commented Jun 13, 2018

@asgs to answer your questions:

Please confirm if this understanding is correct.
Yes, using separate pools (connections) will prevent the issue. The read-only connections are not required to be pooled, a single connection will work, too.

when MASTER_PREFERRED is used and the master goes down or becomes unavailable, will the same problem arise? Shouldn't the transaction (MULTI/EXEC) fail because slaves are read-only?
Yes, the problem will arise.

So, is it because the entire transaction was executed on the Master node as it was available at that time?
Exactly.

Happy to continue the discussion on this issue. Another idea came to my mind:

Since transactions are broken as soon non-master nodes are touched, we could state, that transactions work only on masters. This assumption can help us to get transactions working with Master/Slave in the sense that we bind transactions to the master node.

As soon as WATCH/MULTI commands are executed, we'll dispatch all subsequent commands, that are issued, to the master node regardless of the ReadFrom. Once we see EXEC or DISCARD (which terminates the transactional state), we can again dispatch commands according to the ReadFrom setting.

Does this make sense?

@asgs
Copy link

asgs commented Jun 13, 2018

@mp911de thank you! we're on the same page.

yes, we plan to use only one connection in either mode (transactional and non-transactional).

@mp911de
Copy link
Collaborator

mp911de commented Jun 13, 2018

Please make sure, when you use transactions to not share the same connection across multiple threads (use pooling in such case). Otherwise, you will run into race conditions. You can share a single connection if you're not using transactional commands.

@asgs
Copy link

asgs commented Jun 13, 2018

@mp911de, when we create the LettuceClusterConnectionFactory we pass in a LettuceClusterPool which is constructed with a GenericObjectPoolConfig instance which has max connections set to 8. does this help when using this connection factory for transactions? Also, this connection factory instance is set onto RedisTemplate which is what we use to perform transactions.

Does Lettuce and/or Spring Data Redis print the Redis connection Id? I could see CommandHandler printing DEBUG statements under the thread names of the format lettuce-nioEventLoop-x-y or on the main thread like below.

2018-06-13 20:16:16.457 DEBUG [main] com.lambdaworks.redis.protocol.CommandHandler - [channel=0xdeef79c8, /x.y.z:33462 -> host/a.b.c:6379, chid=0x11] write() writeAndFlush command TransactionalCommand [type=ZREM, output=IntegerOutput [output=null, error='null'], commandType=com.lambdaworks.redis.protocol.TransactionalCommand]

2018-06-13 20:16:16.457 DEBUG [lettuce-nioEventLoop-9-2] com.lambdaworks.redis.protocol.CommandHandler - [channel=0xdeef79c8, /x.y.z:33462 -> host/a.b.c:6379, chid=0x11] write(ctx, TransactionalCommand [type=PEXPIRE, output=BooleanOutput [output=null, error='null'], commandType=com.lambdaworks.redis.protocol.TransactionalCommand], promise)

@mp911de
Copy link
Collaborator

mp911de commented Jun 14, 2018

No, Spring Data Redis does not report a connection Id. The ChannelId you see is a netty-internal id, the chid is the Id of the CommandHandler for easier command/process correlation.

Happy to improve correlation any further, but let's take this into a new ticket if you feel there is something we can actually do.

@mp911de mp911de removed the status: waiting-for-feedback We need additional information before we can continue label Jun 14, 2018
@mp911de mp911de added this to the Backlog milestone Jun 16, 2018
@mp911de mp911de changed the title multi()/exec() is not working correctly with master/slave config and SLAVE readFrom MULTI is dispatched to slave nodes using SLAVE readFrom Jun 21, 2018
mp911de added a commit that referenced this issue Jun 21, 2018
MULTI is no longer considered a read-only command to prevent accidental dispatch to slave nodes.
mp911de added a commit that referenced this issue Jun 21, 2018
MULTI is no longer considered a read-only command to prevent accidental dispatch to slave nodes.
mp911de added a commit that referenced this issue Jun 21, 2018
MULTI is no longer considered a read-only command to prevent accidental dispatch to slave nodes.
mp911de added a commit that referenced this issue Jun 21, 2018
MULTI is no longer considered a read-only command to prevent accidental dispatch to slave nodes.
@mp911de mp911de modified the milestones: Backlog, Lettuce 4.4.6 Jun 21, 2018
@mp911de
Copy link
Collaborator

mp911de commented Jun 21, 2018

That's fixed now, we no longer consider MULTI to be a read-only command.

@mp911de mp911de closed this as completed Jun 21, 2018
@mp911de
Copy link
Collaborator

mp911de commented Jun 21, 2018

See #800 for binding transactions in master/slave to the master node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A general bug
Projects
None yet
Development

No branches or pull requests

3 participants