-
Notifications
You must be signed in to change notification settings - Fork 992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Application-level exceptions in Pub/Sub notifications mess up pub sub decoding state and cause timeouts #997
Comments
Can you provide a test case to reproduce the issue? Spinning up a massive amount of messages does not help to reliably diagnose the issue. |
The code snippet that I have shown above should be enough to reproduce the
issue. If you are not able to reproduce using the above code, will surly
write an test case in PubSubCommandHandlerUnitTests
…On Sun 10 Mar, 2019, 4:25 PM Mark Paluch, ***@***.***> wrote:
Can you provide a test case to reproduce the issue? Spinning up a massive
amount of messages does not help to reliably diagnose the issue.
PubSubCommandHandlerUnitTests
<https://github.com/lettuce-io/lettuce-core/blob/0e0d62ada7cfdcd981e8a366973b896a12c3571c/src/test/java/io/lettuce/core/pubsub/PubSubCommandHandlerUnitTests.java#L57>
is a good starting point for isolated Pub/Sub testing.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#997 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABXGSVb6eIfxqpPPxWRUrrd2l4UWTskSks5vVOSsgaJpZM4bmpVI>
.
|
It took me 3 days to pinpoint this issue, so I can understand your concern. Now the same test code takes 4 to 5 runs to reproduce RedisCommandTimeoutException. |
Thanks. As general rule of thumb, raise that kind of tickets earlier so we can assist you with problem analysis. |
With the below code snippet, you must be able to reproduce the issue. List<RedisURI> rList = new ArrayList<>();
rList.add(new RedisURI("127.0.0.1", 7000, Duration.ofSeconds(15)));
rList.add(new RedisURI("127.0.0.1", 7001, Duration.ofSeconds(15)));
rList.add(new RedisURI("127.0.0.1", 7002, Duration.ofSeconds(15)));
RedisClusterClient clusterClient = RedisClusterClient.create(rList);
StatefulRedisClusterPubSubConnection<String, String> redisSub = clusterClient.connectPubSub();
StatefulRedisClusterConnection<String, String> redisPub = clusterClient.connect();
CountDownLatch subCl = new CountDownLatch(1);
redisSub.addListener(new RedisPubSubAdapter<String, String>() {
private void sendMsg() {
try {
if(subCl.getCount() > 0) {
subCl.countDown();
/*
Sleep, so that UNSUBSCRIBE response from redis-server is read by the TCP's
receive queue and intern read by netty, in the next event loop
*/
Thread.sleep(2000);
}
} catch (InterruptedException e) {
e.printStackTrace();
}
throw new NullPointerException();
}
@Override
public void message(String channel, String message) {
sendMsg();
}
@Override
public void message(String pattern, String channel, String message) {
sendMsg();
}
});
AtomicInteger count = new AtomicInteger();
String channel = "test_sub_0";
//Subscribe
redisSub.sync().subscribe(channel);
//Publish
redisPub.async().publish(channel, "text c -"+count.getAndIncrement());
redisPub.async().publish(channel, "text c -"+count.getAndIncrement());
subCl.await(); //Wait for first message
redisSub.sync().unsubscribe(channel); |
Thanks a lot for your support. I took the code from your unit test commit and integrated it into Lettuce. It took me quite a while to understand the issue. The fix should be adding try/catch blocks around notification and around listener invocation with appropriate logging. |
Pub/Sub listener callbacks are now guarded against exceptions bubbling up into channel processing. Instead, exceptions are logged. Listener notification stops on the first exception. These guards prevent exceptions interrupting the state update flow which could previously cause the state machine of decoding leave in an invalid state.
Pub/Sub listener callbacks are now guarded against exceptions bubbling up into channel processing. Instead, exceptions are logged. Listener notification stops on the first exception. These guards prevent exceptions interrupting the state update flow which could previously cause the state machine of decoding leave in an invalid state.
That's fixed now. |
Current Behavior
If a client exception occurs, while notifying a subscriber and if the next response form Redis Server is an UNSUBSCRIBE response, lettuce will discard this UNSUBSCRIBE data from the buffer, resulting in RedisCommandTimeoutException
Input Code
Expected behavior/code
Must not get RedisCommandTimeoutException
Environment
Possible Solution
The issue occurs because the old output is not cleared, which results in
canDecode
of PubSubCommandHandler to return false, there by discarding all the readBytes of the bufferPlacing all the occurrence of
output = new PubSubOutput<>(codec);
inPubSubCommandHandler.decode
method inside finally block could solve this. But 'am not sure whether it's the right solutionThe text was updated successfully, but these errors were encountered: