Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test pingNotAllowedInSubscriptionState #2322

Closed
sancar opened this issue Feb 13, 2023 · 7 comments
Closed

Flaky test pingNotAllowedInSubscriptionState #2322

sancar opened this issue Feb 13, 2023 · 7 comments
Labels
status: waiting-for-feedback We need additional information before we can continue

Comments

@sancar
Copy link

sancar commented Feb 13, 2023

Bug Report

Current Behavior

Test pingNotAllowedInSubscriptionState fails randomly in our tests against Upstash Redis. After investigating if our implementation of Redis has a problem, we have concluded that the problem in lettuce side.

Test failure happens in the following assertion

  assertThatThrownBy(() -> TestFutures.getOrTimeout(pubsub.echo("ping"))).isInstanceOf(RedisException.class)
                .hasMessageContaining("not allowed");

Cause Of the Problem

The test fails because of a race condition between two threads. More specifically:
The future for subscribe is completed after we are updating the internal state of PubsubEndpoint(adding a new channel to the channels set). Because of that, in some unlucky timing, the test (PubSubCommandHandler#notifyPushListeners) fails.

  1. subscribe future is completed.
  2. new channel is not put into the channels map in PubsubEndpoint
  3. Test continues to send an echo to the server.
  4. Client side for PubSubEndpint#isSubscribed returns false because channels and patterns are empty.
  5. Server runs the command and does not return an error because the client is talking with resp3 and all commands are allowed.
  6. Even if we were talking in resp2 test would fail because of the error message of the exception.

Possible Solution

Remove local checks for allowed commands and rely on the exception from redis.
This way, we can both remove the bug and simplify the library.
More details in pr that I have prepared for the proposed solution:
#2314
This solution is eliminated because the local checks are needed see #579

@tishun
Copy link
Collaborator

tishun commented Jun 28, 2024

Is this issue still relevant?

@tishun tishun added the status: waiting-for-feedback We need additional information before we can continue label Jun 28, 2024
@sancar
Copy link
Author

sancar commented Jun 29, 2024

Hi @tishun,

Is this a question for me? I'll need to read the description again(I probably already forgot)and check the lettuce code to see if there have been any changes. If you're one of the maintainers, I'd appreciate it if you could handle that. I'll try to look into it if I get some free time next week, but I can't make any promises.

In our test suite (which includes a copy of your test suite) against our Redis implementation, we decided to skip that test and moved on.

@tishun
Copy link
Collaborator

tishun commented Jun 30, 2024

Hi @tishun,

Is this a question for me?

Hey, sorry for not addressing you directly, yes, it was meant for you.

If you're one of the maintainers, I'd appreciate it if you could handle that.

Trying to wrap my head around this. What you are saying is that there is some delay between the time you subscribe to a channel and the time the logic that disables sending commands kicks in. As a result sometimes you would get the RedisException and some times you wont. Is that mostly correct?

@sancar
Copy link
Author

sancar commented Jul 1, 2024

Yes, it sounds correct. If the codebase is still same, channels map in PubsubEndpoint is updated in the background in the background after the subscribe future is completed. Depending on whether that works first , or the test does pubsub.echo("ping") first(which checks PubSubEndpint#isSubscribed internally), the exception behavior changes.

@tishun
Copy link
Collaborator

tishun commented Jul 2, 2024

Yes, it sounds correct. If the codebase is still same, channels map in PubsubEndpoint is updated in the background in the background after the subscribe future is completed. Depending on whether that works first , or the test does pubsub.echo("ping") first(which checks PubSubEndpint#isSubscribed internally), the exception behavior changes.

Well since your report quite some things have changed in the code including the test case itself.
#2594 changed how RESP3 is handled, and even before that #1601 changed the way this test works.

I suggest that you try and update your code with the latest changes and see if you have issues with them.
Our CI/CD runs this test daily and we haven't caught any issues.

@sancar
Copy link
Author

sancar commented Jul 2, 2024

Ok, thanks for the update. We will do so.

@tishun
Copy link
Collaborator

tishun commented Jul 17, 2024

Let's close this for now, if you experience any more issues please reopen

@tishun tishun closed this as not planned Won't fix, can't repro, duplicate, stale Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: waiting-for-feedback We need additional information before we can continue
Projects
None yet
Development

No branches or pull requests

2 participants