Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hanging/Freezing persistent subscription test on net48 on CI #103

Closed
shaan1337 opened this issue Jan 20, 2021 · 4 comments
Closed

Hanging/Freezing persistent subscription test on net48 on CI #103

shaan1337 opened this issue Jan 20, 2021 · 4 comments
Assignees

Comments

@shaan1337
Copy link
Member

The test EventStore.Client.Bugs.Issue_1125.persistent_subscription_delivers_all_events hangs on CI on the net48 build in this PR: #93

It was thus skipped temporarily in this commit until we figure out the root cause: 92804a7

@shaan1337
Copy link
Member Author

shaan1337 commented Feb 3, 2021

Progress updates:

  • Managed to consistently reproduce the issue on Ubuntu 18.04 by adding GRPC_TRACE=all and GRPC_VERBOSITY=debug to the environment which slows down the execution and makes the issue much more likely to occur. To analyze the grpc traces, I also had to redirect standard output and standard error to a file with a dup2 system call but not sure it's a necessary condition to make the issue reproducible.

  • Only a single iteration of the test is required to reproduce the issue (instead of 50 in the test)

  • Analyzing the backtraces with lldb shows that the mono Finalizer thread is waiting on a futex. I'm not 100% sure this is the cause but it's very likely given the information below - the Dispose calls are probably being done by the Finalizer thread. The fact that the test successfully passes and then the hanging occurs also supports this hypothesis.

  • The issue occurs when we call Dispose() on AsyncDuplexStreamingCall - it seems that it's currently in use only with persistent subscriptions.

  • Removing _call?.Dispose(); here stops the hanging:

  • It is also related to the dispose calls on the AsyncDuplexStreamingCall in the interceptors - replacing any of the response.Dispose calls with an empty action stops the hanging:

    response.GetStatus, response.GetTrailers, response.Dispose);

Summary:

It appears that multiple simultaneous calls to Dispose() on AsyncDuplexStreamingCall causes a deadlock due to a race condition.

I've attempted a minimal reproduction (without EventStore-specific code) but I've been unsuccessful so far.

@shaan1337
Copy link
Member Author

Managed to get a minimal repro of the issue: https://github.com/shaan1337/async-duplex-streaming-call-hang
Opened an issue on gRPC repo: grpc/grpc#25488

@tambeau
Copy link

tambeau commented Jan 13, 2022

PR: #93 is closed

@timothycoleman
Copy link
Contributor

In the end dealt with in #172 by dropping the (short lived) support for net48

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants