Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash due to race condition in TCPServerDispatcher #2085

Closed
jrfeenst opened this issue Jan 11, 2018 · 5 comments · Fixed by #2088
Closed

Crash due to race condition in TCPServerDispatcher #2085

jrfeenst opened this issue Jan 11, 2018 · 5 comments · Fixed by #2088
Assignees
Labels

Comments

@jrfeenst
Copy link

Expected behavior

Clean server shutdown with no crash.

Actual behavior

#0 0x00007fc7d030a0c1 in pthread_kill () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007fc7c0da4ed1 in ?? () from .so
#2
#3 0x0000000009078380 in ?? ()
#4 0x00007fc7c2ff429f in Poco::PooledThread::run() () from bin/glnxa64/libPocoFoundation.so.48
#5 0x00007fc7c2feff86 in Poco::ThreadImpl::runnableEntry(void*) () from bin/glnxa64/libPocoFoundation.so.48
#6 0x00007fc7d0305064 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#7 0x00007fc7d1ba462d in clone () from /lib/x86_64-linux-gnu/libc.so.6

Steps to reproduce the problem

This is a very rare issue that has been causing sporadic crashes in our build and test environment. I believe the issue is reproducible when there is a server.stop() call concurrent with a new socket connection.

I believe the problem is that a new socket connection enqueues the TCPServerDispatcher instance without changing the reference count. That means that if you shutdown the server before the TCPServerDispatcher::run method starts the ref count would go to zero before the AutoPtr in run() has a chance to increment the ref count.

I think that incrementing the ref count before enqueuing TCPServerDispatcher (and decrementing at the end of run()) would resolve the issue and ensure that the run() method can be safely called during server shutdown.

POCO version

1.7.9

Compiler and version

gcc 6.3.0-GLIBC2.11

Operating system and version

debian 8

Other relevant information

@aleks-f
Copy link
Member

aleks-f commented Jan 11, 2018

Possibly fixed with #1965

@aleks-f aleks-f added the bug label Jan 12, 2018
@aleks-f aleks-f self-assigned this Jan 12, 2018
@obiltschnig obiltschnig added this to the Release 1.8.2 milestone Jan 12, 2018
@jrfeenst
Copy link
Author

I don't think that #1965 fixes this issue. It seems like it would still be possible to enqueue the TCPServerDispatcher and there is nothing to prevent it from getting destructed before the thread starts running it. It seems like there is an attempt to keep the Runnable object alive by adding a self reference inside the run() method (guard local), but that would only work once the run method starts executing. It's still possible to destruct it before the thread starts executing run().

@Burgch
Copy link
Contributor

Burgch commented Jan 12, 2018

After thinking this through, I don't think there's an easy solution. It would be possible to call Poco::Net::TCPServerDispatcher::duplicate from within Poco::Net::TCPServerDispatcher::enqueue, and then in Poco::Net::TCPServerDispatcher::run() acquire the Poco::AutoPtr guard with shared set to false to prevent calling duplicate again, but then this dispatcher will be leaked if the thread is stopped before it starts running. Perhaps that's an acceptable tradeoff in exchange for avoiding crashes?

@aleks-f
Copy link
Member

aleks-f commented May 29, 2022

@obiltschnig

why is this stil opened? i thought it was fixed (as much as it can be) with this

@obiltschnig
Copy link
Member

I don't know.

@aleks-f aleks-f closed this as completed May 30, 2022
@aleks-f aleks-f removed the pending label Jul 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants