Crash due to race condition in TCPServerDispatcher #2085

jrfeenst · 2018-01-11T23:23:18Z

Expected behavior

Clean server shutdown with no crash.

Actual behavior

#0 0x00007fc7d030a0c1 in pthread_kill () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007fc7c0da4ed1 in ?? () from .so
#2
#3 0x0000000009078380 in ?? ()
#4 0x00007fc7c2ff429f in Poco::PooledThread::run() () from bin/glnxa64/libPocoFoundation.so.48
#5 0x00007fc7c2feff86 in Poco::ThreadImpl::runnableEntry(void*) () from bin/glnxa64/libPocoFoundation.so.48
#6 0x00007fc7d0305064 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#7 0x00007fc7d1ba462d in clone () from /lib/x86_64-linux-gnu/libc.so.6

Steps to reproduce the problem

This is a very rare issue that has been causing sporadic crashes in our build and test environment. I believe the issue is reproducible when there is a server.stop() call concurrent with a new socket connection.

I believe the problem is that a new socket connection enqueues the TCPServerDispatcher instance without changing the reference count. That means that if you shutdown the server before the TCPServerDispatcher::run method starts the ref count would go to zero before the AutoPtr in run() has a chance to increment the ref count.

I think that incrementing the ref count before enqueuing TCPServerDispatcher (and decrementing at the end of run()) would resolve the issue and ensure that the run() method can be safely called during server shutdown.

POCO version

1.7.9

Compiler and version

gcc 6.3.0-GLIBC2.11

Operating system and version

debian 8

Other relevant information

aleks-f · 2018-01-11T23:59:09Z

Possibly fixed with #1965

jrfeenst · 2018-01-12T12:05:08Z

I don't think that #1965 fixes this issue. It seems like it would still be possible to enqueue the TCPServerDispatcher and there is nothing to prevent it from getting destructed before the thread starts running it. It seems like there is an attempt to keep the Runnable object alive by adding a self reference inside the run() method (guard local), but that would only work once the run method starts executing. It's still possible to destruct it before the thread starts executing run().

Burgch · 2018-01-12T14:11:01Z

After thinking this through, I don't think there's an easy solution. It would be possible to call Poco::Net::TCPServerDispatcher::duplicate from within Poco::Net::TCPServerDispatcher::enqueue, and then in Poco::Net::TCPServerDispatcher::run() acquire the Poco::AutoPtr guard with shared set to false to prevent calling duplicate again, but then this dispatcher will be leaked if the thread is stopped before it starts running. Perhaps that's an acceptable tradeoff in exchange for avoiding crashes?

aleks-f · 2022-05-29T01:20:13Z

@obiltschnig

why is this stil opened? i thought it was fixed (as much as it can be) with this

obiltschnig · 2022-05-29T17:41:45Z

I don't know.

aleks-f added the bug label Jan 12, 2018

aleks-f self-assigned this Jan 12, 2018

obiltschnig added this to the Release 1.8.2 milestone Jan 12, 2018

Burgch mentioned this issue Jan 12, 2018

Fix race condition in TCPServerDispatcher.cpp #2088

Merged

obiltschnig modified the milestones: Release 1.8.2, Release 1.9.0 Feb 17, 2018

obiltschnig closed this as completed in #2088 Mar 5, 2018

obiltschnig modified the milestones: Release 1.9.0, Release 1.9.1 Mar 7, 2018

obiltschnig reopened this Mar 7, 2018

Burgch mentioned this issue Mar 13, 2018

Poco 1.9.1 version of #2088 - Fix for TCPServerDispatcher race/crash #2219

Merged

aleks-f mentioned this issue May 9, 2018

Bug: Segfault in ThreadPool.cpp / 1.9.0 / linux #2325

Closed

Mardalemer mentioned this issue Jul 9, 2019

Race in TCPServerDispatcher::stop #2746

Closed

obiltschnig modified the milestones: Release 1.10.0, Release 1.11.0 Jan 21, 2020

aleks-f added the pending label May 29, 2022

aleks-f closed this as completed May 30, 2022

aleks-f removed the pending label Jul 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash due to race condition in TCPServerDispatcher #2085

Crash due to race condition in TCPServerDispatcher #2085

jrfeenst commented Jan 11, 2018

aleks-f commented Jan 11, 2018

jrfeenst commented Jan 12, 2018

Burgch commented Jan 12, 2018

aleks-f commented May 29, 2022

obiltschnig commented May 29, 2022

Crash due to race condition in TCPServerDispatcher #2085

Crash due to race condition in TCPServerDispatcher #2085

Comments

jrfeenst commented Jan 11, 2018

Expected behavior

Actual behavior

Steps to reproduce the problem

POCO version

Compiler and version

Operating system and version

Other relevant information

aleks-f commented Jan 11, 2018

jrfeenst commented Jan 12, 2018

Burgch commented Jan 12, 2018

aleks-f commented May 29, 2022

obiltschnig commented May 29, 2022