Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPLWorkerThreadPool: regression in master #10825

Closed
rouault opened this issue Sep 17, 2024 · 2 comments · Fixed by #10826
Closed

CPLWorkerThreadPool: regression in master #10825

rouault opened this issue Sep 17, 2024 · 2 comments · Fixed by #10826
Assignees

Comments

@rouault
Copy link
Member

rouault commented Sep 17, 2024

@abellgithub I believe this is related to your recent changes

pytest autotest/utilities/test_gdal_viewshed.py --capture=no -ra -vv --capture=no -ra -vv on a debug build randomly stall or crashes for me, with traces like:

gdal_viewshed_path = '/home/even/gdal/gdal/build_cmake/apps/gdal_viewshed', tmp_path = PosixPath('/tmp/pytest-of-even/pytest-648/test_gdal_viewshed0')
viewshed_input = '/tmp/pytest-of-even/pytest-648/test_gdal_viewshed0/test_gdal_viewshed_in.tif'

    def test_gdal_viewshed(gdal_viewshed_path, tmp_path, viewshed_input):
    
        viewshed_out = str(tmp_path / "test_gdal_viewshed_out.tif")
    
        _, err = gdaltest.runexternal_out_and_err(
            gdal_viewshed_path
            + " -oz {} -ox {} -oy {} {} {}".format(
                oz[0], ox[0], oy[0], viewshed_input, viewshed_out
            )
        )
>       assert err is None or err == ""
E       assert ("ERROR 7: Assertion `psWorkerThread->bMarkedAsWaiting' failed in file `/home/even/gdal/gdal/port/cpl_worker_thread_pool.cpp', line 218\n\nERROR ret code = -6" is None or "ERROR 7: Assertion `psWorkerThread->bMarkedAsWaiting' failed in file `/home/even/gdal/gdal/port/cpl_worker_thread_pool.cpp', line 218\n\nERROR ret code = -6" == ''
E         + ERROR 7: Assertion `psWorkerThread->bMarkedAsWaiting' failed in file `/home/even/gdal/gdal/port/cpl_worker_thread_pool.cpp', line 218
E         + 
E         + ERROR ret code = -6)

@rouault
Copy link
Member Author

rouault commented Sep 17, 2024

I also get a segmentation fault without assertion when running under gdb:

``` $ gdb --args /home/even/gdal/gdal/build_cmake/apps/gdal_viewshed -oz 100 -ox 621528 -oy 4817617 /tmp/pytest-of-even/pytest-660/test_gdal_viewshed0/test_gdal_viewshed_in.tif /tmp/pytest-of-even/pytest-660/test_gdal_viewshed0/test_gdal_viewshed_out.tif GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.2) 9.2 [...] (gdb) r Starting program: /home/even/gdal/gdal/build_cmake/apps/gdal_viewshed -oz 100 -ox 621528 -oy 4817617 /tmp/pytest-of-even/pytest-660/test_gdal_viewshed0/test_gdal_viewshed_in.tif /tmp/pytest-of-even/pytest-660/test_gdal_viewshed0/test_gdal_viewshed_out.tif [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [New Thread 0x7fffe1fad700 (LWP 235296)] [New Thread 0x7fffe17ac700 (LWP 235297)] [New Thread 0x7fffe0fab700 (LWP 235299)] [New Thread 0x7fffdbfff700 (LWP 235301)] 0...10...20...30...40...50[Thread 0x7fffe1fad700 (LWP 235296) exited]

Thread 3 "gdal_viewshed" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe17ac700 (LWP 235297)]
0x0000555555b44598 in ?? ()
(gdb) bt
#0 0x0000555555b44598 in ?? ()
#1 0x00007ffff6422b9a in std::function<void ()>::operator()() const (this=0x7fffe17a9e00) at /usr/include/c++/9/bits/std_function.h:688
#2 0x00007ffff641fbe4 in CPLWorkerThreadPool::WorkerThreadFunction (user_data=0x555555707c20) at /home/even/gdal/gdal/port/cpl_worker_thread_pool.cpp:122
#3 0x00007ffff638af44 in CPLStdCallThreadJacket (pData=0x555555b51670) at /home/even/gdal/gdal/port/cpl_multiproc.cpp:2014
#4 0x00007fffe6762609 in start_thread (arg=) at pthread_create.c:477
#5 0x00007ffff5690353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) thread apply all bt

Thread 5 (Thread 0x7fffdbfff700 (LWP 235301)):
#0 futex_wait_cancelable (private=, expected=0, futex_word=0x7fffc80019ec) at ../sysdeps/nptl/futex-internal.h:183
#1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7fffc8001998, cond=0x7fffc80019c0) at pthread_cond_wait.c:508
#2 __pthread_cond_wait (cond=0x7fffc80019c0, mutex=0x7fffc8001998) at pthread_cond_wait.c:647
#3 0x00007ffff599de30 in std::condition_variable::wait(std::unique_lockstd::mutex&) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007ffff6421828 in std::condition_variable::wait<CPLJobQueue::WaitCompletion(int)::<lambda()> >(std::unique_lockstd::mutex &, CPLJobQueue::<lambda()>) (this=0x7fffc80019c0, __lock=..., __p=...) at /usr/include/c++/9/condition_variable:101
#5 0x00007ffff64213c8 in CPLJobQueue::WaitCompletion (this=0x7fffc8001990, nMaxRemainingJobs=0) at /home/even/gdal/gdal/port/cpl_worker_thread_pool.cpp:636
#6 0x00007ffff669f721 in gdal::viewshed::ViewshedExecutor::processLine (this=0x7fffffffc8c0, nLine=72, vLastLineVal=std::vector of length 103, capacity 103 = {...}) at /home/even/gdal/gdal/alg/viewshed/viewshed_executor.cpp:629
#7 0x00007ffff669fa38 in gdal::viewshed::ViewshedExecutor::<lambda()>::operator()(void) const (__closure=0x555555a726c0) at /home/even/gdal/gdal/alg/viewshed/viewshed_executor.cpp:680
#8 0x00007ffff66a088b in std::_Function_handler<void(), gdal::viewshed::ViewshedExecutor::run()::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/9/bits/std_function.h:300
#9 0x00007ffff6422b9a in std::function<void ()>::operator()() const (this=0x55555576ba48) at /usr/include/c++/9/bits/std_function.h:688
#10 0x00007ffff64211d5 in CPLJobQueue::<lambda()>::operator()(void) const (__closure=0x55555576ba40) at /home/even/gdal/gdal/port/cpl_worker_thread_pool.cpp:618
#11 0x00007ffff6421d0c in std::_Function_handler<void(), CPLJobQueue::SubmitJob(std::function<void()>)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/9/bits/std_function.h:300
#12 0x00007ffff6422b9a in std::function<void ()>::operator()() const (this=0x7fffdbffce00) at /usr/include/c++/9/bits/std_function.h:688
#13 0x00007ffff641fbe4 in CPLWorkerThreadPool::WorkerThreadFunction (user_data=0x555555b3d330) at /home/even/gdal/gdal/port/cpl_worker_thread_pool.cpp:122
#14 0x00007ffff638af44 in CPLStdCallThreadJacket (pData=0x555555703800) at /home/even/gdal/gdal/port/cpl_multiproc.cpp:2014
#15 0x00007fffe6762609 in start_thread (arg=) at pthread_create.c:477
#16 0x00007ffff5690353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7fffe0fab700 (LWP 235299)):
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff6422b9a in std::function<void ()>::operator()() const (this=0x7fffe0fa8e00) at /usr/include/c++/9/bits/std_function.h:688
#2 0x00007ffff641fbe4 in CPLWorkerThreadPool::WorkerThreadFunction (user_data=0x555555754ae0) at /home/even/gdal/gdal/port/cpl_worker_thread_pool.cpp:122
#3 0x00007ffff638af44 in CPLStdCallThreadJacket (pData=0x555555a72790) at /home/even/gdal/gdal/port/cpl_multiproc.cpp:2014
#4 0x00007fffe6762609 in start_thread (arg=) at pthread_create.c:477
#5 0x00007ffff5690353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7fffe17ac700 (LWP 235297)):
#0 0x0000555555b44598 in ?? ()
#1 0x00007ffff6422b9a in std::function<void ()>::operator()() const (this=0x7fffe17a9e00) at /usr/include/c++/9/bits/std_function.h:688
#2 0x00007ffff641fbe4 in CPLWorkerThreadPool::WorkerThreadFunction (user_data=0x555555707c20) at /home/even/gdal/gdal/port/cpl_worker_thread_pool.cpp:122
#3 0x00007ffff638af44 in CPLStdCallThreadJacket (pData=0x555555b51670) at /home/even/gdal/gdal/port/cpl_multiproc.cpp:2014
#4 0x00007fffe6762609 in start_thread (arg=) at pthread_create.c:477
#5 0x00007ffff5690353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7fffe1fb37c0 (LWP 234501)):
#0 futex_wait_cancelable (private=, expected=0, futex_word=0x555555b8e238) at ../sysdeps/nptl/futex-internal.h:183
#1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x555555b8e1e8, cond=0x555555b8e210) at pthread_cond_wait.c:508
#2 __pthread_cond_wait (cond=0x555555b8e210, mutex=0x555555b8e1e8) at pthread_cond_wait.c:647
#3 0x00007ffff599de30 in std::condition_variable::wait(std::unique_lockstd::mutex&) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007ffff6421828 in std::condition_variable::wait<CPLJobQueue::WaitCompletion(int)::<lambda()> >(std::unique_lockstd::mutex &, CPLJobQueue::<lambda()>) (this=0x555555b8e210, __lock=..., __p=...) at /usr/include/c++/9/condition_variable:101
#5 0x00007ffff64213c8 in CPLJobQueue::WaitCompletion (this=0x555555b8e1e0, nMaxRemainingJobs=0) at /home/even/gdal/gdal/port/cpl_worker_thread_pool.cpp:636
#6 0x00007ffff642103f in CPLJobQueue::~CPLJobQueue (this=0x555555b8e1e0, __in_chrg=) at /home/even/gdal/gdal/port/cpl_worker_thread_pool.cpp:572
#7 0x00007ffff6666eae in std::default_delete::operator() (this=0x7fffffffc6e8, __ptr=0x555555b8e1e0) at /usr/include/c++/9/bits/unique_ptr.h:81
#8 0x00007ffff6665dea in std::unique_ptr<CPLJobQueue, std::default_delete >::~unique_ptr (this=0x7fffffffc6e8, __in_chrg=) at /usr/include/c++/9/bits/unique_ptr.h:292
#9 0x00007ffff669fd32 in gdal::viewshed::ViewshedExecutor::run (this=0x7fffffffc8c0) at /home/even/gdal/gdal/alg/viewshed/viewshed_executor.cpp:660
--Type for more, q to quit, c to continue without paging--
#10 0x00007ffff669c902 in gdal::viewshed::Viewshed::run (this=0x7fffffffcf40, band=0x5555556a4fc0, pfnProgress=0x7ffff641c1b4 <GDALTermProgress(double, char const*, void*)>, pProgressArg=0x0) at /home/even/gdal/gdal/alg/viewshed/viewshed.cpp:353
#11 0x000055555558d8b8 in main (argc=9, argv=0x555555707020) at /home/even/gdal/gdal/apps/gdal_viewshed.cpp:362

</details>

@rouault
Copy link
Member Author

rouault commented Sep 17, 2024

On a debug build and with -DCMAKE_CXX_FLAGS_DEBUG=-DDEBUG so that CPLAssert() is turned on

And this seems to be specific with gcc 9.4 of Ubuntu 20.04. Can't reproduce with gcc 13.2 of Ubuntu 24.04

@rouault rouault self-assigned this Sep 17, 2024
rouault added a commit to rouault/gdal that referenced this issue Sep 17, 2024
…tool=helgrind autotest/cpp/gdal_unit_test --gtest_filter=test_cpl.CPLWorkerThreadPool' happy

Related to OSGeo#10825
rouault added a commit to rouault/gdal that referenced this issue Sep 17, 2024
…n of NotifyQueue symbols, leading to mutex corruption

Fixes OSGeo#10825
rouault added a commit to rouault/gdal that referenced this issue Sep 18, 2024
…tool=helgrind autotest/cpp/gdal_unit_test --gtest_filter=test_cpl.CPLWorkerThreadPool' happy (master only)

Related to OSGeo#10825
rouault added a commit to rouault/gdal that referenced this issue Sep 18, 2024
…n of NotifyQueue symbols, leading to mutex corruption

Fixes OSGeo#10825
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant