-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use reader lock where possible #473
Comments
More links (about unlocking in case of POSIX rw lock): |
The major operations that could benefit from introducing RW lock are
However GC_do_blocking writes SP to GC context of the current thread, thus there seems a need of a store release barrier (so that changes made to GC threads context be visible to the thread which acquires the writer lock (to do garbage collection which involves reading of all GC thread contexts). I've added 3 macros to support reader variant of the GC lock:
|
About READER_UNLOCK_RELEASE() implementation: does pthread_rw_rdunlock contain a release barrier? Same question for ReleaseSRWLockShared. |
To support this I'm planning to add GC_call_with_reader_lock(fn, arg, int release) // where release!=0 means to call READER_UNLOCK_RELEASE() instead of READER_UNLOCK(). |
I haven't looked at the potential bdwgc use case here carefully. In general
a rwlock reader unlock should "synchronize with", i.e. make memory
operations visible to a subsequent writer lock acquisition, and the same
with a writer unlock and a subsequent reader lock. C++ is clear about this;
Posix less so, but I think it basically still says that. I'm pretty sure
that Android bionic gets it right. I think this mostly has to be the case
for anything to work correctly. However it doesn't seem completely
inconceivable that somebody would mishandle write in a reader critical
section, which is your case.
Note that the reader unlock operation needs to have at least partial
release semantics to ensure that the reader critical section can't see
writes from a later writer critical section. But hardware tends to have
fences that are restricted to loads or stores, so it may still be possible
to get this wrong by not supporting writes . But I think it would be hard
to defend such an interpretation in light of current specifications. Thus I
would be inclined to assume that relock unlock operations have the release
semantics you need.
A general concern about the use of rwlocks is that individual lock/unlock
operations tend to be no cheaper, and maybe slightly more expensive than
simple mutex operations. And readers will still contend on the cache line
containing the lock. So the question is whether the critical sections are
long enough for actual reader concurrency buys enough to make this
worthwhile.
Hans
…On Fri, Sep 9, 2022 at 12:15 AM Ivan Maidanski ***@***.***> wrote:
@hboehm <https://github.com/hboehm>, @bcardiff
<https://github.com/bcardiff>, do you know about release semantic of
unlocking of the reader lock (so that to see data changes after acquiring
the exclusive lock)? See my posts above.
—
Reply to this email directly, view it on GitHub
<#473 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHUHKX4K6MTISCCBJWU3VLV5LPZ3ANCNFSM6AAAAAAQEUVWVM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I think the concern here would be with an implementation that implements
read-unlock with only a "load-store fence", i.e. a fence that orders only
loads with respect to stores, and doesn't order stores with respect to
stores. Such a fence exists on a few architectures (SPARC, RISC-V), but not
in most programming languages.
But, I think it would take a very creative interpretation of the Posix spec
to allow that, since it doesn't say anything about restricting reader
critical sections to load. In fact, it's pretty clear from the spec that
if you have only two threads, one of which uses read locks and the other
uses only write locks, then a rwlock should behave like a mutex. But that
would be false for such an implementation that uses a load-store fence for
the reader. I'm more and more convinced that if pthread_rwlock_unlock
doesn't have the release semantics Ivan wants, that should just be reported
as a bug.
…On Mon, Sep 12, 2022 at 1:14 PM Brian J. Cardiff ***@***.***> wrote:
In Crystal, the RWLock uses mostly sequential consistent semantics (as
explained in
https://llvm.org/docs/LangRef.html#atomic-memory-ordering-constraints)
for it's implementation. This has proven better for us than acquiring a GC
lock.
For reference the implementation is at
https://github.com/crystal-lang/crystal/blob/master/src/crystal/rw_lock.cr.
The relevant implementation of the Atomic counter are
https://github.com/crystal-lang/crystal/blob/master/src/atomic.cr#L50-L70
and
https://github.com/crystal-lang/crystal/blob/master/src/atomic.cr#L182-L195
. The two things that might be worth mentioning is that write_unlock uses a
lazy_set (ie: normal write) at
https://github.com/crystal-lang/crystal/blob/master/src/crystal/rw_lock.cr#L35
and that we yield the cpu while waiting on reader and writer locks at
https://github.com/crystal-lang/crystal/blob/master/src/crystal/rw_lock.cr#L8-L10
and
https://github.com/crystal-lang/crystal/blob/master/src/crystal/rw_lock.cr#L24-L32
. That yielding is arch dependent and is implemented by
https://github.com/crystal-lang/crystal/blob/master/src/intrinsics.cr#L163-L169
.
Maybe @ggiraldez <https://github.com/ggiraldez> or @ysbaddaden
<https://github.com/ysbaddaden> can offer some other insight from their
time implementing this.
—
Reply to this email directly, view it on GitHub
<#473 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHUHKQWJZPGJ52PWDIFSATV56FKXANCNFSM6AAAAAAQEUVWVM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
(refactoring) Issue #473 (bdwgc). * alloc.c (GC_get_disable_automatic_collection, GC_get_stop_func, GC_get_start_callback, GC_get_on_collection_event): Use READER_LOCK/UNLOCK() instead of exclusive LOCK/UNLOCK(). * dyn_load.c [DARWIN_DEBUG && !NO_DEBUGGING] (GC_dyld_image_add, GC_dyld_image_remove): Likewise. * finalize.c (GC_get_toggleref_func, GC_get_await_finalize_proc, GC_get_interrupt_finalizers): Likewise. * mark.c (GC_get_on_mark_stack_empty, GC_print_trace): Likewise. * misc.c (GC_get_heap_usage_safe, GC_get_prof_stats, GC_get_warn_proc, GC_get_abort_func, GC_dump, GC_get_memory_use, GC_get_oom_fn, GC_get_on_heap_resize, GC_get_finalizer_notifier): Likewise. * pthread_stop_world.c [GC_ENABLE_SUSPEND_THREAD] (GC_is_thread_suspended): Likewise. * pthread_support.c (GC_self_thread, GC_get_my_stackbottom, GC_get_on_thread_event, GC_get_sp_corrector): Likewise. * ptr_chck.c (GC_get_same_obj_print_proc, GC_get_is_valid_displacement_print_proc, GC_get_is_visible_print_proc): Likewise. * headers.c (GC_next_block, GC_prev_block): Replace I_HOLD_LOCK() to I_HOLD_READER_LOCK(). * mark_rts.c (GC_roots_present): Likewise. * pthread_support.c (GC_count_threads, GC_segment_is_thread_stack, GC_greatest_stack_base_below): Likewise. * reclaim.c (GC_enumerate_reachable_objects_inner): Likewise. * win32_threads.c [GC_PTHREADS] (GC_lookup_by_pthread): Likewise. * include/gc/gc.h (GC_get_oom_fn, GC_get_on_heap_resize, GC_get_on_collection_event, GC_get_on_thread_event, GC_get_finalizer_notifier, GC_get_same_obj_print_proc, GC_get_disable_automatic_collection, GC_get_stop_func, GC_get_heap_usage_safe, GC_get_memory_use, GC_get_toggleref_func, GC_get_await_finalize_proc, GC_get_interrupt_finalizers, GC_get_warn_proc, GC_get_abort_func, GC_get_my_stackbottom, GC_dump): Update comment that the allocator lock is aquired in the reader mode. * include/gc/gc.h [GC_THREADS] (GC_thread_is_registered, GC_get_sp_corrector): Likewise. * include/gc/gc_inline.h (GC_print_free_list): Likewise. * include/gc/gc_mark.h (GC_get_start_callback, GC_get_on_mark_stack_empty): Likewise. * include/gc/javaxfc.h [GC_THREADS] (GC_is_thread_suspended): Likewise. * include/gc/gc.h (GC_get_heap_size, GC_REVEAL_POINTER): Update comment that it is enough for the caller to hold the allocator lock in the reader mode. * include/gc/gc_mark.h (GC_iterate_free_hblks, GC_apply_to_all_blocks, GC_is_black_listed, GC_is_marked, GC_enumerate_reachable_objects_inner): Likewise. * misc.c [THREADS] (GC_get_prof_stats_unsafe): Likewise. * pthread_support.c (GC_lookup_thread): Likewise. * reclaim.c (GC_print_free_list): Likewise. * include/private/gc_locks.h: Mention I_HOLD_READER_LOCK() in comment about I_[DONT_]HOLD_LOCK. * include/private/gc_locks.h [!READER_LOCK] (READER_LOCK, READER_UNLOCK): Define macro (redirect to [UN]LOCK). * include/private/gc_locks.h [!READER_LOCK && GC_ASSERTIONS] (I_HOLD_READER_LOCK): Define macro (redirect to I_HOLD_LOCK); add comment. * misc.c (GC_set_stackbottom): Use UNUSED_ARG() instead of cast to void. * pthread_support.c [GC_PTHREADS] (GC_pthread_join, GC_pthread_detach): Use READER_LOCK/UNLOCK() instead of exclusive LOCK/UNLOCK() to call GC_lookup_by_pthread(). * win32_threads.c [GC_PTHREADS] (GC_win32_cache_self_pthread): Add assertion that the lock is held (in the exclusive mode). * win32_threads.c (GC_get_next_stack): Likewise.
Issue #473 (bdwgc). This is similar to GC_call_with_alloc_lock() but acquires the allocator lock in the reader (shared) mode, if supported. * cord/cordxtra.c [!CORD_USE_GCC_ATOMIC && !CORD_DONT_USE_CALL_WITH_READER_LOCK] (CORD_lf_func): Use GC_call_with_reader_lock() instead of GC_call_with_alloc_lock(); add comment. * include/gc/gc.h (GC_get_gc_no, GC_get_full_freq, GC_get_non_gc_bytes, GC_get_free_space_divisor, GC_get_max_retries, GC_get_time_limit, GC_get_prof_stats_unsafe, GC_get_size_map_at): Update comment to refer to GC_call_with_reader_lock instead of GC_call_with_alloc_lock; reformat comment. * misc.c (GC_get_find_leak): Likewise. * include/gc/gc.h (GC_call_with_alloc_lock): Document. * include/gc/gc.h (GC_call_with_reader_lock): New public function declaration. * misc.c (GC_call_with_reader_lock): Implement. * tests/gctest.c (inc_int_counter): Move GC_incr_bytes_allocd() and GC_incr_bytes_freed() calls to run_one_test(). * tests/gctest.c (run_one_test): Call GC_alloc_lock/unlock() around GC_incr_bytes_allocd/freed() calls. * tests/gctest.c (reachable_objs_counter): Rename pcounter argument to plocalcnt. * tests/gctest.c (count_reachable_objs): New static function. * tests/gctest.c (check_heap_stats): Call GC_call_with_reader_lock(count_reachable_objs) instead of GC_enumerate_reachable_objects_inner(reachable_objs_counter) and GC_alloc_lock/unlock().
Issue #473 (bdwgc). * include/gc/gc_mark.h (GC_is_tmp_root): Update comment. * include/private/gc_locks.h [READER_LOCK] (HAS_REAL_READER_LOCK): Define. * mark_rts.c [!NO_DEBUGGING] (GC_is_tmp_root): Initialize last_root_set to 0 instead of MAX_ROOT_SETS; add comments; fetch last_root_set to i local variable; replace LOCK/UNLOCK() to READER_LOCK/UNLOCK(); move i local variable to the outermost scope in the function. * mark_rts.c [!NO_DEBUGGING && HAS_REAL_READER_LOCK] (GC_is_tmp_root): Define last_root_set as volatile. * mark_rts.c [!NO_DEBUGGING && HAS_REAL_READER_LOCK && (AO_HAVE_load || AO_HAVE_store)] (GC_is_tmp_root): Change type of last_root_set from int to AO_t; use AO_load() and AO_store().
Issue #473 (bdwgc). The feature is enabled by passing -D USE_RWLOCK to CFLAGS_EXTRA. Both Win32 threads and pthreads are supported. * include/private/gc_locks.h [USE_RWLOCK && GC_PTHREADS] (USE_PTHREAD_LOCKS): Define. * include/private/gc_locks.h [USE_RWLOCK && GC_PTHREADS && !NO_PTHREAD_TRYLOCK] (NO_PTHREAD_TRYLOCK): Likewise. * include/private/gc_locks.h [(GC_WIN32_THREADS || USE_PTHREAD_LOCKS) && USE_RWLOCK] (UNCOND_READER_LOCK, UNCOND_READER_UNLOCK): Likewise. * include/private/gc_locks.h [USE_RWLOCK && GC_PTHREADS] (USE_SPIN_LOCK): Undefine. * include/private/gc_locks.h [GC_WIN32_THREADS && !USE_PTHREAD_LOCKS && USE_RWLOCK] (GC_allocate_ml): Use SRWLOCK as the type of the variable. * win32_threads.c [!USE_PTHREAD_LOCKS && USE_RWLOCK] (GC_allocate_ml): Likewise. * include/private/gc_locks.h [GC_WIN32_THREADS && !USE_PTHREAD_LOCKS && USE_RWLOCK] (UNCOND_LOCK, UNCOND_UNLOCK): Define to use AcquireSRWLockExclusive and ReleaseSRWLockExclusive, respectively. * include/private/gc_locks.h [!SN_TARGET_PSP2 && (!THREAD_LOCAL_ALLOC || USE_SPIN_LOCK) && !USE_PTHREAD_LOCKS && !THREAD_SANITIZER] (USE_SPIN_LOCK): Do not define if USE_RWLOCK. * include/private/gc_locks.h [USE_PTHREAD_LOCKS && USE_RWLOCK] (GC_allocate_ml): Use pthread_rwlock_t as the type of the variable. * pthread_support.c [!USE_SPIN_LOCK && USE_PTHREAD_LOCKS && USE_RWLOCK] (GC_allocate_ml): Likewise. * include/private/gc_locks.h [USE_PTHREAD_LOCKS && USE_RWLOCK] (UNCOND_UNLOCK): Define to use pthread_rwlock_unlock. * include/private/gc_locks.h [USE_PTHREAD_LOCKS && USE_RWLOCK && !GC_ASSERTIONS] (UNCOND_LOCK): Define to use pthread_rwlock_wrlock. * include/private/gc_locks.h [UNCOND_LOCK && !LOCK && UNCOND_READER_LOCK] (READER_LOCK, READER_UNLOCK): Define to use UNCOND_READER_LOCK and UNCOND_READER_UNLOCK, respectively. * include/private/gc_locks.h [READER_LOCK] (I_HOLD_READER_LOCK): Define to TRUE; add TODO item. * misc.c [GC_DEFN_ALLOCATE_ML] (GC_allocate_ml): Do not define if USE_RWLOCK. * misc.c [GC_WIN32_THREADS && !GC_PTHREADS && USE_RWLOCK] (GC_init): Call InitializeSRWLock() instead of InitializeCriticalSection(). * misc.c [GC_WIN32_THREADS && (MSWIN32 || MSWINCE) && !GC_PTHREADS] (GC_deinit): Do not call DeleteCriticalSection(&GC_allocate_ml) if USE_RWLOCK. * pthread_support.c [CAN_HANDLE_FORK && USE_PTHREAD_LOCKS && !GC_WIN32_THREADS && USE_RWLOCK] (fork_child_proc): Call pthread_rwlock_destroy and pthread_rwlock_init instead of pthread_mutex_destroy and pthread_mutex_init, respectively. * pthread_support.c [!USE_SPIN_LOCK && USE_PTHREAD_LOCKS && !NO_PTHREAD_TRYLOCK && GC_ASSERTIONS && USE_RWLOCK] (GC_lock): Use pthread_rwlock_wrlock() instead of pthread_mutex_lock().
This is an experimental implementation of the feature, |
Sometimes a hang occurs on Linux, I need to debug this. |
rwlock/SWRLock prototypes availability:
|
(fix of commit 3bfb499) Issue #473 (bdwgc). At least in the Linux threads implementation, rwlock primitives are not atomic in respect to signals, and suspending externally a thread which is running inside pthread_rwlock_rdlock() may lead to a deadlock. As a workaround GC_suspend_thread() API is disabled if USE_RWLOCK. * include/private/gcconfig.h [USE_RWLOCK] (GC_ENABLE_SUSPEND_THREAD): Undefine; add comment.
The scenario is:
I've applied a workaround - disable GC_suspend_thread if USE_RWLOCK, |
Issue #473 (bdwgc). These are GC_register_altstack, GC_do_blocking, GC_call_with_gc_active, GC_set_stackbottom, GC_calloc_do_explicitly_typed functions. * include/gc/gc.h (GC_call_with_reader_lock, GC_set_stackbottom): Update comment. * include/gc/gc.h (GC_register_altstack, GC_do_blocking, GC_call_with_gc_active): Refine comment about the allocator lock. * include/private/gc_locks.h (READER_UNLOCK_RELEASE): Define macro (to READER_UNLOCK). * misc.c [HAS_REAL_READER_LOCK] (GC_call_with_reader_lock): Call READER_UNLOCK_RELEASE() if release argument is non-zero. * pthread_support.c (GC_register_altstack, GC_do_blocking_inner, GC_call_with_gc_active): Replace LOCK/UNLOCK() with READER_LOCK/READER_UNLOCK_RELEASE(). * typd_mlc.c (GC_calloc_do_explicitly_typed): Likewise. * pthread_support.c [!E2K] (do_blocking_enter): Replace I_HOLD_LOCK() to I_HOLD_READER_LOCK(). * pthread_support.c (do_blocking_leave, GC_set_stackbottom): Likewise. * pthread_support.c [GC_ENABLE_SUSPEND_THREAD && SIGNAL_BASED_STOP_WORLD] (GC_suspend_self_blocked): Add comment. * tests/gctest.c (run_one_test): Call GC_call_with_reader_lock (with release argument set to 1) for set_stackbottom instead of GC_call_with_alloc_lock.
Issue #473 (bdwgc). Passing "--enable-rwlock" option to configure or "-Denable_rwlock=ON" option to cmake defines USE_RWLOCK macro, thus enabling the reader mode of the allocator lock. Should be used only if the platform supports the relevant locking primitives. * CMakeLists.txt (enable_rwlock): New option (off by default). * CMakeLists.txt [enable_rwlock] (USE_RWLOCK): Define C macro. * configure.ac [$enable_rwlock=yes] (USE_RWLOCK): Likewise. * configure.ac (rwlock): New AC_ARG_ENABLE.
The feature is enabled by passing |
Current limitations:
I have not done any performance measurements so far. |
@bcardiff, please check the functionality. I have no more plans to do anything else regarding shared locks, as of now. |
(fix of commit 3bfb499) Issue #473 (bdwgc). * pthread_support.c [CAN_HANDLE_FORK && USE_PTHREAD_LOCKS && !GC_WIN32_THREADS && USE_RWLOCK] (fork_child_proc): Update comment. * pthread_support.c [CAN_HANDLE_FORK && USE_PTHREAD_LOCKS && !GC_WIN32_THREADS && USE_RWLOCK && DARWIN] (fork_child_proc): Do not call pthread_rwlock_init(); reset GC_allocate_ml to PTHREAD_RWLOCK_INITIALIZER instead; add comment.
This idea (pointed out by @bcardiff in #362) is to improve concurrency with RWLock - acquire writer lock during GC, while reader lock should be sufficient to access GC_lookup_thread and manipulating data local to the current thread (e.g. as in GC_do_blocking_inner).
READER_[UN}LOCK() macros could be introduced.
Replacing LOCK to READER_LOCK could be done independently in each function where possible.
Configuration macro: e.g. USE_RWLOCK (to use SRWLOCK (Win32) or pthread_rwlock_t)
[1] https://docs.microsoft.com/en-us/windows/win32/sync/slim-reader-writer--srw--locks (Windows Vista+)
[2] https://linux.die.net/man/3/pthread_rwlock_wrlock
The text was updated successfully, but these errors were encountered: