Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use reader lock where possible #473

Open
ivmai opened this issue Sep 5, 2022 · 15 comments
Open

Use reader lock where possible #473

ivmai opened this issue Sep 5, 2022 · 15 comments

Comments

@ivmai
Copy link
Owner

ivmai commented Sep 5, 2022

This idea (pointed out by @bcardiff in #362) is to improve concurrency with RWLock - acquire writer lock during GC, while reader lock should be sufficient to access GC_lookup_thread and manipulating data local to the current thread (e.g. as in GC_do_blocking_inner).
READER_[UN}LOCK() macros could be introduced.
Replacing LOCK to READER_LOCK could be done independently in each function where possible.
Configuration macro: e.g. USE_RWLOCK (to use SRWLOCK (Win32) or pthread_rwlock_t)

[1] https://docs.microsoft.com/en-us/windows/win32/sync/slim-reader-writer--srw--locks (Windows Vista+)
[2] https://linux.die.net/man/3/pthread_rwlock_wrlock

@ivmai
Copy link
Owner Author

ivmai commented Sep 9, 2022

@ivmai
Copy link
Owner Author

ivmai commented Sep 9, 2022

The major operations that could benefit from introducing RW lock are

  • client callers of GC_REVEAL_POINTER (because converting a hidden to a real pointer requires verifying that the object still exists)
  • GC_do_blocking (in case of using it to switch between routines - GC_do_blocking(SwitchtoFiber, fb))

However GC_do_blocking writes SP to GC context of the current thread, thus there seems a need of a store release barrier (so that changes made to GC threads context be visible to the thread which acquires the writer lock (to do garbage collection which involves reading of all GC thread contexts).

I've added 3 macros to support reader variant of the GC lock:

  • READER_LOCK()
  • READER_UNLOCK()
  • READER_UNLOCK_RELEASE() - READER_UNLOCK() with the guarantee that local writes done before it will be visible to the thread after normal (exclusive) LOCK().

@ivmai
Copy link
Owner Author

ivmai commented Sep 9, 2022

About READER_UNLOCK_RELEASE() implementation: does pthread_rw_rdunlock contain a release barrier? Same question for ReleaseSRWLockShared.

@ivmai
Copy link
Owner Author

ivmai commented Sep 9, 2022

client callers of GC_REVEAL_POINTER (because converting a hidden to a real pointer requires verifying that the object still exists)

To support this I'm planning to add GC_call_with_reader_lock(fn, arg, int release) // where release!=0 means to call READER_UNLOCK_RELEASE() instead of READER_UNLOCK().
(Of course, using of the new API will require changes on client side.)

@ivmai
Copy link
Owner Author

ivmai commented Sep 9, 2022

@hboehm, @bcardiff, do you know about release semantic of unlocking of the reader lock (so that to see data changes after acquiring the exclusive lock)? See my posts above.

@hboehm
Copy link
Collaborator

hboehm commented Sep 12, 2022 via email

@bcardiff
Copy link
Contributor

In Crystal, the RWLock uses mostly sequential consistent semantics (as explained in https://llvm.org/docs/LangRef.html#atomic-memory-ordering-constraints) for it's implementation. This has proven better for us than acquiring a GC lock.

For reference the implementation is at https://github.com/crystal-lang/crystal/blob/master/src/crystal/rw_lock.cr. The relevant implementation of the Atomic counter are https://github.com/crystal-lang/crystal/blob/master/src/atomic.cr#L50-L70 and https://github.com/crystal-lang/crystal/blob/master/src/atomic.cr#L182-L195 . The two things that might be worth mentioning is that write_unlock uses a lazy_set (ie: normal write) at https://github.com/crystal-lang/crystal/blob/master/src/crystal/rw_lock.cr#L35 and that we yield the cpu while waiting on reader and writer locks at https://github.com/crystal-lang/crystal/blob/master/src/crystal/rw_lock.cr#L8-L10 and https://github.com/crystal-lang/crystal/blob/master/src/crystal/rw_lock.cr#L24-L32 . That yielding is arch dependent and is implemented by https://github.com/crystal-lang/crystal/blob/master/src/intrinsics.cr#L163-L169 .

Maybe @ggiraldez or @ysbaddaden can offer some other insight from their time implementing this.

@hboehm
Copy link
Collaborator

hboehm commented Sep 12, 2022 via email

ivmai added a commit that referenced this issue Oct 6, 2023
(refactoring)

Issue #473 (bdwgc).

* alloc.c (GC_get_disable_automatic_collection, GC_get_stop_func,
GC_get_start_callback, GC_get_on_collection_event): Use
READER_LOCK/UNLOCK() instead of exclusive LOCK/UNLOCK().
* dyn_load.c [DARWIN_DEBUG && !NO_DEBUGGING] (GC_dyld_image_add,
GC_dyld_image_remove): Likewise.
* finalize.c (GC_get_toggleref_func, GC_get_await_finalize_proc,
GC_get_interrupt_finalizers): Likewise.
* mark.c (GC_get_on_mark_stack_empty, GC_print_trace): Likewise.
* misc.c (GC_get_heap_usage_safe, GC_get_prof_stats, GC_get_warn_proc,
GC_get_abort_func, GC_dump, GC_get_memory_use, GC_get_oom_fn,
GC_get_on_heap_resize, GC_get_finalizer_notifier): Likewise.
* pthread_stop_world.c [GC_ENABLE_SUSPEND_THREAD]
(GC_is_thread_suspended): Likewise.
* pthread_support.c (GC_self_thread, GC_get_my_stackbottom,
GC_get_on_thread_event, GC_get_sp_corrector): Likewise.
* ptr_chck.c (GC_get_same_obj_print_proc,
GC_get_is_valid_displacement_print_proc, GC_get_is_visible_print_proc):
Likewise.
* headers.c (GC_next_block, GC_prev_block): Replace I_HOLD_LOCK() to
I_HOLD_READER_LOCK().
* mark_rts.c (GC_roots_present): Likewise.
* pthread_support.c (GC_count_threads, GC_segment_is_thread_stack,
GC_greatest_stack_base_below): Likewise.
* reclaim.c (GC_enumerate_reachable_objects_inner): Likewise.
* win32_threads.c [GC_PTHREADS] (GC_lookup_by_pthread): Likewise.
* include/gc/gc.h (GC_get_oom_fn, GC_get_on_heap_resize,
GC_get_on_collection_event, GC_get_on_thread_event,
GC_get_finalizer_notifier, GC_get_same_obj_print_proc,
GC_get_disable_automatic_collection, GC_get_stop_func,
GC_get_heap_usage_safe, GC_get_memory_use, GC_get_toggleref_func,
GC_get_await_finalize_proc, GC_get_interrupt_finalizers,
GC_get_warn_proc, GC_get_abort_func, GC_get_my_stackbottom, GC_dump):
Update comment that the allocator lock is aquired in the reader mode.
* include/gc/gc.h [GC_THREADS] (GC_thread_is_registered,
GC_get_sp_corrector): Likewise.
* include/gc/gc_inline.h (GC_print_free_list): Likewise.
* include/gc/gc_mark.h (GC_get_start_callback,
GC_get_on_mark_stack_empty): Likewise.
* include/gc/javaxfc.h [GC_THREADS] (GC_is_thread_suspended): Likewise.
* include/gc/gc.h (GC_get_heap_size, GC_REVEAL_POINTER): Update comment
that it is enough for the caller to hold the allocator lock in the
reader mode.
* include/gc/gc_mark.h (GC_iterate_free_hblks, GC_apply_to_all_blocks,
GC_is_black_listed, GC_is_marked,
GC_enumerate_reachable_objects_inner): Likewise.
* misc.c [THREADS] (GC_get_prof_stats_unsafe): Likewise.
* pthread_support.c (GC_lookup_thread): Likewise.
* reclaim.c (GC_print_free_list): Likewise.
* include/private/gc_locks.h: Mention I_HOLD_READER_LOCK() in comment
about I_[DONT_]HOLD_LOCK.
* include/private/gc_locks.h [!READER_LOCK] (READER_LOCK,
READER_UNLOCK): Define macro (redirect to [UN]LOCK).
* include/private/gc_locks.h [!READER_LOCK && GC_ASSERTIONS]
(I_HOLD_READER_LOCK): Define macro (redirect to I_HOLD_LOCK); add
comment.
* misc.c (GC_set_stackbottom): Use UNUSED_ARG() instead of cast to
void.
* pthread_support.c [GC_PTHREADS] (GC_pthread_join, GC_pthread_detach):
Use READER_LOCK/UNLOCK() instead of exclusive LOCK/UNLOCK() to call
GC_lookup_by_pthread().
* win32_threads.c [GC_PTHREADS] (GC_win32_cache_self_pthread): Add
assertion that the lock is held (in the exclusive mode).
* win32_threads.c (GC_get_next_stack): Likewise.
ivmai added a commit that referenced this issue Oct 6, 2023
Issue #473 (bdwgc).

This is similar to GC_call_with_alloc_lock() but acquires the allocator
lock in the reader (shared) mode, if supported.

* cord/cordxtra.c [!CORD_USE_GCC_ATOMIC
&& !CORD_DONT_USE_CALL_WITH_READER_LOCK] (CORD_lf_func): Use
GC_call_with_reader_lock() instead of GC_call_with_alloc_lock(); add
comment.
* include/gc/gc.h (GC_get_gc_no, GC_get_full_freq, GC_get_non_gc_bytes,
GC_get_free_space_divisor, GC_get_max_retries, GC_get_time_limit,
GC_get_prof_stats_unsafe, GC_get_size_map_at): Update comment to refer
to GC_call_with_reader_lock instead of GC_call_with_alloc_lock;
reformat comment.
* misc.c (GC_get_find_leak): Likewise.
* include/gc/gc.h (GC_call_with_alloc_lock): Document.
* include/gc/gc.h (GC_call_with_reader_lock): New public function
declaration.
* misc.c (GC_call_with_reader_lock): Implement.
* tests/gctest.c (inc_int_counter): Move GC_incr_bytes_allocd() and
GC_incr_bytes_freed() calls to run_one_test().
* tests/gctest.c (run_one_test): Call GC_alloc_lock/unlock() around
GC_incr_bytes_allocd/freed() calls.
* tests/gctest.c (reachable_objs_counter): Rename pcounter argument to
plocalcnt.
* tests/gctest.c (count_reachable_objs): New static function.
* tests/gctest.c (check_heap_stats): Call
GC_call_with_reader_lock(count_reachable_objs) instead of
GC_enumerate_reachable_objects_inner(reachable_objs_counter) and
GC_alloc_lock/unlock().
ivmai added a commit that referenced this issue Oct 6, 2023
Issue #473 (bdwgc).

* include/gc/gc_mark.h (GC_is_tmp_root): Update comment.
* include/private/gc_locks.h [READER_LOCK] (HAS_REAL_READER_LOCK):
Define.
* mark_rts.c [!NO_DEBUGGING] (GC_is_tmp_root): Initialize last_root_set
to 0 instead of MAX_ROOT_SETS; add comments; fetch last_root_set to i
local variable; replace LOCK/UNLOCK() to READER_LOCK/UNLOCK(); move
i local variable to the outermost scope in the function.
* mark_rts.c [!NO_DEBUGGING && HAS_REAL_READER_LOCK] (GC_is_tmp_root):
Define last_root_set as volatile.
* mark_rts.c [!NO_DEBUGGING && HAS_REAL_READER_LOCK && (AO_HAVE_load
|| AO_HAVE_store)] (GC_is_tmp_root): Change type of last_root_set from
int to AO_t; use AO_load() and AO_store().
ivmai added a commit that referenced this issue Oct 6, 2023
Issue #473 (bdwgc).

The feature is enabled by passing -D USE_RWLOCK to CFLAGS_EXTRA.
Both Win32 threads and pthreads are supported.

* include/private/gc_locks.h [USE_RWLOCK && GC_PTHREADS]
(USE_PTHREAD_LOCKS): Define.
* include/private/gc_locks.h [USE_RWLOCK && GC_PTHREADS
&& !NO_PTHREAD_TRYLOCK] (NO_PTHREAD_TRYLOCK): Likewise.
* include/private/gc_locks.h [(GC_WIN32_THREADS || USE_PTHREAD_LOCKS)
&& USE_RWLOCK] (UNCOND_READER_LOCK, UNCOND_READER_UNLOCK): Likewise.
* include/private/gc_locks.h [USE_RWLOCK && GC_PTHREADS]
(USE_SPIN_LOCK): Undefine.
* include/private/gc_locks.h [GC_WIN32_THREADS && !USE_PTHREAD_LOCKS
&& USE_RWLOCK] (GC_allocate_ml): Use SRWLOCK as the type of the
variable.
* win32_threads.c [!USE_PTHREAD_LOCKS && USE_RWLOCK]
(GC_allocate_ml): Likewise.
* include/private/gc_locks.h [GC_WIN32_THREADS && !USE_PTHREAD_LOCKS
&& USE_RWLOCK] (UNCOND_LOCK, UNCOND_UNLOCK): Define to use
AcquireSRWLockExclusive and ReleaseSRWLockExclusive, respectively.
* include/private/gc_locks.h [!SN_TARGET_PSP2 && (!THREAD_LOCAL_ALLOC
|| USE_SPIN_LOCK) && !USE_PTHREAD_LOCKS && !THREAD_SANITIZER]
(USE_SPIN_LOCK): Do not define if USE_RWLOCK.
* include/private/gc_locks.h [USE_PTHREAD_LOCKS && USE_RWLOCK]
(GC_allocate_ml): Use pthread_rwlock_t as the type of the variable.
* pthread_support.c [!USE_SPIN_LOCK && USE_PTHREAD_LOCKS
&& USE_RWLOCK] (GC_allocate_ml): Likewise.
* include/private/gc_locks.h [USE_PTHREAD_LOCKS && USE_RWLOCK]
(UNCOND_UNLOCK): Define to use pthread_rwlock_unlock.
* include/private/gc_locks.h [USE_PTHREAD_LOCKS && USE_RWLOCK
&& !GC_ASSERTIONS] (UNCOND_LOCK): Define to use pthread_rwlock_wrlock.
* include/private/gc_locks.h [UNCOND_LOCK && !LOCK
&& UNCOND_READER_LOCK] (READER_LOCK, READER_UNLOCK): Define to use
UNCOND_READER_LOCK and UNCOND_READER_UNLOCK, respectively.
* include/private/gc_locks.h [READER_LOCK] (I_HOLD_READER_LOCK):
Define to TRUE; add TODO item.
* misc.c [GC_DEFN_ALLOCATE_ML] (GC_allocate_ml): Do not define if
USE_RWLOCK.
* misc.c [GC_WIN32_THREADS && !GC_PTHREADS && USE_RWLOCK] (GC_init):
Call InitializeSRWLock() instead of InitializeCriticalSection().
* misc.c [GC_WIN32_THREADS && (MSWIN32 || MSWINCE) && !GC_PTHREADS]
(GC_deinit): Do not call DeleteCriticalSection(&GC_allocate_ml) if
USE_RWLOCK.
* pthread_support.c [CAN_HANDLE_FORK && USE_PTHREAD_LOCKS
&& !GC_WIN32_THREADS && USE_RWLOCK] (fork_child_proc): Call
pthread_rwlock_destroy and pthread_rwlock_init instead of
pthread_mutex_destroy and pthread_mutex_init, respectively.
* pthread_support.c [!USE_SPIN_LOCK && USE_PTHREAD_LOCKS
&& !NO_PTHREAD_TRYLOCK && GC_ASSERTIONS && USE_RWLOCK] (GC_lock):
Use pthread_rwlock_wrlock() instead of pthread_mutex_lock().
@ivmai
Copy link
Owner Author

ivmai commented Oct 6, 2023

This is an experimental implementation of the feature,
As of commit 3bfb499, the mutex-based allocator lock is replaced with rwlock (SWRLock on Win32) if the collector is built with -D USE_RWLOCK.

@ivmai
Copy link
Owner Author

ivmai commented Oct 6, 2023

Sometimes a hang occurs on Linux, I need to debug this.
Build: https://app.travis-ci.com/github/ivmai/bdwgc/jobs/611184313

@ivmai
Copy link
Owner Author

ivmai commented Oct 7, 2023

no implementation of detection that rwlock (or SWRLock) is provided by the platform

rwlock/SWRLock prototypes availability:

  • Windows: if _WIN32_WINNT >= 0x0600
  • pthreads: if defined __USE_UNIX98 || defined __USE_XOPEN2K
  • cygwin: if __XSI_VISIBLE >= 500 || __POSIX_VISIBLE >= 200112 || __cplusplus >= 201402L

ivmai added a commit that referenced this issue Oct 10, 2023
(fix of commit 3bfb499)

Issue #473 (bdwgc).

At least in the Linux threads implementation, rwlock primitives are not
atomic in respect to signals, and suspending externally a thread which
is running inside pthread_rwlock_rdlock() may lead to a deadlock.

As a workaround GC_suspend_thread() API is disabled if USE_RWLOCK.

* include/private/gcconfig.h [USE_RWLOCK] (GC_ENABLE_SUSPEND_THREAD):
Undefine; add comment.
@ivmai
Copy link
Owner Author

ivmai commented Oct 10, 2023

Sometimes a hang occurs on Linux, I need to debug this.

The scenario is:

  • one thread is trying to acquire a reader lock
  • other thread is holding writer lock and does suspension of the 1st thread (using GC_suspend_thread API)
  • GC_suspend_thread complete normally and the 1st thread stays inside GC_suspend_handler_inner
  • no other thread can acquire writer lock - hang

I've applied a workaround - disable GC_suspend_thread if USE_RWLOCK,

ivmai added a commit that referenced this issue Oct 12, 2023
(fix of commit 3bfb499)

Issue #473 (bdwgc).

* docs/README.macros (USE_RWLOCK): Document.
* docs/scale.md (Options for enhanced scalability): Document USE_RWLOCK
macro.
ivmai added a commit that referenced this issue Oct 12, 2023
Issue #473 (bdwgc).

These are GC_register_altstack, GC_do_blocking, GC_call_with_gc_active,
GC_set_stackbottom, GC_calloc_do_explicitly_typed functions.

* include/gc/gc.h (GC_call_with_reader_lock, GC_set_stackbottom):
Update comment.
* include/gc/gc.h (GC_register_altstack, GC_do_blocking,
GC_call_with_gc_active): Refine comment about the allocator lock.
* include/private/gc_locks.h (READER_UNLOCK_RELEASE): Define macro (to
READER_UNLOCK).
* misc.c [HAS_REAL_READER_LOCK] (GC_call_with_reader_lock): Call
READER_UNLOCK_RELEASE() if release argument is non-zero.
* pthread_support.c (GC_register_altstack, GC_do_blocking_inner,
GC_call_with_gc_active): Replace LOCK/UNLOCK() with
READER_LOCK/READER_UNLOCK_RELEASE().
* typd_mlc.c (GC_calloc_do_explicitly_typed): Likewise.
* pthread_support.c [!E2K] (do_blocking_enter): Replace I_HOLD_LOCK()
to I_HOLD_READER_LOCK().
* pthread_support.c (do_blocking_leave, GC_set_stackbottom): Likewise.
* pthread_support.c [GC_ENABLE_SUSPEND_THREAD
&& SIGNAL_BASED_STOP_WORLD] (GC_suspend_self_blocked): Add comment.
* tests/gctest.c (run_one_test): Call GC_call_with_reader_lock (with
release argument set to 1) for set_stackbottom instead of
GC_call_with_alloc_lock.
ivmai added a commit that referenced this issue Oct 12, 2023
…build

(fix of commit 08e4e25)

Issue #473 (bdwgc).

* include/gc/gc.h [!GC_THREADS] (GC_call_with_reader_lock): Define as
a macro.
* misc.c (GC_call_with_reader_lock): Do not define (implement) unless
THREADS macro is defined.
ivmai added a commit that referenced this issue Oct 12, 2023
Issue #473 (bdwgc).

Passing "--enable-rwlock" option to configure or "-Denable_rwlock=ON"
option to cmake defines USE_RWLOCK macro, thus enabling the reader mode
of the allocator lock.  Should be used only if the platform supports
the relevant locking primitives.

* CMakeLists.txt (enable_rwlock): New option (off by default).
* CMakeLists.txt [enable_rwlock] (USE_RWLOCK): Define C macro.
* configure.ac [$enable_rwlock=yes] (USE_RWLOCK): Likewise.
* configure.ac (rwlock): New AC_ARG_ENABLE.
@ivmai
Copy link
Owner Author

ivmai commented Oct 12, 2023

The feature is enabled by passing --enable-rwlock to configure or -Denable_rwlock=ON to cmake.

@ivmai
Copy link
Owner Author

ivmai commented Oct 12, 2023

Current limitations:

  • I_HOLD_READER_LOCK() (for assertions) is always true
  • no implementation of detection that rwlock (or SWRLock) is provided by the platform
  • no spin locking (i.e. pthread_rwlock_trywrlock is not used)

I have not done any performance measurements so far.

@ivmai
Copy link
Owner Author

ivmai commented Oct 12, 2023

@bcardiff, please check the functionality. I have no more plans to do anything else regarding shared locks, as of now.

ivmai added a commit that referenced this issue Oct 13, 2023
(fix of commit 3bfb499)

Issue #473 (bdwgc).

* pthread_support.c [CAN_HANDLE_FORK && USE_PTHREAD_LOCKS
&& !GC_WIN32_THREADS && USE_RWLOCK] (fork_child_proc): Update comment.
* pthread_support.c [CAN_HANDLE_FORK && USE_PTHREAD_LOCKS
&& !GC_WIN32_THREADS && USE_RWLOCK && DARWIN] (fork_child_proc): Do
not call pthread_rwlock_init(); reset GC_allocate_ml to
PTHREAD_RWLOCK_INITIALIZER instead; add comment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants