Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream and fix E2K support #411

Closed
ivmai opened this issue Jan 12, 2022 · 34 comments
Closed

Upstream and fix E2K support #411

ivmai opened this issue Jan 12, 2022 · 34 comments

Comments

@ivmai
Copy link
Owner

ivmai commented Jan 12, 2022

Downstream patch by @ilyakurdyukov enabling libgc v7.6.8 on Elbrus 2000 architecture.

Same patch applied to v8.0.2: libgc-8.0.2-e2k.patch

Observed issues:

  • SigSegv sometimes (probably lack of GC_get_procedure_stack in do_blocking and lack of LOAD_TAGGED_VALUE in some marker)
  • compilation failure with Clang
  • assertion violation (~1/3 rate during test_cpp) about hb_n_marks in GC_reclaim_block if PARALLEL_MARK and USE_MARK_BITS
  • ENOMEM (very rare) in GC_get_procedure_stack
@ivmai
Copy link
Owner Author

ivmai commented Jan 12, 2022

Local test host: Linux ... 5.4.0-3.13-e8c # 1 SMP ... e2k E8C E8C-SWTX GNU/Linux

ivmai pushed a commit that referenced this issue Jan 12, 2022
(a port of altlinux libgc-e2k.patch (8ed786c))

Issue #411 (bdwgc).

* include/gc/gc.h [__GNUC__ && !__INTEL_COMPILER && __e2k__]
(GC_reachable_here): Specify "r" contraint for ptr.
* include/gc/gc.h [__e2k__] (GC_stack_base): Declare reg_base field.
* include/private/gc_priv.h [E2K] (GC_push_all_register_sections,
GC_save_regs_in_stack, GC_get_procedure_stack): Declare.
* include/private/gc_priv.h [E2K && __ptr64__] (LOAD_TAGGED_VALUE):
Define macro.
* include/private/gc_priv.h (LOAD_WORD_OR_CONTINUE): Likewise.
* include/private/gcconfig.h [LINUX && __e2k__] (E2K, mach_type_known):
Likewise.
* include/private/gcconfig.h [E2K] (MACH_TYPE, CPP_WORDSZ, ALIGNMENT,
HBLKSIZE): Likewise.
* include/private/gcconfig.h [E2K && LINUX] (DATASTART): Likewise.
* mach_dep.c [E2K] (VA_SIZE, E2K_PSHTP_SIZE, get_stack_index):
Likewise.
* include/private/gcconfig.h [GC_GNUC_PREREQ(2,8)]
(HAVE_BUILTIN_UNWIND_INIT): Do not define if E2K.
* include/private/gcconfig.h [E2K && LINUX] (__dso_handle): Declare
extern variable.
* include/private/pthread_support.h [E2K] (GC_Thread_Rep): Declare
backing_store_end and backing_store_ptr fields.
* mach_dep.c [E2K]: Include errno.h, sys/syscall.h, asm/e2k_syswork.h.
* mach_dep.c [E2K] (e2k_rwap_lo_fields, e2k_rwap_hi_fields): Declare
struct.
* mach_dep.c [E2K] (e2k_rwap_lo_u, e2k_rwap_hi_u): Declare union.
* mach_dep.c [E2K] (GC_get_procedure_stack, GC_save_regs_in_stack):
Implement.
* mach_dep.c [E2K] (GC_with_callee_saves_pushed): Call
GC_save_regs_in_stack() (not saving the result).
* mark.c (GC_mark_from, GC_push_all_eager): Use LOAD_WORD_OR_CONTINUE()
instead of direct dereference of current_p.
* mark.c [!SMALL_CONFIG] (GC_mark_from): Do not prefetch if E2K.
* mark_rts.c [E2K] (GC_push_all_register_sections): Implement but
ignore traced_stack_sect (add TODO item).
* mark_rts.c [!THREADS && E2K] (GC_push_current_stack): Call
GC_get_procedure_stack() and GC_push_all_register_sections().
* misc.c [E2K] (GC_call_with_stack_base): Initialize reg_base to 0.
* misc.c [!THREADS && E2K] (GC_do_blocking_inner,
GC_get_my_stackbottom): Likewise.
* os_dep.c [((HAVE_PTHREAD_ATTR_GET_NP || HAVE_PTHREAD_GETATTR_NP)
&& THREADS || !HAVE_GET_STACK_BASE) && E2K] (GC_get_stack_base):
Likewise.
* pthread_support.c [E2K] (GC_get_my_stackbottom): Likewise.
* pthread_stop_world.c [E2K] (GC_suspend_handler): Call
GC_with_callee_saves_pushed().
* pthread_stop_world.c [E2K] (GC_store_stack_ptr): Call
GC_save_regs_in_stack() and GC_get_procedure_stack().
* pthread_stop_world.c [E2K] (GC_suspend_handler_inner): Call
free(me->backing_store_end) before return.
* pthread_stop_world.c [E2K] (GC_push_all_stacks): Declare bs_lo,
bs_hi, stack_size local variables; call GC_save_regs_in_stack() and
GC_get_procedure_stack() (and free() at the end) for self thread;
call GC_push_all_register_sections().
* pthread_support.c [E2K] (GC_do_blocking_inner): Call
GC_save_regs_in_stack(); add FIXME.
@ivmai
Copy link
Owner Author

ivmai commented Jan 12, 2022

ENOMEM (very rare) in GC_get_procedure_stack

Fixed directly in commit 9ddbbae.

@ilyakurdyukov
Copy link

ilyakurdyukov commented Jan 12, 2022

Actually, I have some improvements for these patches - less changes, no need for extra source file.

I can rebase these patches on newer versions. The reason for patching 8.0.2 is because I ported software that has chipped with that particular version.

@ivmai
Copy link
Owner Author

ivmai commented Jan 12, 2022

Actually, I have some improvements for these patches - less changes, no need for extra source file.

Hello Ilya,
It's good, make a PR please (to master branch). I also have some patches not pushed to Github yet (fixing the some above issues, still in testing).

Pending issues (I have not worked on them yet):

  • lack of get_proc_stack if thread is in do-blocking state (should be easy to fix)
  • support of --enable-redirect-malloc (not straightforward how to fix - either need a way to access "real" malloc/free of libc or use GC_scratch_alloc) - I've created issue Support malloc redirect on E2K #413

no need for extra source file

I've already removed the extra source file while porting your commit to master - please see mach_dep.c in 9ddbbae.

PS. I think now you could add ivmai/libatomic_ops@7914f53 link to "Accepted" on https://www.altlinux.org/%D0%AD%D0%BB%D1%8C%D0%B1%D1%80%D1%83%D1%81/upstream ;)

@ilyakurdyukov
Copy link

Here is the latest patch, the E2K inclusions are much more compact.

Also notice the if (E2K_LOAD_TAGGED (result, tmp)) break; before GC_noop1 ((word) (* result));, which allows to scan memory without throwing exceptions.

diff --git a/Makefile.direct b/Makefile.direct
index 9e4186b..6eb0823 100644
--- a/Makefile.direct
+++ b/Makefile.direct
@@ -374,6 +374,7 @@ gctest$(EXEEXT): tests/test.o base_lib $(UTILS)
 	./if_mach SPARC DRSNX $(CC) $(CFLAGS) -o gctest tests/test.o gc.a -lucb
 	./if_mach HP_PA HPUX $(CC) $(CFLAGS) -o gctest tests/test.o gc.a -ldld `./threadlibs`
 	./if_mach M68K AMIGA $(CC) $(CFLAGS) -UGC_AMIGA_MAKINGLIB -o gctest  tests/test.o gc.a `./threadlibs`
+	./if_mach E2K LINUX $(CC) $(CFLAGS) -o gctest tests/test.o gc.a -lm `./threadlibs`
 	./if_not_there gctest$(EXEEXT) || $(CC) $(CFLAGS) -o gctest$(EXEEXT) tests/test.o gc.a `./threadlibs`
 
 # If an optimized setjmp_test generates a segmentation fault,
diff --git a/include/gc.h b/include/gc.h
index b952285..95eddb9 100644
--- a/include/gc.h
+++ b/include/gc.h
@@ -1324,8 +1324,13 @@ GC_API int GC_CALL GC_invoke_finalizers(void);
 /* The function is sometimes called keep_alive in other         */
 /* settings.                                                    */
 #if defined(__GNUC__) && !defined(__INTEL_COMPILER)
+#if defined(__e2k__)
+# define GC_reachable_here(ptr) \
+                __asm__ __volatile__(" " : : "r"(ptr) : "memory")
+#else
 # define GC_reachable_here(ptr) \
                 __asm__ __volatile__(" " : : "X"(ptr) : "memory")
+#endif
 #else
   GC_API void GC_CALL GC_noop1(GC_word);
 # define GC_reachable_here(ptr) GC_noop1((GC_word)(ptr))
@@ -1403,7 +1408,7 @@ GC_API void * GC_CALL GC_call_with_alloc_lock(GC_fn_type /* fn */,
 /* platforms this contains just a single address.                       */
 struct GC_stack_base {
   void * mem_base; /* Base of memory stack. */
-# if defined(__ia64) || defined(__ia64__) || defined(_M_IA64)
+# if defined(__ia64) || defined(__ia64__) || defined(_M_IA64) || defined(__e2k__)
     void * reg_base; /* Base of separate register stack. */
 # endif
 };
diff --git a/include/private/gc_priv.h b/include/private/gc_priv.h
index 728e74e..189c2cf 100644
--- a/include/private/gc_priv.h
+++ b/include/private/gc_priv.h
@@ -1582,7 +1582,7 @@ struct GC_traced_stack_sect_s {
                         /* NULL if no such "frame" active.              */
 #endif /* !THREADS */
 
-#ifdef IA64
+#if defined(IA64) || defined(E2K)
   /* Similar to GC_push_all_stack_sections() but for IA-64 registers store. */
   GC_INNER void GC_push_all_register_sections(ptr_t bs_lo, ptr_t bs_hi,
                   int eager, struct GC_traced_stack_sect_s *traced_stack_sect);
@@ -1726,10 +1726,25 @@ GC_EXTERN void (*GC_push_typed_structures)(void);
 GC_INNER void GC_with_callee_saves_pushed(void (*fn)(ptr_t, void *),
                                           volatile ptr_t arg);
 
-#if defined(SPARC) || defined(IA64)
+#if defined(SPARC) || defined(IA64) || defined(E2K)
   /* Cause all stacked registers to be saved in memory.  Return a       */
   /* pointer to the top of the corresponding memory stack.              */
   ptr_t GC_save_regs_in_stack(void);
+  #ifdef E2K
+  /* Copy full procedure stack to buffer. Get tag of target memory.     */
+  size_t GC_get_procedure_stack(ptr_t *);
+    #ifdef __ptr64__
+      #define E2K_LOAD_TAGGED(ptr, ret) ({     \
+        int tag;                               \
+        asm volatile(                          \
+          "ldd,sm %2, 0, %0\n"                 \
+          "gettagd %0, %1\n"                   \
+          : "=r"(ret), "=&r"(tag) : "r"(ptr)); \
+        tag; })
+    #else
+      #error "Unsupported march for e2k target"
+    #endif
+  #endif
 #endif
                         /* Push register contents onto mark stack.      */
 
@@ -1781,7 +1796,7 @@ GC_INNER void GC_cond_register_dynamic_libraries(void);
 
 /* Machine dependent startup routines */
 ptr_t GC_get_main_stack_base(void);     /* Cold end of stack.           */
-#ifdef IA64
+#if defined(IA64) || defined(E2K)
   GC_INNER ptr_t GC_get_register_stack_base(void);
                                         /* Cold end of register stack.  */
 #endif
@@ -2335,7 +2350,7 @@ GC_EXTERN signed_word GC_bytes_found;
     GC_EXTERN GC_bool GC_dont_query_stack_min;
                                 /* Defined and set in os_dep.c. */
 # endif
-#elif defined(IA64)
+#elif defined(IA64) || defined(E2K)
   GC_EXTERN ptr_t GC_save_regs_ret_val; /* defined in mach_dep.c. */
                         /* Previously set to backing store pointer.     */
 #endif /* !THREADS */
@@ -2636,7 +2651,7 @@ GC_INNER void *GC_store_debug_info_inner(void *p, word sz, const char *str,
 /* data segment.                                                */
 #if defined(HEURISTIC2) || defined(SEARCH_FOR_DATA_START) \
     || (!defined(STACKBOTTOM) && defined(HEURISTIC2)) \
-    || ((defined(SVR4) || defined(AIX) || defined(DGUX) \
+    || ((defined(SVR4) || defined(AIX) || defined(DGUX) || defined(E2K) \
          || (defined(LINUX) && defined(SPARC))) && !defined(PCR))
 # define NEED_FIND_LIMIT
 #endif
diff --git a/include/private/gcconfig.h b/include/private/gcconfig.h
index 5671402..0941f80 100644
--- a/include/private/gcconfig.h
+++ b/include/private/gcconfig.h
@@ -360,6 +360,10 @@ EXTERN_C_BEGIN
 #    define AARCH64
 #    define mach_type_known
 # endif
+# if defined(LINUX) && defined(__e2k__)
+#    define E2K
+#    define mach_type_known
+# endif
 # if defined(LINUX) && (defined(__arm) || defined(__arm__))
 #    define ARM32
 #    define mach_type_known
@@ -720,6 +724,7 @@ EXTERN_C_BEGIN
                     /*             S390       ==> 390-like machine      */
                     /*                  running LINUX                   */
                     /*             AARCH64    ==> ARM AArch64           */
+                    /*             E2K        ==> Elbrus 2000           */
                     /*             ARM32      ==> Intel StrongARM       */
                     /*             IA64       ==> Intel IPF             */
                     /*                            (e.g. Itanium)        */
@@ -867,6 +872,7 @@ EXTERN_C_BEGIN
      && !defined(__INTEL_COMPILER) && !defined(__PATHCC__) \
      && !defined(__FUJITSU) /* for FX10 system */ \
      && !(defined(POWERPC) && defined(DARWIN)) /* for MacOS X 10.3.9 */ \
+     && !defined(__e2k__) \
      && !defined(RTEMS) \
      && !defined(__ARMCC_VERSION) /* does not exist in armcc gnu emu */ \
      && !defined(__clang__) /* since no-op in clang (3.0) */
@@ -2331,6 +2337,26 @@ EXTERN_C_BEGIN
 #   endif
 # endif
 
+# ifdef E2K
+#   define MACH_TYPE "E2K"
+#   define CPP_WORDSZ 64
+#   define ALIGNMENT 8
+#   ifndef HBLKSIZE
+#     define HBLKSIZE 4096
+#   endif
+#   ifdef LINUX
+#     define OS_TYPE "LINUX"
+#     define LINUX_STACKBOTTOM
+#     define DYNAMIC_LOADING
+      extern int __dso_handle[];
+      extern int _end[];
+#     define DATASTART ((ptr_t)__dso_handle)
+#     define DATAEND ((ptr_t)_end)
+      extern ptr_t GC_register_stackbottom;
+#     define BACKING_STORE_BASE GC_register_stackbottom
+#   endif
+# endif
+
 # ifdef ARM32
 #   if defined(NACL)
 #     define MACH_TYPE "NACL"
diff --git a/include/private/pthread_support.h b/include/private/pthread_support.h
index 2dc1541..a37742b 100644
--- a/include/private/pthread_support.h
+++ b/include/private/pthread_support.h
@@ -113,7 +113,7 @@ typedef struct GC_Thread_Rep {
                                 /* valid only if the thread is blocked; */
                                 /* non-NULL value means already set.    */
 #   endif
-#   ifdef IA64
+#   if defined(IA64) || defined(E2K)
         ptr_t backing_store_end;
         ptr_t backing_store_ptr;
 #   endif
diff --git a/mach_dep.c b/mach_dep.c
index 60b73b2..815e2b8 100644
--- a/mach_dep.c
+++ b/mach_dep.c
@@ -92,7 +92,7 @@
 
 #endif /* MACOS && __MWERKS__ */
 
-# if defined(SPARC) || defined(IA64)
+# if defined(SPARC) || defined(IA64) || defined(E2K)
     /* Value returned from register flushing routine; either sp (SPARC) */
     /* or ar.bsp (IA64).                                                */
     GC_INNER ptr_t GC_save_regs_ret_val = NULL;
@@ -284,7 +284,7 @@ GC_INNER void GC_with_callee_saves_pushed(void (*fn)(ptr_t, void *),
             ABORT("feenableexcept failed");
 #       endif
 #     endif /* GETCONTEXT_FPU_EXCMASK_BUG */
-#     if defined(SPARC) || defined(IA64)
+#     if defined(SPARC) || defined(IA64) || defined(E2K)
         /* On a register window machine, we need to save register       */
         /* contents on the stack for this to work.  This may already be */
         /* subsumed by the getcontext() call.                           */
diff --git a/mark.c b/mark.c
index b1d5902..e5293ca 100644
--- a/mark.c
+++ b/mark.c
@@ -734,7 +734,11 @@ GC_INNER mse * GC_mark_from(mse *mark_stack_top, mse *mark_stack,
           credit -= WORDS_TO_BYTES(WORDSZ/2); /* guess */
           while (descr != 0) {
             if ((descr & SIGNB) != 0) {
+            #ifdef E2K
+             if (!E2K_LOAD_TAGGED(current_p, current)) {
+            #else
               current = *(word *)current_p;
+            #endif
               FIXUP_POINTER(current);
               if (current >= (word)least_ha && current < (word)greatest_ha) {
                 PREFETCH((ptr_t)current);
@@ -748,6 +752,9 @@ GC_INNER mse * GC_mark_from(mse *mark_stack_top, mse *mark_stack,
                 PUSH_CONTENTS((ptr_t)current, mark_stack_top,
                               mark_stack_limit, current_p);
               }
+            #ifdef E2K
+             }
+            #endif
             }
             descr <<= 1;
             current_p += sizeof(word);
@@ -822,7 +829,7 @@ GC_INNER mse * GC_mark_from(mse *mark_stack_top, mse *mark_stack,
     {
 #     define PREF_DIST 4
 
-#     ifndef SMALL_CONFIG
+#     if !defined(SMALL_CONFIG) && !defined(E2K)
         word deferred;
 
         /* Try to prefetch the next pointer to be examined ASAP.        */
@@ -859,7 +866,11 @@ GC_INNER mse * GC_mark_from(mse *mark_stack_top, mse *mark_stack,
         /* Empirically, unrolling this loop doesn't help a lot. */
         /* Since PUSH_CONTENTS expands to a lot of code,        */
         /* we don't.                                            */
+        #ifdef E2K
+        if (!E2K_LOAD_TAGGED(current_p, current)) {
+        #else
         current = *(word *)current_p;
+        #endif
         FIXUP_POINTER(current);
         PREFETCH(current_p + PREF_DIST*CACHE_LINE_SIZE);
         if (current >= (word)least_ha && current < (word)greatest_ha) {
@@ -876,10 +887,13 @@ GC_INNER mse * GC_mark_from(mse *mark_stack_top, mse *mark_stack,
           PUSH_CONTENTS((ptr_t)current, mark_stack_top,
                         mark_stack_limit, current_p);
         }
+        #ifdef E2K
+        }
+        #endif
         current_p += ALIGNMENT;
       }
 
-#     ifndef SMALL_CONFIG
+#     if !defined(SMALL_CONFIG) && !defined(E2K)
         /* We still need to mark the entry we previously prefetched.    */
         /* We already know that it passes the preliminary pointer       */
         /* validity test.                                               */
@@ -1590,8 +1604,12 @@ GC_API void GC_CALL GC_push_all_eager(void *bottom, void *top)
       lim = t - 1 /* longword */;
       for (p = b; (word)p <= (word)lim;
            p = (word *)(((ptr_t)p) + ALIGNMENT)) {
+        #ifdef E2K
+        REGISTER word q;
+        if (!E2K_LOAD_TAGGED(p, q))
+        #else
         REGISTER word q = *p;
-
+        #endif
         GC_PUSH_ONE_STACK(q, p);
       }
 #   undef GC_greatest_plausible_heap_addr
diff --git a/mark_rts.c b/mark_rts.c
index 7d166ed..dd0b558 100644
--- a/mark_rts.c
+++ b/mark_rts.c
@@ -642,11 +642,12 @@ STATIC void GC_push_conditional_with_exclusions(ptr_t bottom, ptr_t top,
     }
 }
 
-#ifdef IA64
+#if defined(IA64) || defined(E2K)
   /* Similar to GC_push_all_stack_sections() but for IA-64 registers store. */
   GC_INNER void GC_push_all_register_sections(ptr_t bs_lo, ptr_t bs_hi,
                   int eager, struct GC_traced_stack_sect_s *traced_stack_sect)
   {
+  #ifndef E2K
     while (traced_stack_sect != NULL) {
         ptr_t frame_bs_lo = traced_stack_sect -> backing_store_end;
         GC_ASSERT((word)frame_bs_lo <= (word)bs_hi);
@@ -658,6 +659,7 @@ STATIC void GC_push_conditional_with_exclusions(ptr_t bottom, ptr_t top,
         bs_hi = traced_stack_sect -> saved_backing_store_ptr;
         traced_stack_sect = traced_stack_sect -> prev;
     }
+  #endif
     GC_ASSERT((word)bs_lo <= (word)bs_hi);
     if (eager) {
         GC_push_all_eager(bs_lo, bs_hi);
@@ -833,6 +835,19 @@ STATIC void GC_push_current_stack(ptr_t cold_gc_frame,
                 /* don't have to worry about the boundary.              */
               }
 #       endif
+#       ifdef E2K
+              /* We also need to push procedure stack store.            */
+              /* Procedure stack grows up, so we can't use              */
+              /* GC_push_all_stack_partially_eager.                     */
+              {
+                ptr_t bs_lo = NULL, bs_hi;
+                size_t stack_size = GC_get_procedure_stack(&bs_lo);
+                bs_hi = bs_lo + stack_size;
+                GC_push_all_register_sections(bs_lo, bs_hi, TRUE,
+                                              GC_traced_stack_sect);
+                free(bs_lo);
+              }
+#       endif
 #   endif /* !THREADS */
 }
 
diff --git a/misc.c b/misc.c
index fe4605b..137fedd 100644
--- a/misc.c
+++ b/misc.c
@@ -83,7 +83,7 @@ GC_INNER GC_bool GC_debugging_started = FALSE;
 
 ptr_t GC_stackbottom = 0;
 
-#ifdef IA64
+#if defined(IA64) || defined(E2K)
   ptr_t GC_register_stackbottom = 0;
 #endif
 
@@ -1226,11 +1226,11 @@ GC_API void GC_CALL GC_init(void)
         || defined(GC_WIN32_THREADS) || defined(GC_SOLARIS_THREADS)
       if (GC_stackbottom == 0) {
         GC_stackbottom = GC_get_main_stack_base();
-#       if (defined(LINUX) || defined(HPUX)) && defined(IA64)
+#       if (defined(LINUX) || defined(HPUX)) && defined(IA64) || defined(E2K)
           GC_register_stackbottom = GC_get_register_stack_base();
 #       endif
       } else {
-#       if (defined(LINUX) || defined(HPUX)) && defined(IA64)
+#       if (defined(LINUX) || defined(HPUX)) && defined(IA64) || defined(E2K)
           if (GC_register_stackbottom == 0) {
             WARN("GC_register_stackbottom should be set with GC_stackbottom\n", 0);
             /* The following may fail, since we may rely on             */
@@ -2201,7 +2201,7 @@ STATIC void GC_do_blocking_inner(ptr_t data, void * context GC_ATTR_UNUSED)
     struct blocking_data * d = (struct blocking_data *) data;
     GC_ASSERT(GC_is_initialized);
     GC_ASSERT(GC_blocked_sp == NULL);
-#   ifdef SPARC
+#   ifdef SPARC || defined(E2K)
         GC_blocked_sp = GC_save_regs_in_stack();
 #   else
         GC_blocked_sp = (ptr_t) &d; /* save approx. sp */
@@ -2212,7 +2212,7 @@ STATIC void GC_do_blocking_inner(ptr_t data, void * context GC_ATTR_UNUSED)
 
     d -> client_data = (d -> fn)(d -> client_data);
 
-#   ifdef SPARC
+#   ifdef SPARC || defined(E2K)
         GC_ASSERT(GC_blocked_sp != NULL);
 #   else
         GC_ASSERT(GC_blocked_sp == (ptr_t)(&d));
diff --git a/os_dep.c b/os_dep.c
index 6eee61a..54addd5 100644
--- a/os_dep.c
+++ b/os_dep.c
@@ -404,6 +404,53 @@ GC_INNER char * GC_get_maps(void)
 
 #endif /* NEED_PROC_MAPS */
 
+#ifdef E2K
+#include <errno.h>
+#include <sys/syscall.h>
+#include <asm/e2k_syswork.h>
+
+#define E2K_GET_PS_INDEX() ({               \
+  long psp_lo, psp_hi, pshtp;               \
+  char tmp = 0, val;                        \
+  asm volatile(                             \
+    "1: ldb %4, 0, %0, mas = 7\n"           \
+    "rrd %%pshtp, %1\n"                     \
+    "rrd %%psp.lo, %2\n"                    \
+    "rrd %%psp.hi, %3\n"                    \
+    "{\n"                                   \
+      "stb %4, 0, %0, mas = 2\n"            \
+      "rbranch 1b\n"                        \
+    "}\n"                                   \
+    : "=&r"(val), "=&r"(pshtp),             \
+      "=&r"(psp_lo), "=&r"(psp_hi)          \
+    : "r"(&tmp));                           \
+  (psp_lo & ~(~0L << 48)) +                 \
+     (unsigned)psp_hi +                     \
+     ((pshtp << 52) >> 51);                 \
+})
+
+#define E2K_READPS(idx, ptr, n, ret) \
+  syscall(__NR_access_hw_stacks, E2K_READ_PROCEDURE_STACK, idx, ptr, n, ret)
+
+size_t GC_get_procedure_stack(ptr_t *ps_buf) {
+  long ps_idx, cur_size, ps_size;
+  int ret;
+  do ps_idx = E2K_GET_PS_INDEX();
+  while (E2K_READPS(&ps_idx, NULL, 0, &ps_size) == -1 && errno == EAGAIN);
+  if (!(*ps_buf = realloc(*ps_buf, ps_size))) ABORT("realloc failed");
+  while ((ret = E2K_READPS(&ps_idx, *ps_buf, ps_size, &cur_size)) == -1 &&
+      errno == EAGAIN)
+    ps_idx = E2K_GET_PS_INDEX();
+  if (ret || cur_size != ps_size) ABORT("E2K_READ_PROCEDURE_STACK failed");
+  return ps_size;
+}
+
+ptr_t GC_save_regs_in_stack(void) {
+  asm volatile("flushr");
+  return (ptr_t)E2K_GET_PS_INDEX();
+}
+#endif
+
 #if defined(SEARCH_FOR_DATA_START)
   /* The I386 case can be handled without a search.  The Alpha case     */
   /* used to be handled differently as well, but the rules changed      */
@@ -1017,7 +1064,14 @@ GC_INNER size_t GC_page_size = 0;
                     }
                     result -= MIN_PAGE_SIZE; /* no underflow expected */
                 }
+#ifdef E2K
+                {
+                  word tmp;
+                  if (E2K_LOAD_TAGGED(result, tmp)) break;
+                }
+#else
                 GC_noop1((word)(*result));
+#endif
             }
         }
         GC_reset_fault_handler();
@@ -1100,6 +1154,13 @@ GC_INNER size_t GC_page_size = 0;
       return result;
     }
 # endif /* IA64 */
+# ifdef E2K
+    GC_INNER ptr_t GC_get_register_stack_base(void)
+    {
+      ptr_t result = GC_find_limit(GC_save_regs_in_stack(), FALSE);
+      return result;
+    }
+# endif
 
   STATIC ptr_t GC_linux_main_stack_base(void)
   {
@@ -1373,6 +1434,14 @@ GC_INNER size_t GC_page_size = 0;
         RESTORE_CANCEL(cancel_state);
       }
       UNLOCK();
+#   endif
+#   ifdef E2K
+      LOCK();
+      {
+        ptr_t bsp = GC_save_regs_in_stack();
+        b->reg_base = GC_find_limit(bsp, FALSE);
+      }
+      UNLOCK();
 #   endif
     return GC_SUCCESS;
   }
diff --git a/pthread_stop_world.c b/pthread_stop_world.c
index a94d466..71f0b5c 100644
--- a/pthread_stop_world.c
+++ b/pthread_stop_world.c
@@ -226,7 +226,7 @@ STATIC void GC_suspend_handler_inner(ptr_t dummy, void *context);
     ABORT("Bad signal in suspend_handler");
   }
 
-# if defined(IA64) || defined(HP_PA) || defined(M68K)
+# if defined(IA64) || defined(HP_PA) || defined(M68K) || defined(E2K)
     GC_with_callee_saves_pushed(GC_suspend_handler_inner, NULL);
 # else
     /* We believe that in all other cases the full context is already   */
@@ -320,6 +320,14 @@ STATIC void GC_suspend_handler_inner(ptr_t dummy GC_ATTR_UNUSED,
       return;
     }
 # endif
+# ifdef E2K
+      ptr_t stack = NULL;
+      size_t stack_size;
+      GC_save_regs_in_stack();
+      stack_size = GC_get_procedure_stack(&stack);
+      me->backing_store_ptr = stack;
+      me->backing_store_end = stack + stack_size;
+# endif
 
   if (((word)me->stop_info.last_stop_count & ~(word)0x1)
         == (word)my_stop_count) {
@@ -367,6 +375,12 @@ STATIC void GC_suspend_handler_inner(ptr_t dummy GC_ATTR_UNUSED,
   } while (AO_load_acquire(&GC_world_is_stopped)
            && AO_load(&GC_stop_count) == my_stop_count);
 
+# ifdef E2K
+    stack = me->backing_store_ptr;
+    me->backing_store_ptr = NULL;
+    me->backing_store_end = NULL;
+    free(stack);
+# endif
 # ifdef DEBUG_THREADS
     GC_log_printf("Continuing %p\n", (void *)self);
 # endif
@@ -647,6 +661,10 @@ GC_INNER void GC_push_all_stacks(void)
     ptr_t lo, hi;
     /* On IA64, we also need to scan the register backing store. */
     IF_IA64(ptr_t bs_lo; ptr_t bs_hi;)
+#ifdef E2K
+    size_t stack_size;
+    ptr_t bs_lo, bs_hi;
+#endif
     struct GC_traced_stack_sect_s *traced_stack_sect;
     pthread_t self = pthread_self();
     word total_size = 0;
@@ -663,6 +681,13 @@ GC_INNER void GC_push_all_stacks(void)
         traced_stack_sect = p -> traced_stack_sect;
         if (THREAD_EQUAL(p -> id, self)) {
             GC_ASSERT(!p->thread_blocked);
+#           ifdef E2K
+            GC_save_regs_in_stack();
+            bs_lo = NULL;
+            stack_size = GC_get_procedure_stack(&bs_lo);
+            p->backing_store_ptr = bs_lo;
+            p->backing_store_end = bs_hi = bs_lo + stack_size;
+#           endif
 #           ifdef SPARC
                 lo = (ptr_t)GC_save_regs_in_stack();
 #           else
@@ -689,9 +714,17 @@ GC_INNER void GC_push_all_stacks(void)
             hi = GC_stackbottom;
             IF_IA64(bs_lo = BACKING_STORE_BASE;)
         }
+#       ifdef E2K
+        bs_lo = p->backing_store_ptr;
+        bs_hi = p->backing_store_end;
+#       endif
 #       ifdef DEBUG_THREADS
           GC_log_printf("Stack for thread %p = [%p,%p)\n",
                         (void *)p->id, (void *)lo, (void *)hi);
+#         ifdef E2K
+            GC_log_printf("Procedure stack for thread %p = [%p,%p)\n",
+                          (void *)p->id, (void *)bs_lo, (void *)bs_hi);
+#         endif
 #       endif
         if (0 == lo) ABORT("GC_push_all_stacks: sp not set!");
         if (p->altstack != NULL && (word)p->altstack <= (word)lo
@@ -712,7 +745,7 @@ GC_INNER void GC_push_all_stacks(void)
               (ptr_t)(p -> stop_info.reg_storage + NACL_GC_REG_STORAGE_SIZE));
           total_size += NACL_GC_REG_STORAGE_SIZE * sizeof(ptr_t);
 #       endif
-#       ifdef IA64
+#       if defined(IA64) || defined(E2K)
 #         ifdef DEBUG_THREADS
             GC_log_printf("Reg stack for thread %p = [%p,%p)\n",
                           (void *)p->id, (void *)bs_lo, (void *)bs_hi);
@@ -723,6 +756,15 @@ GC_INNER void GC_push_all_stacks(void)
                                         THREAD_EQUAL(p -> id, self),
                                         traced_stack_sect);
           total_size += bs_hi - bs_lo; /* bs_lo <= bs_hi */
+#         ifdef E2K
+          if (THREAD_EQUAL(p->id, self)) {
+            bs_lo = p->backing_store_ptr;
+            p->backing_store_ptr = NULL;
+            p->backing_store_end = NULL;
+            free(bs_lo);
+            bs_lo = bs_hi = NULL;
+          }
+#         endif
 #       endif
       }
     }
diff --git a/pthread_support.c b/pthread_support.c
index 8a78efe..d8d1cdf 100644
--- a/pthread_support.c
+++ b/pthread_support.c
@@ -1362,6 +1362,9 @@ GC_INNER void GC_do_blocking_inner(ptr_t data, void * context GC_ATTR_UNUSED)
 #   if defined(SPARC) || defined(IA64)
         ptr_t stack_ptr = GC_save_regs_in_stack();
 #   endif
+#   ifdef E2K
+        GC_save_regs_in_stack();
+#   endif
 #   if defined(GC_DARWIN_THREADS) && !defined(DARWIN_DONT_PARSE_STACK)
         GC_bool topOfStackUnset = FALSE;
 #   endif
diff --git a/tests/test.c b/tests/test.c
index 8ec8c13..f44f28a 100644
--- a/tests/test.c
+++ b/tests/test.c
@@ -1402,7 +1402,7 @@ void run_one_test(void)
         FAIL;
       }
       if (!TEST_FAIL_COUNT(1)) {
-#       if!(defined(POWERPC) || defined(IA64)) || defined(M68K)
+#       if!(defined(POWERPC) || defined(IA64)) || defined(M68K) || defined(E2K)
           /* On POWERPCs function pointers point to a descriptor in the */
           /* data segment, so there should have been no failures.       */
           /* The same applies to IA64.  Something similar seems to      */

@ivmai
Copy link
Owner Author

ivmai commented Jan 12, 2022

Okay, I will check your new version carefully (today or tomorrow).

Also notice the if (E2K_LOAD_TAGGED (result, tmp)) break; before GC_noop1 ((word) (* result));, which allows to scan memory without throwing exceptions.

Good point. But it turned out that scanning of memory is not needed at all on E2K (because reg_base is not used).

@ilyakurdyukov
Copy link

Good point. But it turned out that scanning of memory is not needed at all on E2K (because reg_base is not used).

Have you already disabled this scan? This SIGSEGV exception is annoying when debugging because I have to skip it every time.

@ilyakurdyukov
Copy link

I think it's illegal:

            "1:\n\t"                                            \
            "ldb [%[tmp] + 0x0] 0x7, %[val]\n\t"                \
            "rrd %%pshtp, %[pshtp]\n\t"                         \
            "rrd %%psp.lo, %[psp_lo]\n\t"                       \
            "rrd %%psp.hi, %[psp_hi]\n\t"                       \
            "stb [%[tmp] + 0x0] 0x2, %[val]\n\t"                \
            "ibranch 1b ? %%MLOCK\n"                            \

You removed the curly braces, but they mean that the commands inside must be placed in the same VLIW command. Because MLOCK must be activated by stb from the same command. "rbranch X" is short for "ibranch X ? %%MLOCK".

    "{\n"                                   \
      "stb %4, 0, %0, mas = 2\n"            \
      "rbranch 1b\n"                        \
    "}\n"                                   \

@ilyakurdyukov
Copy link

Update AUTHORS file (add Ilya Kurdyukov)

This is mostly not my patch, I just ported the old MCST patch for gc-7.2d on newer versions, with some minor improvements.

ivmai added a commit that referenced this issue Jan 12, 2022
Issue #411 (bdwgc).

* mark_rts.c [E2K && __clang__ && !CPPCHECK] (GC_approx_sp): Set sp to
&sp (instead of __builtin_frame_address(0)); update comment.
ivmai added a commit that referenced this issue Jan 12, 2022
Issue #411 (bdwgc).

Do not mark objects if its pointer tag is non-zero, not only in
GC_mark_from and GC_push_all_eager, but also in add_back_edges,
GC_ignore_self_finalize_mark_proc, GC_typed_mark_proc and
GC_push_conditional_eager.

* backgraph.c [MAKE_BACK_GRAPH] (add_back_edges): Change type of
current_p local variable from word* to ptr_t.
* typd_mlc.c (GC_typed_mark_proc): Likewise.
* backgraph.c [MAKE_BACK_GRAPH] (add_back_edges): Use
LOAD_WORD_OR_CONTINUE() instead of direct dereference of current_p.
* finalize.c (GC_ignore_self_finalize_mark_proc): Likewise.
* mark.c [WRAP_MARK_SOME && PARALLEL_MARK] (GC_push_conditional_eager):
Likewise.
* typd_mlc.c (GC_typed_mark_proc): Likewise.
* finalize.c (GC_ignore_self_finalize_mark_proc): Rename q and r local
variables to current_p and q, respectively.
ivmai pushed a commit that referenced this issue Jan 12, 2022
(refactoring)

Issue #411 (bdwgc).

* mach_dep.c [E2K] (e2k_rwap_lo_fields, e2k_rwap_lo_u,
e2k_rwap_hi_fields, e2k_rwap_hi_u): Remove type definition.
* mach_dep.c [E2K] (get_stack_index): Change type of psp_lo and psp_hi
local variables to word; do not initialize val local variable; simplify
asm code; add FIXME about placing stb and rbranch into one VLIW
command; extract bits from psp_lo and psp_hi not using structs.
* mach_dep.c [E2K] (get_stack_index): Undefine after last usage.
ivmai added a commit that referenced this issue Jan 12, 2022
Issue #411 (bdwgc).

* pthread_stop_world.c [E2K || IA64] (GC_push_all_stacks): Assert bs_lo
and bs_hi are non-null (before calling GC_push_all_register_sections).
* pthread_support.c [E2K] (GC_do_blocking_inner,
GC_call_with_gc_active): Declare bs_lo and stack_size local variables.
* pthread_support.c [E2K] (GC_do_blocking_inner): Call
GC_get_procedure_stack() before LOCK; remove FIXME; set
backing_store_end and backing_store_ptr before setting thread_blocked
to true; call free(backing_store_end) (and clear backing_store_end and
backing_store_ptr) after setting thread_blocked to false.
* pthread_support.c [E2K] (GC_call_with_gc_active): Call
free(backing_store_end) (and clear backing_store_end and
backing_store_ptr) before setting thread_blocked to false; call
GC_save_regs_in_stack() and GC_get_procedure_stack() after return from
fn(client_data) but before LOCK; set backing_store_end and
backing_store_ptr just before setting thread_blocked to true.
ivmai added a commit that referenced this issue Jan 12, 2022
Issue #411 (bdwgc).

Introduce USE_PTR_HWTAG macro to re-enable H/W tag usage.

* include/private/gc_priv.h [E2K] (LOAD_TAGGED_VALUE): Do not define
unless USE_PTR_HWTAG.
* include/private/gc_priv.h [E2K] (LOAD_WORD_OR_CONTINUE): Do not use
LOAD_TAGGED_VALUE() unless USE_PTR_HWTAG.
@ivmai
Copy link
Owner Author

ivmai commented Jan 12, 2022

You removed the curly braces, but they mean that the commands inside must be placed in the same VLIW command. Because MLOCK must be activated by stb from the same command. "rbranch X" is short for "ibranch X ? %%MLOCK".

Right, I have missed this concept. But at the same time the original variant (and your updated one) with the {} gives an error during compilation:

clang -I include -c mach_dep.c
/tmp/mach_dep-889f32.s: Сообщения ассемблера:
/tmp/mach_dep-889f32.s:40: Ошибка: мусор в конце строки, первый нераспознанный символ «%»

For now, I've placed a FIXME. Let's think together how to fix it properly. Maybe just insert proper number of nops?

@ivmai
Copy link
Owner Author

ivmai commented Jan 12, 2022

Good point. But it turned out that scanning of memory is not needed at all on E2K (because reg_base is not used).

Have you already disabled this scan? This SIGSEGV exception is annoying when debugging because I have to skip it every time.

Yes, I've disabled it because it is not executed anymore. So, no annoying SIGSEGV exception in gdb.

@ivmai
Copy link
Owner Author

ivmai commented Jan 12, 2022

Update AUTHORS file (add Ilya Kurdyukov)

This is mostly not my patch, I just ported the old MCST patch for gc-7.2d on newer versions, with some minor improvements.

Yes, I know it. I'd like to say thank you to those MCST guys who added initial support of E2K but unfortunately the old gc-7.2d patch is just a diff.

@ivmai
Copy link
Owner Author

ivmai commented Jan 12, 2022

Here is the latest patch, the E2K inclusions are much more compact.

I have reviewed your updated patch. The only part which taken to master is c783a98 (keeping FIXME instead of {} for now). Other parts are either already in master or should not be taken.

@ivmai
Copy link
Owner Author

ivmai commented Jan 12, 2022

Let me explain in more details the dropped parts of your commit (and ask some questions):

  • Makefile.direct and tests/test.c changes are not needed anymore
  • # define E2K_LOAD_TAGGED(ptr, ret) ({ - it uses GNU extension ({...}), I've change it, but finally I've disabled it by default (see 5d831bf). Do you know we reason E2K_LOAD_TAGGED is used (except for the one case you mentioned in GC_find_limit)?
  • # define NEED_FIND_LIMIT - not needed anymore
  • if !defined(SMALL_CONFIG) && !defined(E2K) - I keep it as is but why do we need to disable prefetching?
  • GC_save_regs_in_stack(void) { asm volatile("flushr"); return (ptr_t)E2K_GET_PS_INDEX(); - I've replaced the return value to NULL (as not used)
  • GC_get_procedure_stack(ptr_t *ps_buf) - I've observed ENOMEM so the final implementation is more tricky (realloc the buffer if ENOMEM returned by the syscall)
  • GC_find_limit(GC_save_regs_in_stack(), FALSE); - not needed (both in GC_get_register_stack_base and in GC_get_stack_base)
  • GC_register_stackbottom = ... - not needed
  • GC_blocked_sp = GC_save_regs_in_stack(); - not needed
  • me->backing_store_ptr = stack; me->backing_store_end = stack + stack_size - I've swapped _ptr and _end variables (end now is bottom of the save register stack, lowest addr)

@ilyakurdyukov
Copy link

ilyakurdyukov commented Jan 13, 2022

clang -I include -c mach_dep.c

Because you are using Clang instead of the LCC compiler? Using Clang with the Elbrus backend is highly discouraged because this backend compiles some places incorrectly (led to runtime errors) and should only be used for complex C++ code that cannot be compiled with the EDG frontend (LCC based on this).

About that assembly inline in E2K_GET_PS_INDEX(), the rrd instruction can only be used once in a wide command, so it is not possible to read all three required variables at once. Therefore, these reading and writing are there, this is the synchronization code, if something goes out of sync during the reading process, then a jump is performed and they are read again until they are read atomically.

It seems to me that this should be an unlikely event, but since the MCST did this, then there is a possibility. I think that because you removed the curly quotes - the jump will not work if one of the variables has changed during the read. And without it, the code might be unstable.

@ilyakurdyukov
Copy link

Do you know we reason E2K_LOAD_TAGGED is used (except for the one case you mentioned in GC_find_limit)? :

For ptr64 mode, these tags indicate that a page of memory is available (so accessing it will not trigger SIGSEGV). I don't know the exact reason.

@ilyakurdyukov
Copy link

Here is a workaround that works with both LCC and Clang. But I strongly do not recommend using the buggy LLVM backend for Elbrus. Using this backend with -O1 gives more stable but slower code.

#define E2K_GET_PS_INDEX1(esc) ({           \
  long psp_lo, psp_hi, pshtp;               \
  char tmp = 0, val;                        \
  asm volatile(                             \
    "1: ldb %4, 0, %0, mas = 7\n"           \
    "rrd %%pshtp, %1\n"                     \
    "rrd %%psp.lo, %2\n"                    \
    "rrd %%psp.hi, %3\n"                    \
    esc "{\n"                               \
      "stb %4, 0, %0, mas = 2\n"            \
      "rbranch 1b\n"                        \
    esc "}\n"                               \
    : "=&r"(val), "=&r"(pshtp),             \
      "=&r"(psp_lo), "=&r"(psp_hi)          \
    : "r"(&tmp));                           \
  (psp_lo & 0xffffffffffff) +               \
     (unsigned)psp_hi +                     \
     ((pshtp << 52) >> 51);                 \
})

#ifdef __clang__
#define E2K_GET_PS_INDEX() E2K_GET_PS_INDEX1("%")
#else
#define E2K_GET_PS_INDEX() E2K_GET_PS_INDEX1("")
#endif

@ivmai
Copy link
Owner Author

ivmai commented Jan 13, 2022

But I strongly do not recommend using the buggy LLVM backend for Elbrus. Using this backend with -O1 gives more stable but slower code.

Got it, clang is just for test purpose for now.

ivmai added a commit that referenced this issue Jan 13, 2022
(fix of commit 9ddbbae)

Issue #411 (bdwgc).

* include/private/gc_priv.h (LOCAL_VAR_INIT_OK): Define macro (to "=0"
if CPPCHECK).
* include/private/gc_priv.h [E2K && USE_PTR_HWTAG]
(LOAD_WORD_OR_CONTINUE): Attribute tag local variable with
LOCAL_VAR_INIT_OK.
ivmai pushed a commit that referenced this issue Jan 13, 2022
(fix of commit 9ddbbae)

Issue #411 (bdwgc).

The stb and rbranch instructions in get_stack_index() must be packed
into the same VLIW command (because MLOCK must be activated by stb from
the same VLIW command; "rbranch X" is short for "ibranch X? %%MLOCK").

* mach_dep.c [E2K] (VLIW_CMD_BRACES_PREFIX): Define macro; undefine it
near undef of get_stack_index.
* mach_dep.c [E2K] (get_stack_index): Group stb and rbranch into one
VLIW command; prefix "{" and "}" with VLIW_CMD_BRACES_PREFIX (to
workaround asm code generation issue in clang-9); remove FIXME.
@ivmai
Copy link
Owner Author

ivmai commented Jan 13, 2022

Here is a workaround that works with both LCC and Clang.

Committed as 2cc754b

@ilyakurdyukov
Copy link

I asked MCST to read this thread and help if they can.

ivmai added a commit that referenced this issue Feb 1, 2022
(fix of commit 9ddbbae)

Issue #411 (bdwgc).

Use of malloc/free inside an asynchronous signal is not safe and may
lead to a crash inside malloc, e.g. if the signal occurs when malloc
is called from pthread_create.  Also, malloc cannot be used by the
code executed during stop-the-world if the collector is built with
enabled malloc redirection.

This commit replaces malloc/free used to get procedure (register)
stack with alloca or GC_INTERNAL_MALLOC (the latter is not used during
stop-the-world).

This commit also prevents accessing freed procedure stack buffers from
GC_push_all_stacks when the world is not stopped.

* include/private/gc_priv.h [E2K] (GC_get_procedure_stack): Change the
arguments to accept buf and its size; update the comment.
* include/private/gc_priv.h [E2K] (PROCEDURE_STACK_ALLOCA_AND_STORE):
Define macro.
* include/private/gc_priv.h [E2K && THREADS]
(GC_alloc_and_get_procedure_stack): New function declaration.
* include/private/pthread_support.h [E2K || IA64]
(GC_Thread_Rep.backing_store_end): Add comment.
* mach_dep.c [E2K] (GC_get_procedure_stack): Change the implementation
to according to the updated declaration (do allocate the buffer
internally); remove FIXME.
* mach_dep.c [E2K && THREADS] (GC_alloc_and_get_procedure_stack):
Implement.
* mark_rts.c [!THREADS] (GC_push_current_stack): Use
PROCEDURE_STACK_ALLOCA_AND_STORE() instead of GC_get_procedure_stack();
do no call free().
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && E2K]
(GC_push_all_stacks): Likewise.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && E2K]
(GC_store_stack_ptr): Do not declare bs_lo and stack_size local
variables; do not call GC_get_procedure_stack().
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && E2K]
(GC_suspend_handler_inner): Declare bs_lo and stack_size local
variables; call PROCEDURE_STACK_ALLOCA_AND_STORE(); set
backing_store_end and backing_store_ptr fields of me (and clear these
fields before function return); do not call free().
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && E2K]
(GC_push_all_stacks): Define is_stopped local variable; move
stack_size local variable declaration to the innermost scope (where
it is used); do not call GC_push_all_register_sections() if the
procedure stack buffer has been already freed.
* pthread_support.c (first_thread): Update comment; move upper to be
before GC_push_thread_structures.
* pthread_support.c [E2K] (GC_push_thread_structures): Push also
first_thread.backing_store_end.
* pthread_support.c [E2K] (GC_do_blocking_inner,
GC_call_with_gc_active): Do no declare bs_lo local variable; call
GC_alloc_and_get_procedure_stack (holding the lock) instead of
GC_get_procedure_stack; call GC_INTERNAL_FREE() instead of free();
update comment.
ivmai added a commit that referenced this issue Feb 3, 2022
(fix of commit 990dcdb)

Issue #411 (bdwgc).

This commit prevents stack overflow in case of a big procedure stack
to be saved.

If a procedure stack requires less than 1 memory page for the buffer,
then the buffer is still allocated on the local stack.

* include/private/gc_priv.h [E2K] (PS_ALLOCA_BUF, ALLOCA_SAFE_LIMIT,
FREE_PROCEDURE_STACK_LOCAL): Define macro.
* include/private/gc_priv.h [E2K] (PROCEDURE_STACK_ALLOCA_AND_STORE):
Rename to GET_PROCEDURE_STACK_LOCAL; update comment; modify
implementation to call GC_mmap_procedure_stack_buf (and
GC_unmap_procedure_stack_buf) if size > ALLOCA_SAFE_LIMIT.
* include/private/gc_priv.h [E2K] (GC_mmap_procedure_stack_buf,
GC_unmap_procedure_stack_buf): Declare function.
* include/private/gc_priv.h [E2K && THREADS]
(GC_alloc_and_get_procedure_stack): Refine comment.
* mach_dep.c [E2K]: Include sys/mman.h.
* mach_dep.c [E2K] (GC_mmap_procedure_stack_buf,
GC_unmap_procedure_stack_buf): Implement.
* mark_rts.c [E2K && !THREADS] (GC_push_current_stack): Rename
PROCEDURE_STACK_ALLOCA_AND_STORE to GET_PROCEDURE_STACK_LOCAL; call
FREE_PROCEDURE_STACK_LOCAL() when bs_lo is not needed.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && E2K]
(GC_suspend_handler_inner, GC_push_all_stacks): Likewise.
ivmai added a commit that referenced this issue Feb 7, 2022
(fix of commit 73b67ba)

Issue #411 (bdwgc).

* include/private/gc_priv.h [E2K && !CPPCHECK] (ALLOCA_SAFE_LIMIT):
Change value from GC_page_size-16 to HBLKSIZE*3.
ivmai added a commit that referenced this issue Mar 14, 2022
(cherry-pick of commits 9ddbbae, 7797312, c783a98, 8c549d3,
2cc754b, 5ed92fe, 8527650 from 'master')

Issue #411 (bdwgc).

* include/gc.h [__GNUC__ && !__INTEL_COMPILER && __e2k__]
(GC_reachable_here): Specify "r" constraint for ptr.
* include/gc.h [__e2k__] (GC_stack_base): Declare reg_base field.
* include/private/gc_priv.h [E2K] (GC_push_all_register_sections,
GC_save_regs_in_stack, GC_get_procedure_stack): Declare.
* include/private/gc_priv.h [E2K && __ptr64__] (LOAD_TAGGED_VALUE):
Define macro.
* include/private/gc_priv.h (LOCAL_VAR_INIT_OK,
LOAD_WORD_OR_CONTINUE): Likewise.
* include/private/gcconfig.h [LINUX && __e2k__] (E2K, mach_type_known):
Likewise.
* include/private/gcconfig.h [E2K] (MACH_TYPE, CPP_WORDSZ, ALIGNMENT,
HBLKSIZE): Likewise.
* include/private/gcconfig.h [E2K && LINUX] (DATASTART): Likewise.
* mach_dep.c [E2K] (VA_SIZE, E2K_PSHTP_SIZE, get_stack_index):
Likewise.
* mach_dep.c [E2K] (VLIW_CMD_BRACES_PREFIX): Define macro; undefine it
near undef of get_stack_index.
* mach_dep.c [E2K] (get_stack_index): Undefine after last usage.
* include/private/gcconfig.h [GC_GNUC_PREREQ(2,8)]
(HAVE_BUILTIN_UNWIND_INIT): Do not define if E2K.
* include/private/gcconfig.h [E2K && LINUX] (__dso_handle): Declare
extern variable.
* include/private/pthread_support.h [E2K] (GC_Thread_Rep): Declare
backing_store_end and backing_store_ptr fields.
* mach_dep.c [E2K]: Include errno.h, sys/syscall.h, asm/e2k_syswork.h.
* mach_dep.c [E2K] (GC_get_procedure_stack, GC_save_regs_in_stack):
Implement.
* mach_dep.c [E2K] (GC_with_callee_saves_pushed): Call
GC_save_regs_in_stack() (not saving the result).
* mark.c (GC_mark_from, GC_push_all_eager): Use LOAD_WORD_OR_CONTINUE()
instead of direct dereference of current_p.
* mark.c [!SMALL_CONFIG] (GC_mark_from): Do not prefetch if E2K.
* mark_rts.c [E2K && __clang__ && !CPPCHECK] (GC_approx_sp): Set sp to
&sp (instead of __builtin_frame_address(0)); update comment.
* mark_rts.c [E2K] (GC_push_all_register_sections): Implement but
ignore traced_stack_sect (add TODO item).
* mark_rts.c [!THREADS && E2K] (GC_push_current_stack): Call
GC_get_procedure_stack() and GC_push_all_register_sections().
* misc.c [E2K] (GC_call_with_stack_base): Initialize reg_base to 0.
* misc.c [!THREADS && E2K] (GC_do_blocking_inner,
GC_get_my_stackbottom): Likewise.
* os_dep.c [((HAVE_PTHREAD_ATTR_GET_NP || HAVE_PTHREAD_GETATTR_NP)
&& THREADS || !HAVE_GET_STACK_BASE) && E2K] (GC_get_stack_base):
Likewise.
* pthread_support.c [E2K] (GC_get_my_stackbottom): Likewise.
* pthread_stop_world.c [E2K] (GC_suspend_handler): Call
GC_with_callee_saves_pushed().
* pthread_stop_world.c [E2K] (GC_store_stack_ptr): Call
GC_save_regs_in_stack() and GC_get_procedure_stack().
* pthread_stop_world.c [E2K] (GC_suspend_handler_inner): Call
free(me->backing_store_end) before return.
* pthread_stop_world.c [E2K] (GC_push_all_stacks): Declare bs_lo,
bs_hi, stack_size local variables; call GC_save_regs_in_stack() and
GC_get_procedure_stack() (and free() at the end) for self thread;
call GC_push_all_register_sections().
* pthread_stop_world.c [E2K || IA64] (GC_push_all_stacks): Assert bs_lo
and bs_hi are non-null (before calling GC_push_all_register_sections).
* pthread_support.c [E2K] (GC_do_blocking_inner,
GC_call_with_gc_active): Declare bs_lo and stack_size local variables.
* pthread_support.c [E2K] (GC_do_blocking_inner): Call
GC_get_procedure_stack() before LOCK; set backing_store_end and
backing_store_ptr before setting thread_blocked to true; call
free(backing_store_end) (and clear backing_store_end and
backing_store_ptr) after setting thread_blocked to false.
* pthread_support.c [E2K] (GC_call_with_gc_active): Call
free(backing_store_end) (and clear backing_store_end and
backing_store_ptr) before setting thread_blocked to false; call
GC_save_regs_in_stack() and GC_get_procedure_stack() after return
from fn(client_data) but before LOCK; set backing_store_end and
backing_store_ptr just before setting thread_blocked to true.

Co-authored-by: Ivan Maidanski <[email protected]>
ivmai added a commit that referenced this issue Mar 28, 2022
(fix of commits 990dcdb, b3fa164)

Issue #411 (bdwgc).

* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && E2K]
(GC_push_all_stacks): Declare stack_size right before
GET_PROCEDURE_STACK_LOCAL() call.
ivmai added a commit that referenced this issue Apr 17, 2022
(fix of commit 9ddbbae)

Issue #411 (bdwgc).

Also, add the corresponding TODO item for IA64 and SPARC.

* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL]
(GC_store_stack_ptr): Add TODO item.
* pthread_support.c [SPARC || IA64] (do_blocking_enter): Likewise.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && E2K]
(GC_store_stack_ptr): Remove GC_save_regs_in_stack() call.
* pthread_support.c [E2K] (do_blocking_enter): Likewise.
ivmai added a commit that referenced this issue Apr 17, 2022
Issue #411 (bdwgc).

* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && E2K]
(GC_suspend_handler_inner): Add assertion that me->backing_store_end
is null before GET_PROCEDURE_STACK_LOCAL().
ivmai added a commit that referenced this issue Apr 21, 2022
(cherry-pick of commits 990dcdb, 73b67ba, 68619d5, c277053, 353babf,
 82c8af1 from 'master')

Issue #411 (bdwgc).

Use of malloc/free inside an asynchronous signal is not safe and may
lead to a crash inside malloc, e.g. if the signal occurs when malloc
is called from pthread_create.  Also, malloc cannot be used by the
code executed during stop-the-world if the collector is built with
enabled malloc redirection.

This commit replaces malloc/free used to get procedure (register)
stack with alloca or mmap, or GC_INTERNAL_MALLOC (the latter is not
used during stop-the-world).

This commit also prevents accessing freed procedure stack buffers from
GC_push_all_stacks when the world is not stopped.

* include/private/gc_priv.h [E2K] (GC_get_procedure_stack): Change the
arguments to accept buf and its size; update the comment.
* include/private/gc_priv.h [E2K] (PS_ALLOCA_BUF, ALLOCA_SAFE_LIMIT,
GET_PROCEDURE_STACK_LOCAL, FREE_PROCEDURE_STACK_LOCAL): Define macro.
* include/private/gc_priv.h [E2K] (GC_mmap_procedure_stack_buf,
GC_unmap_procedure_stack_buf): Declare function.
* include/private/gc_priv.h [E2K && THREADS]
(GC_alloc_and_get_procedure_stack): New function declaration.
* include/private/pthread_support.h [E2K || IA64]
(GC_Thread_Rep.backing_store_end): Add comment.
* mach_dep.c [E2K]: Include sys/mman.h.
* mach_dep.c [E2K] (GC_get_procedure_stack): Change the implementation
to according to the updated declaration (do allocate the buffer
internally).
* mach_dep.c [E2K && THREADS] (GC_alloc_and_get_procedure_stack):
Implement.
* mach_dep.c [E2K] (GC_mmap_procedure_stack_buf,
GC_unmap_procedure_stack_buf): Implement.
* mark_rts.c [!THREADS] (GC_push_current_stack): Use
GET_PROCEDURE_STACK_LOCAL() instead of GC_get_procedure_stack();
do not call free().
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && E2K]
(GC_push_all_stacks): Likewise.
* mark_rts.c [E2K && !THREADS] (GC_push_current_stack): Call
FREE_PROCEDURE_STACK_LOCAL() when bs_lo is not needed.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && E2K]
(GC_suspend_handler_inner, GC_push_all_stacks): Likewise.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL]
(GC_store_stack_ptr): Add TODO item.
* pthread_support.c [SPARC || IA64] (do_blocking_enter): Likewise.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && E2K]
(GC_store_stack_ptr): Do not declare bs_lo and stack_size local
variables; do not call GC_save_regs_in_stack() and
GC_get_procedure_stack().
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && E2K]
(GC_suspend_handler_inner): Declare bs_lo and stack_size local
variables; add assertion that me->backing_store_end is null before
GET_PROCEDURE_STACK_LOCAL(); call GET_PROCEDURE_STACK_LOCAL(); set
backing_store_end and backing_store_ptr fields of me (and clear these
fields before function return); do not call free().
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && E2K]
(GC_push_all_stacks): Define is_stopped local variable; move
stack_size local variable declaration to the innermost scope (where
it is used); do not call GC_push_all_register_sections() if the
procedure stack buffer has been already freed.
* pthread_support.c (first_thread): Update comment; move upper to be
before GC_push_thread_structures.
* pthread_support.c [E2K] (GC_push_thread_structures): Push also
first_thread.backing_store_end.
* pthread_support.c [E2K] (GC_do_blocking_inner,
GC_call_with_gc_active): Do not declare bs_lo local variable; call
GC_alloc_and_get_procedure_stack (holding the lock) instead of
GC_get_procedure_stack; call GC_INTERNAL_FREE() instead of free();
update comment.
@ivmai
Copy link
Owner Author

ivmai commented Jul 8, 2022

Currently the procedure stack is saved from the signal handler:
GC_suspend_handler_inner calls
GET_PROCEDURE_STACK_LOCAL allocates temp buffer and calls
GC_get_procedure_stack calls
get_stack_index(&ps) and
syscall(__NR_access_hw_stacks, E2K_READ_PROCEDURE_STACK, ...) // fetches PS to temp buffer

Q: Is it possible to call syscall(__NR_access_hw_stacks, E2K_READ_PROCEDURE_STACK, ...) to get some other's thread PS?

@ilyakurdyukov
Copy link

I don't know, probably not, for security reasons.

@ivmai
Copy link
Owner Author

ivmai commented Jul 8, 2022

I don't know, probably not, for security reasons.

Hello Ilya,
I'm discussing this with the e2k specialists directly.

@ivmai
Copy link
Owner Author

ivmai commented Jul 8, 2022

Fix stb asm instruction placement in E2K get_stack_index

This W/A probably won't be needed in the future Clang versions (e.g. clang-13).
But, anyway, we could get rid of asm completely (i.e. rely only on syscalls).

@ivmai
Copy link
Owner Author

ivmai commented Jul 8, 2022

(psp_lo & 0xffffffffffff) + (unsigned)psp_hi + ((pshtp << 52) >> 51)

This could be rewritten as: psp_lo(base) + psp_hi(ind) + 2*signed(psphtp).
But, anyway, we could get rid of asm completely (i.e. rely only on syscalls).

@ivmai
Copy link
Owner Author

ivmai commented Jul 8, 2022

About stack size:
ulimit -s gives (in KB):
16384

@ivmai
Copy link
Owner Author

ivmai commented Jul 8, 2022

About stack size: ulimit -s gives (in KB): 16384

Thus, ALLOCA_SAFE_LIMIT could be greatly increased (from 12KB).

@ivmai
Copy link
Owner Author

ivmai commented Jul 8, 2022

Currently the procedure stack is saved from the signal handler: GC_suspend_handler_inner calls GET_PROCEDURE_STACK_LOCAL allocates temp buffer and calls GC_get_procedure_stack calls get_stack_index(&ps) and syscall(__NR_access_hw_stacks, E2K_READ_PROCEDURE_STACK, ...) // fetches PS to temp buffer

Q: Is it possible to call syscall(__NR_access_hw_stacks, E2K_READ_PROCEDURE_STACK, ...) to get some other's thread PS?

It should be possible using ptrace() but expected to be very slow (because of fetching PS area 1 word by 1 ptrace call).

ivmai added a commit that referenced this issue Jul 14, 2022
Issue #411 (bdwgc).

Use E2K_GET_PROCEDURE_STACK_SIZE and E2K_READ_PROCEDURE_STACK_EX
sub-functions (of access_hw_stacks syscall) instead of the legacy
E2K_READ_PROCEDURE_STACK one which required computation of ps_index
manually.

* mach_dep.c [E2K] (VA_SIZE, E2K_PSHTP_SIZE, VLIW_CMD_BRACES_PREFIX,
get_stack_index): Remove macro.
* mach_dep.c [E2K] (GC_get_procedure_stack): Initialize new_sz only
if buf is null; define stack_ofs local variable instead of ps; add
assertion that buf_sz (and new_sz) is multiple of word size; use
E2K_GET_PROCEDURE_STACK_SIZE sub-function instead of
E2K_READ_PROCEDURE_STACK to get H/W procedure stack size; do not use
get_stack_index(); use E2K_READ_PROCEDURE_STACK_EX sub-function (with
stack_ofs=0) instead of E2K_READ_PROCEDURE_STACK to read H/W procedure
stack from the given stack_ofs; print buf_sz (in addition to errno) in
case of error and not EAGAIN.
* mach_dep.c [E2K && LOG_E2K_ALLOCS] (GC_get_procedure_stack): Remove
code.
* mach_dep.c [E2K && THREADS] (GC_alloc_and_get_procedure_stack): Add
TODO item.
ivmai added a commit that referenced this issue Jul 14, 2022
(fix of commits 73b67ba, 68619d5)

Issue #411 (bdwgc).

* include/private/gc_priv.h [E2K && !CPPCHECK] (ALLOCA_SAFE_LIMIT):
Change value from HBLKSIZE*3 to HBLKSIZE*256.
@ivmai
Copy link
Owner Author

ivmai commented Jul 15, 2022

Do not use assembly in GC_get_procedure_stack (E2K) (commit (d3f5bdc)

It led to EINVAL sometimes, now it is fixed by commit 41d41c3

ivmai added a commit that referenced this issue Jul 15, 2022
(a cherry-pick of commits d3f5bdc, 41d41c3 from 'master')

Issue #411 (bdwgc).

Use E2K_GET_PROCEDURE_STACK_SIZE and E2K_READ_PROCEDURE_STACK_EX
sub-functions (of access_hw_stacks syscall) instead of the legacy
E2K_READ_PROCEDURE_STACK one which required computation of ps_index
manually.  Check the stack size right before reading the stack (to
avoid EINVAL).

* mach_dep.c [E2K] (VA_SIZE, E2K_PSHTP_SIZE, VLIW_CMD_BRACES_PREFIX,
get_stack_index): Remove macro.
* mach_dep.c [E2K] (GC_get_procedure_stack): Initialize new_sz only
if buf is null; define stack_ofs local variable instead of ps; add
assertion that new_sz is non-zero and multiple of word size; use
E2K_GET_PROCEDURE_STACK_SIZE sub-function instead of
E2K_READ_PROCEDURE_STACK to get H/W procedure stack size (call
E2K_GET_PROCEDURE_STACK_SIZE sub-function right before
E2K_READ_PROCEDURE_STACK_EX one to check that the procedure stack size
has not changed since the previous call of
GC_get_procedure_stack(NULL,0)); do not use get_stack_index(); use
E2K_READ_PROCEDURE_STACK_EX sub-function (with stack_ofs=0) instead of
E2K_READ_PROCEDURE_STACK to read H/W procedure stack from the given
stack_ofs; print new_sz (in addition to errno) in case of error and
not EAGAIN; add comment.
* mach_dep.c [E2K && LOG_E2K_ALLOCS] (GC_get_procedure_stack): Remove
code.
* mach_dep.c [E2K && THREADS] (GC_alloc_and_get_procedure_stack): Add
TODO item.
ivmai added a commit that referenced this issue Jul 15, 2022
(fix of commits 73b67ba, 68619d5)

Issue #411 (bdwgc).

* include/private/gc_priv.h [E2K && !CPPCHECK] (ALLOCA_SAFE_LIMIT):
Change value from HBLKSIZE*3 to HBLKSIZE*256.
@ivmai ivmai closed this as completed Aug 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants