-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incremental GC collecting referenced objects (gctest fails with soft vdb if shared build by zig) #544
Comments
Looks strange, let's figure out the root cause. Which libgc version? Reproducible on fresh release-8_2 (or master)? Which VDB mode? SOFT_VDB? The issue is not reproducible if compiled with NO_SOFT_VDB, right? Also, try with NO_VDB_FOR_STATIC_ROOTS (instead of NO_SOFT_VDB) - based on your report, the issue is here. (We need to understand why an object referenced from a static root directly is not marked.) |
If you confirm that SOFT_VDB is suspected then this might have the same root cause as issue #479 |
I'm compiling using zig / clang and targetting glibc 2.28. Using git commit daea2f1 so that's relatively close to HEAD of master. I haven't explicitly picked a VDB mode. I think it defaults to SOFT_VDB on Linux, right? So yes, SOFT_VDB!? And yes indeed, compiling with either What does NO_VDB_FOR_STATIC_ROOTS actually mean? That we scan the static root ranges on every GC run, right? But for the rest we rely on VDB? |
Yes
Got it. The issue relates to code within ifndef NO_VDB_FOR_STATIC_ROOTS block.
With SOFT VDB (and PROC_VDB) facility the scanning of static roots could be optimized too (unless NO_VDB_FOR_STATIC_ROOTS is defined) to skip unmodified pages during collections other than full ones. The algorithm of full collections should be re-inspected. |
I need your help with figuring out the root cause.
Could you please check that soft_set_grungy_pages(0x7f9412cca000, 0x7f9412cfc000, ...) is called? (without -D NO_VDB_FOR_STATIC_ROOTS) And when is it called among output lines above? |
It seems the information that reQ_W_125 has been written (i.e. page is dirty) is lost somehow. |
I could debug this but I need a a guide how to setup the environment. Which host OS/arch? |
@ivmai I'm super busy at the moment preparing for a conference next week, I'll get back as soon as I can after that with more info! :) |
@michaellilltokiwa, I moved the issue reported by you in a separate one - #552 |
@plajjan, I've implemented a checker (CHECK_SOFT_VDB) that if some page was identified as dirty by mprotect-based VDB then it should be also identified as dirty by SOFT_VDB implementation, commit b4e1ce5. |
Is the issue not observed if -D NO_SFT_VDB passed to CFLAGS? |
Reproduced with the following build command: Not reproduced with any of:
|
The root cause seems to be the same as in #376 . |
(fix of commits 5b75fae, 4875114, ee00eb1) Issue #544 (bdwgc). * build.zig [build_shared_libs && t.os.tag==.linux] (build): Do not define NO_VDB_FOR_STATIC_ROOTS macro; remove FIXME item. * os_dep.c [SOFT_VDB] (soft_set_grungy_pages): Rename vaddr argument to start; define vaddr and next_fpos_hint local variables; change type of vaddr and next_vaddr from ptr_t to word; initialize vaddr to start rounded down to page granularity; add assertion that start is hblk-aligned; enforce next_vaddr is not greater than limit; enforce h is not smaller than start. * os_dep.c [SOFT_VDB] (GC_soft_read_dirty): Rename vaddr local variable to start. * os_dep.c [SOFT_VDB && !NO_VDB_FOR_STATIC_ROOTS] (GC_soft_read_dirty): Round start argument of soft_set_grungy_pages() down to block granularity.
(a cherry-pick of commit 6601eec from 'master') Issue #544 (bdwgc). * os_dep.c [SOFT_VDB] (soft_set_grungy_pages): Rename vaddr argument to start; define vaddr and next_fpos_hint local variables; change type of vaddr and next_vaddr from ptr_t to word; initialize vaddr to start rounded down to page granularity; add assertion that start is hblk-aligned; enforce next_vaddr is not greater than limit; enforce h is not smaller than start. * os_dep.c [SOFT_VDB] (GC_soft_read_dirty): Rename vaddr local variable to start. * os_dep.c [SOFT_VDB && !NO_VDB_FOR_STATIC_ROOTS] (GC_soft_read_dirty): Round start argument of soft_set_grungy_pages() down to block granularity.
I am successfully using the GC in the Acton language run time system. In order to improve support for large heaps, I have recently started looking at enabling the incremental & generational GC. When I do, by calling GC_enable_incremental(), I get errors, mostly SIGILL but also some SIGSEGV, based on the 100+ apps in the acton test suite.
I have taken one of those programs that consistently crashes with SIGILL. We have a struct pointer reQ_W_125 with a $class field that points to a malloced object on the heap. I have a debug print when we initialize this struct and another debug print when we try to access it later on. For a working program, the pointers will be the same:
That is with the GC enabled but incremental mode turned off. If I enable incremental mode the program crashes and we can see that the $class address has changed. There is nothing in the program that modifies $class, it is set once on startup and then left unchanged.
I ran the program with gdb, put a breakpoint at initialization of reQ_W_125, grabbed the memory address and installed a watch point on that address. We can see how the memory is next modified by reclaim, so I suppose bdwgc has determined that this address it not used and thus freed. GC_PRINT_STATS is also enabled in this output...
I added in GC_dump_named() and can see that the address of reQ_W_125 is in the root set
How come the GC, when run in incremental mode, thinks this memory is not referenced? How do I go about debugging this?
I am running this on Linux x86_64.
This is trivial to reproduce, if interesting, I can provide simple instructions for how to run this using the acton repo.
The text was updated successfully, but these errors were encountered: