Skip to content
This repository has been archived by the owner on Oct 12, 2022. It is now read-only.

Add precise gc implementation #1603

Closed
wants to merge 22 commits into from
Closed

Add precise gc implementation #1603

wants to merge 22 commits into from

Conversation

Jebbs
Copy link
Contributor

@Jebbs Jebbs commented Jun 29, 2016

This isn't ready for merging, but I figured this would be a good starting point.

This adds Rainer's precise GC as an implementation. It currently forces the precise gc (for testing).

This is almost an exact copy of the PR found here except where I updated things based on the new GC interface and things needed to get unittests to pass(on Linux).

@PetarKirov
Copy link
Member

You need to modify this, file which tracks binary size regressions, in order to make the tests pass on OSX. The precise GC implementation adds additional metadata to the binary and hence why the test fails. It seems that on OSX binaries are larger than on other platforms (it's the only platform where that test fails currently), so maybe it makes sense use different number for different platforms in addition to (x86 vs x86_64).

You can either increase the size to a larger number than necessary - e.g. 2_300_000 (random guess > 2_113_948), or you can leave this task for when your more less confident that this PR is ready, in order to save yourself back and forth changes there.

@Jebbs
Copy link
Contributor Author

Jebbs commented Jun 30, 2016

Yeah, I saw that. Looks like OSX was the culprit for it being raised last time too. I'd like to try to fix it before increasing it even more, so for now I'll say that will be the last resort for getting the tests to pass.

Maybe @MartinNowak or @rainers can chime in with how that part could be tackled, but the size would probably go down by removing a lot of the duplicate code in the precise implementation.

@rainers
Copy link
Member

rainers commented Jul 1, 2016

I don't like 3500 lines of code duplicated as the two implementations are almost identical. The precise and the conservative versions should be merged. Preciseness could either be a runtime switch of this GC as in the original PR, or it could be made a compile time variable. The initialize function then selects the GC to create according to the config.gc setting.

@rainers
Copy link
Member

rainers commented Jul 1, 2016

I guess there are a few things that could be controversial:

  • gc_emplace: is this a functionality that must be supported by any precise implementation? The changes in rt.lifetime assume this, but I'm not super happy with the amount of memory operations that have to be done to make preciseness work on arrays (which will have to be duplicated in std.array if it should be precise aswell). My first attempts embedded some of the array-semantics in the GC, but that didn't work out well, too.
  • RTInfo has been used by some people to enable custom functionality on each type. Maybe we should make it easier to add multiple compile time hooks, e.g. by making the result a struct. Not sure if this works without adding an indirection. Please also note that the compiler sometimes fails to generate a correct RTInfo to begin with.
  • The changes in rt.aaa are also very much tweaked to this implementation of RTInfo. Not sure if we can do better, but probably the typeinfo should not be rebuilt for each new AA instance, but cached instead. Similar to rt.lifetime: should it be disabled for non-precise GCs?

@@ -3190,11 +3199,24 @@ void __ctfeWriteln(T...)(auto ref T values) { __ctfeWrite(values, "\n"); }
* Create RTInfo for type T
*/

template RTInfoImpl(size_t[] pointers)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, the template should already deduplicate the bitpatterns.

@MartinNowak
Copy link
Member

MartinNowak commented Jul 9, 2016

Just a reminder why we actually need to work on this, these are absolute and relative times for our benchmarks w/ the precise GC.
Unfortunately conmsg and vdparser crash with ldc, suspectedly b/c some objects got freed while still being referenced.
gc_bench_abs
gc_bench_rel

@MartinNowak
Copy link
Member

So did we reach an impasse here?
I guess there is still some room for performance improvements.

@MartinNowak
Copy link
Member

MartinNowak commented Aug 8, 2016

This loop should be the major place to leverage precise information for faster marking.

// void mark()

    Lnext: for (; p1 < p2; p1++)
        {
            auto p = cast(byte *)(*p1); // ouch, loading every word only to find that they're not pointers

            if (p >= minAddr && p < maxAddr)
            {
                if ((cast(size_t)p & ~cast(size_t)(PAGESIZE-1)) == pcache)
                    continue;

                static if(precise) if (p1pool)
                {
                    if (!p1pool.is_pointer.test(p1 - cast(void**)p1pool.baseAddr))
                        continue;
                }

Instead it should be sth. along this line.

for (p1 = bitScanForward(pool.is_pointer, 0); p1 < p2;
         p1 += bitScanForward(pool.is_pointer, p1 - baseAddr))
{
    auto p = cast(byte *)(*p1); // only load relevant data
    if (p < minAddr || p >= maxAddr)
        continue;
}

With bitScanForward being a bsf based function for multiple words.
Could as well implement an even more efficient range interface to loop over in GCBits.byIndex.

foreach (off; pool.isPointer[0 .. p2 - p1].byIndex)
{
    auto p = cast(byte *)(p1 + off); // only load relevant data
    // ...

@Jebbs
Copy link
Contributor Author

Jebbs commented Aug 10, 2016

This currently has some issues due to a weird rebase, but I will fix these soon.

I decided that since my current stuff won't be ready by the end of the GSoC period, I should try to get this in here "as is". I essentially merged the precise and conservative files, with a couple of minor tweaks. The conservative GC is still set as the default, however the precise GC is also an option for those that want/need it.

@@ -335,6 +374,9 @@ class ConservativeGC : GC
}


/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't we remove these empty comments explicitly in one of the recent commits to the GC?

@MartinNowak
Copy link
Member

MartinNowak commented Aug 11, 2016

Let's focus on the big problems first @rainers.

  • Obviously the rebasing, could help you with that if necessary.
  • We need to merge the 2 GCs. If we add a copy, we'll loose the git history and w/ that an unacceptable amount of context, necessary to understand the current implementation.
    Didn't know that was already done.

@MartinNowak
Copy link
Member

MartinNowak commented Aug 11, 2016

BTW, we'll soon have a new BitRange, thanks to @schveiguy. That might speed up marking quite a bit.

@Jebbs
Copy link
Contributor Author

Jebbs commented Aug 11, 2016

Did a proper rebase and added the precise stuff directly into gc/impl/conservative/gc.d instead of adding a copy with the precise parts like last time.

@rainers
Copy link
Member

rainers commented Aug 12, 2016

Let's focus on the big problems first @rainers.

IMO the biggest issue that we might regret later is the introduction of gc_emplace:

  • Is it setting the bar too high for other implementations?
  • Should we restrict it to only support setting the type starting from the base of the allocation? We would need some scheme to describe large arrays with an offset of 16 bytes then.
  • Maybe we should add the BlockAttr argument to gc_emplace for symmetry with malloc. That way we don't have to always assume repition of the type.

@rainers
Copy link
Member

rainers commented Aug 12, 2016

This loop should be the major place to leverage precise information for faster marking.
[...]
With bitScanForward being a bsf based function for multiple words.

I remember having tried something similar (working directly on the dwords of the pointer bitmap) but it didn't have a large effect.
IIRC most of the benchmarks don't spent as much time in the marking phase as has been added during allocation in setPointerBitmap, so we cannot compensate that by improving marking alone. setPointerBitmap should be the hot spot for optimizations.

src/gc/config.d Outdated
@@ -26,7 +26,7 @@ struct Config
{
bool disable; // start disabled
ubyte profile; // enable profiling with summary when terminating program
string gc = "conservative"; // select gc implementation conservative|manual
string gc = "conservative"; // select gc implementation conservative|precise|manual
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please enable the precise GC for running the new code on the auto testers. It should be changed back to conservative before merging, though.

@Jebbs
Copy link
Contributor Author

Jebbs commented Aug 13, 2016

Precise collection is currently enabled, and all platforms are passing except for Win 32 I guess? Two hosts look like they timed out and one couldn't find unittest.exe, or something like that. Any idea what's going on there?

@rainers
I think you mentioned possibly moving the array allocation stuff in lifetime.d into gc.d. I don't know if that is something that can be pursued during my GSoC project, but if things are discussed and laid out I could possibly work on that after it is over. I would like to see this PR merged soon, but I don't want to create more work later on.

@rainers
Copy link
Member

rainers commented Aug 13, 2016

Two hosts look like they timed out

The win32 version seems to be sitting in an infinite loop.

and one couldn't find unittest.exe, or something like that.

The message "The system cannot find the path specified." is actually output by a unittest. The return code translates to 0xc0000005, though. That means there has been an access violaton during execution of unittest.exe.

I couldn't reproduce the crash for Win64, but see one with Win32.

@Jebbs
Copy link
Contributor Author

Jebbs commented Aug 13, 2016

I guess I'll see if I can set up a virtual machine to try to track that down.

@@ -260,6 +294,7 @@ class ConservativeGC : GC
import core.internal.spinlock;
static gcLock = shared(AlignedSpinLock)(SpinLock.Contention.lengthy);
static bool _inFinalizer;
static bool isPrecise=false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After stumbling over a number of unrelated issues (I thought these days are gone) I figured the issue: isPrecise must be __gshared, otherwise stuff allocated in other threads won't get correct pointer information and GC collections triggered outside the main thread are done conservatively!

Long term, these statics should be members instead, probably by merging Gcx into ConservativeGC. Why are these separated to begin with?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little fuzzy on the proper usage of __gshared. Should anything marked __gshared be in a module's global scope or can they still be a member of a struct/class?

Long term, these statics should be members instead, probably by merging Gcx into ConservativeGC. Why are these separated to begin with?

I'm actually planning to work on that after GSoC is over.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little fuzzy on the proper usage of __gshared. Should anything marked __gshared be in a module's global scope or can they still be a member of a struct/class?

They can stay members, but you should remove static, as it might revert the effect of __gshared.

@Jebbs
Copy link
Contributor Author

Jebbs commented Aug 16, 2016

Win32 passes now, but one Linux and one FreeBSD fail for Phobos where they passed yesterday. The Linux one is a permission denied exception, and while I didn't pinpoint the FreeBSD one it looks like a segfault in a regex test. I'll dig around in the FreeBSD log tomorrow to make sure, but is there anything I can do about the permission error for the failed Linux machine?

Edit:
Both failing machines appear to belong to @braddr. Do they do anything different than the other auto tester machines?

@rainers
Copy link
Member

rainers commented Nov 24, 2018

Rebased as I needed this for Visual D and a newer compiler version. Nothing new here, sorry if this triggered too many people for review, again.

@thewilsonator
Copy link
Contributor

doctester fails with

src/core/bitop.d(407): Warning: Ddoc: parameter count mismatch, expected 3, got 0
src/core/bitop.d(407):        Note that the format is `param = description`

Mind also the build kite failures.

thewilsonator and others added 2 commits November 25, 2018 09:25
AA: reduce impact of precise GC if disabled
@rainers
Copy link
Member

rainers commented Nov 29, 2018

Mind also the build kite failures.

Precise GC needs an update in ocean: sociomantic-tsunami/ocean#663

"precise" should probably not be merged as the default, though.

@thewilsonator
Copy link
Contributor

Thanks, let me know when that goes through and this is ready.

@rainers rainers added the GC garbage collector label Dec 24, 2018
@rainers
Copy link
Member

rainers commented Sep 28, 2019

Since #2418 has been merged a while ago, I guess this can be closed. Thanks @Jebbs for contributing to the precise GC,

@rainers rainers closed this Sep 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
GC garbage collector
Projects
None yet
Development

Successfully merging this pull request may close these issues.