Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC spends too much time in marking phase #233

Closed
lomereiter opened this issue Nov 15, 2012 · 9 comments
Closed

GC spends too much time in marking phase #233

lomereiter opened this issue Nov 15, 2012 · 9 comments

Comments

@lomereiter
Copy link

Here are call profiles for a test program unpacking stream of gzipped blocks:

DMD

Incl.   Self    Called      Function
89.26   89.26   134         inflate_fast
 3.93    3.93   285         inflate_table
 2.65    2.51    55         gc.gcx.Gcx.mark
95.11    1.92    76         inflate
...

LDC

Incl.   Self    Called  Function
73.02   72.42   216     gc.gcx.Gcx.mark
24.67   24.67   134     inflate_fast
 1.05    1.05   285     inflate_table
26.28    0.56    76     inflate
...

As you can see, with LDC garbage collector spends ridiculous amount of time on marking!

This behaviour is also seen in this simple example (compile with -O0/-O1):

void main() {
    for (int i = 0; i < 1000; i++)
        auto array = new ubyte[65536];
}

My system is Linux x86_64 if it matters.

@dnadlinger
Copy link
Member

There are two possible causes that come to my mind:

  • Currently, our copy of druntime is built by CMake with optimizations off, which might lead to lots of bounds checking, etc. going on. To find out whether this is the cause of the problem, you could try building LDC from source, setting the D_FLAGS variable in CMake to -d;-O3;-release (I don't know when I'll get around to investigate this, unfortunately).
  • Our GC root ranges may be off, causing a much larger portion of memory to be scanned than with DMD. @alexrp: This might be related to your comments about the code in ldc.memory – I don't suppose you want to investigate what the reason to include proc map scanning was? It seems to be an ancient piece of Tango code (checked in first by Sean in 2006!)

@alexrp
Copy link
Contributor

alexrp commented Nov 15, 2012

@klickverbot presumably the code is there so that static data segments of all loaded ELF files are added as root ranges. I can think of two reasons for that:

  1. To help prevent code breakage when someone forgets to pin an object passed to C code.
  2. To support dynamically loaded shared libraries.

So, point 1 is pretty much moot because we have always expected people to take care of this themselves and most people are perfectly aware of that, and really, it'll just obscure things even more when such issues arise. Point 2 also doesn't hold water because it doesn't help if it's not called every time a shared library is loaded.

I also think that registration of D shared libraries (once we get there) should be handled completely differently (for instance, an ELF constructor that is run by the dynamic linker once the library is loaded, which registers the library (and its roots) with druntime).

I may have overlooked something, but from where I stand, it seems safe to delete all that arcane code in ldc.memory.

@lomereiter
Copy link
Author

@klickverbot I built LDC specifying -O3 -release in runtime/CMakeLists.txt. That didn't make any difference. So it's probably root ranges that are too big.

@lomereiter
Copy link
Author

OK, I found a solution.

  • I commented out line version = GC_Use_Data_Proc_Maps; in runtime/druntime/src/ldc/memory.d. This fixed the GC problem.
  • As it turned out, zlib library is compiled without any optimizations. I added -O3 to C_FLAGS variable in build/runtime/CMakeFiles/phobos-ldc.dir/flags.make and recompiled it.

@dnadlinger
Copy link
Member

@lomereiter: Depending on your system, you will find that your GC now collects live objects. On Linux, I'd expect that TLS globals/statics are no longer scanned.

Working on the optimization flags thing.

@lomereiter
Copy link
Author

Ouch. Then lines of /proc/self/maps should be filtered somehow so that system libraries like libc don't get added.

Using zlib compiled with -O3, I now have following timings for the program I'm trying to optimize:

2.17s - dmd -O -release -inline
2.85s - ldmd2 -O3 -release -inline, with untouched memory.d
2.22s - ldmd2 -O3 -release -inline, manually excluding in memory.d ranges coming from libc, libpthread, librt, ld
2.08s - ldmd2 -O3 -release -inline, with disabled /proc/self/maps scanning (unsafe)

@dnadlinger
Copy link
Member

Two versions of druntime/Phobos are now built, and C files are always compiled with release flags. This of course doesn't touch the memory range problem. I'm afraid this will have to wait until after the release.

dnadlinger added a commit that referenced this issue May 19, 2013
@dnadlinger
Copy link
Member

Another step towards fixing the issue, but the range we add for TLS is still much to big.

Will close when we merge in the 2.063 implementation based on the ELF section headers.

@dnadlinger
Copy link
Member

We use the upstream druntime code for determining the size and location of the TLS segment on Linux now. This should eliminate the problem, and the simple program from the original report now doesn't show the excessive scanning behavior anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants