Native Image GC Improvements #2386

christianwimmer · 2020-04-23T22:40:53Z

GraalVM Native Image CE currently provides a simple (non-parallel, non-concurrent) stop&copy GC. There are various areas where the GC can be improved. This issue captures ideas. Actual work should be done under separate issues, but linked from this issue so that everyone who wants to work on GC performance gets an overview of who is working on what.

TLAB implementation and sizing algorithm

The heap is divided into chunks, and currently full chunks are used as the TLAB. It is desirable to have a reasonably large chunk size (currently 1 MByte), which is often too big for a TLAB. Especially when many threads are started and some threads have very low allocation rates compared to others, GCs are started too often. It can also lead to pathological cases: when ChunkSize * NumberOfThreads > YoungGenerationSize, there are not enough chunks for all threads and the system starts a GC continuously.

To improve this, the TLAB implementation should be decoupled from the chunk management so that many TLAB can be in one chunk. TLAB size can be adjusted per thread based on the allocation rate of a thread, i.e., threads that allocate a lot still get a whole chunk, while threads that barely allocate get a small TLAB.

Performance improvements in the GC/heap implementation itself

Cluster related image heap objects to improve cache locality and reduce footprint and the amount of scanned objects from remembered set
Optimize object pinning: do not keep other objects in the chunk alive and avoid creating additional objects
Optimize concurrency in low-level memory management (CommittedMemoryProvider)
Use a single survivor space instead of one space per object age to avoid internal fragmentation
Use prefetch instructions before copying an object to hide the memory latency.
Hold more data in the chunk header to avoid querying the space.
Implement exact write barriers (the current write barrier marks the whole object as dirty, which is not a good idea for large object arrays)
Decrease the size of the write barrier by redesigning it (it would be significantly smaller if objects in unaligned & aligned chunks could be treated the same)

Implement a mark&compact GC for the old generation

The stop&copy GC has a high memory overhead during GC. In the worst case, twice as much memory is needed when the whole heap is reachable, because all objects are copied during a full GC. If the OS cannot provide any memory during GC, then the VM exits because the heap is in an inconsistent state.

For the old generation, a mark&compact algorithm avoids additional memory overhead because compaction happens in place.

Error handling

Throw an out-of-memory error if too much time is spent in the GC.
Handle out-of-memory conditions during VM operations more gracefully.

The text was updated successfully, but these errors were encountered:

fniephaus · 2020-08-10T08:59:37Z

To your list, @christianwimmer, I'd like to add mechanisms that could be useful to language implementors, such as:

Some languages, such as Ruby and Smalltalk, allow the enumeration of instances for a given class (ObjectSpace.each_object(Numeric) and Behavior>>#allInstances). This is often implemented in the GC using the marking phase. In TruffleSqueak and TruffleRuby, custom tracing algorithms written in Java are used to make things work, but a GC-based solution can provide much better performance.
Some languages expose GC mechanisms (e.g. forced full/incremental GC, finalizers, pinning) and statistics (e.g. number of objects, eden size, ...) through some API (e.g. Python's gc module and Ruby's ObjectSpace)
Some languages may support other mechanisms that would be best implemented in the GC. Smalltalk, for example, supports the infamous #become: method to swap the references of objects. Some variants of it (e.g. ProtoObject>>becomeForward:: All variables in the entire system that used to point to the receiver now point to the argument.) are usually implemented in the GC, it would be cool if the SVM GC could provide an API that lets language developers implement such features.

rodrigo-bruno · 2020-08-10T09:35:54Z

Several of these reference or object tracking features are currently provided in HotSpot through the JVMTI. If native-image eventually supports JVMTI (even if partially), these would come naturally.

The downside of most of these features is that the more hooks you add to the GC implementation to call back some user code, the slower the GC becomes (more checks for callbacks) and the more difficult it becomes to update it because you are locked to the user-facing APIs. Tracing APIs (in JVMTI) are an example as they require header space which might also be necessary for the GC. Another issue is with finalizers, which are tricky to implement correctly and efficiently.

Pinning, on the other hand, is a relatively simple feature to implement and can unlock very interesting use-cases where you allow accelerators to read/write directly from your managed heap. In HotSpot only Shenandoah supports pinning but G1 could also support it I think (and thus also native image could).

fniephaus · 2020-08-10T09:43:56Z

Thanks for your comment, @rodrigo-bruno!

I'm somewhat familiar with JVMTI and know that it supports, for example, IterateOverInstancesOfClass. In a Truffle language, however, I'm not sure you can easily access this API. If I understand correctly, JVMTI, as the name suggests, is a tooling API, as in external tools such as VisualVM. I believe you could connect to JVMTI from within a Truffle language, but that feels more like a terrible hack. I tried that at some point but gave up on the idea fairly quickly.

But since this functionality exists, it shouldn't be too hard to provide similar APIs in Truffle, for example through TruffleRuntime.

rodrigo-bruno · 2020-08-10T10:05:28Z

Hi @fniephaus , that is correct, JVMTI was (AFAIK) originally intended to be an interface used by external tools. However, JVMTI is implemented on top of JNI, which you can use to build other abstractions.

I am not super familiar with the Truffle internal workings but couldn't you create an interface on top of JNI and expose it to Truffle languages (some extra plumbing inside Truffle would be required)? Most likely the implementation of such interface would be JVM specific (i.e., one for HotSpot, one for native image).

christianwimmer · 2020-08-10T15:06:20Z

@fniephaus Some features you mention, like a #become to change the class of a Java object, are certainly not desirable because they cannot be implemented in every GC. Offering an API that can only be implemented easily in a simple non-concurrent non-parallel GC are poisonous for future optimizations.

SVM supports object pinning, this is actually a feature that we need in every GC due to the way the low-level C interface is implemented.

christianwimmer · 2020-08-10T15:08:38Z

@rodrigo-bruno

However, JVMTI is implemented on top of JNI, which you can use to build other abstractions.
That is not correct, I would say JNI and JVMTI do not have anything to do with each other. JVMTI is an interface into the Java VM and offers hooks that JNI does not offer you.

rodrigo-bruno · 2020-08-10T15:47:28Z

That is not correct, I would say JNI and JVMTI do not have anything to do with each other. JVMTI is an interface into the Java VM and offers hooks that JNI does not offer you.

Yup, you are right! I was under the impression that the JVMTI greatly relied on existing JNI functions but after checking the code it seems that it is quite tight with the JVM internals.

fniephaus · 2020-08-13T07:00:39Z

@fniephaus Some features you mention, like a #become to change the class of a Java object, are certainly not desirable because they cannot be implemented in every GC. Offering an API that can only be implemented easily in a simple non-concurrent non-parallel GC are poisonous for future optimizations.

Of course, there are always trade-offs. If some APIs needed for such mechanisms are not common enough and possibly hinder future optimizations, it may not make sense to provide them. My main goal here was to give some examples for what kind of things language implementers might want to do with the GC. Forcing incremental/full GCs and enumerating objects should be relatively easy to provide. Another thing to consider is API compatibility between JVM and SVM: it's probably undesirable if a language behaves significantly different when running on the JVM. As an example, there's no proper Java API to force incremental GCs in the JVM at the moment, and forcing full GCs instead may cause a significant performance penalty.

dufoli · 2020-11-10T06:56:48Z

hello, I have worked on mono long time ago and they work a lot on GC. And there is a cool article which describe feature. Some of them can be very interesting.
https://www.mono-project.com/docs/advanced/garbage-collector/sgen/

I know that, at the beginning, they use an external GC (bohem GC) but it was not so accurate. So they choose to implement a new one and it take years to be done. So why not reuse a external library (or fork it?) just my 2 cents

christianwimmer added feature native-image labels Apr 23, 2020

christianwimmer assigned christianwimmer, christianhaeubl and peter-hofer Apr 23, 2020

christianwimmer modified the milestone: 19.3.2 Apr 23, 2020

peter-hofer mentioned this issue Apr 28, 2020

Clean up garbage collector code base. #2412

Closed

peter-hofer mentioned this issue Jun 10, 2020

Always support isolates (remove option SpawnIsolates) to simplify GC improvements. #2558

Closed

fniephaus mentioned this issue Aug 18, 2020

Is there any plan for GC improvement? #1449

Closed

peter-hofer mentioned this issue Jul 14, 2021

New GC policy. #3584

Closed

fniephaus pinned this issue Mar 8, 2022

retrogradeorbit mentioned this issue Aug 26, 2022

RFC: Is this an acceptable direction for reify for Object? babashka/babashka#1347

Closed

4 tasks

fniephaus unpinned this issue Sep 19, 2022

wirthi unassigned christianwimmer Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Native Image GC Improvements #2386

Native Image GC Improvements #2386

christianwimmer commented Apr 23, 2020 •

edited by peter-hofer

Loading

fniephaus commented Aug 10, 2020 •

edited

Loading

rodrigo-bruno commented Aug 10, 2020

fniephaus commented Aug 10, 2020

rodrigo-bruno commented Aug 10, 2020

christianwimmer commented Aug 10, 2020

christianwimmer commented Aug 10, 2020

rodrigo-bruno commented Aug 10, 2020

fniephaus commented Aug 13, 2020

dufoli commented Nov 10, 2020

Native Image GC Improvements #2386

Native Image GC Improvements #2386

Comments

christianwimmer commented Apr 23, 2020 • edited by peter-hofer Loading

TLAB implementation and sizing algorithm

Performance improvements in the GC/heap implementation itself

Implement a mark&compact GC for the old generation

Error handling

fniephaus commented Aug 10, 2020 • edited Loading

rodrigo-bruno commented Aug 10, 2020

fniephaus commented Aug 10, 2020

rodrigo-bruno commented Aug 10, 2020

christianwimmer commented Aug 10, 2020

christianwimmer commented Aug 10, 2020

rodrigo-bruno commented Aug 10, 2020

fniephaus commented Aug 13, 2020

dufoli commented Nov 10, 2020

christianwimmer commented Apr 23, 2020 •

edited by peter-hofer

Loading

fniephaus commented Aug 10, 2020 •

edited

Loading