Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prolific types #1143

Open
wks opened this issue Jun 3, 2024 · 1 comment
Open

Prolific types #1143

wks opened this issue Jun 3, 2024 · 1 comment

Comments

@wks
Copy link
Collaborator

wks commented Jun 3, 2024

In some VMs, the majority of objects are small objects of the same type. To conserve space, VMs usually wish to allocate such objects in a dedicated space so that they can identify such objects from the space they are in, and eliminate their headers. This technique is sometimes known as "Big Bag Of Pages", a.k.a. BiBOP.

The most notable examples are LISP dialects, where the majority of all objects are "cons cells". Each cons cell has two (tagged) pointer fields, head and tail (a.k.a. car and cdr). Because cons cells are small (only two words), adding another one-word header (one word is usually the minimum alignment of the allocator anyway) will result in 50% space overhead.

Currently, the MMTk core does not expose any public API for this.

Existing work

Shuf et al. described a way to support prolific types in JikesRVM in this paper. They did not eliminate the header completely, but reduced the header overhead from two words to one word. They argued that the BiBOP approach has locality problems because objects allocated together temporally are located far apart.

Dedicated spaces

One way to support such dedicated space is adding a space to an existing plan.

Take the Immix plan as an example. It has one ImmixSpace. We can add one additional ImmixSpace to it. Prolific objects are allocated into one ImmixSpace, and other objects are allocated into annother. During copying GC, objects remain in the same ImmixSpace they were in.

This solution is straightforward, but is not very general for GC implementers. SemiSpace, for example, has two Copyspace instances. To support one prolific type, we need to add two additional copy spaces. Generational collectors may need a dedicated nursery for prolific types, although Shuf et al. only consider prolific objects as young objects.

This solution is not general for other VMs, either. Other VMs may have more than one prolific types, and VMs can choose plans at start-up time, making it tangled with the implementation of plans.

Per-region metadata

Another way to remove headers is using per-region metadata.

Concretely, we allow some chunks to only hold objects of one type. We allow the VM binding to add a one-word metadata to such chunks in order to identify them. We can also make it per-block if the space is block-based (such as ImmixSpace).

The VM binding will need to see two different allocation semantics, one for prolific objects and another for other objects. If there are more than one prolific types, we can use more allocation semantics, or let the alloc method take another argument, i.e. the tag for the prolific type. Internally, each space may need to maintain one BumpPointer per semantics or type tag.

Other metadata

If an object does not have metadata, then all metadata (except forwarding pointers) need to be on the side. For copying collectors, that include the two-bits-per-object forwarding bits. From the previous experiment, we know that side metadata do have performance overhead. It will be a tradeoff between whether we want to save object size or GC performance.

@qinsoon
Copy link
Member

qinsoon commented Jun 3, 2024

I am concerned that the per-region metadata approach would make the code hard to maintain. It would require modification to every allocator to support it, which is a pain. It would be better if we can use a separate space for it, and reuse one of the existing policy for the space. It may be also reasonable to assume the 'prolific types' may use a different GC policy other than the main policy for the plan. It can be seen as something like LOS, or non moving space. Non moving space always uses mark sweep, no matter which plan it is. 'prolific types' is meant to be an optimization -- it sounds unnecessarily general to allow using inefficient policies like semi space for it.

One question we need to answer is if we provide such an allocation semantic, what the semantic would be. So far it seems we need to meet two requirements: 1. the object has no header, 2. MMTk needs to provide a cheap way to check if an object is allocated with this semantic. Please add if I miss anything -- it is important to get right about the semantics.

For 1, hopefully we can allow normal policies to take metadata specs as generic constants so the 'prolific type' space can reuse those policies and can be specialized to only use side metadata.

As for 2, cheap ways to check if an object is allocated with a given semantic, there can be a few ways:

  • bound check: we can use a contiguous space (or discontiguous within certain bounds)
  • SFT calls: we can have one method in SFT to tell if it is a prolific object
  • Side metadata: we can set a bit to tell if an object is 'prolific' in post_alloc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants