Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Off-heap buffers #10

Open
3 of 6 tasks
wks opened this issue Aug 29, 2022 · 1 comment
Open
3 of 6 tasks

Off-heap buffers #10

wks opened this issue Aug 29, 2022 · 1 comment

Comments

@wks
Copy link
Collaborator

wks commented Aug 29, 2022

Roadmap

Ruby types to transform: (Items are sorted from the easiest to the hardest.)

  • T_STRING: off-heap array of non-reference byte data
  • T_BIGNUM: off-heap array of non-reference integer data (BDIGIT, defined as unsigned int)
  • T_ARRAY: off-heap array of references
  • T_HASH: off-heap hash table
  • T_OBJECT: off-heap array plus hash table
  • T_MATCH: off-heap arrays holding begin/end of match groups.

Rationale

In Ruby, almost all object types are subject to finalization (i.e. calling obj_free on the object when the object dies). This is not normal.

Most of the time, Ruby call obj_free to free underlying off-heap buffers allocated using xmalloc. Vanilla Ruby does this because their GC cannot allocate objects larger than 40 bytes (extended to 320 bytes with RVARGC). MMTk doesn't have such limitation.

We should fix this problem by transforming those off-heap buffers into on-heap objects.

Attached type information of buffers

To scan those buffers, the buffers must have attached type information, in the form of struct RBasic. We may add a special ruby_value_type, such as RUBY_T_MMTK = 0x17, to indicate it is special MMTk-specific objects, and should be scanned specially. We discussed recently that this approach is wasteful w.r.t. memory space.

Use disjoint objects

An alternative The preferred approach is to keep those buffers "naked" (without header), and extend mmtk-core to support such objects. Julia also has such "naked" buffers..

We have some discussions about "disjoint objects" here: mmtk/mmtk-core#656 . The key is that the header (In Ruby, the RObject, RString, RArray structs...) contains all the necessary type & length & capacity information for scanning both the header itself and the buffer(s) it owns.

This also gives (the current) Ruby (implementation) an opportunity to support object resizing better. Currently, when an object transitions from the "embedded" and the "heap" layout or back, the size of the header object cannot be changed. That's a waste of memory because a 320-byte header can never be smaller even if the array/string only contains a few element. With disjoint objects, at GC time, the VM can decide to split/merge the object into/from the header and the buffer, and resize them if needed.

@wks wks changed the title Remove off-heap buffers Off-heap buffers Sep 7, 2022
@wks
Copy link
Collaborator Author

wks commented Aug 23, 2023

We have transformed T_STRING, T_ARRAY and T_MATCH so that they allocate underlying buffers in the MMTk heap instead of using xmalloc. We have seen noticeable performance improvement (See https://mmtk.zulipchat.com/#narrow/stream/313365-mmtk-ruby/topic/Liquid.20benchmark/near/378359415).

T_OBJECT should be easy to transform, but it is not profitable at this moment because very few T_OBJECT instances have buffers.

T_HASH should be the next target. It may be possible to allocate the st_table into the MMTk heap.

At this moment, obj_free candidate is no longer the main bottleneck for the Liquid benchmark. (It is the fstring table.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant