Skip to content

Commit

Permalink
Set the lowest bit in object tags (#2801)
Browse files Browse the repository at this point in the history
In compacting GC we need to distinguish a heap location (object or field
address) from object headers. Currently this is done by checking if the value
is smaller than or equal to the largest tag. Because first 64 KiB of the heap
is for Rust stack, as long as the largest tag is smaller than 65,536, we can
assume that values smaller than 65,536 are headers.

This way of checking if a value is a header or an address causes problems when
we want to use rest of the object headers to store more information. Examples:

- In #2706 we will use one bit in the header to mark large objects. At least
  initially, we won't be compacting large objects, so mark-compact GC won't see
  large objects and so won't have to care about large header values. But we may
  want to do compaction on large objects, or store other information (maybe
  mark bits, or generation numbers).

- We may want to store number of untagged (scalar) and tagged fields in object
  headers and merge some of the different object types. For example, instead of
  having 3 tags for `Variant`, `Some`, and `MutBox`, we could have one tag, and
  use rest of the headers to indicate that variants will have one scalar, one
  tagged fields, mutable objects will have just one tagged field, etc.

- We could have `SmallBlob` and `SmallArray` types for blobs and arrays with
  lenghts smaller than 65,535 (16 bits length field). This would save us one
  word for small blobs and arrays.

- We don't have to rely on Rust stack being large enough so that largest tag
  will still be small enough to be a valid address in heap.

In this PR we update tags so that they always have the lowest bit set. Since
objects and fields are all word aligned (so have the lowest 2 bits unset, this
invariant was established in #2764), this allows checking the lowest bit to
distinguish an address from a header. With this we can freely use the rest of
the bits in headers.

While this PR currently does not unblock any PRs, it's nice to have this
flexibility for the future changes, and these changes do not have any
downsides. (mo-rts.wasm grows 0.03%, 58 bytes)
  • Loading branch information
osa1 authored Sep 22, 2021
1 parent 604c89d commit 031dddb
Show file tree
Hide file tree
Showing 3 changed files with 45 additions and 35 deletions.
10 changes: 6 additions & 4 deletions rts/motoko-rts/src/gc/mark_compact.rs
Original file line number Diff line number Diff line change
Expand Up @@ -231,16 +231,18 @@ unsafe fn thread(field: *mut Value) {

/// Unthread all references at given header, replacing with `new_loc`. Restores object header.
unsafe fn unthread(obj: *mut Obj, new_loc: u32) {
// NOTE: For this to work heap addresses need to be greater than the largest value for object
// headers. Currently this holds. TODO: Document this better.
let mut header = (*obj).tag;
while header > TAG_NULL {
// TODO: is `header > TAG_NULL` the best way to distinguish a tag from a pointer?

// All objects and fields are word-aligned, and tags have the lowest bit set, so use the lowest
// bit to distinguish a header (tag) from a field address.
while header & 0b1 == 0 {
let tmp = (*(header as *mut Obj)).tag;
(*(header as *mut Value)) = Value::from_ptr(new_loc as usize);
header = tmp;
}

// At the end of the chain is the original header for the object
debug_assert!(header >= TAG_OBJECT && header <= TAG_NULL);

(*obj).tag = header;
}
32 changes: 17 additions & 15 deletions rts/motoko-rts/src/types.rs
Original file line number Diff line number Diff line change
Expand Up @@ -312,22 +312,24 @@ pub const fn unskew(value: usize) -> usize {
// of an unsafe API usage).
pub type Tag = u32;

// Tags need to have the lowest bit set, to allow distinguishing a header (tag) from object
// locations in mark-compact GC. (Reminder: objects and fields are word aligned)
pub const TAG_OBJECT: Tag = 1;
pub const TAG_OBJ_IND: Tag = 2;
pub const TAG_ARRAY: Tag = 3;
pub const TAG_BITS64: Tag = 5;
pub const TAG_MUTBOX: Tag = 6;
pub const TAG_CLOSURE: Tag = 7;
pub const TAG_SOME: Tag = 8;
pub const TAG_VARIANT: Tag = 9;
pub const TAG_BLOB: Tag = 10;
pub const TAG_FWD_PTR: Tag = 11;
pub const TAG_BITS32: Tag = 12;
pub const TAG_BIGINT: Tag = 13;
pub const TAG_CONCAT: Tag = 14;
pub const TAG_NULL: Tag = 15;
pub const TAG_ONE_WORD_FILLER: Tag = 16;
pub const TAG_FREE_SPACE: Tag = 17;
pub const TAG_OBJ_IND: Tag = 3;
pub const TAG_ARRAY: Tag = 5;
pub const TAG_BITS64: Tag = 7;
pub const TAG_MUTBOX: Tag = 9;
pub const TAG_CLOSURE: Tag = 11;
pub const TAG_SOME: Tag = 13;
pub const TAG_VARIANT: Tag = 15;
pub const TAG_BLOB: Tag = 17;
pub const TAG_FWD_PTR: Tag = 19;
pub const TAG_BITS32: Tag = 21;
pub const TAG_BIGINT: Tag = 23;
pub const TAG_CONCAT: Tag = 25;
pub const TAG_NULL: Tag = 27;
pub const TAG_ONE_WORD_FILLER: Tag = 29;
pub const TAG_FREE_SPACE: Tag = 31;

// Common parts of any object. Other object pointers can be coerced into a pointer to this.
#[repr(C)] // See the note at the beginning of this module
Expand Down
38 changes: 22 additions & 16 deletions src/codegen/compile.ml
Original file line number Diff line number Diff line change
Expand Up @@ -1168,24 +1168,30 @@ module Tagged = struct
| OneWordFiller (* Only used by the RTS *)
| FreeSpace (* Only used by the RTS *)

(* Let's leave out tag 0 to trap earlier on invalid memory *)
(* Tags needs to have the lowest bit set, to allow distinguishing object
headers from heap locations (object or field addresses).
(Reminder: objects and fields are word-aligned so will have the lowest two
bits unset) *)
let int_of_tag = function
| Object -> 1l
| ObjInd -> 2l
| Array -> 3l
| Bits64 -> 5l
| MutBox -> 6l
| Closure -> 7l
| Some -> 8l
| Variant -> 9l
| Blob -> 10l
| Indirection -> 11l
| Bits32 -> 12l
| BigInt -> 13l
| Concat -> 14l
| Null -> 15l
| OneWordFiller -> 16l
| FreeSpace -> 17l
| ObjInd -> 3l
| Array -> 5l
| Bits64 -> 7l
| MutBox -> 9l
| Closure -> 11l
| Some -> 13l
| Variant -> 15l
| Blob -> 17l
| Indirection -> 19l
| Bits32 -> 21l
| BigInt -> 23l
| Concat -> 25l
| Null -> 27l
| OneWordFiller -> 29l
| FreeSpace -> 31l
(* Next two tags won't be seen by the GC, so no need to set the lowest bit
for `CoercionFailure` and `StableSeen` *)
| CoercionFailure -> 0xfffffffel
| StableSeen -> 0xffffffffl

Expand Down

0 comments on commit 031dddb

Please sign in to comment.