Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heap traversal #1174

Merged
merged 36 commits into from
Aug 8, 2024
Merged

Heap traversal #1174

merged 36 commits into from
Aug 8, 2024

Conversation

wks
Copy link
Collaborator

@wks wks commented Jul 24, 2024

This PR adds a heap traversal API MMTK::enumerate_objects which enumerates all objects in the MMTk heap at the time of calling.

We added SideMetadataSpec::scan_non_zero_values to support enumerating objects by scanning the VO bit metadata. It can also be used for scanning other side metadata when needed. Note, however, that it is inefficient (but should work correctly) if the metadata is not contiguous or the metadata has more than one bit per region. If there is any need for scanning such metadata, we need further refactoring.

@wks
Copy link
Collaborator Author

wks commented Jul 24, 2024

This is my first attempt to create a heap traversal API. The implementation is probably right, but we may need to refactor our current mechanism for visiting metadata in bulk (including reading, zeroing, setting, copying, as well as finding the last non-zero value, and finding all non-zero values in a range for heap traversal). The main problem is that iterate_meta_bits has two call-back closures, and they can't share mutable borrows well. We have already seen its consequence because find_prev_non_zero_value_fast needs to use a Cell to store the return value. Heap traversal will need to use a Cell or RefCell for similar reason, and that won't perform well.

And things can be further simplified if we unify ObjectReference and the "in-object address" (see #1170). Then the linear scanner will no longer depend on the <VM: VMBinding> type (for using ObjectReference::from_address<VM>).

@wks wks mentioned this pull request Jul 31, 2024
@wks wks changed the title Heap traversal (the first attempt) Heap traversal Aug 6, 2024
wks added 5 commits August 6, 2024 15:50
Just use start and end for now.  In the future, we may introduce a
proper `AddressRange` type and do a larger scale refactoring.
It should have been removed while merging with master.
@wks wks marked this pull request as ready for review August 8, 2024 01:37
@wks
Copy link
Collaborator Author

wks commented Aug 8, 2024

... The main problem is that iterate_meta_bits has two call-back closures, and they can't share mutable borrows well. We have already seen its consequence because find_prev_non_zero_value_fast needs to use a Cell to store the return value. Heap traversal will need to use a Cell or RefCell for similar reason, and that won't perform well.

I reimplemented it upon #1181 and now we no longer use RefCell.

Benchmark (taskset -c 0 cargo bench --features test_private -- bscan) shows that it can scan an Immix block with 200 objects in 268 ns in the steady state. In reality, it may take much longer due to cache misses.

GitHub CI reported an added API function SideMetadataSpec::scan_non_zero_values which is intended. It did not report the added public method MMTK::enumerate_objects, probably because it is guarded by the feature "vo_bit".

@wks wks requested a review from qinsoon August 8, 2024 01:47
Copy link
Member

@qinsoon qinsoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an old ObjectIterator, which is very similar to the simple version of scan_non_zero_values (expect that it skips the object size when it finds an object). After introducing trait ObjectEnumerator, it becomes confusing that we have ObjectIterator and ObjectEnuemrator and they have nothing to do with each other.

It would be good if we can simply remove ObjectIterator and replace its uses with the newly introduced enumerator. If this cannot be done in this PR, we should at least document the code to state what we plan to do with the old ObjectIterator.

/// overhead is OK if we call it a block at a time because scanning the VO bits will dominate
/// the execution time. For LOS, it will be cheaper to enumerate individual objects than
/// scanning VO bits because it is sparse.
fn enumerate_objects_coarse(&self, enumerator: &mut dyn ObjectEnumerator);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just call it enumerate_objects. Emphasizing 'coarse' doesn't seem to be necessary. Some spaces like LOS just enumerates each object. For those spaces, there is no coarse vs fine granularity.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it back to enumerate_objects.

/// This function searches the side metadata for the data address range from `data_start_addr`
/// (inclusive) to `data_end_addr` (exclusive). The data address range must be fully mapped.
///
/// `visit_data` is called for each data address that has non-zero side metadata.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not precise. 0x1000 to 0x1007 may all have the side metadata bit set. The method is only called for the lowest address in the region (or the address that is alinged to the region)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. That means I still need to use the IN_OBJECT_ADDRESS_OFFSET constant and alignment requirement to find the right ObjectReference. I'll fix it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some methods in vo_bit that could be helpful to you. Also in the side metadata implementation, you don't need to know about ObjectReference. You can do that in the object_enum module.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now use vo_bit::get_object_ref_for_vo_addr to get object reference from raw Address.

src/util/object_enum.rs Show resolved Hide resolved
src/mmtk.rs Outdated
Comment on lines 498 to 500
/// trigger GC while enumerating objects. The VM binding may use the callback of this function
/// to save all visited objects in an array and let the user visit the array after this function
/// returned.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how it would be helpful to save the visited objects and visit them later. If the GC is happening, the visited objects may be reclaimed during this call, and the saved object references may become invalid.

As this function uses VO bit (which is changed during GC and allocation), and the function does not block GC and does not block allocation, it should be undefined behavior if GC or allocation happens at the same time. We cannot guarantee anything if GC/allocation is happenning. We may call the closure for an object that seems valid, but when the closure visits the object, the object may be reclaimed. We may visit objects that are copied (but their VO bits are not yet cleared). We may miss objects that are just allocated.

Copy link
Collaborator Author

@wks wks Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how it would be helpful to save the visited objects and visit them later. If the GC is happening, the visited objects may be reclaimed during this call, and the saved object references may become invalid.

The VM binding needs to treat the saved list of object references as roots. In the way, if GC happens, those objects will be guaranteed to live through the next GC.

The list may contain the last reference to some objects because the object may be unlinked from their original parents after the last GC. That's expected. Such heap traversal APIs are allowed to pick up objects that have become unreachable, finalized, etc., and the VM binding needs to be aware of that. If such unreachable objects points to other objects, their outgoing edges will still remain valid because (1) if GC has not happened, their edges remain valid, and (2) if GC happens, those objects are rooted and will be traced and forwarded.

If the VM can update the roots (in the same way updating JNI handles or other native roots), the saved list can be updated if a copying GC happens. Alternatively, if the VM only intends to support plans that support pinning, it can treat the list as pinning roots, and those objects will not be moved while iterating through the list.

I have tested it on Ruby by manually triggering a GC between saving the list (as a pinning root) and iterating through the list, and also triggering GC after visiting every few objects. It works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you said does not contradict with what I said. I suggested that 'it should be undefined behavior if GC or allocation happens at the same time (during the enumerating function call)'. After the enumerating function, the binding can do whatever they need to make an 'illusion' that the user is enumerating objects during a GC. But this function itself is not allowed during GC.

Suggested change
/// trigger GC while enumerating objects. The VM binding may use the callback of this function
/// to save all visited objects in an array and let the user visit the array after this function
/// returned.
/// trigger GC while enumerating objects. The VM binding may use the callback of this function
/// to save all visited objects in an array outside GCs and let the user visit the array after this function
/// returned. The binding needs to carefully keep the object references in the array alive and updated properly if
/// a GC happens.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it should have undefined behavior if GC happens.

I thought allocation should be benign because allocation strictly adds VO bits but does not remove VO bits. But after thinking about it, I find that it is easier to just specify that it has undefined behavior if allocation happens concurrently, too. Reasons include

  • We currently accesses the VO bits at different granularities, including bit, byte, word, etc. The interaction of such memory accesses in a multi-threaded environment is not well-defined in the C++11 or Rust memory model. So it is hard to reason what the reader will see (or whether it has undefined behavior) if the read and the write happens concurrently.
  • In a multi-threaded program, different threads may not agree upon a global total order in which different VO bits are set. (Thread 1 may see bit X is set before bit Y, but Thread2 may see bit Y is set before bit X.) So "all bits that have been set when enumerate_objects is called" is not precisely defined unless we make all threads stop, or at least synchornize with the current thread.

It is OK for Ruby because we currently only supports a single Ractor (in which only one thread is active at any time). For OpenJDK, it will require a VMOperation to stop the world before enumerating objects.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the comment. I made it clear that it is a data race if either allocation or GC happens concurrently with enumerate_objects. I also added some comments on keeping the saved object references in roots if the VM binding implements an API that allows allocation while enumerating objects.

@wks
Copy link
Collaborator Author

wks commented Aug 8, 2024

We have an old ObjectIterator, which is very similar to the simple version of scan_non_zero_values (expect that it skips the object size when it finds an object). After introducing trait ObjectEnumerator, it becomes confusing that we have ObjectIterator and ObjectEnuemrator and they have nothing to do with each other.

It would be good if we can simply remove ObjectIterator and replace its uses with the newly introduced enumerator. If this cannot be done in this PR, we should at least document the code to state what we plan to do with the old ObjectIterator.

I think linear_scan::ObjectIterator works similarly as the algorithm in this PR. ObjectIterator outputs ObjectReference, too, and it also scans from low to high addresses. A minor difference is that ObjectIterator attempts to jump a longer distance by calling ObjectModel::get_current_size. I am not sure whether ObjectModel::get_current_size is faster than blindly scanning the VO bits at word granularity. We need benchmarks for that.

Since removing ObjectIterator will involve refactoring MarkCompact, too, I'll do it in a separate PR.

wks added 3 commits August 8, 2024 14:53
State that it is an undefined behavior if it races with allocation or
GC.

Added comments on saving a list of object references keeping those
references alive in subsequent GCs.
Copy link
Member

@qinsoon qinsoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wks wks enabled auto-merge August 8, 2024 10:04
@wks wks added this pull request to the merge queue Aug 8, 2024
Merged via the queue into mmtk:master with commit 25051ea Aug 8, 2024
31 of 33 checks passed
@wks wks deleted the feature/each-object branch August 8, 2024 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants