Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting TracePoint and GC.stat #79

Open
wks opened this issue Jul 22, 2024 · 1 comment
Open

Supporting TracePoint and GC.stat #79

wks opened this issue Jul 22, 2024 · 1 comment

Comments

@wks
Copy link

wks commented Jul 22, 2024

TracePoint is a mechanism to install hooks and count various GC-related events in a code block. GC.stat returns the internal statistics of the GC. They are currently not (well) supported when using MMTk. That has caused some test cases to fail.

Failing test cases

Tests related to TracePoint, mainly TestTracepointObj#test_tracks_objspace_events and TestTracepointObj#test_tracks_objspace_count, failed for various reasons.

In Debug mode, the gc_trace_point() call in obj_free attempts to call GET_EC() to find the current execution context of the current mutator thread. However, when using MMTk, obj_free is executed by GC worker threads which do not have execution contexts. It crashes because of SIGSEGV.

In Release mode, some counts are different from the expected value. Currently, when using MMTk, we do not call gc_event_trace() during object allocation, so the number of newobj is always observed as 0.

The test case TestTracepointObj#test_tracks_objspace_count also reads free_count, gc_start_count, gc_end_mark_count and gc_end_sweep_count. They are not implemented, either. It also reads from GC.stat, and that doesn't have the required keys when using MMTk, either

Supporting TracePoint

TracePoint is based on gc_event_hook. The GC-related code cals gc_event_hook at various places. Those events are defined in event.h:

  • RUBY_INTERNAL_EVENT_NEWOBJ
  • RUBY_INTERNAL_EVENT_FREEOBJ (calling obj_free)
  • RUBY_INTERNAL_EVENT_GC_START
  • RUBY_INTERNAL_EVENT_GC_END_MARK
  • RUBY_INTERNAL_EVENT_GC_END_SWEEP
  • RUBY_INTERNAL_EVENT_GC_ENTER
  • RUBY_INTERNAL_EVENT_GC_EXIT

It is not hard to add hooks to newobj_of. Checking gc_event_newobj_hook_needed_p(objspace) and callling gc_event_hook_prep(objspace, RUBY_INTERNAL_EVENT_NEWOBJ, obj, newobj_zero_slot(obj)); is sufficient to get the newobj_count number correct.

Other places can be supported similarly. There are things need to be changed.

  • Some events are emitted by GC workers. For example, obj_free is now executed by GC worker threads which do not have "execution context", and the GC-start, GC-end events are related to GC workers, too.
  • Some events do not make sense for MMTk. For example,
    • When not using MarkSweep, GC_END_MARK and GC_END_SWEEP won't make sense.
    • GC_ENTER and GC_EXIT are CRuby's default GC's notion of whether the VM is "in GC". In the current definition, the VM is also considered "in GC" when a mutator is executing finalizers. But in MMTk, the VM is "in GC" since all mutators have stopped, until the GC ends.

It is useful to have something like TracePoint for debugging. But it should be adapted to MMTk, or other different GCs, too.

Supporting GC.stat

GC.stat extracts GC-specific statistics. MMTk internally keeps various statistics, too, and the data is used by harness_begin and harness_end. To bridge Ruby's GC.stat with MMTk, we just need to expose the needed API and call into mmtk-core

About testing

Because the statistics collected from TracePoint and GC.stat can be GC-specific, test cases should be written in a way generic to all GCs, or written specifically for each GC implementation.

However, The VM may do optimization that changes the number of objects allocated. For example, if the VM detects a function or a code block never changes a String argument, it may reuse the same String instance instead of allocating a new instance each time. So the number of object the following code snippet (inspired by TestTracepointObj#test_tracks_objspace_count) allocates may vary if the JIT compiler or the interpreter optimizes the code.

100.times { "" }
200.times { puts "Hello world!".length }

If the VM reuses the same "Hello world!" instance, there will be no objects allocated, or only one object (the String "Hello world!" itself) allocated.

So test cases depending on the implementation details of the GC or potential JIT compiler optimizations may fail mysteriously.

@peterzhu2118
Copy link
Collaborator

For TracePoint, I think the priority is to get it working at all. The GC TracePoint are not available on the Ruby level (because executing the event cannot allocate objects), and are only available for C extensions. Therefore, I don't think it's a priority and it's used fairly rarely (mostly used by profilers).

For example, consider the following Ruby code:

tp = TracePoint.new(:line) { puts "new line!" }
tp.enable
puts "hello"
puts "world"

On regular Ruby, it outputs:

new line!
hello
new line!
world

Whereas on MMTk, the event does not fire:

hello
world

This is because MMTk does not support heap walking which is required to turn on tracing for all of the iseq objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants