Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ask from binding if GC is disabled #1075

Merged
merged 19 commits into from
Feb 7, 2024

Conversation

udesou
Copy link
Contributor

@udesou udesou commented Jan 23, 2024

MMTk currently supports disabling GC (a.k.a. allowing the heap to grow above its limit) via two api functions: disable_collection and enable_collection.

This PR replaces this strategy since currently, MMTk relies on the binding ensuring thread safety when calling those functions. In this refactoring, MMTk asks if GC is disabled in VM::VMCollection::is_collection_disabled() function, and expects any thread safety logic to be implemented in the VM itself, and not necessarily depend on MMTk.

@udesou udesou requested a review from wks January 24, 2024 03:23
@wks
Copy link
Collaborator

wks commented Jan 25, 2024

If the VM needs to disable GC to implement critical sections where GC activities such as moving object may affect the correctness of the program (usuallly due to native programs accessing buffers in heap objects), the easiest way is simply telling the current thread not to yield. Of course this approach only works for STW GC. p.s. It is always better to pin objects for this purpose instead of relying on disabling GC.

The mechanism introduced in this PR (i.e. letting mmtk-core ask the binding whether it should trigger GC) is still not sound for implementing critical sections. For example,

  • Mutator1 attempted to allocate a large object and triggered GC. Mutator1 blocked itself for GC.
  • A GC thread responded and asked the VM binding if it was OK to do GC.
  • The VM binding answered yes.
  • The GC thread started GC, and was just about to stop all mutators.
  • Mutator2 entered a critical section, and native C code started reading from an array in a MMTk heap object.
  • Mutator2 executed a yieldpoint and yielded.
  • GC started tracing, and moved the heap object.

The problem is that one mutator is still active after another mutator triggered GC and a GC thread responded.

@qinsoon
Copy link
Member

qinsoon commented Jan 25, 2024

I don't think this PR is trying to introduce something that can be used for implementing critical section. It is trying to replace the current enable/disable_collection API with a callback so that MMTk does not need to know about GC enable state and just asks the binding for it. arguably this will simply MMTk's implementation.

src/memory_manager.rs Outdated Show resolved Hide resolved
@udesou
Copy link
Contributor Author

udesou commented Jan 25, 2024

I don't think this PR is trying to introduce something that can be used for implementing critical section. It is trying to replace the current enable/disable_collection API with a callback so that MMTk does not need to know about GC enable state and just asks the binding for it. arguably this will simply MMTk's implementation.

@qinsoon is correct.

The mechanism introduced in this PR (i.e. letting mmtk-core ask the binding whether it should trigger GC) is still not sound for implementing critical sections. For example,

Mutator1 attempted to allocate a large object and triggered GC. Mutator1 blocked itself for GC.
A GC thread responded and asked the VM binding if it was OK to do GC.
The VM binding answered yes.
The GC thread started GC, and was just about to stop all mutators.
Mutator2 entered a critical section, and native C code started reading from an array in a MMTk heap object.
Mutator2 executed a yieldpoint and yielded.
GC started tracing, and moved the heap object.

For critical sections, Julia has the notion of GC state. Essentially, if Mutator2 enters a critical section, Mutator1 should wait (when it's waiting for the world to stop) until Mutator2 leaves the critical section.

@wks
Copy link
Collaborator

wks commented Jan 25, 2024

Currently, enable_collection is primarily used for postponing GC to the moment when GC is safe. For example, the VM needs to initialize type info metadata (for Java, that include the Klass instances for the most primitive types such as Object; for Ruby, that include the global tables for strings, symbols, finalizers, etc.) before it can scan any object. For this use, it is sufficient to let all operations that can possibly trigger GC happen after the enable_collection. The current API says "enable_collection is not thread-safe". It means the VM needs to use its own synchronization mechanisms to ensure no "potential GC-triggering operations" happen concurrently (neither happen before nor happen after) with enable_collection. I think this semantics is fine because VM initialization is usually done in one thread (i.e. before other mutator threads are spawn).

If disable_collection cannot be used for critical sections, I don't know where it can be useful. Maybe some languages allow the user to disable GC (e.g. PHP famously do it so that some programs run 100% faster).

As long as mmtk-core doesn't need to enforce any sychronization between mutators, this PR should be OK. Specifically,

  1. It made clear in the doc of is_collection_disabled that it is called when allocating and when VM requests a GC. That is, the VM developers know when and in which thread it is called.
  2. The default behavior is "always allowed" which is intuitive, but I guess most VMs will need to override it.

And the synchronization is left to the VM. The VM can duplicate the current behavior by adding an atomic boolean variable in the binding.

And it is also up to the VM to define the semantics of disabling GC. For CRuby, the semantics of GC::disable is vague w.r.t. the behavior of multiple threads.

disable → true or false

Disables garbage collection, returning true if garbage collection was already disabled.

But since CRuby uses global VM lock, the execution within one Ractor should be sequentially consistent by definition (because it is sequential).

@udesou
Copy link
Contributor Author

udesou commented Jan 28, 2024

Currently, enable_collection is primarily used for postponing GC to the moment when GC is safe. For example, the VM needs to initialize type info metadata (for Java, that include the Klass instances for the most primitive types such as Object; for Ruby, that include the global tables for strings, symbols, finalizers, etc.) before it can scan any object. For this use, it is sufficient to let all operations that can possibly trigger GC happen after the enable_collection. The current API says "enable_collection is not thread-safe". It means the VM needs to use its own synchronization mechanisms to ensure no "potential GC-triggering operations" happen concurrently (neither happen before nor happen after) with enable_collection. I think this semantics is fine because VM initialization is usually done in one thread (i.e. before other mutator threads are spawn).

I'm a bit confused. Are you referring to initialize_collection? Does Ruby actually use enable_collection and disable_collection?

If disable_collection cannot be used for critical sections, I don't know where it can be useful. Maybe some languages allow the user to disable GC (e.g. PHP famously do it so that some programs run 100% faster).

We can ask the Julia folks why they have it, but the semantics as far as I understand is that you may continue allocating (beyond any heap limit) and it will not trigger GC. Essentially disabling any GC pooling that may happen, including the case when the user asks for a GC.

As long as mmtk-core doesn't need to enforce any sychronization between mutators, this PR should be OK. Specifically,

  1. It made clear in the doc of is_collection_disabled that it is called when allocating and when VM requests a GC. That is, the VM developers know when and in which thread it is called.
  2. The default behavior is "always allowed" which is intuitive, but I guess most VMs will need to override it.

I'll try to improve the documentation to include (1). For (2), it means that bindings for VMs that do not have such "feature" don't need to change anything. Again, if we're talking about enable_collection and not initialize_collection, the latter should not be impacted by this change, I hope.

@wks
Copy link
Collaborator

wks commented Jan 29, 2024

I'm a bit confused. Are you referring to initialize_collection? Does Ruby actually use enable_collection and disable_collection?

Both. Ruby currently uses both initialize_collection, enable_collection and disable_collection during VM start-up. The current code is something like:

        // The threading system is ready.  Initialize collection.
        mmtk_initialize_collection(th);
        // Bind the main thread.
        rb_mmtk_bind_mutator(th);
        // Temporarily disable GC.
        mmtk_disable_collection();

// After lots of initialization, in another function

        if (rb_mmtk_enabled_p()) {
            // Now that the VM says it's time to enable GC, we enable GC for MMTk, too.
            mmtk_enable_collection();
        }

Given that the only thing initialize_collection (currently) does is spawning GC threads, I think it is equivalent to postpone the initialize_collection to the place of enable_collection, and remove both the disable_collection and enable_collection above. "Having GC threads but they cannot do GC" is equivalent to "not having GC threads at all". (Update: They are not equivalent. If initialize_collection is not called, but GC is enabled (by default), then alloc will trigger GC and wait for GC to finish, which will never happen because GC threads are not spawn yet.)

@udesou udesou requested a review from qinsoon January 30, 2024 05:08
@qinsoon
Copy link
Member

qinsoon commented Jan 30, 2024

Is this PR ready for review? I would prefer @wks to review this. I will remove myself from the reviewer list.

@qinsoon qinsoon removed their request for review January 30, 2024 05:13
@udesou udesou marked this pull request as ready for review January 30, 2024 05:14
@qinsoon qinsoon added the PR-extended-testing Run extended tests for the pull request label Jan 30, 2024
src/vm/collection.rs Outdated Show resolved Hide resolved
benches/alloc.rs Outdated Show resolved Hide resolved
src/vm/collection.rs Outdated Show resolved Hide resolved
@wks
Copy link
Collaborator

wks commented Jan 30, 2024

Here is a PR for the Ruby binding: mmtk/mmtk-ruby#52

@qinsoon
Copy link
Member

qinsoon commented Feb 1, 2024

binding-refs
JULIA_BINDING_REPO=udesou/mmtk-julia
JULIA_BINDING_REF=feature/check-gc-disabled
RUBY_BINDING_REPO=wks/mmtk-ruby
RUBY_BINDING_REF=feature/check-gc-disabled

@qinsoon
Copy link
Member

qinsoon commented Feb 1, 2024

The merge check somehow failed, which is strange. It seems observing extended tests from previous runs (not the most recent runs).

[2024-02-01 01:57:33.902814] wait.DEBUG: Check "extended-tests-ruby (release)" has the following status "completed" and conclusion "failure"
[2024-02-01 01:57:33.902824] wait.DEBUG: Check (extended-tests-ruby (release)) failed, marking resolve and failure
[2024-02-01 01:57:33.902834] wait.DEBUG: Check "extended-tests-ruby (debug)" has the following status "completed" and conclusion "cancelled"
[2024-02-01 01:57:33.902843] wait.DEBUG: Check (extended-tests-ruby (debug)) failed, marking resolve and failure
[2024-02-01 01:57:33.902853] wait.DEBUG: Check "extended-tests-jikesrvm" has the following status "completed" and conclusion "failure"
[2024-02-01 01:57:33.902862] wait.DEBUG: Check (extended-tests-jikesrvm) failed, marking resolve and failure
[2024-02-01 01:57:33.902871] wait.DEBUG: Check "extended-tests-julia" has the following status "completed" and conclusion "failure"
[2024-02-01 01:57:33.902881] wait.DEBUG: Check (extended-tests-julia) failed, marking resolve and failure

Those tests passed in the most recent run.

@udesou udesou requested a review from wks February 1, 2024 23:05
@udesou
Copy link
Contributor Author

udesou commented Feb 1, 2024

@wks I've just re-requested a review in case you have further comments.

@qinsoon
Copy link
Member

qinsoon commented Feb 2, 2024

Update: They are not equivalent. If initialize_collection is not called, but GC is enabled (by default), then alloc will trigger GC and wait for GC to finish, which will never happen because GC threads are not spawn yet.

Did you observe this? If so, it sounds like a bug. Without initialize_collection, MMTk does not set intialized to true, and should not trigger a GC at all.

src/vm/collection.rs Outdated Show resolved Hide resolved
src/vm/collection.rs Outdated Show resolved Hide resolved
@wks
Copy link
Collaborator

wks commented Feb 2, 2024

Update: They are not equivalent. If initialize_collection is not called, but GC is enabled (by default), then alloc will trigger GC and wait for GC to finish, which will never happen because GC threads are not spawn yet.

Did you observe this? If so, it sounds like a bug. Without initialize_collection, MMTk does not set intialized to true, and should not trigger a GC at all.

No. This is purely hypothetical.

I see that Space::acquire has this statement:

        // Is a GC allowed here? If we should poll but are not allowed to poll, we will panic.
        // initialize_collection() has to be called so we know GC is initialized.
        let allow_gc = should_poll && self.common().global_state.is_initialized();

And this:

        if should_poll && self.get_gc_trigger().poll(false, Some(self.as_space())) {
            debug!("Collection required");
            assert!(allow_gc, "GC is not allowed here: collection is not initialized (did you call initialize_collection()?).");
            // ...
            VM::VMCollection::block_for_gc(VMMutatorThread(tls)); // We have checked that this is mutator
            unsafe { Address::zero() }

So if initial_collection() is not called, but poll says it should do GC, then it will panic.

It looks like the actual contract of initialize_collection is that it not only spawns GC threads, but it is also a precondition for allocating memory. (p.s. This is fine for forking because the forked process (both parent and child) should immediately call initialize_collection again after forking, before making any allocation.)

Copy link
Collaborator

@wks wks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wks
Copy link
Collaborator

wks commented Feb 2, 2024

The "Merge Check" failed for the reason of "Status checks failed". That's strange.

@qinsoon
Copy link
Member

qinsoon commented Feb 2, 2024

The "Merge Check" failed for the reason of "Status checks failed". That's strange.

I think the first attempt to run Julia tests failed, which caused the merge check fail. When we reran the failed tests (Julia tests), the merge check did not get a rerun and its status won't change.

@udesou udesou added this pull request to the merge queue Feb 7, 2024
Merged via the queue into mmtk:master with commit 7cafe56 Feb 7, 2024
22 of 23 checks passed
@udesou udesou deleted the feature/check-gc-disabled branch February 7, 2024 04:03
mmtkgc-bot added a commit to mmtk/mmtk-ruby that referenced this pull request Feb 7, 2024
Upstream PR: mmtk/mmtk-core#1075

---------

Co-authored-by: Eduardo Souza <[email protected]>
Co-authored-by: mmtkgc-bot <[email protected]>
mmtkgc-bot added a commit to mmtk/mmtk-julia that referenced this pull request Feb 7, 2024
Upstream PR: mmtk/mmtk-core#1075
Julia PR: mmtk/julia#35

---------

Co-authored-by: mmtkgc-bot <[email protected]>
mergify bot pushed a commit to mmtk/mmtk-julia that referenced this pull request Feb 7, 2024
Upstream PR: mmtk/mmtk-core#1075
Julia PR: mmtk/julia#35

---------

Co-authored-by: mmtkgc-bot <[email protected]>
(cherry picked from commit eac7e88)

# Conflicts:
#	mmtk/Cargo.lock
#	mmtk/Cargo.toml
#	mmtk/api/mmtk.h
#	mmtk/src/api.rs
#	mmtk/src/collection.rs
mergify bot pushed a commit to mmtk/mmtk-julia that referenced this pull request Feb 7, 2024
Upstream PR: mmtk/mmtk-core#1075
Julia PR: mmtk/julia#35

---------

Co-authored-by: mmtkgc-bot <[email protected]>
(cherry picked from commit eac7e88)

# Conflicts:
#	mmtk/Cargo.lock
#	mmtk/Cargo.toml
#	mmtk/src/collection.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR-extended-testing Run extended tests for the pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants