-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let the coordinator thread open buckets #782
Conversation
I ran |
That counter is used for synchronization, so we move it to WorkerMonitorSync and make it a plain usize instead of AtomicUsize.
Use pub instead of pub(crate) for the methods if the struct is already pub(crate).
work buckets are opened based on only one assumption that all the previous buckets are empty, including all the worker local queues. The only reason it want all the workers to reach the parked state is to check if all the local queues are empty. So it is corrected to open more buckets as long as all the workers reached the "all parked" state once. It probably doesn't matter that some workers spuriously wake up before a new bucket is open. Same as work bucket close condition. As long as all the buckets are empty, and all the workers reached the "all parked" state once, it is correct to finish the current GC and close all the buckets. This makes me wondering that probably this assertion is not strictly correct. I think it's okay for workers to poll for more packets, or even try to open more buckets when the coordinator thread is doing other jobs -- for example when doing |
This does not sound 100% right. Let's clarify one thing: if a worker is executing a packet, the packet is not in any bucket, and it is not in any local queue, right? Under this definition, If all the active buckets are empty but a worker is executing a work packet, we still cannot open new buckets, as the worker that is executing a work packet may generate more work packets.
This makes sense if all the packets are executed by the workers. But what about the coordinator? A coordinator work also belongs to a bucket. If an coordinator is executing a packet, we should not open new buckets, right? The coordinator may generate new work as well. |
For this case, workers will not reach the "all parked" state. The above assumption still holds.
No I don't think this is true. coordinator work never goes to the bucket's queue. There is a separate message channel for that.
This is probably not true as well. For example, when the coordinator thread is stopping mutators, there are some stacks can already be scanned before all the mutators are yielded. We did not implement this feature for now, but in theory some other workers should be able to scan these stacks when the coordinator is still stopping mutators. Another example is that when stopping mutators, workers can already start zeroing mark tables. It's totally safe and correct to do so.
Yes. But given your stronger assumption that you can't open buckets when coordinator is doing some jobs, these new work can never have a chance to be executed early in parallel. |
I'm curious whether there are any real correctness issues or crashes after removing these strong assertions. If so, that means the current worker synchronization mechanism is broken. Otherwise, it is probably less efficient to enforce a strong sequential order between coordinator packets and normal work packets. |
What exactly do you mean by 'strong assertions'? If you meant 'we cannot open new buckets if we have |
I was talking about this one [1] But probably there are others (I just noticed one more below)
When doing Even if the workers see all the active buckets are empty, it still cannot open buckets because we already have a check here [3] and prevent buckets from opening before pending coordinator work goes to zero.
[1] https://github.com/mmtk/mmtk-core/blob/master/src/scheduler/scheduler.rs#L242-L245 |
If we don't have checks for The next problem is that even with the check for Generally I think it is troublesome to allow every worker to open new buckets and also require them to coordinate with the coordinator. It is way simpler to just let the coordinator to do this kind of global work like opening new buckets. |
The first STW bucket can only be open at the end of the coordinator After the coordinator Also, under the current implementation, the coordinator
As mentioned before, this assertion is probably incorrect. The assumption in this assertion is too strong. The Yeah you can argue that not having a sequential order between coordinator and worker packets can make things complicated. But my guess is that relaxing the order like this probably will not bring more real correctness issues. We already have a lot of phase-like buckets and synchronization barriers, and making coordinator and workers non-overlapping will bring more single-threaded "coordinator" phases. |
Well the current work packet system is problematic. But in the original implementation, the coordinator thread can only receive four signals:
Note the signal (3) is a part of the things this PR want to do -- don't make workers do too much things other than executing packets. This was removed lately in the work stealing PR because on a machine with more cores (say 32 cores), it's already expensive to do a synchronization between buckets. It will be more expensive to communicate with one more thread to do one more synchronization to open buckets. Also, I believe signal (2) can now be moved back to the normal workers as well. We have a VM companion thread in the openjdk binding to trigger safepoints now. If we remove (2) and (3) out of the coordinator, the coordinator can be greatly simplified: it can only initiate a GC, and finish a GC. We don't need to care about most of the synchronization issues between coordinator and workers anymore. Given that (2) and (3) can be removed, the coordinator message queue and pending work counter can be removed as well. We only need the coordinator to have three states, and it triggers a GC when the state transits from |
This is not right. If mmtk-core/src/scheduler/gc_work.rs Lines 179 to 183 in f8469b0
|
This is still not a problem. No worker can open the first stw bucket. Thus no worker can open any buckets, until stop mutators work is finished. |
During the execution of mmtk-core/src/scheduler/gc_work.rs Line 191 in f8469b0
After this, a worker can open new buckets if our defined requirement is met. At this point, the coordinator may be still executing Checking |
With the current If you are really worrying about this issue, a simpler solution would be taking
See my other proposal above. I believe all of the jobs can be moved out of the coordinator, apart from initiating a GC and ending a GC. If you think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Once Wenyu approves this PR, we can merge it.
use super::*; | ||
|
||
/// A one-way channel for workers to send coordinator packets and notifications to the controller. | ||
struct Channel<VM: VMBinding> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this different from a standard multi-producer single-consumer channel (https://doc.rust-lang.org/std/sync/mpsc/fn.channel.html)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The differences are
- The message "all workers have parked" is idempotent, i.e. setting the
all_workers_parked
flag twice is the same as setting it once. It avoids the previous problem of sending multipleCoordinatorMessage::Finish
messages in a row, which forced the coordinator to "consume" extraneousFinish
messages after GC. - The coordinator always processes all pending coordinator packets before processing the
all_workers_parked
flag. In this way, the coordinator is sure that no "pending coordinator packets" exist when opening new buckets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
This reverts commit 8dd1b73.
Note: This PR is not intended to do any full-scale refactoring on the work packet framework as described in #774 Instead, this PR tries to fix #778 cleanly while not changing the overall structure.
The main problem is that
Condvar::wait
is allowed to spuriously wake up at any time even without another thread notifying it. Currently, work buckets are opened and closed under the assumption that all GC workers have parked. Due to spurious wake-ups, any GC worker can wake up at any time, and break the assumption.This PR makes the following changes:
Only the coordinator can open more buckets (with the exception of
Prepare
. See the "won't fix" section below.).When all workers have parked, they notify the coordinator that all workers have parked, and wait for the coordinator to open buckets.
When all workers have parked, a boolean flag
WorkerMonitor::group_sleep
is set. No GC workers can unpark until this flag is explicitly cleared by the coordinator.group_sleep
flag still set, and will wait again.To implement those changes, some data structures are modified.
mpsc::Channel
to a customscheduler::controller::channel::Channel
.WorkerMonitor
struct as the anything-to-workers notification channel.Mutex
andCondvar
so that the coordinator or other workers can notify workers.parked_workers
counter.WorkerGroup
is no longer responsible for recording parked workers.group_sleep
boolean flag.pending_coordinator_packets
counter.Known issue
There are two boolean flags that records the "all workers have parked" condition.
controller::channel::ChannelSync::all_workers_parked
.WorkerMonitorSync::group_sleep
.Currently, the former is used to notify the coordinator, while the latter is used to wake up workers. It may be possible to merge them.
However, as a consequence, if we merge them, both the coordinator and the workers will be waiting on the same mutex that guards that flag. That would mean changing the channels from one worker-to-coordinator channel and an anything-to-worker notifier to one single synchronisation entity guarded by one mutex (but may have multiple condvars). That will be a larger-scale change. On the contrary, both the worker monitor and the worker-to-coordinator channel already existed, except that the former was implemented as a tuple of
(Mutex<()>, Condvar)
while the latter was anmpsc::Channel
. Therefore, it is easier (and requires less changes) to keep them separate.Requiring the coordinator to open new buckets adds a tiny context-switching overhead, but this overhead is small compared to the load balance problem.
Update::
On machines with many cores, the synchronisation overhead may be greater.
This problem mainly affects work buckets with a single small work packet. One example is the
VMRefClosure
bucket. It has one packet that callsScanning::process_weak_refs
which does nothing by default. Previously, executing such a work packet has no observable cost. But with this PR, it will involve a context switch from the coordinator to a worker that executes this no-op packet, and another context switch from the worker to the coordinator to open more buckets. This mainly affects VMs that don't use theVMRefClosure
bucket (such as OpenJDK which uses the legacy{Soft,Weak,Phantom}Closure
buckets).This implies that the cost is not negligible in scale, and this PR should only be a temporary workaround before we do a full-scale redesign of the work packet system.
Is there a better name for "group sleeping"? It should describe a state that once the group of threads entered that state, they cannot get out of that state by themselves, and must be release by an external object (coordinator).
Won't fix
The
Prepare
bucket is opened by theStopMutators
work packet which may be executed by a worker (if the binding configured so).In theory, we can just let the coordinator open the first STW bucket (
Prepare
). From the log, theGCWorkScheduler::notify_mutator_paused
method is called near the end of theStopMutators
work packet. Even in the best case, thePrepare
work packet only starts several microseconds before theStopMutators
work packet finishes. So the synchronisation overhead should be small.I (Kunshan) tried implementing this. When all mutators stopped, the
StopMutators
packet sends a message to the coordinator. The coordinator, when received that message, opens thePrepare
bucket. However, it introduced another synchronisation bug.If the coordinator and the workers are executed in the following order, the coordinator will observe a strange state that all workers parked, but some open buckets still contain packets.
StopMutators
packet. When mutators paused, it sent a "mutators stopped" message to the coordinator.StopMutators
packet, and found no more packets in open buckets (`Prepare is not open yet.). The worker parked, and it found it is the last parked worker. Just at this moment, the OS performed a context switch.Prepare
bucket. The coordinator tried to clear the "group-sleep" flag and notify the workers, but workers were not in the "group-sleep" state, yet (the worker mentioned was about to wait, but had not started waiting). This clearing and notifying had no effect.Prepare
bucket during the context switch). Then the worker set the "group-sleep" flag and sent a "all workers parked" message to the coordinator, and started waiting for the "group-sleep" flag to be cleared.assert!(self.scheduler.all_activated_buckets_are_empty());
, which fails because thePrepare
bucket is not empty.So is that assertion unreasonable? I feel reluctant to remove that assertion because that's quite a fundamental condition for opening new buckets. All buckets (except "Prepare") can only be open when all active buckets are drained.
In another execution sequence, the coordinator will observe "all workers are parked" when all workers are actually active.
StopMutators
packet, and found no more packets in open buckets. It parked, found it was the last parked worker. Then it set the "group-sleep" flag, sent a "all workers parked" message to the coordinator, and started waiting for the "group-sleep" flag to be cleared.Prepare
bucket. It cleared the "group-sleep" flag and notified the workers.Prepare
bucket.I think the fundamental problem is that buckets can be opened without holding any lock. The
WorkBucket::activate()
method works by setting anAtomicBool
flag (WorkBucket::active
) to true. Because of this, one thread (the coordinator) can open a bucket while another thread (a worker) peek into the bucket to see if it is open and has any packets.A thorough fix should employ the technique described in #546, i.e. let workers poll packets from one single queue, while buckets are just temporary places to hold packets when the buckets are not open.
I think I should not fix it in this PR because this PR aims to be a small-scale fix. The current code, i.e. letting the
StopMutators
packet callGCWorkScheduler::notify_mutators_paused
and open thePrepare
bucket directly, works fine without synchronisation issues.