Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load-balance problem in ProcessRegionModBuf #797

Open
wks opened this issue Apr 27, 2023 · 2 comments
Open

Load-balance problem in ProcessRegionModBuf #797

wks opened this issue Apr 27, 2023 · 2 comments
Labels
C-bug Category: Bug P-high Priority: High. A high-priority issue should be fixed as soon as possible.

Comments

@wks
Copy link
Collaborator

wks commented Apr 27, 2023

The following is the current implementation of the ProcessRegionModBuf work packet:

impl<E: ProcessEdgesWork> GCWork<E::VM> for ProcessRegionModBuf<E> {
    fn do_work(&mut self, worker: &mut GCWorker<E::VM>, mmtk: &'static MMTK<E::VM>) {
        // Scan modbuf only if the current GC is a nursery GC
        if mmtk.plan.generational().unwrap().is_current_gc_nursery() {
            // Collect all the entries in all the slices
            let mut edges = vec![];  // NOTE: one single vector.
            for slice in &self.modbuf {
                for edge in slice.iter_edges() {
                    edges.push(edge);
                }
            }
            // Forward entries
            GCWork::do_work(&mut E::new(edges, false, mmtk), worker, mmtk)
        }
    }
}

This two-level for-loop packs all edges in all slices in self.modbuf into one single Vec, and create one single E: ProcessEdgesWork instance from it. The size of each slice is not limited. In extreme cases, there can be more than four million edges from all slices (in the jython benchmark in the DaCapo Chopin benchmark suite).

We should split those edges into smaller vectors and make multiple ProcessEdgesWork instances so that we can parallelise the processing using multiple GC workers.

Update: It is also advisable to create a ProcessMemorySliceWork as a complement to the ProcessEdgesWork. Each ProcessMemorySliceWork can hold one or more slice to be processed. This basically makes the iteration of MemorySlice lazy (postponing it to the dedicated work packet), and can avoid the need to unpack a slice into a vector of edges just for creating ProcessEdgesWork instance. ProcessMemorySliceWork can be implemented by wrapping a ProcessEdgesWork inside, and call ProcessEdgesWork::process_edge for each edge in the MemorySlice.

@k-sareen
Copy link
Collaborator

k-sareen commented May 2, 2023

Why are we not enforcing the max packet size with assertions or something?

@wks
Copy link
Collaborator Author

wks commented May 3, 2023

Why are we not enforcing the max packet size with assertions or something?

Well, that's a good idea. I previously consider the max packet size as a hint rather than a strict rule, and we don't consider it a violation if we sometimes add more items to the packet. But it should be OK if we enforce the max packet size so that the packet size remains bounded.

@k-sareen k-sareen added C-bug Category: Bug P-high Priority: High. A high-priority issue should be fixed as soon as possible. labels Nov 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: Bug P-high Priority: High. A high-priority issue should be fixed as soon as possible.
Projects
None yet
Development

No branches or pull requests

2 participants