Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow nursery size to be proportional to the heap size #1087

Merged
merged 13 commits into from
Apr 17, 2024

Conversation

qinsoon
Copy link
Member

@qinsoon qinsoon commented Feb 22, 2024

This PR introduces different kinds of nursery size options, and by default, we use a proportion of the heap size as the min nursery. This PR should generally improve the generational plans' performance by triggering a full heap GC more promptly. This PR mitigates the issue identified in #594, but does not fully fix the problem.

@qinsoon
Copy link
Member Author

qinsoon commented Feb 22, 2024

/// Return upper bound of the nursery size (in number of bytes)
    pub fn get_max_nursery_bytes(&self) -> usize {
        self.nursery.max.unwrap_or_else(|| {
            if !vm_layout().force_use_contiguous_spaces {
                DEFAULT_MAX_NURSERY_32
            } else {
                DEFAULT_MAX_NURSERY
            }
        })
    }

This check is removed. I am not sure about the consequences. Maybe I should change the default value for max nursery, depending on the VM layout.

@qinsoon
Copy link
Member Author

qinsoon commented Feb 22, 2024

I quickly tried h2o, for which generational Immix plans were reported slower than full heap Immix in #594 (comment). By changing the min nursery size, generational immix can perform better than Immix. I will do more thorough performance evaluation before making this PR ready for review.

@qinsoon
Copy link
Member Author

qinsoon commented Mar 5, 2024

The performance results for now (using 2.5x minheap from temurin-21-G1-n-5):

GenImmix

Normalized to Immix/master
genix
plotty

Normalized to GenImmix/master
genix-normalized-to-master
plotty

StickyImmix

Normalized to Immix/master
stickyix
plotty

Normalized to StickyImmix/master
stickyix-normalized-to-master
plotty

Issues I will look into

  • jython and tomcat shows slowdown for GenImmix
  • h2, jython and lusearch show slowdown for StickyImmix

Issues I am not going to look into

  • Gen immix and sticky immix is slower than immix (the two issues above may contribute to this issue. But if those are resolved and gen/sticky immix are still slower, I do not plan to look further into the issues.)
  • Immix and sticky immix ran into OOM for tomcat.
  • Immix ran into OOM for h2.

@k-sareen
Copy link
Collaborator

k-sareen commented Mar 5, 2024

Hmm. {Gen,Sticky}Immix are still really bad in comparison to Immix for lusearch. That doesn't make any sense. Looking back at my old results, it seems like yeah, that was the case back then as well.

@qinsoon
Copy link
Member Author

qinsoon commented Apr 3, 2024

I fixed a few mistakes based on the previous evaluation. The following is the new results. I ran with 2.5x of the G1 heap size (temurin-21-G1-n-5) with 5 iterations.

The results show that the PR is an improvement for both GenImmix and StickyImmix in most benchmarks, 8% improvement for GenImmix and 3% improvement for StickyImmix (using 0.25x heap size as the min nursery). However, both plans are still performing worse than Immix (2% and 1.5% slower, respectively). There are still other performance issues with our generational plans that should be looked into.

The results also show slowdown in tomcat. The G1 min heap for tomcat is 19M, and the min heap for MMTk is 42M (GenImmix) and 59M (Immix) in my measurement. So with 2.5x of the G1 min heap, we are almost running at the min heap for Immix plans. With a proportion of the heap size as the min nursery, we always trigger full heap GCs for tomcat, while with a fixed 2M min nursery (as in master), we may occasionally do nursery GCs. However, this is a separate issue about MMTk's min heap with OpenJDK.

GenImmix

min-nursery-genix

plotty

StickyImmix

min-nursery-stickyix

plotty

@qinsoon qinsoon marked this pull request as ready for review April 3, 2024 05:07
@qinsoon qinsoon requested review from wks and k-sareen April 3, 2024 05:07
max: usize,
},
/// A bounded nursery that is porportional to the current heap size. By default, a proportional bounded
// nursery has a lower bound of 20% of the heap size, and has an upper bound of 100% of the heap size.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// nursery has a lower bound of 20% of the heap size, and has an upper bound of 100% of the heap size.
// nursery has a lower bound of 25% of the heap size, and has an upper bound of 100% of the heap size.

/// The default min nursery size. This does not affect the actual space we create as nursery. It is
/// only used in the GC trigger check.
#[cfg(target_pointer_width = "32")]
pub const DEFAULT_MIN_NURSERY: usize = 2 << LOG_BYTES_IN_MBYTE;
const DEFAULT_MAX_NURSERY_32: usize = 32 << LOG_BYTES_IN_MBYTE;
/// The default max nursery size for 32 bits.
pub const DEFAULT_MAX_NURSERY_32: usize = 32 << LOG_BYTES_IN_MBYTE;
Copy link
Collaborator

@k-sareen k-sareen Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These numbers were based off of JikesRVM and an older DaCapo benchmark suite. I'm not sure if they reflect good defaults for the larger workloads + compressed pointers would also use these defaults instead of the 64-bit one (which might explain the performance issues you saw).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah wait -- this was a bit confusing: for estimate_virtual_memory_in_bytes, if force_use_contiguous_space is false (which it is for compressed pointers and 32-bit VMs), then 32 MB is used as the max nursery size when creating the space. But then in the above code in GCTrigger the DEFAULT_MAX_NURSERY is used which is much larger than 32 MB for 64-bit VMs with compressed pointers.

src/util/options.rs Outdated Show resolved Hide resolved
src/util/heap/gc_trigger.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@wks wks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One major problem is the virtual memory size of the nursery for confined address space. On 32-bit systems, or 64-bit systems with compressed oops, the nurser of GenCopy and GenImmix occupies a contiguous range separated from discontiguous range shared by other spaces. The size is determined by estimate_virtual_memory_in_bytes, and it returns 32MB on systems with confined address space. This size may be smaller than the desired nursery size, regardless whether the nursery size is fixed, bounded or proportionally bounded.

I think the 32MB limitation is only relevant if the nursery is contiguous and the address space is confined.

  • If the nursery can be discontiguous (this is possible in theory, but I don't know why we didn't implement that. But the conceptual nursery for StickyImmix is indeed discontiguous.), we can just let it share the discontiguous address range (managed by FreeList) with other spaces.
  • If the address space is not confined, the code beneath VMRequest::fixed_extent will ignore the size and use 1TB instead (in VMRequest::common64bit).

I don't think we want to make the nursery discontiguous for now, because that may require changes to many different places, but not impossible. And it may affect performance.

The easiest way to make the design consistent is making two explicit rules (at least temporarily) that

  1. The nursery of GenCopy and GenImmix is always contiguous, and
  2. If the address space is confined, the nursery of GenCopy and GenImmix can not be larger than 32MB, regardless whether the nursery size is fixed, bounded or proportional.

We can document the second rule on Options::nursery. This may not be ideal, but we don't need to change the 32MB size limit for now.

Alternatively, instead of making 32MB a constant, we can let the VM configure it using VMLayout.

A slightly more elegant but more difficult way to fix it is making use of VMRequest::fraction which is currently unused in the Rust MMTk, and probably needs testing.

src/util/heap/gc_trigger.rs Outdated Show resolved Hide resolved
src/util/options.rs Outdated Show resolved Hide resolved
args.global_args
.options
.nursery
.estimate_virtual_memory_in_bytes(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may make use of VMRequest::fraction if it is appropriate.

// Just use the default max nursery size -- the nursery won't get larger than that.
NurserySize::ProportionalBounded { min: _, max: _ } => {
if !crate::util::heap::vm_layout::vm_layout().force_use_contiguous_spaces {
DEFAULT_MAX_NURSERY_32
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the space is confined (not using contiguous spaces), the return value will be DEFAULT_MAX_NURSERY_32 which is 32MB. But the Options::nursery can be set to a larger proportion and can be larger than this. As @k-sareen commented in #1087 (comment), GCTrigger::get_max_nursery_bytes may return a larger size.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. You are right. I have changed this. I added a check: if we do not have pressure about virtual memory, the virtual memory for nursery can be as large as the max nursery (even if the max nursery is 100% heap size). Otherwise, it just panics.

Let me know what you think. @wks @k-sareen

I will rerun the benchmarks for this change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Panicking is pretty drastic, but if we don't have a nice way of saying "the rest of the heap" then maybe for the time being it is fine. We should file an issue though. Also is there any way to test for a 32-bit architecture? The JikesRVM binding doesn't support generational collectors and the MockVM doesn't stress test the GC much.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be better to panic instead of running with an unexpected value silently -- we could change this if the panic turns out to be an issue.

What do you think we should test? The OpenJDK uses compressed pointer which tests most of the implementation for 32bits layout. I also added two unit tests that check the virtual memory we set for nursery.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I mainly wanted to test the virtual memory and ensuring that full-heap GCs happen more promptly. I guess OpenJDK with compressed pointers and the unit tests are good enough.

Copy link
Collaborator

@k-sareen k-sareen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. But we should make an issue for resolving the virtual memory issue. Also did you get the performance results in the end?

@qinsoon
Copy link
Member Author

qinsoon commented Apr 10, 2024

LGTM. But we should make an issue for resolving the virtual memory issue. Also did you get the performance results in the end?

I am running the benchmarks. I will post the results once I got it.

@qinsoon
Copy link
Member Author

qinsoon commented Apr 10, 2024

GenImmix

min-nursery-genix-fix
plotty

StickyImmix

min-nursery-stickyix-fix
plotty

The improvement for GenImmix is 9.5% and the improvement for Stickyimmix is 3.1% (compared to the 8% and 3% improvement in the previous evaluation). However, one issue is that h2 panics in the run, as it sets the heap size as 1700M, and we cannot guarantee another 1700M virtual memory for the 100% max nursery.

If the nursery size is set explicitly, I think panicking may be a good idea when we cannot satisfy the virtual memory for the nursery. But when we use the default nursery size, we would expect it run, rather than panicking.

Performance-wise, I think this PR achieved what we want.

@wks
Copy link
Collaborator

wks commented Apr 11, 2024

... But when we use the default nursery size, we would expect it run, rather than panicking.

Yes. If the user doesn't specify a nursery size, we shouldn't blame the user for providing a bad value. One way to solve this is to provide another option "force_nursery_size" which defaults to false. When true, it will panic if the give nursery size cannot be satisfied; when false, MMTk core will adjust the size to the nearest possible value.

@qinsoon
Copy link
Member Author

qinsoon commented Apr 11, 2024

... But when we use the default nursery size, we would expect it run, rather than panicking.

Yes. If the user doesn't specify a nursery size, we shouldn't blame the user for providing a bad value. One way to solve this is to provide another option "force_nursery_size" which defaults to false. When true, it will panic if the give nursery size cannot be satisfied; when false, MMTk core will adjust the size to the nearest possible value.

I am thinking about adding NurserySize::Adaptive and use that as the default value. It tries to use [25%, 100%) heap size as nursery, and when it is not possible, it will be adaptive and use whatever that works (e.g. use a smaller extent for nursery virtual memory, or use discontiguous nursery).

@k-sareen
Copy link
Collaborator

The performance looks good (better than master for most things) but still not much better than Immix which is odd. But this PR does not need to solve all the issues at once.

I agree with @wks and your Adaptive idea.

@qinsoon
Copy link
Member Author

qinsoon commented Apr 12, 2024

We checked with Steve. The reason that nursery was using a contiguous space is that we can do range check for mature objects in barriers. However, this is not important for now. I changed the code to use discontiguous nursery. I am running another round of evaluation.

@qinsoon
Copy link
Member Author

qinsoon commented Apr 16, 2024

The results with a discontiguous nursery.

GenImmix

min-nursery-genix-discontig
plotty

StickyImmix

min-nursery-stickyix-discontig
plotty

The results are pretty similar to the previous run (with contiguous nursery), but slower. The improvement is 8.7% for GenImmix and 2.4% for StickyImmix, compared with 9.5% and 3.1% in the previous run. There is a consistent 0.8%/0.7% difference.

The PR currently uses discontiguous nursery. But I can change it to contiguous nursery, and only use discontiguous nursery if we may run out of virtual address. Just let me know if that's desired.

@qinsoon
Copy link
Member Author

qinsoon commented Apr 16, 2024

@wks @k-sareen Can you review this PR again if you think it is necessary? I would like to get issues resolved for this PR, and get it merged.

Copy link
Collaborator

@wks wks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qinsoon qinsoon added the PR-extended-testing Run extended tests for the pull request label Apr 16, 2024
@qinsoon qinsoon added this pull request to the merge queue Apr 17, 2024
Merged via the queue into mmtk:master with commit 79cfcb6 Apr 17, 2024
29 of 30 checks passed
@qinsoon qinsoon deleted the proportional-min-nursery branch April 17, 2024 01:15
k-sareen pushed a commit to k-sareen/mmtk-core that referenced this pull request Sep 17, 2024
This PR introduces different kinds of nursery size options, and by
default, we use a proportion of the heap size as the min nursery. This
PR should generally improve the generational plans' performance by
triggering a full heap GC more promptly. This PR mitigates the issue
identified in mmtk#594, but does not
fully fix the problem.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR-extended-testing Run extended tests for the pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants