-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clear side forwarding bits properly #1138
Conversation
When using StickyImmix, we should clear side forwarding bits in the ImmixSpace in each GC, too. We move it from the start of major GC to the end of each GC so that we don't add additional preparation work if it is not nursery GC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you run benchmarks for Immix and SemiSpace? We should expect no perf difference for SemiSpace, and speedup/no diff for Immix, right?
src/policy/immix/immixspace.rs
Outdated
// In the beginning of the next GC, no side forwarding bits shall be set. | ||
if let MetadataSpec::OnSide(side) = *VM::VMObjectModel::LOCAL_FORWARDING_BITS_SPEC { | ||
if is_moving_gc { | ||
let objects_may_move = !is_defrag_gc || block.is_defrag_source(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably add some comments here to make it more clear: if this is a defrag GC, we only need to clear forwarding bits for defrag source blocks; if this is a moving GC but not a defrag GC, we have no other information, and we have to clear forwarding bits for all the blocks.
This PR only changes the code that involves on-the-side forwarding bits. If the forwarding bits are in the header, there should be no performance difference with the default configuration of OpenJDK (which uses in-header forwarding bits). If the forwarding bits are on the side, this PR fixes correctness issues because both CopySpace and ImmixSpace (in StickyImmix) would crash. I'll run benchmarks using the default settings (in-header forwarding bits) to verify that there is no performance difference. |
If we can use side forwarding bits for OpenJDK, it is worth measuring the performance. |
I'll add a feature to OpenJDK to force the forwarding bits to be on the side. And it shall be added to minimal-tests-openjdk. Otherwise the CI has no way to even check if this PR is correct. |
binding-refs |
There is a deadlock happening in the minimal binding test. It is running |
The benchmark result. Lusearch from DaCapo Chopin, 3x min heap size, 40 invocations. 3 builds:
Note: There is no master branch with side forwarding bits because that'll crash. Plots (normalised to build1): plotty link Plots (not normalised): plotty link Note that the number of GC is stuck at 2047 for GenImmix. The actual number of GCs should be much higher. build1 vs build2The difference between build1 and build2 is the effect of this PR without enabling side forwarding bits. Plots (normalised to build1): plotty link Mostly unchanged. For some plans, such as GenCopy and SemiSpace, STW time becomes higher, but other time becomes lower. It's likely some changes in the code influenced the way it measures STW time, counting some other time into STW time. The total time is different with non-overlapping error bars, but is within 1%. build2 vs build3The difference between build2 and build3 indicates the cost of enabling side forwarding bits. Plots (normalised to build2): plotty link Obviously, the number of GC is increased. That's because the side forwarding bits occupies part of the heap space. As a result, the effective heap size for objects becomes smaller. The STW time increased due to increased number of GCs and the cost of clearing side forwarding bits. The plans involving CopySpace (SemiSpace, GenCopy and GenImmix) are impacted to a greater degree. The |
The slowdown of using side forwarding bits is significant (build2 vs build3). It would be good to measure the overhead of side forwarding bits on master -- if it cannot run with Immix or with discontiguous spaces, you can try find a configuration that runs (e.g. no compress pointer, semispace). Also please include all the benchmarks in the evaluation. It is okay to focus on one benchmark to address some issues, but we need to know the results on more benchmarks in order to know if the PR has performance regression. |
Yes. Side forwarding bits are not free. It is significant for CopySpace because the entire CopySpace is copied away, and the clearing involves the entire space. But it is much lower for ImmixSpace. For Immix and StickyImmix, the total time overhead is within 1%, and the STW time overhead is about 2%-4%.
I am doing a 4-way comparison, comparing master-header, master-side, PR-header and PR-side, using SemiSpace and GenCopy, without compressed oops. Let's wait for the results. |
I added build4: master with side forwarding bits enabled. To avoid confusion, I'll use "master-header", "PR-header", "master-side", "PR-side" as the labels of the builds. I re-ran lusearch, but only using SemiSpace and GenCopy which only involve CopySpace. Plot (un-normalised): plotty Plot (normalised to "master-header"): plotty The increase in the number of GC and the times due to moving the forwarding bits on the side (i.e. the ratio of For SemiSpace, there is a 1.7% increase in STW time and 0.5% increase in total time, and it is consistent with the result of the last 3-way comparison. |
I ran other benchmarks on on lynx.moma and bobcat.moma (identical hardware and identical binary). 3x min heap size w.r.t. G1 in OpenJDK. No compressed OOPs (even when running StickyImmix). Note that all four builds use the mmtk-openjdk binding from mmtk/mmtk-openjdk#277, but only master-header and PR-header have the SemiSpace. (plotty) Note that biojava and fop failed to run at 3x min heap size with SemiSpace StickyImmix. (plotty Comparing master-header and pr-header, most differences are within the error bars. For SemiSpace, the stw.time for pr-header is noticeably higher (but still small) than master-header. That could be a result of executing Comparing XXX-header and XXX-side, we see the overhead of forcing forwarding bits to be on the side. The overhead is significantly higher for SemiSpace (only CopySpace) than StickyImmix (only ImmixSpace). One reason may be that ImmixSpace distributes the clearing operation to MajorGC per GC: (plotty) There are two interesting benchmarks: "avrora" and "jython". Their side-metadata overhead is so big in SemiSpace, but "avrora" sped up when using side forwarding bits, the "jython" only had a very tiny overhead when using side forwarding bits. The speed-up could be noise (as their error bars are large). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The performance looks reasonable to me. If you would like to investigate further, that's fine. The PR in its current state is good to merge.
I re-ran lusearch with SemiSpace, but this time with three builds
All builds are using header forwarding bits. I added Build3 to see if the unconditional iteration of chunks is the cause of the tiny overhead of STW time. But experiments show that build3 has even greater overhead than build2. (plotty) I think the effect of the |
Added a feature to force using on-the-side forwarding bits. This is for testing mmtk-core. The CI script for minimal test is also extended to run several plans involving CopySpace and ImmixSpace on OpenJDK with side forwarding bits. Related PR: mmtk/mmtk-core#1138 The feature introduced in this PR can be used to check the correctness of that PR. --------- Co-authored-by: mmtkgc-bot <[email protected]>
This PR fixes bugs related to the clearing of side forwarding bits.
The side forwarding bits of ImmixSpace are now cleared after moving GCs instead of before every major GC. This eliminates redundant clearing operations during non-moving full-heap GCs (i.e. when not doing defrag), and also ensures the subsequent nursery GCs do not see stale forwarding bits when using StickyImmix in which case young objects are also allocated in the ImmixSpace.
Also fixed the code for clearing side forwarding bits in discontiguous CopySpace.