-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Mark-delay" performance improvement to major GC #13580
base: trunk
Are you sure you want to change the base?
Conversation
31fc961
to
72500e9
Compare
Thanks! I’ll have a look tomorrow. |
72500e9
to
9af1835
Compare
b341dd4
to
4a6c7af
Compare
I've addressed @kayceesrk's review comments and rebased. |
Thanks to @kayceesrk and @stedolan I think this can come out of draft now. |
The code looks ok now. MSVC 32-bit has a failing test. I'll wait until the test is fixed before I approve the PR. |
4a6c7af
to
7d91cc4
Compare
any insights on this so far? |
7d91cc4
to
295543e
Compare
The Win32 problem is due to an accounting error, observed across platforms, which causes the |
(note: this accounting problem is specific to this PR, and not observed on trunk). |
7a17a33
to
af5fb77
Compare
I have further diagnosed the accounting problem, and put in a dirty hack to prove my hypothesis. My hack, which should not be merged, demonstrates that addressing this accounting issue fixes the problem by artificially consuming the rest of the slice budget at the point at which sweeping is first completed. |
Would switching to 64-bit counters fix this problem? |
64-bit counters would prevent the |
Most workloads will have either enough sweeping or enough marking to reach the slice target in some slice, before the shortfall reaches problematic levels.
In fact, very few cycles of this test, out of over 60,000, begin with Pending a more far-reaching rework of the pacing system, there are a few obvious changes which could address this problem, without the blunt approach of my hack in af5fb77
|
3908f08
to
956e306
Compare
This still has a failing multicore test, in which we discover that orphaned ephemerons can have the wrong colour bits.
So still not ready for review. |
…caml/ocaml#13580. Co-authored-by: Stephen Dolan <[email protected]>
956e306
to
e0ad612
Compare
Rebased. The ephemeron-adoption problem is fixed now but there's still a problem with finalisers, so still in draft. |
…caml/ocaml#13580. Co-authored-by: Stephen Dolan <[email protected]>
e0ad612
to
ef422d3
Compare
Fix ephemeron-adoption problem found when upstreaming mark-delay to ocaml/ocaml#13580. Co-authored-by: Stephen Dolan <[email protected]>
Band-aid for GC pacing problem discovered while upstreaming mark-delay to ocaml/ocaml#13580.
Co-authored-by: Stephen Dolan <[email protected]>
…unter at the start of any slice when it falls very far behind alloc_counter.
ef422d3
to
6eb0f2d
Compare
This is the upstreaming of ocaml-flambda/flambda-backend#2348 ocaml-flambda/flambda-backend#2358 (minor) and ocaml-flambda/flambda-backend#3029 by @stedolan. It introduces a new sweep-only phase at the start of each major GC cycle. This reduces the "latent garbage delay" - the time between a block becoming unreachable and it becoming available for allocation - by approximately half a major GC cycle.
Because marking, including root marking, doesn't take place until part-way through the GC cycle (when we move from sweep-only to mark-and-sweep), the allocation colour is not always
MARKED
but changes fromUNMARKED
toMARKED
at that point. Effectively we switch from a grey mutator allocating white to a black mutator allocating black.This PR is in draft because I've just done a fairly mechanical (although manual) patch application; I'm publishing it so that @stedolan and perhaps @kayceesrk can take a look. It passes the whole testsuite on my machine, including the new test (
parallel/churn.ml
) written by @stedolan for theflambda-backend
mark-delay work.