Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gc_sweep_page assertion triggering in tests #50419

Closed
vtjnash opened this issue Jul 5, 2023 · 7 comments · Fixed by #50466
Closed

gc_sweep_page assertion triggering in tests #50419

vtjnash opened this issue Jul 5, 2023 · 7 comments · Fixed by #50466
Assignees
Labels
ci Continuous integration GC Garbage collector rr trace included
Milestone

Comments

@vtjnash
Copy link
Member

vtjnash commented Jul 5, 2023

gc                                                (1) |        started at 2023-07-03T19:18:26.371 on pid 398
julia: /cache/build/default-amdci4-3/julialang/julia-master/src/gc.c:1380: gc_sweep_page: Assertion `!prev_sweep_full || pg->prev_nold >= pg->nold' failed.
[7136] signal (6.-6): Aborted
in expression starting at /cache/build/default-amdci4-4/julialang/julia-master/julia-889c2e3d81/share/julia/test/gc/binarytree.jl:54
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7fe2d50c640e)
__assert_fail at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
gc_sweep_page at /cache/build/default-amdci4-3/julialang/julia-master/src/gc.c:1380 [inlined]
gc_sweep_pool_page at /cache/build/default-amdci4-3/julialang/julia-master/src/gc.c:1476 [inlined]
gc_sweep_pool at /cache/build/default-amdci4-3/julialang/julia-master/src/gc.c:1563
_jl_gc_collect at /cache/build/default-amdci4-3/julialang/julia-master/src/gc.c:3260
ijl_gc_collect at /cache/build/default-amdci4-3/julialang/julia-master/src/gc.c:3435
gc at ./gcutils.jl:129 [inlined]
gc at ./gcutils.jl:129

https://buildkite.com/julialang/julia-master/builds/25532#01891c91-c02f-4181-b22c-5962d72d1499
@d-netto

@vtjnash vtjnash added GC Garbage collector ci Continuous integration rr trace included labels Jul 5, 2023
@KristofferC KristofferC added this to the 1.10 milestone Jul 5, 2023
@d-netto d-netto self-assigned this Jul 6, 2023
@Keno
Copy link
Member

Keno commented Jul 6, 2023

@d-netto
Copy link
Member

d-netto commented Jul 7, 2023

After building with FORCE_ASSERTIONS=1 I'm able to reproduce it by starting Julia with 8 threads and just calling GC.gc() on an infinite loop.

I was able to reproduce on #50021 and later, but not on the immediately preceding commit.

By looking at this PR, seems like it has introduced a race on the page->nold updates.

My guess is that with the changes from #50021, an item may be marked twice. Which can lead to gc_setmark_pool_ being called twice for the same item, finally leading to page->nold being updated more times than it should, which will trigger the assertion above.

CC: @gbaraldi

@d-netto
Copy link
Member

d-netto commented Jul 7, 2023

(8 GC threads, to clarify)

@gbaraldi
Copy link
Member

gbaraldi commented Jul 7, 2023

Interesting, that PR was made on the assumption that trying to mark the same object twice was a fine, but it seems it's actually not. Which makes me think, are the workstealing queues idempotent? Because if not they are also potential sources of this.

@gbaraldi
Copy link
Member

gbaraldi commented Jul 7, 2023

One thought I have here is, how much is correctness important on that number, because incrementing that counter(nold) is actually one of the most expensive parts of the mark loop currently, because as me and @vtjnash fetch_add is possibly expensive because it has seq_cst behaviour between the load/store.

@gbaraldi
Copy link
Member

gbaraldi commented Jul 7, 2023

At least from what I've seen, getting it wrong means sweeping one extra page, and that should be safe.

@d-netto
Copy link
Member

d-netto commented Jul 7, 2023

Which makes me think, are the workstealing queues idempotent?

No.

Making the mark-phase cheaper at the cost of losing a bit of precision in the sweep phase seems to be worth it, so I think that relaxing the assertion for multithreaded marking would make sense.

d-netto added a commit that referenced this issue Dec 10, 2023
Use atomics in the write-barrier slow-path to prevent duplicates in the
remset.

As discussed in #50419, setting
the mark bit is idempotent, but updating page metadata in the mark phase
is not.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Continuous integration GC Garbage collector rr trace included
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants