Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wipe ptls->system_id when a thread exits. #50747

Closed
wants to merge 3 commits into from
Closed

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Aug 1, 2023

On macOS, we use the system thread ID to match against the list of known thread local states during signal handling. To prevent picking up the wrong entry, i.e. from when a thread was previously executing a different task, make sure to wipe the system ID when a thread exits.

This manifested as the signal handler actually reporting a bus error when a thread touched safepoint memory during GC, because the matched thread local state had no current task attached to it.

Fixes JuliaGPU/Metal.jl#225

@maleadt maleadt added system:mac Affects only macOS multithreading Base.Threads and related functionality labels Aug 1, 2023
@maleadt maleadt requested a review from vtjnash August 1, 2023 14:41
Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very sketchy that you are doing this without atomics or a lock, but otherwise SGTM

@maleadt maleadt added backport 1.9 Change should be backported to release-1.9 backport 1.10 Change should be backported to the 1.10 release labels Aug 1, 2023
@maleadt
Copy link
Member Author

maleadt commented Aug 1, 2023

Are you suggesting to make the field atomic?

@maleadt maleadt requested a review from vtjnash August 1, 2023 14:59
Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it still looks sketchy, since you now have a lot of unsynchronized uses of an an atomic variable. Many of them happen behind a lock, so that is fine. But signals-mach.c:392 looks like a data race now, since the current_task load is not an acquire load and is in the wrong order with this. All of the other places could probably use a brief comment nearby that they are either inside the lock, or in a region where we know the target thread for lookup is pinned by the OS and currently stopped.

@maleadt
Copy link
Member Author

maleadt commented Aug 1, 2023

Many of them happen behind a lock, so that is fine.

Should I make them regular loads then?

But signals-mach.c:392 looks like a data race now, since the current_task load is not an acquire load and is in the wrong order with this.

So what should I change?

@maleadt
Copy link
Member Author

maleadt commented Aug 8, 2023

@vtjnash Bump.

On macOS, we use the system thread ID to match against the list of known
thread local states during signal handling. To prevent picking up the
wrong entry, i.e. from when a thread was previously executing a
different task, make sure to wipe the system ID when a thread exits.

This manifested as the signal handler actually reporting a bus error
when a thread touched safepoint memory during GC, because the matched
thread local state had no current task attached to it.

Fixes JuliaGPU/Metal.jl#225
@maleadt
Copy link
Member Author

maleadt commented Aug 10, 2023

Let's close this in favor of #50871, which is much simpler.

@maleadt maleadt closed this Aug 10, 2023
@KristofferC KristofferC removed backport 1.9 Change should be backported to release-1.9 backport 1.10 Change should be backported to the 1.10 release labels Aug 10, 2023
@giordano giordano deleted the tb/system_id branch February 25, 2024 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multithreading Base.Threads and related functionality system:mac Affects only macOS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash during MTLDispatchListApply
4 participants