Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement OSX futex-equivalent (__ulock_wait/__ulock_wake) to futex #876

Closed
autumnjolitz opened this issue Jul 30, 2023 · 3 comments
Closed

Comments

@autumnjolitz
Copy link

autumnjolitz commented Jul 30, 2023

The overview at https://shift.click/blog/futex-like-apis/ characterizes that OSX and even semi-obscure BSDs (disclosure: I use DragonFlyBSD) have futex-equivalents.

given that the futex polyfill applies a check-wait/sleep approach (ie it can be delayed after an effective futex wake), would it be an appropriate thing to consider implementing for OSX (since 10.12, Sierra, 2016) __ulock_wait/__ulock_wake? It’s been part of libc++ and it’s unlikely Apple would remove it despite being a private API as they would break libc++ linked apps badly and futex’es are popular enough to make such a decision to move away from such increasingly unlikely.

The blog author wrote a Rust crate to demo using it: https://github.com/thomcc/ulock-sys

My interest in this is mostly academic as I wanted a futex like that would work on OSX, DragonFly and Linux for one of my projects.

Cosmopolitan’s polyfill, the linked blog post and various OS manual pages have been helpful at helping me understand the state of the Futex union. It’s my hopes my findings here can help improve cosmopolitan libc as well.

@jart
Copy link
Owner

jart commented Oct 3, 2023

Wow this is a wonderful suggestion. I've always wondered how to do this on XNU. On Apple M1 we've been using Grand Central Dispatch via the system DSOs. But we don't use DSOs on x86 XNU. Also I really want the ability to get EINTR. If ulock_wait/wait has that, then we might use it on both architectures.

@jart
Copy link
Owner

jart commented Oct 3, 2023

Oh yes, ulock is so much better than I could have even imagined! I've just about finished incorporating it. Right now I'm trying to figure out why it interacts weirdly with fork, since some of our fork+threads torture tests are failing. Any ideas?

@jart
Copy link
Owner

jart commented Oct 3, 2023

I managed to get all tests passing with ulock by not using process shared mode. We normally enable pshared mode when waiting on the "clear child tid" futex that pthread_join() waits on. However enabling pshared mode with ulock for those futexes caused popen_test, fds_torture_test, and pthread_atfork_test to fail for reasons that aren't clear to me yet. Yet using process private mode caused everything to pass, so I'm happy for now, since cross-process futexes has spotty support across platforms anyway.

@autumnjolitz It excites me to hear you're interested in potentially getting involved with Cosmo development. That's great to hear. We pay close attention to GitHub, but it's worth mentioning that most of the action happens on our Discord server: https://discord.gg/ct73HWDD We'd love to see you there!

Anyway I'm going to push a change closing this out. Thank you again.

@jart jart closed this as completed in 85f64f3 Oct 3, 2023
G4Vi pushed a commit to G4Vi/cosmopolitan that referenced this issue Jan 19, 2024
Thanks to @autumnjolitz (in jart#876) the Cosmopolitan codebase is now
acquainted with Apple's outstanding ulock system calls which offer
something much closer to futexes than Grand Central Dispatch which
wasn't quite as good, since its wait function can't be interrupted
by signals (therefore necessitating a busy loop) and it also needs
semaphore objects to be created and freed. Even though ulock is an
internal Apple API, strictly speaking, the benefits of futexes are
so great that it's worth the risk for now especially since we have
the GCD implementation still as a quick escape hatch if it changes

Here's why this change is important for x86 XNU users. Cosmo has a
suboptimal polyfill when the operating system doesn't offer an API
that let's us implement futexes properly. Sadly we had to use that
on X86 XNU until now. The polyfill works using clock_nanosleep, to
poll the futex in a busy loop with exponential backoff. On XNU x86
clock_nanosleep suffers from us not being able to use a fast clock
gettime implementation, which had a compounding effect that's made
the polyfill function even more poorly. On X86 XNU we also need to
polyfill sched_yield() using select(), which made things even more
troublesome. Now that we have futexes we don't have any busy loops
anymore for both condition variables and thread joining so optimal
performance is attained. To demonstrate, consider these benchmarks

Before:

    $ ./lockscale_test.com -b
    consumed 38.8377   seconds real time and
              0.087131 seconds cpu time

After:

    $ ./lockscale_test.com -b
    consumed 0.007955 seconds real time and
             0.011515 seconds cpu time

Fixes jart#876
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants