Allowing atomic instructions with unshared memory #144

tlively · 2019-10-17T18:34:29Z

It is currently specified that attempting to use atomic access operators on non-shared linear memory is a validation error. This validation requirement causes a lot of friction for toolchain developers and users.

For example, consider a user wishing to distribute a WebAssembly binary for a C++ library. Say that library uses C++11 atomics internally to achieve thread-safety, but is otherwise agnostic to whether it is used in a single threaded environment or a multithreaded environment. On native platforms a single build of the library can be used in both environments, but due to the validation rules for WebAssembly atomics, the library must be built twice for WebAssembly; once with atomics enabled and once without. Beyond causing extra work for developers, having to produce these separate builds is also a large hole in WebAssembly's portability story.

One solution to this would be for all applications to use shared memories whether they are multithreaded or not. But this is not practical because not all engines support threads and not all engines are expected to support threads in the future.

A better solution would be to relax the validation rules to allow atomic operations to be used with non-shared memories. The semantics for atomic loads, stores, rmw operations, and cmpxch operations are straightforward to extend to unshared memories. There are more possibilities that might make sense for wait and notify operations, but I suggest that waiting on an unshared memory always trap and notify on an unshared memory always return 0.

lars-t-hansen · 2019-10-18T07:00:00Z

Making the load/store/rmw/cmpxchg ops more lenient seems fairly unproblematic (though see below) and would make simple cases "just work".

wait and notify I'm more skeptical about. Anyone using wait and notify is basically implementing some sort of mutex, and it's not terribly plausible that that would work the way it's intended in unshared memory. So then the argument would have to be that it's possible for the module to discover whether its memory is shared or not and hence whether to use the mutex or not. So do we want a mechanism for that? I can see the case for allowing these ops to verify in unshared memory but probably not the case for allowing them to execute without traps (even notify).

Another concern with the simple operations is that code written for shared memory may use techniques that are not directly related to wait/notify, such as spinlocks, that make little sense in unshared memory. This again suggests a need for a mechanism to discover that shared memory is used.

It is possible that we could allow the simple ops to verify and execute with non-shared memory but that wait/notify won't even verify w/o shared memory and therefore must be used under control of feature detection, which then must know about shared memory.

To deal with the spinlocks problem, perhaps a constant flag in memory (placed there by the linker) would be adequate, until we know how much an issue this really is; or we add an instruction that evaluates to a constant to test for shared memory. After all, we already have memory.size, why not memory.is_shared?

AndrewScheidecker · 2019-10-18T13:38:46Z

One solution to this would be for all applications to use shared memories whether they are multithreaded or not. But this is not practical because not all engines support threads and not all engines are expected to support threads in the future.

I think this is actually practical. Even if an engine doesn't support threading, it's not so hard to support this proposal in a superficial way.

There's been some discussion (see WebAssembly/wasm-c-api#107) about sharing functions between threads would add a shared flag to functions (and all other extern kinds), and would prohibit shared functions from calling non-shared functions or accessing non-shared memories/tables/globals. This implies that modules compiled for sharing between threads would not be interoperable with non-shared modules.

Allowing the atomic instructions to operate on non-shared memories will only defer the interoperability problems.

tlively · 2019-10-18T20:22:38Z

wait and notify I'm more skeptical about. Anyone using wait and notify is basically implementing some sort of mutex, and it's not terribly plausible that that would work the way it's intended in unshared memory.

I think mutexes should mostly just work. The single thread cannot fail to acquire a mutex because there are no other threads that could be holding it. Mutexes can easily be implemented to not execute a wait instruction if they are acquired on the first try.

Programs that wait on condition variables while some other thread does work would be more problematic, but presumably such applications have alternative code paths for single-threaded execution if they are meant to be run in single-threaded environments.

So then the argument would have to be that it's possible for the module to discover whether its memory is shared or not and hence whether to use the mutex or not. So do we want a mechanism for that? I can see the case for allowing these ops to verify in unshared memory but probably not the case for allowing them to execute without traps (even notify).

I don't believe this introspection is necessary if mutexes just work. The reason I suggested that notify not trap is that waiters are often unconditionally woken after work is finished while waiting is usually guarded on some condition. It seems unnecessary to require an alternative code path to avoid executing notify instructions when everything would just work correctly by having them do nothing. There may be usage patterns I'm not considering, though.

Another concern with the simple operations is that code written for shared memory may use techniques that are not directly related to wait/notify, such as spinlocks, that make little sense in unshared memory. This again suggests a need for a mechanism to discover that shared memory is used.

Like normal waiting locks, I would expect spinlocks to always be acquired on the first try and therefore not actually spin.

It is possible that we could allow the simple ops to verify and execute with non-shared memory but that wait/notify won't even verify w/o shared memory and therefore must be used under control of feature detection, which then must know about shared memory... perhaps a constant flag in memory (placed there by the linker) would be adequate...

Good point! If I'm missing something and this introspection is necessary, then it can easily be solved by the linker.

Even if an engine doesn't support threading, it's not so hard to support this proposal in a superficial way.

True, but it seems hacky to have an engine that appears to support the threads proposal but doesn't really support actually having multiple threads. I would like to avoid such a muddled situation.

Allowing the atomic instructions to operate on non-shared memories will only defer the interoperability problems.

Perhaps, but the user problem will be solved for the time before the extensions you mention are implemented and standardized. If those extensions were standardized, it would ideally still be possible to have MVP-compatible multithreaded modules even if other capabilities like forking could not be MVP-compatible.

KronicDeth · 2019-10-18T20:38:48Z

I think mutexes should mostly just work. The single thread cannot fail to acquire a mutex because there are no other threads that could be holding it. Mutexes can easily be implemented to not execute a wait instruction if they are acquired on the first try.

Something "just works" for how the Rust WASM support in std and crates like parking_lot does this as we use Arc and Mutex when compiling Lumen and it is only using the single-threaded support in Safari, Chrome, and Firefox.

Like normal waiting locks, I would expect spinlocks to always be acquired on the first try and therefore not actually spin.

We use spinlocks for parts of the memory allocator in Lumen and haven't had issue being single-threaded.

tlively · 2019-10-18T20:50:02Z

@KronicDeth Good point! I am essentially proposing that we move the stripping of atomics from the tools to the engine, and the fact that everything works when we strip atomics in the tools suggests that everything would continue to work if we had the engines do the stripping instead.

lars-t-hansen · 2019-10-24T13:16:10Z

@tlively, there's a fair amount of "can be", "presumably", "seems", "may" in your earlier message. I'm inclined to be skeptical myself. We probably need some data / experience reports here from real applications. I see @KronicDeth has some; more would be welcome.

(Unless we come to agreement quickly it would also be good not to hold up the threads MVP over this, but perhaps that's a separate discussion.)

tlively · 2019-10-29T18:06:33Z

@lars-t-hansen We will experiment with the proposed semantics in V8 and real-world programs and follow up with our results.

tlively · 2019-11-05T21:40:56Z

I have confirmed that few Emscripten-compiled program would be broken by this change. Emscripten's pthread implementation calls emscripten_futex_wait and emscripten_futex_wake rather than using wait and notify instructions directly. These futex functions are currently implemented in JS and call Atomics.wait and Atomics.notify, respectively, but they could just as well be implemented in C using __builtin_wasm_atomic_wait_i32 and __builtin_wasm_atomic_notify, which would lower to the corresponding wasm instructions. The Emscripten pthread implementation only calls these futex functions and therefore waits or notifies if the lock is contended, so Emscripten programs using pthreads will never break on an unshared memory.

The only programs that could possibly break under the proposed semantics are those that call emscripten_futex_wait (and therefore Atomics.wait) without first checking for contention. Since this is a low-level API, I don't expect many projects to be using it directly, but there are probably some.

However, we could prevent even these applications from breaking on unshared memory with further changes to how Atomics.wait and i32.atomics.wait work. Currently Atomics.wait is specified to throw a TypeError if it is used with an unshared array, whether or not it would actually sleep. If we changed the semantics to only throw the TypeError (or trap) if the array (or memory) is unshared in the case the sleep condition is satisfied, then only programs that actually try to put their single thread to sleep will break, but those programs are already broken. What do folks think about this slightly more invasive change?

lars-t-hansen · 2019-11-06T09:20:34Z

If we changed the semantics to only throw the TypeError (or trap) if the array (or memory) is unshared in the case the sleep condition is satisfied, then only programs that actually try to put their single thread to sleep will break, but those programs are already broken.

They are not broken if they are using a wait with a timeout as a simple implementation of sleep(). That's what I'd do...

tlively · 2019-11-09T01:35:22Z

Right, I should have been more precise. Only programs that would reach the Suspend step in the Atomics.wait semantics that are also running in a context that forbids suspending or using an unshared memory would throw an error, but those programs throw errors today.

I am proposing that we loosen the restrictions on what programs can execute Atomics.wait without throwing but not that we allow any code to suspend that cannot suspend today.

lars-t-hansen · 2019-11-12T16:00:33Z

Only programs that would reach the Suspend step in the Atomics.wait semantics that are also running in a context that forbids suspending or using an unshared memory would throw an error, but those programs throw errors today.

They don't, though. Today those programs require shared memory to validate, in which case they would execute just fine. You're proposing to allow them to validate & execute in a context where they would start throwing errors if they get to the Suspend state, and my argument is only that we should expect there to be some programs like that. (If a program is running in unshared memory today then the author would have used a different mechanism for sleeping.)

tlively · 2019-11-12T18:15:05Z

I presented this proposal at the CG meeting today. There did not appear to be broad consensus on the best way forward, but allowing normal (i.e. non-wait) atomic operations to validate and execute on unshared memories was uncontroversial. The conditional sections proposal was also raised as a way of allowing individual objects to simultaneously contain and not contain wait instructions.

I presented the two solutions raised earlier in this thread, namely 1) trapping when wait instructions are executed in unshared-memory environments by analogy to Atomics.wait in JS, or 2) modifying the Atomics.wait semantics to trap in unshared memory environments only when the thread would be suspended (and therefore deadlocked) by the wait instruction but otherwise allowing the wait instruction to return early without trapping.

There was mixed support for option 2 due to the riskiness of changing wait semantics. It was also mentioned that pursuing option 1 would not preclude moving to option 2 at a later date if the need arose. Also, an optimal solution would probably use conditional sections to avoid executing waits in unshared environments, in which case option 1 would be fine.

Given all that, are there any objections to moving forward with option 1, i.e. trapping when waits are executed in unshared environments? @lars-t-hansen did you have additional concerns about engines needing to do extra work to lower normal atomics?

lars-t-hansen · 2019-11-13T06:59:50Z

@tlively, no real concern about the work involved for the plain atomics, these can be lowered to alignment check + nonatomic access I think.

As discussed in the CG on November 12, 2019 and in WebAssembly#144. Specifically, all atomic accesses are now allowed to validate and execute normally on unshared memories and wait operations trap when used with unshared memories. The PR updates Overview.md and makes a best-effort attempt at updating the spec, making no changes when there is already a TODO for updating spec text. It also adds new TODO comments to the reference interpreter where changes will have to be made.

As discussed in the CG on November 12, 2019 and in #144. Specifically, all atomic accesses are now allowed to validate and execute normally on unshared memories and wait operations trap when used with unshared memories. The PR updates Overview.md and makes a best-effort attempt at updating the spec, making no changes when there is already a TODO for updating spec text. It also adds new TODO comments to the reference interpreter where changes will have to be made.

tlively · 2020-05-06T23:30:33Z

We have decided on a path forward for this and updated the spec proposal accordingly, so I will close this issue.

Summary: The WebAssembly backend automatically lowers atomic operations and TLS to nonatomic operations and non-TLS data when either are present and the atomics or bulk-memory features are not present, respectively. The resulting object is no longer thread-safe, so the linker has to be told not to allow it to be linked into a module with shared memory. This was previously done by disallowing the 'atomics' feature, which prevented any objct with its atomic operations or TLS removed from being linked with any object containing atomics or TLS, and therefore preventing it from being linked into a module with shared memory since shared memory requires atomics. However, as of WebAssembly/threads#144, the validation rules are relaxed to allow atomic operations to validate with unshared memories, which makes it perfectly safe to link an object with stripped atomics and TLS with another object that still contains TLS and atomics as long as the resulting module has an unshared memory. To allow this kind of link, this patch disallows a pseudo-feature 'shared-mem' rather than 'atomics' to communicate to the linker that the object is not thread-safe. This means that the 'atomics' feature is available to accurately reflect whether or not an object has atomics enabled. As a drive-by tweak, this change also requires that bulk-memory be enabled in addition to atomics in order to use shared memory. This is because initializing shared memories requires bulk-memory operations. Reviewers: aheejin, sbc100 Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79542

WebAssembly/threads#144 updated the WebAssembly threads proposal to make atomic operations on unshared memories valid. This change updates the feature checking in the linker accordingly. Production WebAssembly engines have recently been updated to allow this behvaior, but after this change users who accidentally use atomics with unshared memories on older versions of the engines will get validation errors at runtime rather than link errors. Differential Revision: https://reviews.llvm.org/D79530

Expects the runtime will support atomic access, even in single-threaded mode. See WebAssembly/threads#144 for more.

Since #144, we've specified that `memory.atomic.notify` is allowed to be executed on unshared memories, but always returns 0 since it is impossible for there to be anything waiting on an unshared memory since all variations of the wait instructions in Wasm and JS trap or throw exceptions on unshared memories. However, I realized recently that there is no technical reason why [`Atomics.waitAsync`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Atomics/waitAsync) shouldn't be able to wait on an unshared memory. Because `Atomics.waitAsync` returns a promise rather than blocking, it can in principle be used to synchronize between multiple asynchronous tasks in flight concurrently in the same JS context, and now that [JSPI](https://github.com/WebAssembly/js-promise-integration/blob/main/proposals/js-promise-integration/Overview.md) is becoming more real, this actually might be a useful thing to do. In particular `Atomics.waitAsync` could be used to build suspending versions of mutexes and condition variables to synchronize between threads running as different asynchronous tasks within a single JS context. Actually realizing this vision of asynchronous threading with JSPI on top of `Atomics.waitAsync` requires changes on the JS side to allow `Atomics.waitAsync` and `Atomics.notify` to be used on unshared ArrayBuffers, but in the meantime we can prepare for that change on the Wasm side by softening the language stating that `memory.atomic.notify` unconditionally returns 0 for unshared memories.

Summary: The WebAssembly backend automatically lowers atomic operations and TLS to nonatomic operations and non-TLS data when either are present and the atomics or bulk-memory features are not present, respectively. The resulting object is no longer thread-safe, so the linker has to be told not to allow it to be linked into a module with shared memory. This was previously done by disallowing the 'atomics' feature, which prevented any objct with its atomic operations or TLS removed from being linked with any object containing atomics or TLS, and therefore preventing it from being linked into a module with shared memory since shared memory requires atomics. However, as of WebAssembly/threads#144, the validation rules are relaxed to allow atomic operations to validate with unshared memories, which makes it perfectly safe to link an object with stripped atomics and TLS with another object that still contains TLS and atomics as long as the resulting module has an unshared memory. To allow this kind of link, this patch disallows a pseudo-feature 'shared-mem' rather than 'atomics' to communicate to the linker that the object is not thread-safe. This means that the 'atomics' feature is available to accurately reflect whether or not an object has atomics enabled. As a drive-by tweak, this change also requires that bulk-memory be enabled in addition to atomics in order to use shared memory. This is because initializing shared memories requires bulk-memory operations. Reviewers: aheejin, sbc100 Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79542

WebAssembly/threads#144 updated the WebAssembly threads proposal to make atomic operations on unshared memories valid. This change updates the feature checking in the linker accordingly. Production WebAssembly engines have recently been updated to allow this behvaior, but after this change users who accidentally use atomics with unshared memories on older versions of the engines will get validation errors at runtime rather than link errors. Differential Revision: https://reviews.llvm.org/D79530

tlively mentioned this issue Nov 5, 2019

Add discussion of atomics with unshared memories WebAssembly/meetings#461

Merged

tlively mentioned this issue Dec 18, 2019

Allow atomic operations on unshared memories #147

Merged

syg mentioned this issue Mar 20, 2020

Normative: Allow Atomics methods on ArrayBuffers tc39/ecma262#1908

Merged

tlively closed this as completed May 6, 2020

sbc100 mentioned this issue Dec 2, 2020

relocation R_WASM_MEMORY_ADDR_TLS_SLEB cannot be used against __THREW__ in non-TLS section: .bss emscripten-core/emscripten#12941

Open

sbc100 mentioned this issue Dec 31, 2020

failed to asynchronously prepare wasm: CompileError: WebAssembly.instantiate(): Compiling function #67 failed: Invalid opcode 0xfe (enable with --experimental-wasm-threads) emscripten-core/emscripten#13152

Closed

sbc100 mentioned this issue Nov 22, 2021

Add multithreaded variant to ICU port emscripten-core/emscripten#15592

Merged

sbc100 mentioned this issue Mar 22, 2022

Ensure FreeType compiles with pthreads support when necessary. emscripten-core/emscripten#16546

Merged

sbc100 pushed a commit to emscripten-core/emscripten that referenced this issue Mar 29, 2022

Compile FreeType with pthreads support (#16546)

f253af0

Expects the runtime will support atomic access, even in single-threaded mode. See WebAssembly/threads#144 for more.

tlively mentioned this issue Sep 12, 2022

Soften language for notify on unshared memories #191

Merged

conrad-watt mentioned this issue May 22, 2023

wait/notify can check for shared memory at instantiation time #200

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allowing atomic instructions with unshared memory #144

Allowing atomic instructions with unshared memory #144

tlively commented Oct 17, 2019

lars-t-hansen commented Oct 18, 2019

AndrewScheidecker commented Oct 18, 2019

tlively commented Oct 18, 2019

KronicDeth commented Oct 18, 2019

tlively commented Oct 18, 2019

lars-t-hansen commented Oct 24, 2019

tlively commented Oct 29, 2019

tlively commented Nov 5, 2019

lars-t-hansen commented Nov 6, 2019

tlively commented Nov 9, 2019

lars-t-hansen commented Nov 12, 2019

tlively commented Nov 12, 2019

lars-t-hansen commented Nov 13, 2019

tlively commented May 6, 2020

Allowing atomic instructions with unshared memory #144

Allowing atomic instructions with unshared memory #144

Comments

tlively commented Oct 17, 2019

lars-t-hansen commented Oct 18, 2019

AndrewScheidecker commented Oct 18, 2019

tlively commented Oct 18, 2019

KronicDeth commented Oct 18, 2019

tlively commented Oct 18, 2019

lars-t-hansen commented Oct 24, 2019

tlively commented Oct 29, 2019

tlively commented Nov 5, 2019

lars-t-hansen commented Nov 6, 2019

tlively commented Nov 9, 2019

lars-t-hansen commented Nov 12, 2019

tlively commented Nov 12, 2019

lars-t-hansen commented Nov 13, 2019

tlively commented May 6, 2020