Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deal with cross-thread access to user-defined GodotClass instances #18

Open
chitoyuu opened this issue Nov 8, 2022 · 15 comments
Open
Labels
c: ffi Low-level components and interaction with GDExtension API feature Adds functionality to the library hard Opposite of "good first issue": needs deeper know-how and significant design work.

Comments

@chitoyuu
Copy link

chitoyuu commented Nov 8, 2022

Currently, RefCell and i32 are used to store user-defined GodotClass instances and their reference counts:

https://github.com/godot-rust/gdextension/blob/a64f22409e05d1353492fded4aa36a4b4e6b710d/godot-core/src/storage.rs#L15-L21

In a multi-threaded configuration of the engine, this may lead to unsoundness when the instances are accessed by GDScript or engine code. Noting that GodotClass does not have a Send or Sync requirement, this can be as simple as a non-Send field in the type.

@Bromeon in Discord:

I plan to have a mix of multiple things:

  • the "you need to take some responsibility of the non-Rust code"
  • no access to Godot thread APIs as direct Rust classes (Thread, Mutex, Semaphore) by default, and > unsafe access when there is a threads feature enabled
  • GDScript->Rust calls could validate the thread ID and compare it with the one through which the extension library was initialized (which I hope is the main thread)
    A Cargo feature thread would loosen those restrictions and give access to multithreading, in an unsafe way.
    We could then either conditionally implement Send/Sync on Gd, or maybe better, provide an explicit Send/Sync type to transport references across thread boundaries.
    Users would still need to adhere to the Godot thread guidelines.

As prior art, the GDNative bindings deals with the problem with the user-data wrapper system.

@Bromeon Bromeon added c: ffi Low-level components and interaction with GDExtension API bug labels Nov 8, 2022
@Bromeon
Copy link
Member

Bromeon commented Jan 22, 2023

Re-labeling this as feature for general discussion about multithreading support.

@Bromeon Bromeon added feature Adds functionality to the library and removed bug labels Jan 22, 2023
@Bromeon Bromeon added the hard Opposite of "good first issue": needs deeper know-how and significant design work. label Mar 21, 2023
TitanNano added a commit to TitanNano/gdext that referenced this issue Apr 1, 2023
TitanNano added a commit to TitanNano/gdext that referenced this issue Apr 1, 2023
TitanNano added a commit to TitanNano/gdext that referenced this issue Apr 22, 2023
TitanNano added a commit to TitanNano/gdext that referenced this issue Apr 22, 2023
TitanNano added a commit to TitanNano/gdext that referenced this issue Apr 27, 2023
bors bot added a commit that referenced this issue May 1, 2023
212: Implement InstanceStorage optionally as sync for #18 r=Bromeon a=TitanNano

This is a second implementation of `InstanceStorage` that is `Sync` and `Send` and should make it suitable for usage with rust threads.

I hope it fits with the general plans for multithreading in #18. Feedback is more than welcome.

Co-authored-by: Jovan Gerodetti <[email protected]>
bors bot added a commit that referenced this issue May 1, 2023
212: Implement InstanceStorage optionally as sync for #18 r=Bromeon a=TitanNano

This is a second implementation of `InstanceStorage` that is `Sync` and `Send` and should make it suitable for usage with rust threads.

I hope it fits with the general plans for multithreading in #18. Feedback is more than welcome.

Co-authored-by: Jovan Gerodetti <[email protected]>
Co-authored-by: Jan Haller <[email protected]>
bors bot added a commit that referenced this issue May 1, 2023
212: Implement InstanceStorage optionally as sync for #18 r=Bromeon a=TitanNano

This is a second implementation of `InstanceStorage` that is `Sync` and `Send` and should make it suitable for usage with rust threads.

I hope it fits with the general plans for multithreading in #18. Feedback is more than welcome.

Co-authored-by: Jovan Gerodetti <[email protected]>
Co-authored-by: Jan Haller <[email protected]>
Hapenia-Lans pushed a commit to Hapenia-Lans/gdextension that referenced this issue May 26, 2023
# This is the 1st commit message:

Parse gdextension_interface.h declarations using regex

# This is the commit message #2:

AsUninit trait to convert FFI pointers to their uninitialized versions

# This is the commit message godot-rust#3:

GodotFfi::from_sys_init() now uses uninitialized pointer types

# This is the commit message godot-rust#4:

Introduce GDExtensionUninitialized*Ptr, without changing semantics

# This is the commit message godot-rust#5:

Adjust init code to new get_proc_address mechanism

# This is the commit message godot-rust#6:

Make `trace` feature available in godot-ffi, fix interface access before initialization

# This is the commit message godot-rust#7:

Compatibility layer between Godot 4.0 and 4.1 (different GDExtension APIs)

# This is the commit message godot-rust#8:

Add GdextBuild to access build/runtime metadata

# This is the commit message godot-rust#9:

Detect 4.0 <-> 4.1 mismatches in both directions + missing `compatibility_minimum = 4.1`

# This is the commit message godot-rust#10:

Detect legacy/modern version of C header (also without `custom-godot` feature)

# This is the commit message godot-rust#11:

CI: add jobs that use patched 4.0.x versions

# This is the commit message godot-rust#12:

Remove several memory leaks by constructing into uninitialized pointers

# This is the commit message godot-rust#13:

CI: memcheck jobs for both 4.0.3 and nightly

# This is the commit message godot-rust#14:

Remove ToVariant, FromVariant, and VariantMetadata impls for pointers

This commit splits SignatureTuple into two separate traits:
PtrcallSignatureTuple and VarcallSignatureTuple. The latter is a child
of the former. PtrcallSignatureTuple is used for ptrcall and only
demands GodotFuncMarshall of its arguments. VarcallSignatureTuple is
used for varcall and additionally demands ToVariant, FromVariant, and
VariantMetadata of its arguments, so pointers cannot benefit from the
optimizations provided by varcall over ptrcall.

# This is the commit message godot-rust#15:

Adds FromVariant and ToVariant proc macros

# This is the commit message godot-rust#16:

godot-core: builtin: reimplement Plane functions/methods

# This is the commit message godot-rust#17:

impl GodotFfi for Option<T> when T is pointer sized and nullable godot-rust#240

Additionally FromVariant and ToVariant are also implemented for Option<Gd<T>>
to satisfy all the requirements for ffi and godot_api.

# This is the commit message godot-rust#18:

Fix UB in virtual method calls that take objects
Fix incorrect incrementing of refcount when calling in to godot
Fix refcount not being incremented when we receive a refcounted object in virtual methods

# This is the commit message godot-rust#19:

fix UB caused by preload weirdness

# This is the commit message godot-rust#20:

Implements swizzle and converts from/to tuples
@Bromeon
Copy link
Member

Bromeon commented Aug 24, 2023

To elaborate on my original plans a bit and keep the discussion public:

As mentioned in #212 (comment), some operations are currently unsound, for example:

  • when storing a Gd<T> equivalent reference on GDScript side
  • through the scene tree
  • Gd::upcast()
  • Gd::from_instance_id()
  • Gd::from_variant()
  • ...

So individual unsound access points like from_instance_id are mostly symptoms, the core problem is that Gd is not generally safe to construct on other threads.

The semantics here are actually more limiting than Rust's own:

in general, Gd<T> can only be safely constructed and used on the main thread.

This is very important, as Gd<T> is not an isolated instance (as it were in Rust). Through the engine, there can be lots of hidden links to other objects (scene tree, connected signals, base classes...). In other words, Gd<T> potentially has side effects transcending thread boundaries. This concept is not expressible using Rust standard traits; Rust generally assumes that if you have ownership of something, it's unique; and if it's shared across threads, it's Sync.


So, what can we do? My idea is still more or less the same from the original post.

  1. When threads feature is off: we strictly forbid construction of Gd<T> in other threads.

    • Validated through thread ID and panics.
  2. When threads feature is on: we need to differentiate which types are "thread-safe" according to Godot.

    • This not only concerns classes, but also builtins, especially the composite ones (Variant, Array, PackedArray, Dictionary, Callable).
    • Servers and Godot singletons should be constructible from any thread, and shareable across thread boundaries.
    • Resources can be loaded on one thread and sent to another (Send) and read concurrently from different threads (Sync), but not modified concurrently.
  3. Once we have a model, we can turn it into the type system:

    • Traits for properties that can be statically verified (like "this type is safe to construct on non-main threads").
    • Runtime checks and panics for properties that cannot be verified at compile time.
    • Escape hatches via unsafe for cases that need to be supported but don't fit into the type system. Ideally, this surface should be small.

@lilizoey
Copy link
Member

i still kinda feel like it'd be better to just have two separate Gds. One that is only meant to be used on the main thread and one that is able to be used concurrently. maybe more idk we can make it as granular as we like there.

mainly because i really think that making Gd thread safe will be a very invasive procedure and likely hurt the ergonomics a fair bit. several methods may need to be made unsafe, etc. and having it be a feature that changes the behavior of Gd would mean it's either all or nothing.

im addition, how do we document that? a separate type is easy to document, but documenting all the changes from this feature seems like it'd be very confusing and weird. i imagine there is a way to do it in the docs but i cant imagine it'd look nice.

in addition, for the other core classes we can probably just make them Send/Sync if they are. all the value-types are already, since they're just implemented in rust. and for Dictionary and Array we can probably make a thread safe version of each.

anyway let's say we split Gd into Gd and GdSync (temp names). we could just make Gd check that it's always on the main thread when created, and have it not implement Send or Sync ever. (or maybe we can allow it in very specific cases, like perhaps the singletons? that might also just be confusing though).

We could also potentially use object metadata to store what thread affinity an object should have, if we want to allow Gd to be used on more than just the main thread. But that would also mean we'd need to ensure the user doesn't override the metadata somehow. unless, can we set instance bindings on arbitrary objects? if so then we might be able to use that instead.

GdSync would then have a more restricted api. likely more unsafe methods, and probably some way to track whether it should be Send or Sync. i dont know if we can do that by just implementing Send/Sync directly on the classes. at least for Sync that would allow:

let foo = Gd::<Node>::from_instance_id(id);
let node: &Node = *foo;
thread::spawn(|| node.get_child())

maybe it is fine to allow this for classes that actually are Sync (Node probably isn't).

Also there's a distinction i just thought of:

  • some classes can always be safely sent/shared between threads
  • some classes may be safely sent/shared between threads as long as the object is actually that class and not a subclass

generally the first one of these will be true if the entire subtree of the class hierarchy is also send/sync. this would technically include user-classes but we can perform a check in user classes to make sure they're not created/shared where they shouldn't be.

the second one may be true in some cases, for instance it's true for Object i believe.

we may also have some cases where some individual methods are safe to call from other threads. i suspect this may be a bit too hard to model properly though, may be feasible through an unsafe escape hatch. If we do add a RawGd we can use that as the escape hatch, RawGd would be Send and Sync, but most methods on it are unsafe.

There are also a couple other possibilities for dedicated smart pointers we could make if we wanted to. One I've thought of would be analogous to the Unique typestate in gdnative. Essentially a GdBox, which would be probably the most permissive of all the types. However it can only be safely constructed fresh, and unsafely converted from another Gd type. But also can be safely consumed and turned into other gd types. It's also freed on drop. We might actually be able to safely convert RefCounted objects into this type, since we can check the refcount to see that we're the only one owning it. This would likely be very useful in dealing with Resource in that case.

@Bromeon
Copy link
Member

Bromeon commented Aug 24, 2023

Thanks a lot for the great feedback!

Yep, it might be better to have its dedicated type GdSync (or Gds, SyncGd, Agd or whatever -- I'll use Gds for brevity here) if the APIs differ too much.

Proliferation of public types with closely related meaning, such as Gd, GdSync, RawGd, GdBox, ... goes a bit against our declared Simplicity principle. As a user wanting to develop a game, I'm usually not interested in selecting out of 4+ smart pointer types and memorize all the subtle differences and use cases, if GDScript or C# just allows me to get stuff done. But probably we will see if there's a real need for those once things get more concrete 🙂

Additionally, there are a lot of things we cannot check at compile time either way, by the nature how Godot operates.

You mentioned a good one: if Gds<Base> can point to Gds<Derived>, we need to ensure that all derivates follow the same constraints. Which creates non-locality: adding a new class deriving from Base may break existing usages of Gd<Base> (at runtime). Or it must be guaranteed that polymorphism (up/downcasting) is only possible when derived classes implement the same traits.

Alternatively, we could outlaw polymorphism on Gds and enforce that Gds<T> always hold exactly T. This might be an acceptable trade-off, since synced pointers are usually short-lived (e.g. load resource in background thread, do a parallel computation with a result, etc).

@lilizoey
Copy link
Member

Just gonna summarize general thread safety issues for the various types. When we focus more on thread safety, we'll likely need to split all the thread unsafe types in two to create a thread-safe variant for each. But there are several types which likely dont need such a split, and some types may only require slight modifications to be made safe.

Known thread safe

  • All the mathy types are implemented in rust and thus are entirely thread-safe
  • integers, floats, and bool

Likely thread safe

  • StringName is immutable + atomically refcounted and should therefore be thread safe
  • Packed*Array except String, these should be largely COW types so can't be mutated concurrently from different threads
  • Gd<T> where T is a godot-singleton, it may be that our assumptions in making Gd makes this thread unsafe but usage of the godot singletons is thread safe

Likely thread unsafe

String types can be easily cloned into a shared reference and then mutated afterwards, this is likely gonna cause concurrency issues.

  • GString is mutable and thus likely not thread safe
  • NodePath is mutable and thus likely not thread safe
  • PackedStringArray, GString is likely not thread safe so this likely isn't either

Known thread unsafe

  • Array, allows arbitrary sending and sharing of other godot types, even Array<i64> cannot in a thread-safe manner always maintain its safety invariant
  • Dictionary, allows arbitrary sending and sharing of other godot types
  • Variant, it can be any of the other types
  • Gd<T>, intentionally thread unsafe, will likely need a different type for thread safe object access - see above
  • Callable, arbitrary code may be run
  • Signal, arbitrary code may be run

@Bromeon
Copy link
Member

Bromeon commented Feb 18, 2024

This is a great overview! 👍

If we want to provide ergonomic usage, we need to think if/how we can support common use cases such as:

  • load resources on a separate thread
  • construct a node on a separate thread, send it to main, attach to tree there

Some observations:

  1. Certain methods like Object::call, Object::get, Node::call_group etc. must be excluded from thread-safety guarantees, since those can invoke arbitrary code (Rust and non-Rust).
    • This is already a problem in regular Gd<T>.
  2. Some node interactions must be run on main, e.g. scene-tree related ones like Node::add_child, Node::add_sibling, Node::reparent.
    • I don't know how to verify this unless we annotate each method manually, which I'd like to avoid if somehow possible.
  3. Other options like Node3D::set_position (most properties in general) are fine to execute on any thread, as long as there is no concurrent access to the same object from another thread.
    • Unique objects (e.g. newly created ones) are always safe to use.
    • Shared objects obey the borrow principle: 1 writer XOR many readers.
    • Node is not reference-counted, so it's impossible to know how many references exist.
    • However, at least from Rust side we could provide best-effort diagnostics in Debug mode: method calls check a global lock for that instance ID, and panic if the lock is being held. This can't be done in GDScript code, but we need to live with that being outside our guarantees.

As for unsafe usage, we need to see where it makes sense -- if every single cross-thread method becomes unsafe, we are back to gdnative's approach and don't win much. I'm particularly considering how to map Send (allow moving but not sharing) and Sync (allow shared access) to the above-mentioned semantics.

@lilizoey
Copy link
Member

One thing we could do is have a Gd type with Box-like semantics, similar to gdnative's Unique typestate. Though a different type entirely instead of typestates may be more appropriate.

Its safety invariant is that it has unique ownership of the object. This would let us much more easily use it safely from different threads, it could be Send and maybe even Sync. Since we know there is only one reference to this object, then it should be entirely safe to use the vast majority of methods on it. It may be a bit difficult to figure out what methods could break the safety invariant though. We could then have a way to safely convert it to a normal Gd (consuming the original object). This would let us support patterns like "do things on other threads, send to main thread, then use like normal".

@Bromeon
Copy link
Member

Bromeon commented Feb 20, 2024

We could also start very conservatively, and limit the way how this UniqueGd (or however it's called) can be used:

  • ✔️ any Deref/DerefMut API methods on it (otherwise it's useless).
    • (We could potentially mark some unsafe if really appropriate, but this won't scale well)
  • ✔️ constructors new_alloc, new_gd, from_init_fn, from_object
  • cast, try_cast (consumes object), upcast, upcast_ref, upcast_mut (basically Deref)
  • ✔️ free (consumes object)
  • ✔️ instance_id
  • ✔️ bind, bind_mut
  • ✔️ into_gd (consumes object)
  • ✔️ pass to Godot APIs by consuming the object

These operations are absent:

  • clone(), obviously
  • from_godot and from_variant
  • to_godot and to_variant

These operations could be marked unsafe (or also absent):

  • ✖️ from_gd() or from_gd_unchecked() for manually managed types
    • could involve best-effort diagnostics in debug mode, but quite involved
  • ⚠️ from_gd() for ref-counted types, which is dynamically checked and safe
  • ✖️ from_instance_id()
    • can initially also be absent, as this may not even be needed; can also be trivially achieved with a from_gd constructor
  • ✖️ pass to Godot APIs but via reference

@lilizoey
Copy link
Member

We could also start very conservatively, and limit the way how this UniqueGd (or however it's called) can be used:

* ✔️ any `Deref`/`DerefMut` API methods on it (otherwise it's useless).

  * (We could potentially mark some unsafe if really appropriate, but this won't scale well)

We do need to double check some of this, Deref is almost certainly fine. But DerefMut might have some issues where we could for instance do:

let mut foo = UniqueGd::<Node>::new_alloc();
let bar = Gd::<Node>::new_alloc();
foo.add_child(bar.clone());

send_to_main_thread(foo); // Also sends `bar` to the main thread?

bar.some_function(); // And we have a reference to `bar` on both threads?

Maybe if there was some way to limit the internal APIs such that functions called through DerefMut for UniqueGd only accepts UniqueGds...

* ✔️ `into_gd` (consumes object)

* ✔️ pass to Godot APIs by consuming the object

Both of these could maybe just be a From<UniqueGd> for Gd impl.

@Bromeon
Copy link
Member

Bromeon commented Feb 21, 2024

We do need to double check some of this, Deref is almost certainly fine. But DerefMut might have some issues where we could for instance do:

Good observation -- I checked Node and it seems like most "dangerous" methods actually take &mut self. We'll possibly find one or another outlier, but this might greatly help at keeping the &self methods ergonomic 🙂


Maybe if there was some way to limit the internal APIs such that functions called through DerefMut for UniqueGd only accepts UniqueGds...

We could maybe reuse the trick from std::thread::spawn() and require a closure with a Send bound?

So mutable methods could not be implicitly called through DerefMut, but would need something like

unique.with_mut(move |this| {
    this.add_child(other_unique);
});

@lilizoey
Copy link
Member

So i experimented a bit with making a UniqueGd, and there are some extra issues that come up

Which classes can be UniqueGd?

Any engine classes can, but what about rust user classes? If the user class doesn't store the Base field and uses the default init implementation, then this should be safe. But what if you want Base access?

If you can access Base then you can just do self.to_gd() at which point you've aliased the UniqueGd, which would be library UB.

So we'd need to prevent people from doing things like this, maybe a UniqueBase is needed? But that adds further new types, maybe it'd be possible to add something to Bounds which says whether Self can be UniqueGd or not, which lets us then determine what functionality Base exposes.

But then users need to opt-in to being usable in Unique which is maybe not ideal? Or maybe it is, it's nice to be explicit sometimes.

Mutable references

As soon as you have a &mut T reference to the base object, it's possible to use std::mem::swap to break the invariant. Simply do something like this:

let unique = UniqueGd::new();
unique.with_mut(|obj| {
    let mut new_gd = Object::new_alloc();
    std::mem::swap(&mut *obj, &mut *new_gd);
    let cloned = new_gd.clone();
    std::mem::swap(&mut *obj, &mut *new_gd);
    // Woops, now `cloned` aliases `unique`, not ok
});

This is essentially #23 again, however with UniqueGd this would actually be unsound. Requiring us to either make any mutable access to such an object unsafe, or to make some bigger changes to actually fix #23, such as the supertrait method.

@Bromeon
Copy link
Member

Bromeon commented Jun 13, 2024

Great observations!

If you can access Base then you can just do self.to_gd() at which point you've aliased the UniqueGd, which would be library UB.

So we'd need to prevent people from doing things like this, maybe a UniqueBase is needed? But that adds further new types, maybe it'd be possible to add something to Bounds which says whether Self can be UniqueGd or not, which lets us then determine what functionality Base exposes.

But then users need to opt-in to being usable in Unique which is maybe not ideal? Or maybe it is, it's nice to be explicit sometimes.

I think we need to strike a balance between static type safety (and the messy APIs it can cause) and ergonomics. I'm fine with panicking in such cases, a descriptive message could tell the user immediately what's wrong -- often even more so than weird bound mismatches. This would also follow ideas here.

As soon as you have a &mut T reference to the base object, it's possible to use std::mem::swap to break the invariant.

Hm, that indeed looks like a hard problem. Once you have &mut access to the object, one can always move in/out...

Spontanous idea that I haven't tested would be to just provide a temporary proxy (bound to scope) that forwards method calls to the original object, or something that doesn't deref-mut to Object directly. But that would also need either a fundamental redesign or at least a duplication of the API methods 🤔

@Bromeon
Copy link
Member

Bromeon commented Jun 14, 2024

If we want uniqueness (or general multi-threading), we will likely need runtime information.

In my comment above, I mentioned that UniqueGd::from_instance_id() could be unsafe/absent, but this alone doesn't solve the problem. For example, what prevents one from doing:

let unique: UniqueGd<Node> = ...;
let id = unique.instance_id();

let secret_clone: Gd<Node> = Gd::from_instance_id(id); // regular Gd!

Making UniqueGd::instance_id() unsafe (as it can cause follow-up UB) isn't a great approach. There are other backdoors like Debug which can be parsed for the ID, etc...

And it doesn't solve a problem that doesn't even involve UniqueGd:

let one: Gd<Node> = ...;
let id = one.instance_id();
thread::spawn(|| {
    let two: Gd<Node> = Gd::from_instance_id(id);
});

On the other hand, we could add meta-information (Object::set_meta) that is verified by Gd::from_instance_id() and causes a panic. If we don't trust the Godot meta system because it's user-accessible, we can also keep our own

static UNIQUE_OBJECTS: Global<HashSet<InstanceId>>

In both cases, we need to trace all UniqueGd -> Gd conversions, to remove the "unique" meta flag. If we have our own hash set, we additionally need to trace all destructions, to avoid an ever-growing set.

@lilizoey
Copy link
Member

One thing to note is that these methods also exist separately on godots end, and we can't always guarantee that the godot versions of these aren't called by users since there's always Object::call.

@lilizoey
Copy link
Member

lilizoey commented Sep 4, 2024

Made a simple proof of concept for unique gd, and copied the example from https://docs.godotengine.org/en/stable/classes/class_arraymesh.html#class-arraymesh-method-surface-get-array-index-len to using it:

// This can hopefully be done safely in one go somehow, like `ArrayMesh::new_unique()`.
let mut array_mesh = unsafe { UniqueGd::from_gd(ArrayMesh::new_gd()) };

array_mesh.with_mut(|array_mesh| {
    let mut vertices = PackedVector3Array::new();
    vertices.push(Vector3::new(0.0, 1.0, 0.0));
    vertices.push(Vector3::new(1.0, 0.0, 0.0));
    vertices.push(Vector3::new(0.0, 0.0, 1.0));

    let mut arrays: VariantArray = Array::new();
    arrays.resize(ArrayType::MAX.into_godot() as usize, &Variant::nil());
    arrays.set(
        ArrayType::VERTEX.into_godot() as usize,
        vertices.to_variant(),
    );

    array_mesh.add_surface_from_arrays(PrimitiveType::TRIANGLES, arrays);
});

let mut m = unsafe { UniqueGd::from_gd(MeshInstance3D::new_alloc()) };
m.with_mut(move |m| m.set_mesh(array_mesh));

this isn't a terrible API for UniqueGd at least

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: ffi Low-level components and interaction with GDExtension API feature Adds functionality to the library hard Opposite of "good first issue": needs deeper know-how and significant design work.
Projects
None yet
Development

No branches or pull requests

3 participants