-
-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Node Processing to allow Scene Multithreading #75901
Conversation
6e1cb31
to
5a26939
Compare
5a26939
to
7473f4e
Compare
Changed the approach and code, it should be a lot simpler now. |
7473f4e
to
a7a056c
Compare
There seems to be some errors in the cicd.
|
fde9a2e
to
b579f3e
Compare
b579f3e
to
ccc6c89
Compare
ccc6c89
to
6d2dbb3
Compare
6d2dbb3
to
05519d0
Compare
How (if at all) does this affect input functions across iOS and Android and other exports? |
@WolfgangSenff Input still happens on the main thread, this is only for threading the process callbacks. Given most users just use the Input singleton, not much should change. |
05519d0
to
60dba0e
Compare
Can you estimate the technical feasibility of a backport to Godot Engine 4.0? |
@fire I think its very unlikely because this will be a very large change when completed. |
@reduz do you think this will work with ENet by default to move it from the main thread to a separate thread? Would be really handy for something I am working on. |
* Node processing works on the concept of process groups. * A node group can be inherited, run on main thread, or a sub-thread. * Groups can be ordered. * Process priority is now present for physics. This is the first steps towards implementing godotengine/godot-proposals#6424. No threading or thread guards exist yet in most of the scene code other than Node. That will have to be added later.
eecd1f1
to
98c655e
Compare
I created a test project, tell me @reduz if it is an appropriate benchmark. Each instance (robot) lives in its own subthread, and has a script that makes it select one animation after n seconds. My computer struggles to run the project, but the multithreaded scenes perform a little better in general then when running Files
FootageFootage without Footage with What's missing from the test
Notes@Calinou I could use your benchmark tool in the test. I used the Godot 3D Robot Character made by @arianagchow |
@adamscott There is a good amount of things that need to be improved that are killing threaded performance right now in AnimationPlayer. After the PR is merged I will do a pass at fixing some things. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed with other maintainers, this is a first step towards allowing proper scene multithreading, which is a work that will span multiple months and probably take until 4.2 or 4.3 to be fully stable. Until then, the feature will be flagged as experimental to make it clear that it's a work in progress, and might change from release to release.
This first step lays the foundation and shouldn't negatively impact the engine when the scene multithreading is not used, so we agreed to merge this now to unblock other parts of the work that will be based on this, even if it's not the fully finalized form that this will take.
@reduz also started working on some guidelines for contributors for writing thread safe scene code which will be shared soon.
Thanks! |
I would appreciate if error regarding thread guard would include some identifiable information for node (name/ call line number from C# code would be perfect). Currently i have no good way to pinpoint errors working in C#. |
Can you paste the full error message you're referring to? |
This is the first steps towards implementing godotengine/godot-proposals#6424. No threading or thread guards exist yet.
Steps:
1. Ability to create thread groups in nodes.2. Ability to run groups threaded.3. Create message passing system.4. Protect the whole Node API for thread misuse in debug mode.5. Test and bechmark.A small benchmark (near a hundred characters playing run animation in 3D) has a 3x speedup with this PR (it should have more, but after running vtune to investigate, there are many memory synchronization bottlenecks in the animation code that have to be fixed that are unrelated to this PR).
How does it work?
Setting up groups
Nodes now have a Thread Group option in the process section. If a group is selected, this node and children nodes will now belong to a thread group. This means, that PROCESS and PHYSICS_PROCESS callbacks will be called in parallel for any node belonging to this thread group. Thread groups also have an order variable. Groups are executed in-order from lesser to greater and everything in the same group is executed in parallel.
By default, all nodes belong to the default group, which has process order 0, and processes in the main thread unless overriden with the
thread_group
option. This ensures backwards compatibility.Synchronizing
Because it is common to want to use the results of the processing and set it to something that is not in the current scene, new functions were added to
Node
, such as :call_deferred_thread_group
, which works likecall_deferred()
, but using the thread group.With these functions, it is possible to send messages to a process group when its processed in a thread, as well as send messages from the process group to another one (be it in the main thread, or a sub-thread).
Thread safety
To ensure thread safety when processing groups, the following changes were made:
The following error macros were added so they can be used in nodes (on the C++ side):
ERR_THREAD_GUARD
andERR_THREAD_GUARD_V
Use them at the beginning of functions to ensure that they are not being called from the wrong process group.ERR_MAIN_THREAD_GUARD
andERR_MAIN_THREAD_GUARD_V
Use them at the beginning of functions to ensure that this function can only be called from the main thread (calling them from groups will fail). Examples of functions that can only be called from the main thread areNode::add_child
orNode::remove_child
.The functions above only perform checks if nodes are inside the SceneTree (so you can still build a scene in a thread, and then use
call_deferred()
to put it in the main thread). Additionally, the checks above only take place on debug builds. They are not compiled during release builds.These functions also perform no locking or memory synchronization, they simply check the thread ID validity using TLS, so they
are very efficient.
Testing
There is a new command-line option that allows running Godot while forcing single-threaded mode in order to compare or debug:
How do you normally use this?
The idea is that, in Godot, most scenes are generally self contained units. An enemy, a bullet, an NPC, etc. If those scenes have something that does significant complex processing (such as animation playback, character physics, pathfinding, etc), then these scenes will greatly benefit from running their
process
iteration in parallel when there is a significant amount of them (dozens or hundreds).If they do something simple (like, just motion), then its likely there will be not much of a performance improvement and the performance could be affected instead. For those cases (lots of elements, simple logic), the swarm proposal is up as a proposed solution.
TODO (after merging this PR):
Protecting the Object API
Things that happen at Object level (set/get/set_meta/get_property_list/call/etc) are not protected. Those functions will need to become virtual and protections be added at Node level in the overrides.
Protecting the rest of the Node API
The Node API (all existing nodes) is quite large. Protecting all of it with the guard macros described above would ensure that the scene system is fully and completely thread-safe, but it's a lot of work to do it everywhere (and should be done as follow-up PRs). One possibility would do this only in the most common nodes for now (Control, Node2D, Node3D) and do the rest at a later time, gradually.
GDScript and other languages
GDScript should probably implement a call to ERR_THREAD_GUARD on functions inheriting from
Node
automatically (it should have no cost), which can only be disabled with some function annotation like @no_thread_guard or something.For other languages, I think it should be possible to efficiently allow this logic manually by encapsulating a function call. Of course, it would be up to the user (if they don't do anything with thread groups, they don't need to do this).
Multiplayer API is not thread safe.
The multiplayer code (rpc as example) is not thread safe, the Node API does not care about this, so we need to discuss if we want to implement safety at Node level or Multiplayer level.
Some nodes need changes
Transform notifications are not threaded
Currently, accumulated transform notifications are not threaded, the list of notifications should be moved to the thread group. This would improve parallelism a bit more.
Synchronization pressure is not great
While for the most part the scene system runs well in single threaded mode, Godot abuses in many places of things that can seriously degrade multithreaded performance. Example of this is synchronization instructions (mutexes) and atomics (used a lot in types such as Vector, String, StringName, Ref<>, etc). The later can put serious memory pressure on multiple threads if the same memory is being accessed (as an example, a lot of nodes playing and accessing the same animation). This needs to be redesigned/fixed and proper benchmarks need to be done.
Multi threaded debugging
Merging this PR and making multi-threaded development easier means that the debugger ability to debug multiple threads has to finally be fixed.
Multi threaded profiling
A multithreaded profiler is overdue in Godot , so its possible to visualize all threads working in all frames over time.
TODO (before merging this PR).
Agree on things, then relevant documentation can be filled.