Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix yield bug #2

Merged
merged 4 commits into from
Mar 8, 2022
Merged

Fix yield bug #2

merged 4 commits into from
Mar 8, 2022

Conversation

tavurth
Copy link
Contributor

@tavurth tavurth commented Mar 5, 2022

closes #1

  • Fixes crash when thread uses yield
  • Fix crash when thread does not use yield 😅

@gilzoide
Copy link
Owner

gilzoide commented Mar 5, 2022

That's an interesting approach, but running the test scene, the tasks that don't even use yield in them run but are never marked as finished. This means that their finished signals are never emitted and callbacks passed to then/then_deferred never run.
Also, the threads seem to freeze, since I only get 8 logs about tasks starting (my worker pool is configured with 8 threads), but nothing else after.

@tavurth
Copy link
Contributor Author

tavurth commented Mar 5, 2022

Hmm interesting, Ok I'll take another look at it!

Fix sometimes yield bug

Show error on invalid state

Fix bug where task never finishes
@tavurth
Copy link
Contributor Author

tavurth commented Mar 5, 2022

Thanks for finding the bug!

It should work now collect_result was always yielding, even when it didn't need to.

Checking about the freeze issue now.

@gilzoide
Copy link
Owner

gilzoide commented Mar 5, 2022

Hmm, it doesn't seem to be working yet. Do you get the same behavior as I do when running the sample scene?
Interesting enough, running in the editor didn't show any errors/warnings, but running the game directly without the editor gave me these warnings:

Screenshot_20220305_084653

Not related, but why are you force-pushing commits into your main branch? Adding new commits is better for reviewing only the latest changes, to see the history of incremental fixes one by one.

@gilzoide
Copy link
Owner

gilzoide commented Mar 5, 2022

Checking about the freeze issue now.

Ooooh, I didn't pay attention enough to this part, I'm really sorry =X

@tavurth
Copy link
Contributor Author

tavurth commented Mar 5, 2022

I'm using this updated test scene:

dispatch-queue-test.zip

The functions now seem to complete properly and without crashing, but there is a weird bug with the GDScriptFunctionState, I think it's internal.

Screenshot 2022-03-05 at 15 05 29

As we can see from here I am actually checking that it's a valid function state, but the call to resume immediately after prints an error.

I think it's more a Godot side of things, as the tasks are now working correctly without crashing and returning the right values 🤔 Maybe time to submit a bug report there.

Not related, but why are you force-pushing commits into your main branch?

I got too used to Godot github style 😅

@gilzoide
Copy link
Owner

gilzoide commented Mar 5, 2022

Hmm, I think I know what's going on with this error.

On resume_if_valid, the GDScriptFunctionState is being resumed manually right away, without actually waiting for VisualServer to emit it's "frame_post_draw" signal. The tasks finish, the object dies and the GDScriptFunctionState probably gets cleaned up.
When the signal is emitted and Godot tries to resume the GDScriptFunctionState, it is now in a state where !function is true, so hence the error message.

I think we shouldn't resume coroutines manually, I don't see a use case of yielding without waiting for a signal on tasks running in background threads.
Instead, we should probably wait until the coroutine is completed to mark the Task as finished. I've tried changing Task.execute to just yield until completed:

	func execute() -> void:
		var result = object.callv(method, args)
		if result is GDScriptFunctionState:
			result = yield(result, "completed")
		emit_signal("finished", result)
		if group:
			group.mark_task_finished(self, result)

This works almost everytime, but is giving a double free or memory corruption crash occasionally 🤔
Maybe signals don't go too well in multithreaded environments, I don't know, needs more testing.

I got too used to Godot github style 😅

Oh, got it! I've never contributed to Godot, so I don't know how it goes. For this case, I prefer separate commits to see each change on top of the other. We can always see the diff between several commits easily as well. But feel free to work as you prefer ^^

@tavurth
Copy link
Contributor Author

tavurth commented Mar 5, 2022

Hmm that makes sense yea, I was trying that but I think then the execute will mark itself as a function state, and it could be possible to move to the shutdown flow:

while true:
pool.semaphore.wait()
if pool.should_shutdown:
break
pool.mutex.lock()
var task = _pop_task()
pool.mutex.unlock()
if task:
task.execute()

If I push that change perhaps we should also yield the execute in the two callers?

Perhaps in the execute while.. resume loop we can just check to see if the thread should still be executing instead?

@tavurth
Copy link
Contributor Author

tavurth commented Mar 5, 2022

After some testing, putting a yield to wait for completion in the places I marked above causes crashes on my machine.

yield as you suggested causes no hangup without errors 👍 Its much simpler too!

@gilzoide
Copy link
Owner

gilzoide commented Mar 6, 2022

If I push that change perhaps we should also yield the execute in the two callers?

The call to Task.execute there may return a GDScriptFunctionState, yes, but the running thread doesn't have to wait for it or anything. The Task itself is referenced in the GDScriptFunctionState, which is referenced by the connection to the signal it is waiting, so they won't be destroyed right away as far as I know. When the signal is emitted, the task resumes. It won't run on the dispatch queue threads, but rather in the same thread where the signal was emitted, which is fine.

yield as you suggested causes no hangup without errors

Well, that's the thing with multithreading: it is quite unpredictable. You didn't get errors, but it might still crash.
The tests I'm doing here is in the sample scene, which has these buttons for running lots of tasks at once. I added a yield(get_tree(), "idle_frame") to the _double task and occasionally it crashes with a double free or corruption or SEGFAULT errors.

I think that the signal is being emitted at the same time that the task yields awaiting for the coroutine to complete, but in another thread. The same GDScriptFunctionState object is being edited by both threads (one is disconnecting the signal slot from yield(get_tree(), "idle_frame"), since yield connects to signals with the CONNECT_ONESHOT flag, and the other is connecting a signal slot from yield(result, "completed")), which causes the memory corruption.

So, that's a tough one. I don't know if there's any way to work around this concurrency issue without going to Godot's code and adding some synchronization code, which might not be worth it.
One thing that can be done is letting this yield there and warn people that crashes might happen.
Another one is do nothing and let people handle the GDScriptFunctionState result after Tasks' finished signal is emitted.

To be honest, I don't think using yield for signals such as "wait for an idle frame" or "wait until frame is drawn" are suitable to background threads, just use the main thread for such synchronizations. Can you tell me your use case for yielding inside tasks running in the background?

@tavurth
Copy link
Contributor Author

tavurth commented Mar 6, 2022

Can you tell me your use case for yielding inside tasks running in the background?

I like to run code in the editor and also in game. In the game I will likely be using a yield for a small pause between loading cycles, without having to move everything to a process buffer.

In the editor I don't use yield, as I would rather my development be really fast, with a few small delays.

In this case we have a maybe yield state, which means I should use yield-idle-frame at the start of the function for consistency.

Having the ability to spawn a thread to do the same task is really useful, because I can choose to do that when tasks are not urgent.

An example would be saving an image to the hard drive.

In game I want that to run in a thread, but when the game is exiting I want that to be as snappy as possible.

Using process sync in this case means I have to duplicate a lot of logic, more bugs, less extensible.


As I can see, right now this PR supports yield in a way that the previous version did not (threadlock and stackoverflow issues).

I realise you might not want to start pushing yield into this repo and I agree with that, but stability wins in my book.

@gilzoide
Copy link
Owner

gilzoide commented Mar 6, 2022

In the game I will likely be using a yield for a small pause between loading cycles, without having to move everything to a process buffer.

In the editor I don't use yield, as I would rather my development be really fast, with a few small delays.

So, if I understand correctly, you'd like to yield on the game, but not when running code in the editor.
In your game, are you running the task in a dispatch queue/background thread as well? If so, why make small pauses? Since it's a background thread, there's no need to worry about blocking the main thread and freezing the game.
Another thing is that if these signals are emitted in the main thread, the coroutine will continue in the main thread, so there will be an overhead of using background threads, but the task won't really run in it after the yield.

As a sidenote, for a "yield until idle frame", you can use call_deferred to dispatch the task/method in the next idle frame instead of adding a yield in the start of the method.

In game I want that to run in a thread, but when the game is exiting I want that to be as snappy as possible.

Using process sync in this case means I have to duplicate a lot of logic, more bugs, less extensible.

I don't think I follow. What would this "process sync" be exactly?

I realise you might not want to start pushing yield into this repo and I agree with that, but stability wins in my book.

Supporting yield with a simple solution like that one is nice. My only concern is that occasional crash that happens when waiting for a signal that could be emitted at the same time that the yield is happening. In my tests I also see occasionally tasks never finishing, probably also related to race conditions in the signal slots.

Said that, I still have my doubts if we should yield until the coroutines complete or just let the user deal with the GDScriptFunctionState in the tasks' finished signal.
I know, the race condition is still there and the same problems can occur, but users have more flexibility to handle GDScriptFunctionStates in this case (maybe it yields without waiting for any signal?) and we avoid losing a Task completion, which could make a DispatchQueue never send its all_tasks_finished signal.

@tavurth
Copy link
Contributor Author

tavurth commented Mar 6, 2022

So, if I understand correctly, you'd like to yield on the game, but not when running code in the editor.

I'd like to support as many crash-free ideas as possible without restriction on users of this lib

I don't think I follow. What would this "process sync" be exactly?

Syncing updates on heavy main_thread dependant tasks to run in _process or _physics_process

In my tests I also see occasionally tasks never finishing, probably also related to race conditions in the signal slots.

Thankfully less than in the original example posted, which was actually a crash 3/5 times for me, in the new version I have not seen a crash yet even with fairly extensive testing.

Said that, I still have my doubts if we should yield until the coroutines complete or just let the user deal with the GDScriptFunctionState in the tasks' finished signal.

Unfortunately it does not seem to mitigate the original crash bug.

and we avoid losing a Task completion, which could make a DispatchQueue never send its all_tasks_finished signal.

Unfortunately I don't see a better solution to this issue as yield in GDscript can have rather strange effects (godotengine/godot#24311 (comment))


As far as I can see we've come quite a way to reducing crashes when using threaded yield. Passing the GDScriptFunctionState back to the user creates far more crashes as can be seen in the original example 😞

At this point I don't really know how to improve further on this, I haven't had the time to step through all of the code in the repository.

If you know of a better solution it would be great, but I will be using the code above in my projects from now forward until a better more crash free solution is found. (Probably when I move to Godot 4.x)

@gilzoide
Copy link
Owner

gilzoide commented Mar 8, 2022

Well, in your test project, you could have handled the GDScriptFunctionState after dispatching the tasks, for example like:

    prints("RUNNING TASK 1")
    for child in self.get_children():
        var function_state = yield(Threads.dispatch(child, "task"), "finished")
        yield(function_state, "completed")

and the results are the same as having the yield inside the task runner. This is what I mean by "let the user deal with the GDScriptFunctionState in the tasks' finished signal".

Again, in this case that the yield is right in the start of the task method and signals are all sent in the same thread, you lose the benefits of using the dispatch queue, since the continuation of the tasks will not run on the DispatchQueue's threads.

Syncing updates on heavy main_thread dependant tasks to run in _process or _physics_process

Well, if you depend on the main thread, maybe you shouldn't be using threaded dispatch queues? At least make the main thread sync with the background task, not the other way around.
If you want, you can configure your DispatchQueue with 0 threads, so that the tasks actually run in the main thread as per call_deferred, one task per idle frame.


But I agree with you that we're better off with the yield inside the task runner than without it. It makes more sense to mark a task as finished after the coroutine completes rather than when it yields.

At this point I don't really know how to improve further on this

I guess we can't zero out the crashes, due to the nature of using non thread-safe features in a multithreaded environment. And that's fine, we just have to warn the users about the risks. I'll add a note in the README about how we yield if the method returns a GDScriptFunctionState and the caveats of doing so.

Thank you very much for pointing out this problem and for contributing to fixing it, I hadn't thought of using yield inside background tasks.
If you have any more ideas about it, or need any help with dispatch queues, just give me a shout! =D

@gilzoide gilzoide merged commit 40fd75a into gilzoide:main Mar 8, 2022
@tavurth
Copy link
Contributor Author

tavurth commented Mar 8, 2022

Thank you! 🎉

I'm using this in production and so far it's crash free after the above changes.

When I start seeing crashes again I will look deeper into this 👍

@tavurth
Copy link
Contributor Author

tavurth commented Mar 20, 2022

Just to note in that I'm still not seeing any crashes with this.

Working really well for rendering in subviewports:

func _render()
	self.set_update_mode(Viewport.UPDATE_ALWAYS)
	yield(VisualServer, "frame_post_draw")
	self.set_update_mode(Viewport.UPDATE_DISABLED)

	var to_return = self.get_texture().get_data()
	to_return.lock()
        return to_return

Other file

func _ready():
        Threads.dispatch(some_node, "render")

@gilzoide
Copy link
Owner

Just to note in that I'm still not seeing any crashes with this.

That's awesome!
Maybe my tests were just too trivial and there was no space for scripts to run more before SceneTree issued an idle_frame signal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Destruction of object while method is in yield state
2 participants