Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented __del__ to clean up hanging coroutines #263

Merged
merged 12 commits into from
Aug 6, 2024
Merged

Conversation

WolfLink
Copy link
Collaborator

This is intended to be a simple fix for #262

Currently, when RuntimeTasks are cancelled, they sometimes end up garbage collected with coroutines that are never awaited. When those coroutines get garbage collected, RuntimeWarnings appear.

I've implemented __del__ for RuntimeTask which simply close()s the coroutine, which lets Python know its ok that the coroutine was never used.

@WolfLink WolfLink linked an issue Jul 17, 2024 that may be closed by this pull request
@edyounis
Copy link
Member

Thanks for doing this! Good catch on this. It seems like pre-commit is unhappy about something that Alon fixed and was recently merged into main. Can you merge main into this and make sure the pre-commit check passes. Let me know if you need help or want to chat. Thanks again for doing this!

@edyounis
Copy link
Member

edyounis commented Jul 18, 2024

Also, do you have any references for the close method on coroutine? I only found the generator.close() documentation.

Going off the generator documentation, we may need to catch and throw some exceptions. Catch from within the del method and throw from within the task execution, maybe.

Edit: Found it: coroutine.close()

Looking at the source code for the close method, I think it might be best to go with your other suggestion in the issue report. This is because both the generator.close() and coroutine.close() can raise warnings and errors. This is not good to have in a __del__() method. We can explicitly close the coroutines when they are cancelled by creating a RuntimeTask.close() method and installing that in Worker._handle_cancel(). I am not totally convinced here on what is the right way to go, so if anyone else has thoughts, please share. We can also chat about this in more detail the next time we talk.

@WolfLink
Copy link
Collaborator Author

WolfLink commented Jul 18, 2024

I don't think its totally unreasonable to deal with exceptions in __del__. This documentation claims that "exceptions that occur during their execution are ignored, and a warning is printed to sys.stderr instead". This StackOverflow claims that only applies to uncaught exceptions, and its possible to try/except as usual.'

Reading more about how coroutine.cancel() works, it looks like the way it tries to stop the coroutine is raising an exception inside the coroutine, and if that fails to cause the coroutine to exit (if the exception is caught), then cancel() will raise an exception.

This might happen if the coroutine gets cancelled while its in a try block followed by a blanket except, which is not an uncommon pattern, so this concern is more than just theoretical.

Ironically, one solution might be to wrap the call to cancel() in a try/except block.

@WolfLink
Copy link
Collaborator Author

WolfLink commented Jul 18, 2024

Honestly I think we should not try to catch the exceptions that might come from cancel, and if anyone encounters it, the recommendation is to avoid blanket exceptstatements, or at least to think carefully about when they do (and to consider either re-raising the error or using a finally block instead of except).

The remaining question then is if the behavior of these errors becoming warnings because of being called in __del__ or if they should get raised as proper errors when the task is cancelled.

@WolfLink
Copy link
Collaborator Author

I changed it to do explicit cancelling instead of using __del__.

Copy link
Member

@edyounis edyounis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this change. I do think this is a better method. Now we know if we see that warning that something is truly going wrong.

Edit: Do you still think we should put a try-except around close()?

bqskit/runtime/task.py Show resolved Hide resolved
bqskit/runtime/worker.py Outdated Show resolved Hide resolved
@WolfLink
Copy link
Collaborator Author

WolfLink commented Jul 19, 2024

Edit: Do you still think we should put a try-except around close()?

I think we shouldn't because an error indicates the coroutine didn't exit properly which is a problem the user should know about and suggests something should be fixed either in BQSKit or the user's code.

I could try to be fancy and catch any errors and after closing all the tasks, I could re-raise the first error it encountered, or something along those lines.

@WolfLink
Copy link
Collaborator Author

I guess the ‘if coro is None` check needs to be there after all

@@ -338,6 +347,10 @@ def _handle_cancel(self, addr: RuntimeAddress) -> None:
if not t.is_descendant_of(addr)
]

# if there was an error earlier, raise it now
if error is not None:
raise error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you delay the raise? Shouldn't the system crash immediately, or is this to test something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to finish cancelling tasks as much as possible before letting the system crash. Although I suppose if its gonna crash, the system isn't going to be in a great state anyway. Should I just let it crash immediately?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let it crash immediately.

@WolfLink
Copy link
Collaborator Author

WolfLink commented Aug 5, 2024

@edyounis This one should be good to merge now. It includes all the tweaks we talked about.

@edyounis edyounis merged commit 064d58e into main Aug 6, 2024
17 checks passed
@edyounis edyounis deleted the coroutine-fix branch August 6, 2024 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RuntimeWarnings from get_runtime().cancel(future)
2 participants