Add a safety mechanism to BackgroundService
to notify about crashed tasks
#9
Labels
part:asyncio
Affects the asyncio module
scope:breaking-change
Breaking change, users will need to update their code
type:enhancement
New feature or enhancement visitble to users
Milestone
What happened?
When there is an unhandled exception inside a task spawn by a
BackgroundService
task, the error is silently swallowed, leading to obscure, hard-to-debug bugs.What did you expect instead?
Unhandled exceptions are at least logged or the whole Python process crashes.
Extra information
I discovered this while working at frequenz-floss/frequenz-sdk-python#806. I added a sanity check which failed inside a task that was sending messages to a channel, so no messages was sent, and other task waiting for messages just got stuck, leaving no clues about where the problem might be. This makes the problem really hard to debug.
A way to cope with it in the
BackgroundService
is to extend it to provide acreate_task()
method that automatically adds the task to the task list and then also adds a done callback where we can either log the unhandled exception, or just raise aSystemExit
exception to exit the program. We could even give the user the option to decide how to handle unhandled exceptions by either passing a callback or letting them override the default callback as a method of the instance. This callback should also remove the task from the tasks list, something users need to do manually at the moment.Related issues
The solution to this issue needs to have in mind the following related issues:
BackgroundService
interface to handling internal tasks #8run_forever
implementations with a generic one in_internal
frequenz-sdk-python#906Actor
interface to handling internal tasks frequenz-sdk-python#819The text was updated successfully, but these errors were encountered: