Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a safety mechanism to BackgroundService to notify about crashed tasks #9

Closed
llucax opened this issue Jun 14, 2024 · 1 comment
Closed
Assignees
Labels
part:asyncio Affects the asyncio module scope:breaking-change Breaking change, users will need to update their code type:enhancement New feature or enhancement visitble to users
Milestone

Comments

@llucax
Copy link
Contributor

llucax commented Jun 14, 2024

What happened?

When there is an unhandled exception inside a task spawn by a BackgroundService task, the error is silently swallowed, leading to obscure, hard-to-debug bugs.

What did you expect instead?

Unhandled exceptions are at least logged or the whole Python process crashes.

Extra information

I discovered this while working at frequenz-floss/frequenz-sdk-python#806. I added a sanity check which failed inside a task that was sending messages to a channel, so no messages was sent, and other task waiting for messages just got stuck, leaving no clues about where the problem might be. This makes the problem really hard to debug.

A way to cope with it in the BackgroundService is to extend it to provide a create_task() method that automatically adds the task to the task list and then also adds a done callback where we can either log the unhandled exception, or just raise a SystemExit exception to exit the program. We could even give the user the option to decide how to handle unhandled exceptions by either passing a callback or letting them override the default callback as a method of the instance. This callback should also remove the task from the tasks list, something users need to do manually at the moment.

Related issues

The solution to this issue needs to have in mind the following related issues:

@llucax llucax added scope:breaking-change Breaking change, users will need to update their code type:enhancement New feature or enhancement visitble to users part:asyncio Affects the asyncio module labels Jun 14, 2024
@llucax llucax added this to the v1.0.0 milestone Jun 14, 2024
@llucax llucax self-assigned this Jun 14, 2024
github-merge-queue bot pushed a commit that referenced this issue Jul 30, 2024
The name `BackgroundService` is a bit too verbose now in the context of
an asyncio library. Services are usually already understood as running
in the background, so the `Background` prefix is redundant and makes the
class name too long.

The `Service` class is made an abstract base class intended for end
users that just want to use services, and a new `ServiceBase` class is
added for developers that want to implement services. This way, the
documentation can be separated between the two use cases and the
`Service` class can be used as a generic type for services. We also
avoid leaking implementation details to services users.

The `ServiceBase` class also have a new method `create_task` that is
used to create tasks in the service, and a new `TaskCreator` protocol is
added to allow overriding how tasks are created (for example, using a
custom `loop` or a `TaskGroup`). `create_task` automatically adds the
task to the internal set, so hopefully users won't forget so easily to
register tasks.

The `create_task` method also have a new `log_exceptions` parameter that
allows to log exceptions raised by tasks. This is useful because
services tasks are supposed to run forever, and there are no many
opportunities to await for their completion appropriately and handle the
failure. With this at least we make sure we will get a log message when
a task raises an exception, so we can at least know what happened and
they just silently disappear.

Fixes #8, improves #9.
@llucax
Copy link
Contributor Author

llucax commented Aug 9, 2024

I will close this, now we have PersistentTaskGroup that will log exceptions and offer users an easy way to check for failed tasks.

@llucax llucax closed this as completed Aug 9, 2024
github-merge-queue bot pushed a commit that referenced this issue Aug 9, 2024
`asyncio.TaskGroup` is a very convenient construct when using
parallelization for doing calculations for example, where the results
for all the tasks need to be merged together to produce a final result.
In this case if one of the tasks fails, it makes sense to cancel the
others and abort as soon as possible, as any further calculations would
be thrown away.

This PR introduces a new `PersistentTaskGroup` class, intended to help
managing a group of tasks that should persist even if other tasks in the
group fail, usually by either only discarding the failed task or by
restarting it somehow.

The `ServiceBase` class is updated to use a `PersistentTaskGroup`
underneath, and it is simplified for single-task services by making the
service driven by a single `main` task, which can use the new group to
monitor sub-tasks and act accordingly.

This is part of #27 and #9.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
part:asyncio Affects the asyncio module scope:breaking-change Breaking change, users will need to update their code type:enhancement New feature or enhancement visitble to users
Projects
None yet
Development

No branches or pull requests

1 participant