-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Most tasks swallow exceptions #826
Comments
The channel registry is using `Any` as the message type, which is not type safe, as it completely *disables* type checking for the messages. This commit makes the channel registry type-aware, so channels are stored with their message type and the registry checks that the same channel is not used for different message types. This also makes the registry just a plain container for channels, the wrapper methods to create new senders and receivers, and to configure the `resend_latest` flag are removed. The `ReceiverFetcher` abstraction is also not needed if we just return the channel directly, as the channel itself is a `ReceiverFetcher`. Also the method to close and remove a channel is made public and the name more explicit, as it is used in normal code paths. The new registry only provide 2 main methods: * `get_or_create()`: Get or create a channel for the given key, doing the type checking to make sure the requested message type matches the existing channel message type if it already exists. * `close_and_remove()`: Close and remove the channel for the given key. This change uncovered 5 issues: * `MockMigrogrid/Resampler: Fail on unhandled exceptions` (in this PR, but more generally #826) * `Fix checks in tests` (in this PR) * `Fix missing conversion to `QuantityT` (workaroud in this PR, but should be better solved by #821) * #807 * #823 Fixes #806.
Maybe we can add a SDK specific |
It is not trivial to override the default handler, because it will store the exception in the task object, and maybe users will inspect this task object in the future and report the exception properly in the future. Because of this, we might want to use a custom exception handler that only prints a debug message about unhandled exceptions, otherwise we might spam the logs with spurious messages about tasks stopping due to unhandled exceptions when exception are really properly handled, just not inside the task function itself, adding more noise to the logs and making things even harder to debug. According to ChatGPT:
So if this is correct and unless we are keeping dead task object around, we should actually get a log error about swallowed exceptions. We got unreported swallowed exceptions in tests, but maybe it is a pytest issue that they are not reported (or were kept alive in tests), as pytest replaces the event loop with a testing one AFAIK. |
What happened?
When there is an unhandled exception inside a task spawn by the SDK, the error is silently swallowed, leading to obscure, hard-to-debug bugs.
What did you expect instead?
Unhandled exceptions are at least logged or the whole Python process crashes.
Affected part(s)
Core components (data structures, etc.) (part:core)
Extra information
I discovered this while working at #806. I added a sanity check which failed inside a task that was sending messages to a channel, so no messages was sent, and other task waiting for messages just got stuck, leaving no clues about where the problem might be. This makes the problem really hard to debug.
This issue is actually a problem across all areas of the SDK and projects in general.
A way to cope with it in the SDK would be to extend
BackgroundService
to provide acreate_task()
method that automatically adds the task to the task list and then also adds a done callback where we can either log the unhandled exception, or just raise aSystemExit
exception to exit the program. We could even give the user the option to decide how to handle unhandled exceptions by either passing a callback or letting them override the default callback as a method of the instance. This callback should also remove the task from the tasks list, something users need to do manually at the moment.Related issues
The solution to this issue needs to have in mind the following related issues:
BackgroundService
to notify about crashed tasks frequenz-core-python#9run_forever
implementations with a generic one in_internal
#906Actor
interface to handling internal tasks #819The text was updated successfully, but these errors were encountered: