Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WebsocketServer exception handler #31

Merged
merged 3 commits into from
Apr 26, 2024

Conversation

davidbrochart
Copy link
Collaborator

@davidbrochart davidbrochart commented Apr 25, 2024

The ability to auto-restart is only supported when using the start/stop API for now, not the context manager.
It does so by catching exceptions and re-creating a new task group.

# wait forever
self._task_group.start_soon(Event().wait)
break
except Exception as e:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the original task group fail, all its tasks are cancelled, including all the existing rooms's broadcast_update tasks, do we need to find a way to recover them?
Is it possible to have one task group per room so one room throws exception in broadcast_update task won't also tear down other room's broadcast_update task?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also add auto_restart for yroom's task group as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should explore recovering each individual room too, but maybe let's leave this for a follow-up PR?

At the very minimum we need recovery at the top layer.

@Zsailer
Copy link
Member

Zsailer commented Apr 25, 2024

This looks good to me @davidbrochart.

@davidbrochart davidbrochart changed the title Add WebsocketServer auto_restart parameter Add WebsocketServer exception handler Apr 26, 2024
@davidbrochart
Copy link
Collaborator Author

I removed auto_restart and instead added exception_handler, which is a callback that is called when an exception is raised. The exception handler returns True if the exception was handled, and in this case the exception is not raised, otherwise it is raised. An exception_logger is available, that just logs the exception and discards it. This is the one we want to use if we absolutely don't want the WebSocket server to crash.

There are two places where exceptions can be handled:

  • at the top-level where the server is waiting for connections,
  • when serving a WebSocket.

The WebSocket server task group is re-created upon successful exception handling, so that it always stays alive.
Each WebSocket connection has its own task group, so that a successful exception handling at this level doesn't affect the top-level task group.

@Zsailer
Copy link
Member

Zsailer commented Apr 26, 2024

I like this a lot, @davidbrochart. Great work!

@Zsailer
Copy link
Member

Zsailer commented Apr 26, 2024

this looks good, @davidbrochart. I propose we merge here and work on getting #30 to follow this pattern. If we notice any changes needed here in that PR, we can fix there.

I'll work with @jzhang20133 to get these two aligned.

@davidbrochart
Copy link
Collaborator Author

Sounds good.

@Zsailer Zsailer merged commit fd7c0b5 into jupyter-server:main Apr 26, 2024
6 checks passed
@davidbrochart davidbrochart deleted the auto-restart branch April 26, 2024 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants