Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix "event loop is already running" bug on Linux #450

Merged
merged 2 commits into from
Oct 30, 2023

Conversation

dlqqq
Copy link
Collaborator

@dlqqq dlqqq commented Oct 30, 2023

Description

This PR fixes the dreaded "event loop is already running" bug. I've determined the root cause of this bug and will explain why these changes fix it.

Context

First, I would like to describe the context:

  • The executor is started as a separate process via the class multiprocessing.Process.
  • The executor calls synchronous methods on the class nbconvert.preprocessors:ExecutePreprocessor, which are actually implemented by wrapping the async methods in a function named run_sync(), provided by jupyter_core.utils
    • run_sync(), as you might expect, takes a coroutine and returns a synchronous function that "does the same thing". For more specifics, it's best to refer to the code directly.

First cause

When nbconvert calls run_sync(), it sometimes throws an exception with the message "The event loop is already running".

Root cause

This means that on Linux, asyncio.get_event_loop() in a new process actually returns the event loop of the server process, which of course is always running (since it's running the server). This is why the exception is thrown. 😁

The fix

To fix this, we need to indicate to multiprocessing (MP) to only start processes via spawn instead of fork. Thankfully this can be done via MP contexts, as shown in this PR.

I also see that we are running the Downloader in a new process via MP. However, I did not implement the same changes there for 3 reasons:

  1. The default Downloader (provided by the default JobFilesManager) works on Linux without any further changes.
  2. Since the Downloader's interface only has synchronous methods, an implementer would have to go out of their way to call asyncio.get_event_loop() in the implementation for the same bug to appear.
  3. Using spawn instead of fork does have a significant performance drawback, so unless there is reason to otherwise, we should stick with MP's default behavior.

Next steps

I filed a new issue to add unit test coverage: #453

@dlqqq dlqqq added the bug Something isn't working label Oct 30, 2023
@dlqqq dlqqq requested a review from 3coins October 30, 2023 21:25
@dlqqq
Copy link
Collaborator Author

dlqqq commented Oct 30, 2023

The E2E snapshot failure does not appear related. New jobs are showing up as "Stopped" instead of "In progress" on first render. I am able to reproduce this on my Linux machine.

Since this is a new and legitimate issue, it will be best to address this in a separate PR. I've filed an issue to track this: #451

@dlqqq
Copy link
Collaborator Author

dlqqq commented Oct 30, 2023

please update playwright snapshots

@dlqqq
Copy link
Collaborator Author

dlqqq commented Oct 30, 2023

Great, now the snapshot updating workflow doesn't work because of this: actions/runner-images#8531

🤦 I'll update the branch and try again.

@dlqqq
Copy link
Collaborator Author

dlqqq commented Oct 30, 2023

please update playwright snapshots

@dlqqq
Copy link
Collaborator Author

dlqqq commented Oct 30, 2023

Oh right, workflows always are run as defined on main rather than as defined on the current branch. So I have to do this manually from my fork. 🤦

@dlqqq
Copy link
Collaborator Author

dlqqq commented Oct 30, 2023

Great, now my fork isn't letting me create a new PR to allow me to run the update snapshots workflow manually because this one already exists. I must have rolled a 1d20 this morning on my luck stat. 🤦

OK, fine, I'll update the workflow in a separate PR and remove the commit from this branch.

Copy link
Collaborator

@3coins 3coins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job on fixing this 🚀 .

@dlqqq
Copy link
Collaborator Author

dlqqq commented Oct 30, 2023

Kicking CI. Not sure why the workflows aren't running after the snapshot update.

@dlqqq dlqqq closed this Oct 30, 2023
@dlqqq dlqqq reopened this Oct 30, 2023
@dlqqq dlqqq added enhancement New feature or request and removed enhancement New feature or request labels Oct 30, 2023
@dlqqq dlqqq merged commit 867d2ea into jupyter-server:main Oct 30, 2023
7 of 8 checks passed
@dlqqq dlqqq deleted the fix-event-loop-bug branch October 30, 2023 22:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RuntimeError: "This event loop is already running" with nbclient==0.8.0
2 participants