Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Tornado with AnyIO #1079

Merged
merged 28 commits into from
Mar 22, 2024
Merged

Replace Tornado with AnyIO #1079

merged 28 commits into from
Mar 22, 2024

Conversation

davidbrochart
Copy link
Collaborator

@davidbrochart davidbrochart commented Jan 25, 2023

References

Supersedes #876.
Closes #656.

IPykernel should not depend on a web server. It uses Tornado as an async framework, but modern Python has better solutions, including standard asyncio. Tornado's use in ipykernel involves using a lot of queues and creates complexity that makes maintenance or improvements to the code base harder.
Structured concurrency as introduced by Trio is gaining a lot of traction. AnyIO brings structured concurrency on top of asyncio and acts as an abstraction layer which allows to choose a concurrency backend, such as asyncio or trio. Major projects have switched to AnyIO, including FastAPI.

This PR is still a work in progress. I have disabled some tests, like the ones for in-process kernels. Also, I think we should re-think event-loop integration (maybe using Trio's approach). But I would like to get feedback from the community and discuss how we can move forward with these changes.

cc @SylvainCorlay @JohanMabille @minrk @ccordoba12

Code changes

The dependency to Tornado is dropped, in favor of AnyIO. This means that instead of using ZMQStream with on_recv callbacks, we use async/await everywhere.
Using AnyIO in the main thread event loop gives us Trio support for free in the user code. It is currently disabled until we get AnyIO support in pyzmq, or decide to process shell messages in a separate thread, which might be needed for sub-shells anyway.

User-facing changes

The goal is to have no user-facing change. However, I have identified the following ones:

  • handling of standard streams (stdout, stderr) is improved as there is no delay anymore.
  • interrupting async user code now works (see Cannot interrupt top-level coroutine #881).
  • traceback when interrupting blocking user code leaks one frame into the kernel source code (might be fixable).

Backwards-incompatible changes

  • The stream traits (e.g. shell_stream) are replaced with their socket equivalent (e.g. shell_socket).
  • The API is now mostly async, including the Kernel.start method.
  • There is a new blocking Kernel.stop (thread-safe) method.

@maartenbreddels
Copy link
Contributor

+1 on AnyIO, they seem to have all the missing primitives needed for doing asyncio+threaded work, seems like a solid project.

@davidbrochart davidbrochart mentioned this pull request Jan 25, 2023
@dhirschfeld
Copy link

As a trio user I'm +:100: for using AnyIO :heart:

@agronholm
Copy link

agronholm commented Jan 25, 2023

I must point out that using blocking portals in this manner will block the original event loop, preventing any other tasks from running until the blocking call is complete. What should also be taken into account is any Protocols and tasks launched from the nested loop, which will also stop functioning once the blocking call completes.

As a former Java programmer, I recall that Java used a similar approach to solve this problem, but there the nested event loop was able to process events from the parent loop.

@davidbrochart
Copy link
Collaborator Author

Thanks for the feedback @agronholm.
Actually this PR doesn't use blocking portals. Instead, I'm running a new event loop in a separate thread (for the control thread). So there is no nested loops, just two separate (parallel) loops, each in their own thread. There shouldn't be any issue with that, right? Except if there is a global state in AnyIO's event loop, which I doubt.

@agronholm
Copy link

I'm running a new event loop in a separate thread (for the control thread). So there is no nested loops, just two separate (parallel) loops, each in their own thread. There shouldn't be any issue with that, right?

Yeah, sorry, it's a rather large and complex PR so I haven't read through it, so I just made assumptions based on your other posts. So are you creating one-off event loops for running async callables from sync functions? Or do you have a lingering, shared event loop thread running in the background? Either way, thread safety is the biggest problem. One of the main selling points of async is that you can reliably determine where you can mutate shared objects safely (without explicit locks) since all other code is suspended until the next yield point. Those guarantees go out the window when two simultaneously running event loops share memory.

Except if there is a global state in AnyIO's event loop, which I doubt.

Technically there is no such thing as an AnyIO event loop 😄

@davidbrochart
Copy link
Collaborator Author

Or do you have a lingering, shared event loop thread running in the background?

Yes, the control thread runs for the whole duration of the kernel process, so we're not creating lots of event loops.
The control channel running in its own thread was already there before this PR, and we're taking care of thread safety. We need parallelism as the control channel should work even when executing user blocking code.
At least this PR should not break things with regards to that.

@ccordoba12
Copy link
Member

Just a quick note to say that this seems significant enough to mark it for IPykernel 7.

@davidbrochart
Copy link
Collaborator Author

Yes, I agree. It also requires pyzmq v25.0 and jupyter-client v8.0, so that will be a good occasion.

@davidbrochart
Copy link
Collaborator Author

I'd like to merge this PR and update downstream projects in other PRs.
Any feedback?

@blink1073
Copy link
Contributor

I can have a look at it next week.

Great, thank you!

@davidbrochart
Copy link
Collaborator Author

Okay, I figured part of it out. All the tests pass for me locally except:

tests/test_kernel.py::test_print_to_correct_cell_from_thread FAILED                          [  7%]
tests/test_kernel.py::test_print_to_correct_cell_from_child_thread FAILED                    [ 11%]
tests/test_kernel.py::test_print_to_correct_cell_from_asyncio FAILED                         [ 14%]

And then it times out on:

tests/test_kernel.py::test_raw_input

I'm looking into it, but at first sight this code was working in my first commit:

from threading import Thread                                                                                                                        
from time import sleep                                                                                                                              

def thread_target():
    for i in range(5):
        print(i, end='', flush=True)
        sleep(0.25)

Thread(target=thread_target).start()

It successfully prints 01234 in JupyterLab, while it doesn't work with the HEAD of this PR. Test test_print_to_correct_cell_from_thread fails because of that.

@agronholm
Copy link

Is test_print_to_correct_cell_from_thread using the anyio.from_thread functions?

@davidbrochart
Copy link
Collaborator Author

Is test_print_to_correct_cell_from_thread using the anyio.from_thread functions?

No, but yes we might replace all this complicated way of communicating between threads using ZMQ pipes with AnyIO's functions.
Let's see if c1dce80 improves things first.

@davidbrochart
Copy link
Collaborator Author

Let's see if c1dce80 improves things first.

Doesn't look good 😄

@davidbrochart
Copy link
Collaborator Author

@blink1073 For me even without c1dce80 a lot of tests hang locally, for instance tests related to cell interrupt and in-process kernels. It seemed to be the case in the CI too.

@davidbrochart
Copy link
Collaborator Author

56fdee2 fixes tests/test_kernel.py::test_raw_input and probably other tests (it was in my first commit 8ff47dc but removed later).

@davidbrochart
Copy link
Collaborator Author

So currently the remaining failing tests are in tests/inprocess.

@blink1073
Copy link
Contributor

So currently the remaining failing tests are in tests/inprocess.

I'd say let's skip those for now, and then merge and iterate.

@davidbrochart
Copy link
Collaborator Author

One remaining failing test on Windows: tests/test_async.py::test_async_interrupt[asyncio].
But should we ignore it on Windows, since it's unlikely we can have it working and in current ipykernel we cannot interrupt an async cell on any platform? See this comment.

@blink1073
Copy link
Contributor

Sounds right to me

@blink1073
Copy link
Contributor

Thanks @davidbrochart!

@blink1073 blink1073 merged commit 772dfb8 into ipython:main Mar 22, 2024
24 of 31 checks passed
@blink1073 blink1073 mentioned this pull request Mar 22, 2024
6 tasks
@davidbrochart
Copy link
Collaborator Author

Thank you @blink1073 !

@rayosborn
Copy link
Contributor

Is a fix for the in-process kernel test failure planned. before the release of v7?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dropping the dependency to tornado