Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Collaborative editing using Yjs #1

Closed
wants to merge 137 commits into from
Closed

[WIP] Collaborative editing using Yjs #1

wants to merge 137 commits into from

Conversation

dmonad
Copy link

@dmonad dmonad commented Jan 20, 2021

This PR implements collaborative editing in JupyterLab using the Yjs shared editing framework. Yjs is an open-source framework to build collaborative applications using data structures (CRDTs) that sync automatically.

Try it out now: Binder

ezgif-3-9df91ebf2294

Scope

The aim of this PR is to switch to Yjs as a data model for notebooks and text files. The ability to share content between users will be provided by separate plugins that connect the Yjs data model with other peers. Currently, this PR currently also implements a basic shared-editing server that synchronizes clients that open the same file. We will outsource this shared-editing server to a separate plugin.

Status

There are a couple of known UI regressions and we probably have to do some refactoring. We are working on it. But this PR is already usable.

Known bugs:

  • It is currently not possible to rename files
  • Since outputs are shared, it is possible to display HTML script tags in the browsers of other users.

Quick start

# optional: use conda
conda create -n yjupyter
conda activate yjupyter
conda install jupyterlab

# clone our branch
git clone [email protected]:QuantStack/jupyterlab.git --branch yjupyter
cd jupyterlab

# installation steps as usual for Jupyter development
pip install -e .
jlpm
jlpm build
jupyter lab --dev-mode 

Technical details

Existing work

Currently, Jupyter Notebooks and several other components use ModelDB to model the notebooks' internal representation. It provides observable data structures that fire events when data is added or removed.

This PR replaces the IModel data model with Yjs' shared types that provide the same functionality. Yjs is meant for building collaborative applications and provides many helpful, well-tested abstractions that reduce the complexity of this codebase significantly. For example, we removed several hundred lines of code that keep ModelDB in-sync with the CodeMirror editor. Instead, we use the y-codemirror editor binding that keeps the editor in-sync with Yjs' data structures.

We are aware that this will break existing plugins that rely on the IModelDB interface. We want to make this upgrade as easy as possible and keep existing APIs (e.g. the event emitters) whenever possible.

We also restructured how the internal data is represented using the observable data structures. Before, we had a complex mixture of key-value stores and observable arrays based on ModelDB. With Yjs, we produced a nearly one-on-one mapping from a Yjs document to the .ipynb JSON format. ydoc.toJSON() is an existing method that converts a Yjs document to a JSON representation that is very similar to the .ipynb JSON format (some keys are missing). Developers that are familiar with the JSON format will easily know how to work with the Yjs data model.

We also want to make it easier for plugins to provide additional features based on Yjs as a data model. A separate plugin could provide commenting features based on annotations on the Yjs document. "Relative Positions" is another Yjs concept that makes it possible to assign information to a range of text while automatically adjusting for position changes.

Another complex problem that Yjs solves is selective undo/redo. We replaced the existing undo manager with a powerful Yjs-based alternative that allows you to selectively decide which changes you want to be able to undo. Text-modifications to the editor models and cell-insertions are tracked as "undoable", while other changes to the Yjs data model are not tracked (e.g. modifications on metadata and the computed output). The use-case for the selective undo manager is that you want to prevent users from undoing remote changes created by other users.

ezgif-2-a0ba6221845f

Currently, this PR also implements a websocket server (in Python) to sync connected clients, and a hook to connect the Yjs data model to the server. We still use http-requests to save the notebook-content to the server. Concurrent access is prevented using a locking implementation that is similar to "redlock".

We will outsource the server-implementation and the hook to a separate plugin to allow third-parties to implement their custom server. Applications that use Jupyter notebooks, like JupyterHub, will be able to add custom authorization and access control to the server.

Yjs is network agnostic and doesn't need a server to perform conflict resolution. The implemented websocket server (79 lines of code) only forwards messages to other clients and implements a little custom logic (e.g. room-management and locking). This implementation is fully functional and yields little overhead. However, we want the Yjs data model to be accessible in Python as well. Next, we will be working on a Rust port of Yjs, including Python bindings, that will allow the server to parse the shared document and perform modifications.

Next steps

  • Fix the remaining bugs
  • Outsource the server to a separate plugin and provide extension-points to provide custom shared-editing servers.
  • Discuss problems that arise when implementing shared-editing in Jupyter (e.g. do we want to share output between users? should there be a shared kernel for a shared notebook?)

@jtpio
Copy link
Member

jtpio commented Jan 20, 2021

Thanks for starting this!

Posting the link to the Binder dev mode here so it's easily accessible:

Binder

@jtpio
Copy link
Member

jtpio commented Jan 22, 2021

FYI, I added the jupyterlab-link-share extension in f916cad, so it's easier to share the link to a running Binder instance.

@dmonad dmonad closed this Feb 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.