Skip to content

Commit

Permalink
Merge pull request #72 from krassowski/lsp
Browse files Browse the repository at this point in the history
Language server protocol (LSP)
  • Loading branch information
Steven Silvester authored Sep 23, 2021
2 parents 1161dd3 + 9d0ff51 commit 39a8a00
Showing 1 changed file with 208 additions and 0 deletions.
208 changes: 208 additions & 0 deletions 72-language-server-protocol/language-server-protocol.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,208 @@
---
title: Jupyter integration with the Language Server Protocol
authors: Nicholas Bollweg (@bollwyvl), Jeremy Tuloup (@jtpio), Michał Krassowski (@krassowski)
issue-number: 67
pr-number: 72
date-started: 2021-06-27
---

# Summary

[jupyter(lab)-lsp](https://github.com/krassowski/jupyterlab-lsp) is a project bringing integration
of language-specific IDE features (such as diagnostics, linting, autocompletion, refactoring) to the
Jupyter ecosystem by leveraging the established
[Language Server Protocol](https://microsoft.github.io/language-server-protocol/) (LSP), with a good
overview on the [community knowledge site](https://langserver.org). We would like to propose its
incorporation as an official sub-project of Project Jupyter. We feel this would benefit Jupyter
users through better discoverability of advanced interactive computing features, supported by the
(LSP), but otherwise missing in a user's Jupyter experience. While our repository currently features
a working implementation, the proposal is not tied to it (beyond a proposal for migration of the
repository to a Jupyter-managed GitHub organization) but rather aimed to guide the process of
formalizing and evolving the way of integrating Jupyter with LSP in general.

# Motivation

A common criticism of the Jupyter environment (regardless of the front-end editor) and of the
official Jupyter frontends (in light of recent, experimental support of feature-rich notebook
edition under development by some of the major IDE developers) is the lack of advanced code
assistance tooling. The proper tooling can improve code quality, validity of computation and
increase development speed and we therefore believe that it is a key ingredient of a good
computational notebooks environment, which from the beginning aimed at improving the workflow of
users.

Providing support for advanced coding assistance for each language separately is a daunting task,
challenging not only for volunteer-driven projects, but also for large companies. Microsoft
recognized the problem creating the Language Server Protocol with reference implementation in
VSCode(TM).

Many language servers are community supported and available for free (see the community-maintained
list of [language servers](https://langserver.org/)).

# Guide-level explanation

Much like
[Jupyter Kernel Messaging](https://jupyter-client.readthedocs.io/en/stable/messaging.html), LSP
provides a language-agnostic, JSON-compatible description for multiple clients to integrate with any
number of language implementations. Unlike Kernel Messaging, the focus is on precise definition of
the many facets of static analysis and code transformation, with nearly four times the number of
messages of the Jupyter specification. We will discuss the opportunities and challenges of this
complexity for users and maintainers of Jupyter Clients, Kernels, and related tools.

The key component of the repository,
[@krassowski/jupyterlab-lsp](https://www.npmjs.com/package/@krassowski/jupyterlab-lsp), offers
Jupyter users an expanded subset of features described by the LSP as an extension to JupyterLab.
These features include refinements of existing Jupyter interactive computing features, such as
completion and introspection, as well as new Jupyter features such as linting, reference linking,
and symbol renaming. It is supported by [jupyter-lsp](https://pypi.org/project/jupyter-lsp/), a
Language Server- and Jupyter Client-agnostic extension of the Jupyter Notebook Server (for the `0.x`
line) and Jupyter Server (for the `1.x`). We will discuss the architecture and engineering process
of maintaining these components at greater length, leveraging a good deal of the user and developer
[documentation](https://jupyterlab-lsp.readthedocs.io/en/latest/?badge=latest).

# Reference-level explanation

The current implementation of the LSP integration is a barely a proof of concept. We believe that a
different implementation should be developed to take the more comprehensive use cases and diversity
of the Jupyter ecosystem into account; we created detailed proposals for improvement and refactoring
of our code as explained later.

## Dealing with Jupyter notebooks complexity

The following features need to be considered in the design:

The interactive, data-driven computing paradigm provides additional convenience features on top of
existing languages:

- cell and line magics
- tranclusions: "foreign" code in the document, often implemented as magics which uses a different
language or scope than the rest of the document (e.g. `%%html` magic in IPython)
- polyglot notebooks using cell metadata to define language
- the concept of cells, including cell outputs and cell metadata (e.g. enabling LSP extensions to
warn users about unused empty cells, out of order execution markers, etc., as briefly discussed in
[#467](https://github.com/krassowski/jupyterlab-lsp/issues/467))

## Current implementation

Currently:

- the notebook cells are concatenated into a single temporary ("virtual") document on the frontend,
which is then sent to the backend,
- the navigation between coordinate system is performed by the frontend and is based solely on the
total number of lines after concatenation
- as a workaround for some language servers requiring actual presence of the file on the filesystem
(against the LSP spec, but common in some less advanced servers), our backend Jupyter server
extension creates a temporary file on the file system (by default in the `.virtual_documents`
directory); this is scheduled for deprecation,
- Jupyter server extension serves as:
- a transparent proxy between LSP language servers and frontend, speaking over websocket
connection
- a manager of language servers, determining whether specific LSP servers are installed and
starting their processes
- JSON files or declarative Python classes registered via entry points are used to define
specification of the LSP servers (where to look for an executable of the LSP server, for which
languages/kernels given LSP server should be used, what is its display name, etc.)

# Rationale and alternatives

A previous (stale) JEP proposed to integrate LSP and to adopt Monaco editor, which would entail
bringing a heavy dependency and large reliance on continuous development of Monaco by Microsoft; it
was not clear whether Monaco would allow efficient use in multi-editor notebook setting and the work
on the integration stalled a few years ago. Differently to that previous proposal we **do not**
propose to adopt any specific implementation, yet we bring a working implementation for CodeMirror 5
editor, which is already in use by two of the official front-ends for Jupyter (Jupyter Notebook and
JupyterLab). While the nearly-feature-complete CodeMirror 6 has specifically declared LSP
integration to be a non-goal, it does however provide a number of features which would allow for
cleaner integration of multiple sources of editor annotation, such as named bundles of marks.

The Jupyter originally driving innovation in the field is now in some communities perceived as a
driver behind bad coding practices due to the lack of available toolset in the official frontends.
Alternative formats to ipynb were proposed and sometimes the only motivation was a better
IDE-features support.

# Prior art

Multiple editors already support the Language Server Protocol, whether directly or via extension
points, including VSCode, Atom, Brackets (Adobe), Spyder, Visual Studio and many more. The list of
clients and their capabilities is described at the community-maintained
[knowledge site](https://langserver.org/) in the "LSP clients" section and at official website of
the [LSP protocol](https://microsoft.github.io/language-server-protocol/implementors/tools/).

Multiple proprietary notebook interfaces attempted integration of language features such as those
provided by LSP, including Google Colab, Datalore, Deepnote, and Polynote; due to proprietary
implementation details it is not clear how many of the existing solutions employ LSP (or its subset)
under the hood.

The on-going integration of the [Debug Adapter Protocol][dap] has demonstrated both the user
benefits, and kernel maintainer costs, of "embracing and extending" existing, non-Jupyter protocols
rather than re-implementing.

# Unresolved questions

The current implementation can be improved by:

1. embedding cell identifiers (and possibly metadata) as comments in the virtual document at a place
corresponding to the start of each cell (in jupytext-compatible way), to enable easier
calculation of positions and implementation of refactoring features (e.g. linting with black)
that add or remove lines (which is not currently possible),
- adding metadata might be required to enable polyglot SOS notebooks, see discussion in
[#282](https://github.com/krassowski/jupyterlab-lsp/issues/282)
- one might consider if it is worth to delegate this task to jupytext; this would necessitate
moving the notebook concatenation logic to the server extension, with a positive side effect of
exposing it for re-use by other clients, but with a potential downsides of the need to
frequently transfer the entire notebook (on each debounced keypress) to the server extension
(which could be alleviated if implemented via delta/diffs; this adds more logic but given that
notebooks is just a JSON it might be feasible to use an existing tool) and with a downside of
having the notebook-virtual document position transformation code on both backend and frontend
as the frontend part cannot be easily (or at al?) eliminated; as this option looks promising it
will be investigated once current performance shortcomings are resolved.
- see further discussion in [#467](https://github.com/krassowski/jupyterlab-lsp/issues/467)
2. formalizing grammar of substituting magics with equivalent or placeholder (which allows for
one-to-one mapping of magics to code that can be understood by standard refactoring tools and
back to the magics after the code was transformed by the refactoring tools, for example moved to
another file), see [#347](https://github.com/krassowski/jupyterlab-lsp/issues/347)
3. abstracting the communication layer between client and server so that different mechanisms can be
used for such communication, for example:
- custom, manually managed websocket between the client and jupyter server extension (existing
solution),
- websocket managed reusing the kernel comms (acting as a transparent proxy but reducing the
number of dependencies since in the context of Jupyter the kernel comms are expected to be
present either way), see the proposed implementation in
[#278](https://github.com/krassowski/jupyterlab-lsp/pull/278)
- direct connection to a cloud or self-hosted service providing language intelligence as a
service, e.g. [sourcegraph](https://about.sourcegraph.com/)
- (potentially) in-client language servers, such a JSON Schema-aware language server to assist in
configuration

There are also smaller fires to put out in the current implementation which we believe do not
warrant further discussion; however, we want to enumerate those to assure a potentially concerned
reader that those topics are being looked at and considered a priority due to the immediate impact
on user and/or developer experience:

- reorganizing deeply nested code into shallower structure of multiple packages, one per each
feature (with the current state of the repository in half a monorepo, half complex project being
an annoyance to maintainers and contributors alike)
- improving performance of completer and overall robustness of the features
- enabling integration with other packages providing completion suggestions
- enabling use of multiple LSP servers for a single document

# Future possibilities

- Amending the kernel messaging protocol to ask only for runtime (e.g. keys in a dictionary, columns
in a data frame) and kernel-specific completions (e.g. magics), this is excluding static-analysis
based completions, to improve the performance of the completer
- Seeding existing linting tools with plugins to support notebook-specific features (empty cells,
out of order execution, largely as envisioned by pioneering work of
[JuLynter](https://dew-uff.github.io/julynter/index.html) experiment)
- also see [lintotype]
- Encouraging contributions to existing language-servers and offering platform for development of
Jupyter-optimized language servers
- Enabling LSP features in markdown cells
- Implementing support for related Language Server Index Format (LSIF), a protocol closely related
to LSP and defined on the
[specification page](https://microsoft.github.io/language-server-protocol/specifications/lsif/0.5.0/specification/)
for even faster IDE features for the retrieval of immutable (or infrequently mutable) information,
such as documentation of built-in functions.

[lintotype]: https://github.com/deathbeds/lintotype/
[dap]:
https://github.com/jupyter/enhancement-proposals/blob/master/jupyter-debugger-protocol/jupyter-debugger-protocol.md

0 comments on commit 39a8a00

Please sign in to comment.