Semantic Token Support #533

Doekeb · 2024-03-10T22:59:02Z

LSP supports Semantic Tokens which editors and colorschemes can opt into in order to provide "smarter" language highlighting than pure tree-based highlighting.
https://code.visualstudio.com/api/language-extensions/semantic-highlight-guide
https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocument_semanticTokens

Notably, Neovim now supports semantic tokens (neovim/neovim#21100) and, more recently, semantic token modifiers (neovim/neovim#22022).

This feature has been requested in this repo here: #33
In the unmaintained base here: palantir/python-language-server#933
In another jedi-based language server here: pappasam/jedi-language-server#137
And it's implementation has been attempted and abandoned twice in the latter: pappasam/jedi-language-server#196 and pappasam/jedi-language-server#231

There is a maintained fork of alternative tool for Neovim here https://github.com/wookayin/semshi, but it suffers from the major drawbacks that it is only available for Neovim and highlight colors are hardcoded, so they are unlikely to match the user's colorscheme.

This PR only implements full document protocol. Performance may be improved by also implementing full document delta protocol, and the range protocol.

Here are some examples in two different colorschemes with only very simple rules implemented so far. Tree-based highlighting is always on the left, and augmented with Semantic Token highlighting is always on the right.

Functions and classes

Tree-based highlighting infers whether a reference is a class based on its name and therefore doesn't highlight dingus_mc_bingus as a class even though it is.
Similarly, tree-based highlighting can't determine that my_function and MyFunction are both functions.

Imports

Tree-based highlighting can't determine what kind of thing imported names are, other than by their naming (which often break convention even in standard library modules in python)

Parameters

Tree-based highlighting colors parameters in the signature of a function/method differently than when they are used in the body. Semantic token highlighting maintains parameter highlighting until the variable is re-assigned.
Note that tree-based highlighting treats the name self as a special token. This is not language smarts as evidenced by the lack of highlighting of the language-equivalent this. Semantic tokens currently colors both self and this inside a method as a regular parameter, but this could be improved using semantic token modifiers and a bit more inference (the colorschemes I'm using here don't apply any different styles to modifiers). Note that even in the semantic token augmented version, tree-based highlighting takes over on self when its outside a method.

Properties

Tree-based highlighting guesses whether an attribute is a property or a method based on the presence of parentheses. Semantic token highlighting knows the difference.

highlighting is working!

rchl · 2024-03-11T08:28:36Z

Do you have some performance data? For example how long it takes to generate tokens in a 2000 lines document? It feels like it would be very slow to trigger "goto" for each "name" like that.

Ideally such feature would be implemented by jedi and use some form of caching to speed things up. The LSP semantic tokens is designed in a way that should make the case of adding/removing text pretty fast but in your implementation the whole work seems like will be done from scratch on every single change.

Doekeb · 2024-03-11T18:23:11Z

Do you have some performance data? For example how long it takes to generate tokens in a 2000 lines document? It feels like it would be very slow to trigger "goto" for each "name" like that.

I don't have performance data on a behemoth like that, but happy to gather some especially if you can point me in the direction of a big project I can try it on. Additionally, if performance ends up being an issue for huge files, it would be fairly simple to implement the range protocol which exists exactly for this purpose. From the LSP specs:

There are two uses cases where it can be beneficial to only compute semantic tokens for a visible range:

for faster rendering of the tokens in the user interface when a user opens a file. In this use cases servers should also implement the textDocument/semanticTokens/full request as well to allow for flicker free scrolling and semantic coloring of a minimap.

if computing semantic tokens for a full document is too expensive servers can only provide a range call. In this case the client might not render a minimap correctly or might even decide to not show any semantic tokens at all.

Determining when to request full semantic tokens vs. a range would then be the client's responsibility.

Ideally such feature would be implemented by jedi and use some form of caching to speed things up. The LSP semantic tokens is designed in a way that should make the case of adding/removing text pretty fast but in your implementation the whole work seems like will be done from scratch on every single change.

I agree that an upstream implementation is possible and preferable, and it would be great to contribute a portion of this to Jedi down the road. But hopefully this can work for the people who want it in the meantime.

If performance is a major concern (I agree that it would be good to gather more information on this front), we could begin by making this plugin opt-in like many of the other bundled plugins are.

rchl · 2024-03-19T14:02:15Z

I don't have performance data on a behemoth like that, but happy to gather some especially if you can point me in the direction of a big project I can try it on.

Not as big but maybe https://github.com/davidhalter/jedi/blob/master/jedi/plugins/stdlib.py

Additionally, if performance ends up being an issue for huge files, it would be fairly simple to implement the range protocol which exists exactly for this purpose. From the LSP specs:

Would it really be that easy? It depends really on whether the API that you are using for this would make it possible.

doeke added 9 commits March 6, 2024 21:11

Add semantic tokens boilerplate

e88a38e

First result in the hookspec

5c9a44c

Remove the extra logs I don't need anymore

2626245

Make the token and modifier kinds

7054c18

Put in all token types and enums for them

fc41380

Working starting point. Lots to fine-tune but basic augmented

d7b675a

highlighting is working!

Change a few names

93e707d

Refactor a few things, improve log messages, add some docstrings

d374347

Remove dummy data

b3425f2

Doekeb mentioned this pull request Mar 10, 2024

feature request: semantic highlighting (textDocument_semanticTokens) #33

Open

krassowski mentioned this pull request Jun 4, 2024

Support for highlighting code by different color according to types like in VSCode jupyter-lsp/jupyterlab-lsp#1090

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Semantic Token Support #533

Semantic Token Support #533

Doekeb commented Mar 10, 2024

rchl commented Mar 11, 2024

Doekeb commented Mar 11, 2024

rchl commented Mar 19, 2024

Semantic Token Support #533

Are you sure you want to change the base?

Semantic Token Support #533

Conversation

Doekeb commented Mar 10, 2024

Functions and classes

Imports

Parameters

Properties

rchl commented Mar 11, 2024

Doekeb commented Mar 11, 2024

rchl commented Mar 19, 2024