Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal of the semantic highlighting protocol extension #367

Closed

Conversation

kittaakos
Copy link

@kittaakos kittaakos commented Jun 27, 2018

Task: #368.

Signed-off-by: Akos Kitta [email protected]

@msftclas
Copy link

msftclas commented Jun 27, 2018

CLA assistant check
All CLA requirements met.

/**
* An array of semantic highlighting scopes for the current token.
*/
scopes: string[];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation should point out that this corresponds to TextMate scopes. https://manual.macromates.com/en/language_grammars

@@ -0,0 +1,90 @@
/* --------------------------------------------------------------------------------------------
* Copyright (c) Microsoft Corporation. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's MS copyright

##### SemanticHighlighting Notification

The `textDocument/semanticHighlighting` notification is pushed from the server to the client to inform the client about additional semantic highlighting information that has to be applied on the text document.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should explain that incremental updates are done on a line-by-line basis.

Copy link
Contributor

@rcjsuen rcjsuen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick pass over the proposal but as it's a WIP I haven't really given a good look at it.

/* --------------------------------------------------------------------------------------------
* Copyright (c) Microsoft Corporation. All rights reserved.
* Licensed under the MIT License. See License.txt in the project root for license information.
* ------------------------------------------------------------------------------------------ */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be TypeFox instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have compared another module. Sorry for the noise.


_Client Capabilities_:

Capability that has to be set by the language client if that can accept and process the semantic highlighting information received from the server.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if that can should be if it can.

/**
* The text document client capabilities.
*/
workspace: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant to name this variable textDocument?

workspace: {

/**
* `true` if the client has the semantic highlighting support for the text document. Otherwise, `false`. It is `false` by default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true if the client has the semantic highlighting support for the text document.
...to...
true if the client supports semantic highlighting support text documents.

@jacobdufault
Copy link

jacobdufault commented Jun 29, 2018

cquery implements semantic highlighting as a notification the client sends to the server and supports rainbow highlighting (ie, each variable is a different shade of the same color, which makes flow analysis easier). Here is what the protocol looks like:

struct Symbol {
  // a stable ID that can be used for rainbow highlighting; ie, this symbol will have
  // the same stableId across multiple notifications (w.r.t. each file)
  int stableId = 0;
  // what of type of symbol declares this symbol? ie, is this a free-standing variable
  // or an instance variable of a class
  SymbolKind parentKind; 
  // what type of symbol is this, ie, variable, func, etc
  SymbolKind kind;
  // what's the memory storage for this variable? ie, is it a global variable?
  StorageClass storage;
  // locations in the document where the symbol is used
  Range[] ranges;
};

// Standard notification
struct CqueryPublishSemanticHighlightingNotification {
  struct Params {
    // The file this applies to
    DocumentUri uri;
    // A list of symbols in the file that should be highlighted
    Symbol[] symbols;
  };
  Params params;
};

It'd be great if the official spec supported all of the features that cquery's implementation supports. Let me know if you have any questions regarding how cquery's implementation works.

There's also an implementation for vscode at https://github.com/cquery-project/vscode-cquery/blob/937864a82e9fb44f03eafb095134d01776e73367/src/extension.ts#L736-L875.

@svenefftinge
Copy link
Contributor

cquery implements semantic highlighting as a notification the client sends to the server

You meant the inverse, i.e. notifications from the server to the client, right?

I think the main difference between this proposal and the protocol you have in cquery is, that we rely on textmate scopes rather than passing semantic information to the client. I don't see why your use cases shouldn't be covered. The server just needs to send the right scopes and the client needs to support them.

Also, this proposal doesn't support multi-line tokens as the server publishes symbols on a per line basis. This is so it can update incrementally.

@jacobdufault
Copy link

You meant the inverse, i.e. notifications from the server to the client, right?

Oops, yep. cquery binary sends a notification to the client, ie, vscode.

I think the main difference between this proposal and the protocol you have in cquery is, that we rely on textmate scopes rather than passing semantic information to the client. I don't see why your use cases shouldn't be covered. The server just needs to send the right scopes and the client needs to support them.

Does the current proposal support rainbow style semantic highlighting, where each unique variable has a similar but slightly different shade of the same color? I'm not familiar enough with textmate scopes.

As for multi-line tokens, this seems like a rare use-case and is not one that cquery needs.

@svenefftinge
Copy link
Contributor

Does the current proposal support rainbow style semantic highlighting, where each unique variable has a similar but slightly different shade of the same color? I'm not familiar enough with textmate scopes.

See https://www.sublimetext.com/docs/3/scope_naming.html for more info on textmate scopes.

It does not directly support rainbow style, but I think it could be done by sending corresponding scopes. For instance, the LS could send variable.other.c1 to variable.other.c9. If the color theme supports it you will get rainbow colors, if not the more general variable coloring would apply.

@Levertion
Copy link

Why is this in Microsoft/vscode-languageserver-node, rather than Microsoft/language-server-protocol? I presume that it's just to allow the development of the reference implementation and the proposal to be linked?

@svenefftinge
Copy link
Contributor

svenefftinge commented Jul 1, 2018

Why is this in Microsoft/vscode-languageserver-node, rather than Microsoft/language-server-protocol?

The process is described here https://github.com/Microsoft/language-server-protocol/blob/master/contributing.md
So this is the mandatory reference implementation for the VSCode language client library.
Unfortunately, VSCode doesn't expose enough API to implement it. E.g. there is no way to access the TextMate scheme registry.
@dbaeumer what would be the way to go in such cases?

@Levertion
Copy link

Fair enough. I just didn't know why microsoft/language-server-protocol#124 was closed, but that explains it.

@rcjsuen
Copy link
Contributor

rcjsuen commented Jul 2, 2018

It's unfortunate that things have to be done this way as there are more people watching Microsoft/language-server-protocol than this repository and would allow for more feedback from the community...but I realize that that discussion is rather off-topic here.

@kittaakos kittaakos force-pushed the semantic-highlighting-proposal branch 2 times, most recently from e773d7c to e881846 Compare July 2, 2018 09:23
@svenefftinge
Copy link
Contributor

We should open a corresponding PR on language-server-protocol, with the relevant changes for the markdown file. @kittaakos WDYT?

@dbaeumer
Copy link
Member

dbaeumer commented Jul 4, 2018

@svenefftinge @rcjsuen actually the document here https://github.com/Microsoft/language-server-protocol/blob/master/contributing.md says that

Proposed extensions need to be announced via an issue in this repository.

where this repository is https://github.com/Microsoft/language-server-protocol

Agree that this can be made more clear by moving it to the top.

@dbaeumer
Copy link
Member

dbaeumer commented Jul 4, 2018

Moved to the top. I am open to accepting a reference implementation for a different well used client as well.

@kittaakos
Copy link
Author

Thanks for the clarification, @dbaeumer. I open one for the proposal in the https://github.com/Microsoft/language-server-protocol repository too.

Agree that this can be made more clear by moving it to the top.

Just naming the repository instead of having a link in the README would help too.

@svenefftinge
Copy link
Contributor

I am open to accepting a reference implementation for a different well used client as well.

Ok, good to know. It would be very cool, to have support for it in VSCode, though.
Is there a chance you are exposing a bit more API, so it could work?
Also, what do you think about the incremental updated based on lines?

@alexdima
Copy link
Member

alexdima commented Jul 6, 2018

I think it is a good idea to push the semantic highlights from the server. The only thing to be aware here would be the case that there are 100 opened files in the client and that only 1 file is visible, so the semantic highlights would only be useful for a single file. @dbaeumer I don't know if the protocol has visibility APIs yet, such that the server could limit its pushing of semantic highlights to the currently visible files or the currently visible regions of files?

Having worked before on the problem of how to represent in a memory-friendly way tokens (see https://code.visualstudio.com/blogs/2017/02/08/syntax-highlighting-optimizations), and having dealt with a lot of problems stemming from large files, may I suggest that the protocol in this area be designed with the following goals in mind:

  1. tokens should be encoded in a memory friendly way straight from the wire (JSON).

  2. there should be a way for the server to incrementally update the semantic highlights.


I think goal 1 is somewhat straight-forward to achieve. I suggest that each token does not point to an array of scopes (i.e. SemanticHighlightingToken.scopes), but rather the entire set of possible scopes are registered statically up front and this field should be replaced with an index into the legend. My assumption is that the possible number and types of semantic tokens is finite. I also believe using scopes (which can be targeted by TM themes) it is a good abstraction level. i.e. we should not talk about colors at this layer, as that is something for themes to decide.

So, assuming the entire possible set of scopes is registered/sent over up front, the token interface can be reduced to:

interface SemanticHighlightingInformation {
  line: number;
  tokens: SemanticHighlightingToken[];
}
interface SemanticHighlightingToken {
  character: number;
  length: number;
  scopes: number;
}

The next immediate memory/size optimisation here is to drop the SemanticHighlightingToken object entirely and inline everything in SemanticHighlightingInformation. i.e.

interface SemanticHighlightingInformation {
  line: number;
  tokens: number[]; // 3*i is character, 3*i+1 is length, 3*i+2 is scopes
}

Other optimisations could be pursued, such as noticing that the length is typically a small integer, and if we assume that the possible number of scopes is smaller than 2^16, we can get away with two 32 bit unsigned integers per token:

interface SemanticHighlightingInformation {
  line: number;
  tokens: number[]; // 2*i is character, (2*i+1)&0xffff0000>>16 is length, (2*i+1)&0x0000ffff is scopes
}

In terms of TypeScript, this would be best represented using typed arrays, UInt32Array:

interface SemanticHighlightingInformation {
  line: number;
  tokens: UInt32Array[];
}

The source for an Uint32Array is always an ArrayBuffer, so we could just store that:

interface SemanticHighlightingInformation {
  line: number;
  tokens: ArrayBuffer;
}

And, finally, the best and easiest way I'm aware of for efficiently encoding a byte array in JSON is to use base64 encoding, so at the wire level:

interface SemanticHighlightingInformation {
  line: number;
  tokens: string; // base64 encoded ArrayBuffer that should be viewed as an UInt32Array
}

Now, the second point is a bit more difficult as we would need to clearly specify what is expected of the client when edits occur. For example, we need to specify how the tokens should be backed by "live-markers" such that they stick with the text. i.e. the server and the client need some shared understanding on how the semantic tokens are adjusted. Second, we need for the notification event to be able to replace a certain region in the file with new tokens.

For example, when opening a large file in an editor, let's suppose the server sends a notification (which must include the document version id) which defines all the semantic tokens in the entire file. Now, when an edit occurs, for example, a new line is pressed on the first position in a file, both client and server should share the understanding that all the previously set tokens have been moved by one line (they stick with the text they're sitting on). So, when pressing a new line on the first position in a file, the server needs not to send any notification to update the semantic tokens.

Another example would be typing a character. This gets sent to the server as a text delta which mentions the position and the new inserted character. Similarly, the server should be able to send a notification of a size in direct proportion with the amount of semantic tokens changes. If the user typed a character in an identifier, then a notification updating only that specific semantic token should suffice.

I'm sorry I'm just describing the general mechanism here and not providing a sketch like above, but again, this is a much more complicated goal to achieve. Anyways, the requirements on the client can be reduced if there is some sort of negotiation where the client indicates that it cannot track semantic tokens, in which case the server would always send the tokens for the entire file. (You can think of this as something similar to formatting; there are some formatting providers which are very good and send N edits that target precisely the whitespace that needs to be changed... and then there are formatting providers that send a single huge edit encompassing the entire file and replacing it with new text). In other words, this should be done in a way where a smart client and a smart server can manage to achieve the correct maintenance of a high volume of semantic tokens as efficiently as possible.

@isc-bsaviano
Copy link

Are there any updates for when this pull request will make it into the master branch? Semantic tokens are a key part of our IDE strategy and we're very excited to add them to our language server!

@rcjsuen
Copy link
Contributor

rcjsuen commented Aug 6, 2020

Are there any updates for when this pull request will make it into the master branch? Semantic tokens are a key part of our IDE strategy and we're very excited to add them to our language server!

@bsaviano-intersystems This proposal has been superseded by a new proposal. I suggest you look at the draft 3.16 specification and share your thoughts in microsoft/language-server-protocol#18.

@isc-bsaviano
Copy link

@rcjsuen Thanks for the response. I've already seen the proposed spec in 3.16. I was wondering if there's a version of the vscode-languageserver/languageclient packages that contains the proposal so I can repurpose my existing DocumentSemanticTokenProvider and test it out. Thanks again.

@rcjsuen
Copy link
Contributor

rcjsuen commented Aug 6, 2020

@rcjsuen I was wondering if there's a version of the vscode-languageserver/languageclient packages that contains the proposal so I can repurpose my existing DocumentSemanticTokenProvider and test it out. Thanks again.

@bsaviano-intersystems I know 6.1.1 and up for vscode-languageserver and 6.1.3 and up for vscode-languageclient should be good. I don't know off hand when it was first introduced though.

@dbaeumer
Copy link
Member

dbaeumer commented Aug 7, 2020

@rcjsuen thanks for answering. The latest server version is 7.0.0-next.6 and client 7.0.0-next.8

@isc-bsaviano
Copy link

Thank you both for the responses. I will install the “next” packages and try it out. Looking forward to the official 3.16 release! Has a timeframe for that been established?

@kjeremy
Copy link
Contributor

kjeremy commented Aug 7, 2020

@dbaeumer npm outdated shows me that 7.0.0-next.1 is the latest vscode-languageclient

@dbaeumer
Copy link
Member

@kjeremy that is incorrect. Let me check what happened.

@dbaeumer
Copy link
Member

That is strange since npm lists everything correctly

capture

What version do you have installed right now. Not sure if outdated works correctly with next versions.

@kjeremy
Copy link
Contributor

kjeremy commented Aug 17, 2020

@dbaeumer

npm --version
6.14.7

npm outdated

image

@dbaeumer
Copy link
Member

@kjeremy I mean which version of the vscode-languageclient

@kjeremy
Copy link
Contributor

kjeremy commented Aug 18, 2020

@dbaeumer 7.0.0-next.1

@dbaeumer
Copy link
Member

@kjeremy what do you expect to happen. To my knowledge next versions are not considered to be semver. So npm will never install 7.0.0-next.2 if you prereq 7.0.0-next.1

@kjeremy
Copy link
Contributor

kjeremy commented Aug 18, 2020

@dbaeumer I expect that 7.0.0-next.8 shows up as the latest version available and it does not.

@dbaeumer
Copy link
Member

@kjeremy but this is then something that npm needs to deal with. IMO the releases are tag correct. Have you raised the issue with them?

@kittaakos kittaakos closed this Sep 11, 2020
qtprojectorg pushed a commit to qt-creator/qt-creator that referenced this pull request Nov 29, 2021
The semantichighlighting proposal from
microsoft/vscode-languageserver-node#367
was replaced with semantic tokens in the final protocol version 3.16.

This reverts 307f1d8

Task-number: QTCREATORBUG-26624
Change-Id: I635f0b4763a197edabf9edf8d9041143dcf531e3
Reviewed-by: Christian Kandeler <[email protected]>
@Hebe970217
Copy link

excuse me?I see KamasamaK Removing Semantic Highlighting, no support on Oct 20, 2020 at after?

@SPGoding
Copy link

SPGoding commented Jan 4, 2022

That was the title of an issue on the eclipse/lsp4j repo to deprecate their Semantic Highlighting API in favor of LSP's Semantic Tokens API, unrelated to microsoft/vscode-languageserver-node.

@Hebe970217
Copy link

That was the title of an issue on the eclipse/lsp4j repo to deprecate their Semantic Highlighting API in favor of LSP's Semantic Tokens API, unrelated to microsoft/vscode-languageserver-node.

Okay.
lsp server init result is "documentHighlightProvider": true.
I tried to use textdocument / documenthighlight to request the backend Clangd, and I didn't see the independent return result. I'm writing a CPP client

@Hebe970217
Copy link

Maybe I should create a new issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.