Skip to content

Latest commit

 

History

History
1066 lines (780 loc) · 50.2 KB

specification.md

File metadata and controls

1066 lines (780 loc) · 50.2 KB

Under Construction

The 0.4.0 version of LSIF is currently under construction.

Language Server Index Format

The purpose of the Language Server Index Format (LSIF) is to define a standard format for language servers or other programming tools to dump their knowledge about a workspace. This dump can later be used to answer language server LSP requests for the same workspace without running the language server itself. Since much of the information would be invalidated by a change to the workspace, the dumped information typically excludes requests used when mutating a document. So, for example, the result of a code complete request is typically not part of such a dump.

Changelog

Version 0.4.0

Up to version 0.4.0, the focus of the LSIF format was to ease the generation of the dump for language tool providers. However, this made it very hard for consumers of the dump to efficiently import them into a DB unless the DB format one to one mapped to the LSIF format. This version of the specification tries to balance this by requiring tools providers to emit additional events when certain data is ready to be consumed. It also adds support to partition data per document.

Since 0.4.0 changes some of the LSIF aspects more deeply an old 0.3.x version of the specification is available here

Motivation

Principal design goals:

  • The format should not imply the use of a certain persistence technology.
  • The data defined should be modeled as closely as possible to the Language Server Protocol to make it possible to serve the data through the LSP without further transformation.
  • The data stored is result data usually returned from an LSP request. The dump doesn't contain any program symbol information nor does the LSIF define any symbol semantics (e.g. where a symbol is defined or referenced or when a method overrides another method). The LSIF, therefore, doesn't define a symbol database. Please note that this is consistent with the LSP itself which doesn't define any symbol semantics either.
  • The output format will be based on JSON as with the LSP.

LSP requests that are good candidates to be supported in LSIF are:

The corresponding LSP requests have one of the following two forms:

request(uri, method) -> result
request(uri, position, method) -> result

where method is the JSON-RPC request method.

Concrete examples are

request('file:///Users/dirkb/sample/test.ts', 'textDocument/foldingRange') -> FoldingRange[];
request('file:///Users/dirkb/sample/test.ts', { line: 10, character: 17 }, 'textDocument/hover') -> Hover;

The input tuple to a request is either [uri, method] or [uri, position, method] and the output is some form of result. For the same uri and [uri, position] tuple, there are many different requests to execute.

The dump format, therefore, should support the following features:

  • Input data must be easily queryable (e.g. the document and the position).
  • Each element has a unique id (which may be a string or a number).
  • It should be possible to emit data as soon as it is available to allow streaming rather than large memory requirements. For example, emitting data based on document syntax should be done for each file as parsing progresses.
  • It should be easy to add additional requests later on.
  • It should be easy for a tool to consume a dump and for example import it into a database without holding the dump in memory.

We came to the conclusion that the most flexible way to emit this is a graph, where edges represent the method, and vertices are [uri], [uri, position] or a request result. This data could then be stored as JSON or read into a database that can represent these vertices and relationships.

Assume there is a file /Users/dirkb/sample.ts and we want to store the folding range information with it then the indexer emits two vertices: one representing the document with its URI file:///Users/dirkb/sample.ts, the other representing the folding result. In addition, an edge would be emitted representing the textDocument/foldingRange request.

{ id: 1, type: "vertex", label: "document", uri: "file:///Users/dirkb/sample.ts", languageId: "typescript" }
{ id: 2, type: "vertex", label: "foldingRangeResult", result: [ { ... }, { ... }, ... ] }
{ id: 3, type: "edge", label: "textDocument/foldingRange", outV: 1, inV: 2 }

The corresponding graph looks like this

Folding Range Result

Ranges

For requests that take a position as its input, we need to store the position as well. Usually, LSP requests return the same result for positions that point to the same word/name in a document. Take the following TypeScript example:

function bar() {
}

A hover request for a position denoting the b in bar will return the same result as a position denoting the a or r. To make the dump more compact, it will use ranges to capture this instead of single positions. The following vertices will be emitted in this case. Note that line and character are zero-based as in the LSP:

{ id: 4, type: "vertex", label: "range", start: { line: 0, character: 9}, end: { line: 0, character: 12 } }

To bind the range to a document, we use a special edge labeled contains which points from a document to a set of ranges.

{ id: 5, type: "edge", label: "contains", outV: 1, inVs: [4] }

LSIF supports 1:n edges for the contains relationship which, in a graph can easily be mapped to n 1:1 edges. LSIF supports this for two reasons: (a) to make the output more compact since a document usually contains hundreds of those ranges and (b) to easy the import and batching for consumers of an LSIF dump.

To bind the hover result to the range, we use the same pattern as we used for the folding ranges. We emit a vertex representing the hover result and an edge representing the textDocument/hover request.

{
  id: 6,
  type: "vertex",
  label: "hoverResult",
  result: {
    contents: [
      { language: "typescript", value: "function bar(): void" }
    ]
  }
}
{ id: 7, type: "edge", label: "textDocument/hover", outV: 4, inV: 6 }

The corresponding graph looks like this

Hover Result

The ranges emitted for a document in the contains relationship must follow these rules:

  1. a given range ID can only be contained in one document or in other words: ranges must not be shared between documents even if they have the same start/end value.
  2. No two ranges can be equal.
  3. No two ranges can overlap, claiming the same position in a document unless one range is entirely contained by the other.

If a position in a document is mapped to a range and more than one range covers the position, the following algorithm should be used:

  1. sort the ranges by containment with innermost first
  2. for range in ranges do
    1. check if the range has an outgoing edge textDocument/${method}
    2. if yes, use it
  3. end
  4. return null

ResultSet

Usually, the hover result is the same whether you hover over a definition of a function or over a reference of that function. The same is actually true for many LSP requests like textDocument/definition, textDocument/references or textDocument/typeDefinition. In a naïve model, each range would have outgoing edges for all these LSP requests and would point to the corresponding results. To optimize this and to make the graph easier to understand, the concept of a ResultSet is introduced. A result set acts as a hub to be able to store information common to a lot of ranges. The ResultSet itself doesn't carry any information. So it looks like this:

export interface ResultSet {
}

The corresponding output of the above example with a hover using a result set looks like this:

{ id: 1, type: "vertex", label: "document", uri: "file:///Users/dirkb/sample.ts", languageId: "typescript" }
{ id: 2, type: "vertex", label: "resultSet" }
{ id: 3, type: "vertex", label: "range", start: { line: 0, character: 9}, end: { line: 0, character: 12 } }
{ id: 4, type: "edge", label: "contains", outV: 1, inVs: [3] }
{ id: 5, type: "edge", label: "next", outV: 3, inV: 2 }
{ id: 6, type: "vertex", label: "hoverResult", result: {"contents":[{"language":"typescript","value":"function bar(): void"},""] }
{ id: 7, type: "edge", label: "textDocument/hover", outV: 2, inV: 6 }

Result Set

Result sets are linked to ranges using a next edge. A results set can also forward information to another result set by linking to it using a next edge.

The pattern of storing the result with the ResultSet will be used for other requests as well. The lookup algorithm is therefore as follows for a request [document, position, method]:

  1. find all ranges for [document, position]. If none exist, return null as the result
  2. sort the ranges by containment the innermost first
  3. for range in ranges do
    1. assign range to out
    2. while out !== null
      1. check if out has an outgoing edge textDocument/${method}. if yes, use it and return the corresponding result
      2. check if out has an outgoing next edge. If yes, set out to the target vertex. Else set out to null
    3. end
  4. end
  5. otherwise, return null

Request: textDocument/definition

The same pattern of connecting a range, result set, or a document with a request edge to a method result is used for other requests as well. Let's next look at the textDocument/definition request using the following TypeScript sample:

function bar() {
}

function foo() {
  bar();
}

This will emit the following vertices and edges to model the textDocument/definition request:

// The document
{ id: 4, type: "vertex", label: "document", uri: "file:///Users/dirkb/sample.ts", languageId: "typescript" }

// The bar declaration
{ id: 6, type: "vertex", label: "resultSet" }
{ id: 9, type: "vertex", label: "range", start: { line: 0, character: 9 }, end: { line: 0, character: 12 } }
{ id: 10, type: "edge", label: "next", outV: 9, inV: 6 }


// The bar reference
{ id: 20, type: "vertex", label: "range", start: { line: 4, character: 2 }, end: { line: 4, character: 5 } }
{ id: 21, type: "edge", label: "next", outV: 20, inV: 6}

// The definition result linked to the bar result set
{ id: 22, type: "vertex", label: "definitionResult" }
{ id: 23, type: "edge", label: "textDocument/definition", outV: 6, inV: 22 }
{ id: 24, type: "edge", label: "item", outV: 22, inVs: [9], document: 4 }

Definition Result

The definition result above has only one value (the range with id '9') and we could have emitted it directly. However, we introduced the definition result vertex for two reasons:

  • To have consistency with all other requests that point to a result.
  • To have support for languages where a definition can be spread over multiple ranges or even multiple documents. To support multiple documents ranges are added to a definition result using a 1:N item edge. Conceptionally a definition result is an array to which the item edge adds items.

Consider the following TypeScript example:

interface X {
  foo();
}
interface X {
  bar();
}
let x: X;

Running Go to Definition on X in let x: X will show a dialog that lets the user select between the two definitions of the interface X. The emitted JSON, in this case, looks like this:

{ id : 38, type: "vertex", label: "definitionResult" }
{ id : 40, type: "edge", label: "item", outV: 38, inVs: [9, 13], document: 4 }

The item edge is an additional property document that indicates which document has the declaration. We added this information to make it easy to emit the data but also make it easy to process the data to store it in a database. Without that information we would either need to specify an order in which data needs to be emitted (e.g. an item edge and only refer to a range that got already added to a document using a contains edge) or we force processing tools to keep a lot of vertices and edges in memory. The approach of having this document property looks like a fair balance.

Request: textDocument/declaration

There are programming languages that have the concept of declarations and definitions (like C/C++). If this is the case, the dump can contain a corresponding declarationResult vertex and a textDocument/declaration edge to store the information. They are handled analogously to the entities emitted for the textDocument/definition request.

More about Request: textDocument/hover

In the LSP, the hover is defined as follows:

export interface Hover {
  /**
   * The hover's content
   */
  contents: MarkupContent | MarkedString | MarkedString[];

  /**
   * An optional range
   */
  range?: Range;
}

where the optional range is the name range of the word hovered over.

Side Note: This is a pattern used for other LSP requests as well, where the result contains the word range of the word the position parameter pointed to.

This makes the hover different for every location so we can't really store it with the result set. But wait, the range is the range of one of the bar references we already emitted and used to start to compute the result. To make the hover still reusable, we ask the index server to fill in the starting range if no range is defined in the result. So for a hover request executed on range { line: 4, character: 2 }, end: { line: 4, character: 5 } the hover result will be:

{ id: 6, type: "vertex", label: "hoverResult", result: { contents: [ { language: "typescript", value: "function bar(): void" } ], range: { line: 4, character: 2 }, end: { line: 4, character: 5 } } }

Request: textDocument/references

Storing references will be done in the same way as storing a hover or go to definition ranges. It uses a reference result vertex and item edges to add ranges to the result.

Look at the following example:

function bar() {
}

function foo() {
  bar();
}

The relevant JSON output looks like this:

// The document
{ id: 4, type: "vertex", label: "document", uri: "file:///Users/dirkb/sample.ts", languageId: "typescript" }

// The bar declaration
{ id: 6, type: "vertex", label: "resultSet" }
{ id: 9, type: "vertex", label: "range", start: { line: 0, character: 9 }, end: { line: 0, character: 12 } }
{ id: 10, type: "edge", label: "next", outV: 9, inV: 6 }

// The bar reference range
{ id: 20, type: "vertex", label: "range", start: { line: 4, character: 2 }, end: { line: 4, character: 5 } }
{ id: 21, type: "edge", label: "next", outV: 20, inV: 6 }

// The reference result
{ id : 25, type: "vertex", label: "referenceResult" }
// Link it to the result set
{ id : 26, type: "edge", label: "textDocument/references",  outV: 6, inV: 25 }

// Add the bar definition as a reference to the reference result
{ id: 27, type: "edge", label: "item", outV: 25, inVs: [9], document: 4, property : "definitions" }

// Add the bar reference as a reference to the reference result
{ id: 28, type: "edge", label: "item", outV: 25, inVs: [20], document:4, property: "references" }

References Result

We tag the item edge with id 27 as a definition since the reference result distinguishes between definitions, declarations, and references. This is done since the textDocument/references request takes an additional input parameter includeDeclarations controlling whether declarations and definitions are included in the result as well. Having three distinct properties allows the server to compute the result accordingly.

The item edge also supports linking reference results to other reference results. This is useful when computing references to methods overridden in a type hierarchy.

Take the following example:

interface I {
  foo(): void;
}

class A implements I {
  foo(): void {
  }
}

class B implements I {
  foo(): void {
  }
}

let i: I;
i.foo();

let b: B;
b.foo();

The reference result for the method foo in TypeScript contains all three declarations and both references. While parsing the document, one reference result is created and then shared between all result sets.

The output looks like this:

// The document
{ id: 4, type: "vertex", label: "document", uri: "file:///Users/dirkb/sample.ts", languageId: "typescript" }

// The declaration of I#foo
{ id: 13, type: "vertex", label: "resultSet" }
{ id: 16, type: "vertex", label: "range", start: { line: 1, character: 2 }, end: { line: 1, character: 5 } }
{ id: 17, type: "edge", label: "next", outV: 16, inV: 13 }
// The reference result for I#foo
{ id: 30, type: "vertex", label: "referenceResult" }
{ id: 31, type: "edge", label: "textDocument/references", outV: 13, inV: 30 }

// The declaration of A#foo
{ id: 29, type: "vertex", label: "resultSet" }
{ id: 34, type: "vertex", label: "range", start: { line: 5, character: 2 }, end: { line: 5, character: 5 } }
{ id: 35, type: "edge", label: "next", outV: 34, inV: 29 }

// The declaration of B#foo
{ id: 47, type: "vertex", label: "resultSet" }
{ id: 50, type: "vertex", label: "range", start: { line: 10, character: 2 }, end: { line: 10, character: 5 } }
{ id: 51, type: "edge", label: "next", outV: 50, inV: 47 }

// The reference i.foo()
{ id: 65, type: "vertex", label: "range", start: { line: 15, character: 2 }, end: { line: 15, character: 5 } }

// The reference b.foo()
{ id: 78, type: "vertex", label: "range", start: { line: 18, character: 2 }, end: { line: 18, character: 5 } }

// The insertion of the ranges into the shared reference result
{ id: 90, type: "edge", label: "item", outV: 30, inVs: [16,34,50], document: 4, property: definitions }
{ id: 91, type: "edge", label: "item", outV: 30, inVs: [65,78], document: 4, property: references }

// Linking A#foo to I#foo
{ id: 101, type: "vertex", label: "referenceResult" }
{ id: 102, type: "edge", label: "textDocument/references", outV: 29, inV: 101 }
{ id: 103, type: "edge", label: "item", outV: 101, inVs: [30], document: 4, property: referenceResults }

// Linking B#foo to I#foo
{ id: 114, type: "vertex", label: "referenceResult" }
{ id: 115, type: "edge", label: "textDocument/references", outV: 47, inV: 114 }
{ id: 116, type: "edge", label: "item", outV: 114, inVs: [30], document: 4, property: referenceResults }

One goal of the language server index format is that the information can be emitted as soon as possible without caching too much information in memory. With languages that support overriding methods defined in more than one interface, this can be more complicated since the whole inheritance tree might only be known after parsing all documents.

Take the following TypeScript example:

interface I {
  foo(): void;
}

interface II {
  foo(): void;
}

class B implements I, II {
  foo(): void {
  }
}

let i: I;
i.foo();

let b: B;
b.foo();

Searching for I#foo() finds 4 references, searching for II#foo() finds 3 reference, and searching on B#foo() finds 5 results. The interesting part here is when the declaration of class B gets processed which implements I and II, neither the reference result bound to I#foo() nor the one bound to II#foo() can be reused. So we need to create a new one. To still be able to profit from the results generated for I#foo and II#foo, the LSIF supports nested references results. This way the one referenced from B#foo will reuse the one from I#foo and II#foo. Depending on how these declarations are parsed, the two reference results might contain the same references. When a language server interprets reference results consisting of other reference results, the server is responsible to de-dup the final ranges.

In the above example, there will be three reference results

// The document
{ id: 4, type: "vertex", label: "document", uri: "file:///Users/dirkb/sample.ts", languageId: "typescript" }

// Declaration of I#foo
{ id: 13, type: "vertex", label: "resultSet" }
{ id: 16, type: "vertex", label: "range", start: { line: 1, character: 2 }, end: { line: 1, character: 5 } }
{ id: 17, type: "edge", label: "next", outV: 16, inV: 13 }

// Declaration of II#foo
{ id: 27, type: "vertex", label: "resultSet" }
{ id: 30, type: "vertex", label: "range", start: { line: 5, character: 2 }, end: { line: 5, character: 5 } }
{ id: 31, type: "edge", label: "next", outV: 30, inV: 27 }

// Declaration of B#foo
{ id: 45, type: "vertex", label: "resultSet" }
{ id: 52, type: "vertex", label: "range", start: { line: 9, character: 2 }, end: { line: 9, character: 5 } }
{ id: 53, type: "edge", label: "next", outV: 52, inV: 45 }

// Reference result for I#foo
{ id: 46, type: "vertex", label: "referenceResult" }
{ id: 47, type: "edge", label: "textDocument/references", outV: 13, inV: 46 }

// Reference result for II#foo
{ id: 48, type: "vertex", label: "referenceResult" }
{ id: 49, type: "edge", label: "textDocument/references", outV: 27, inV: 48 }

// Reference result for B#foo
{ id: 116 "typ" :"vertex", label: "referenceResult" }
{ id: 117 "typ" :"edge", label: "textDocument/references", outV: 45, inV: 116 }

// Link B#foo reference result to I#foo and II#foo
{ id: 118 "typ" :"edge", label: "item", outV: 116, inVs: [46,48], document: 4, property: "referenceResults" }

For Typescript, method references are recorded at their most abstract declaration and if methods are merged (B#foo), they are combined using a reference result pointing to other results.

Request: textDocument/implementation

Supporting a textDocument/implementation request is done reusing what we implemented for a textDocument/references request. In most cases, the textDocument/implementation returns the declaration values of the reference result that a symbol declaration points to. For cases where the result differs, the LSIF provides an ImplementationResult. To nest implementation results the item edge supports a property value "implementationResults".

The corresponding ImplementationResult looks like this:

interface ImplementationResult {

  label: `implementationResult`
}

Request: textDocument/typeDefinition

Supporting textDocument/typeDefinition is straightforward. The edge is either recorded at the range or the ResultSet.

The corresponding TypeDefinitionResult looks like this:

interface TypeDefinitionResult {

  label: `typeDefinitionResult`
}

For the following TypeScript example:

interface I {
  foo(): void;
}

let i: I;

The relevant emitted vertices and edges looks like this:

// The document
{ id: 4, type: "vertex", label: "document", uri: "file:///Users/dirkb/sample.ts", languageId: "typescript" }

// The declaration of I
{ id: 6, type: "vertex", label: "resultSet" }
{ id: 9, type: "vertex", label: "range", start: { line: 0, character: 10 }, end: { line: 0, character: 11 } }
{ id: 10, type: "edge", label: "next", outV: 9, inV: 6 }

// The declaration of i
{ id: 26, type: "vertex", label: "resultSet" }
// The type definition result
{ id: 37, type: "vertex", label: "typeDefinitionResult" }
// Hook the result to the declaration
{ id: 38, type: "edge", label: "textDocument/typeDefinition", outV: 26, inV:37 }
// Add the declaration of I as a target range.
{ id: 51, type: "edge", label: "item", outV: 37, inVs: [9], document: 4 }

As with other results ranges get added using a item edge. In this case without a property since there is only on kind of range.

Document requests

The Language Server Protocol also supports requests for documents only (without any position information). These requests are textDocument/foldingRange, textDocument/documentLink, and textDocument/documentSymbol. We follow the same pattern as before to model these, the difference being that the result is linked to the document instead of to a range.

Request: textDocument/foldingRange

For the folding range result this looks like this:

function hello() {
  console.log('Hello');
}

function world() {
  console.log('world');
}

function space() {
  console.log(' ');
}
hello();space();world();
{ id: 2, type: "vertex", label: "document", uri: "file:///Users/dirkb/sample.ts", languageId: "typescript" }
{ id: 112, type: "vertex", label: "foldingRangeResult", result:[
  { startLine: 0, startCharacter: 16, endLine: 2, endCharacter: 1 },
  { startLine: 4, startCharacter: 16, endLine: 6, endCharacter: 1 },
  { startLine: 8, startCharacter: 16, endLine: 10, endCharacter: 1 }
]}
{ id: 113, type: "edge", label: "textDocument/foldingRange", outV: 2, inV: 112 }

The corresponding FoldingRangeResult is defined as follows:

export interface FoldingRangeResult {
  label: 'foldingRangeResult';

  result: lsp.FoldingRange[];
}

Request: textDocument/documentLink

Again, for document links, we define a result type and a corresponding edge to link it to a document. Since the link location usually appears in comments, the ranges don't denote any symbol declarations or references. We, therefore, inline the range into the result as we do with folding ranges.

export interface DocumentLinkResult {
  label: 'documentLinkResult';

  result: lsp.DocumentLink[];
}

Request: textDocument/documentSymbol

Next we look at the textDocument/documentSymbol request. This request usually returns an outline view of the document in hierarchical form. However, not all programming symbols declared or defined in a document are part of the result (for example, locals are usually omitted). In addition, an outline item needs to provide additional information like the full range and a symbol kind. There are two ways we can model this: either we do the same as we do for folding ranges and the document links and store the information in a document symbol result as literals, or we extend the range vertex with some additional information and refer to these ranges in the document symbol result. Since the additional information for ranges might be helpful in other scenarios as well, we support adding additional tags to these ranges by defining a tag property on the range vertex.

The following tags are currently supported:

/**
 * The range represents a declaration
 */
export interface DeclarationTag {

  /**
   * A type identifier for the declaration tag.
   */
  type: 'declaration';

  /**
   * The text covered by the range
   */
  text: string;

  /**
   * The kind of the declaration.
   */
  kind: lsp.SymbolKind;

  /**
   * The full range of the declaration not including leading/trailing whitespace but everything else, e.g comments and code.
   * The range must be included in fullRange.
   */
  fullRange: lsp.Range;

  /**
   * Optional detail information for the declaration.
   */
  detail?: string;
}

/**
 * The range represents a definition
 */
export interface DefinitionTag {
  /**
   * A type identifier for the declaration tag.
   */
  type: 'definition';

  /**
   * The text covered by the range
   */
  text: string;

  /**
   * The symbol kind.
   */
  kind: lsp.SymbolKind;

  /**
   * The full range of the definition not including leading/trailing whitespace but everything else, e.g comments and code.
   * The range must be included in fullRange.
   */
  fullRange: lsp.Range;

  /**
   * Optional detail information for the definition.
   */
  detail?: string;
}

/**
 * The range represents a reference
 */
export interface ReferenceTag {

  /**
   * A type identifier for the reference tag.
   */
  type: 'reference';

  /**
   * The text covered by the range
   */
  text: string;
}

/**
 * The type of the range is unknown.
 */
export interface UnknownTag {

  /**
   * A type identifier for the unknown tag.
   */
  type: 'unknown';

  /**
   * The text covered by the range
   */
  text: string;
}

Emitting the tags for the following TypeScript example:

function hello() {
}

hello();

Will look like this:

{ id: 2, type: "vertex", label: "document", uri: "file:///Users/dirkb/sample.ts", languageId: "typescript" }
{ id: 4, type: "vertex", label: "resultSet" }
{ id: 7, type: "vertex", label: "range",
  start: { line: 0, character: 9 }, end: { line: 0, character: 14 },
  tag: { type: "definition", text: "hello", kind: 12, fullRange: { start: { line: 0, character: 0 }, end: { line: 1, character: 1 }}}
}

The document symbol result is then modeled as follows:

export interface RangeBasedDocumentSymbol {

  id: RangeId

  children?: RangeBasedDocumentSymbol[];
}

export interface DocumentSymbolResult extends V {

  label: 'documentSymbolResult';

  result: lsp.DocumentSymbol[] | RangeBasedDocumentSymbol[];
}

The given TypeScript example:

namespace Main {
  function hello() {
  }
  function world() {
    let i: number = 10;
  }
}

Produces the following output:

// The document
{ id: 2 , type: "vertex", label: "document", uri: "file:///Users/dirkb/sample.ts", languageId: "typescript" }
// The declaration of Main
{ id: 7 , type: "vertex", label: "range", start: { line: 0, character: 10 }, end: { line: 0, character: 14 }, tag: { type: "definition", text: "Main", kind: 7, fullRange: { start: { line: 0, character: 0 }, end: { line: 5, character: 1 } } } }
// The declaration of hello
{ id: 18 , type: "vertex", label: "range", start: { line: 1, character: 11 }, end: { line: 1, character: 16 }, tag: { type: "definition", text: "hello", kind: 12, fullRange: { start: { line: 1, character: 2 }, end: { line: 2, character: 3 } } } }
// The declaration of world
{ id: 29 , type: "vertex", label: "range", start: { line: 3, character: 11 }, end: { line: 3, character: 16 }, tag: { type: "definition", text: "world", kind: 12, fullRange: { start: { line: 3, character: 2 }, end: { line: 4, character: 3 } } } }
// The document symbol
{ id: 39 , type: "vertex", label: "documentSymbolResult", result: [ { id: 7 , children: [ { id: 18 }, { id: 29 } ] } ] }
{ id: 40 , type: "edge", label: "textDocument/documentSymbol", outV: 2, inV: 39 }

Request: textDocument/diagnostic

The only information missing that is useful in a dump are the diagnostics associated with documents. Diagnostics in the LSP are modeled as push notifications sent from the server to the client. This doesn't work well with a dump modeled on request method names. However, the push notification can be emulated as a request where the request's result is the value sent during the push as a parameter.

In the dump, we model diagnostics as follows:

  • We introduce a pseudo request textDocument/diagnostic.
  • We introduce a diagnostic result that contains the diagnostics associated with a document.

The result looks like this:

export interface DiagnosticResult {

  label: 'diagnosticResult';

  result: lsp.Diagnostic[];
}

The given TypeScript example:

function foo() {
  let x: string = 10;
}

Produces the following output:

{ id: 2, type: "vertex", label: "document", uri: "file:///Users/dirkb/sample.ts", languageId: "typescript" }
{ id: 18, type: "vertex", label: "diagnosticResult", result: [{ severity: 1, code: 2322, message: "Type '10' is not assignable to type 'string'.", range: { start : { line: 1, character: 5 }, end: { line: 1, character: 6 } } } ] }
{ id: 19, type: "edge", label: "textDocument/diagnostic", outV: 2, inV: 18 }

Since diagnostics are not very common in dumps, no effort has been made to reuse ranges in diagnostics.

The Project vertex

Usually, language servers operate in some sort of project context. In TypeScript, a project is defined using a tsconfig.json file. C# and C++ have their own means. The project file usually contains information about compile options and other parameters. Having these in the dump can be valuable. The LSIF, therefore, defines a project vertex. In addition, all documents that belong to that project are connected to the project using a contains edge. If there was a tsconfig.json in the previous examples, the first emitted edges and vertices would look like this:

{ id: 1, type: "vertex", label: "project", resource: "file:///Users/dirkb/tsconfig.json", kind: "typescript"}
{ id: 2, type: "vertex", label: "document", uri: "file:///Users/dirkb/sample.ts", languageId: "typescript" }
{ id: 3, type: "edge", label: "contains", outV: 1, inVs: [2] }

The definition of the project vertex looks as follows:

export interface Project extends V {

	/**
	 * The label property.
	 */
	label: VertexLabels.project;

	/**
	 * The project kind like 'typescript' or 'csharp'. See also the language ids
	 * in the [specification](https://microsoft.github.io/language-server-protocol/specification)
	 */
	kind: string;

	/**
	 * The resource URI of the project file.
	 */
	resource?: Uri;

	/**
	 * Optional the content of the project file, `base64` encoded.
	 */
	contents?: string;
}

Embedding contents

It can be valuable to embed the contents of a document or project file into the dump as well. For example, if the content of the document is a virtual document generated from program meta data. The index format, therefore, supports an optional contents property on the document and project vertex. If used the content needs to be base64 encoded.

Events

To ease the processing of an LSIF dump to for example import it into a database the dump emits begin and end events for documents and projects. After the end event of a document has been emitted the dump must not contain any further data referencing that document. For example, no ranges from that document can be referenced in item edges. Nor can result sets or other vertices linked to the ranges in that document. The document can however be referenced in a contains edge adding the document to a project. The begin/end events for documents look like this:

// The actual document
{ id: 4, type: "vertex", label: "document", uri: "file:///Users/dirkb/sample.ts", languageId: "typescript", contents: "..." }
// The begin event
{ id: 5, type: "vertex", label: "$event", kind: "begin", scope: "document" , data: 4 }
// The end event
{ id: 53, type: "vertex", label: "$event", kind: "end", scope: "document" , data: 4 }

Between the document vertex 4 and the document begin event 5 no information specific to document 4 can be emitted. Please note that more than one document can be open at a given point in time meaning that there have been n different document begin events without corresponding document end events.

The events for projects looks similar:

{ id: 2, type: "vertex", label: "project", kind: "typescript" }
{ id: 4, type: "vertex", label: "document", uri: "file:///Users/dirkb/sample.ts", languageId: "typescript", contents: "..." }
{ id: 5, type: "vertex", label: "$event", kind: "begin", scope: "document" , data: 4 }
{ id: 3, type: "vertex", label: "$event", kind: "begin", scope: "project", data: 2 }
{ id: 53, type: "vertex", label: "$event", kind: "end", scope: "document", data: 4 }
{ id: 54, type: "edge", label: "contains", outV: 2, inVs: [4] }
{ id: 55, type: "vertex", label: "$event", kind: "end", scope: "project", data: 2 }

Project exports and external imports (Monikers)

One use case of the LSIF is to create dumps for released versions of a product, either a library or a program. If a project A references a library B, it would also be useful if the information in these two dumps could be related. To make this possible, the LSIF introduces optional monikers which can be linked to ranges using a corresponding edge. The monikers can be used to describe what a project exports and what it imports. Let's first look at the export case.

Consider the following TypeScript file called index.ts:

export function func(): void {
}

export class Emitter {
  private doEmit() {
  }

  public emit() {
    this.doEmit();
  }
}
{ id: 4, type: "vertex", label: "document", uri: "file:///Users/dirkb/index.ts", languageId: "typescript", contents: "..." }
{ id: 11, type: "vertex", label: "resultSet" }
{ id: 12, type: "vertex", label: "moniker", kind: "export", scheme: "tsc", identifier: "lib/index:func" }
{ id: 13, type: "edge", label: "moniker", outV: 11, inV: 12 }
{ id: 14, type: "vertex", label: "range", start: { line: 0, character: 16 }, end: { line: 0, character: 20 } }
{ id: 15, type: "edge", label: "next", outV: 14, inV: 11 }

{ id: 18, type: "vertex", label: "resultSet" }
{ id: 19, type: "vertex", label: "moniker", kind: "export", scheme: "tsc", identifier: "lib/index:Emitter" }
{ id: 20, type: "edge", label: "moniker", outV: 18, inV: 19 }
{ id: 21, type: "vertex", label: "range", start: { line: 3, character: 13 }, end: { line: 3, character: 20 } }
{ id: 22, type: "edge", label: "next", outV: 21, inV: 18 }

{ id: 25, type: "vertex", label: "resultSet" }
{ id: 26, type: "vertex", label: "moniker", kind: "export", scheme: "tsc", identifier: "lib/index:Emitter.doEmit" }
{ id: 27, type: "edge", label: "moniker", outV: 25, inV: 26 }
{ id: 28, type: "vertex", label: "range", start: { line: 4, character: 10 }, end: { line: 4, character: 16 } }
{ id: 29, type: "edge", label: "next", outV: 28, inV: 25 }

{ id: 32, type: "vertex", label: "resultSet" }
{ id: 33, type: "vertex", label: "moniker", kind: "export", scheme: "tsc", identifier: "lib/index:Emitter.emit" }
{ id: 34, type: "edge", label: "moniker", outV: 32, inV: 33 }
{ id: 35, type: "vertex", label: "range", start: { line: 7, character: 9 }, end: { line: 7, character: 13 } }
{ id: 36, type: "edge", label: "next", outV: 35, inV: 32 }

This describes the exported declaration inside index.ts with a moniker (e.g. a handle in string format) that is bound to the corresponding range declaration. The generated moniker must be position-independent and stable so that it can be used to identify the symbol in other projects or documents. It should be sufficiently unique so as to avoid matching other monikers in other projects unless they actually refer to the same symbol. A moniker, therefore, has two properties: a scheme to indicate how the identifier is to be interpreted. And the identifier to actually identify the symbol. Its structure is opaque to the scheme owner. In the above example, the monikers are created by the TypeScript compiler tsc and can only be compared to monikers also having the scheme tsc.

Please also note that the method Emitter#doEmit has a moniker although the method is private. If private elements do have monikers depend on the programming language. Since TypeScript can't enforce visibility (it compiles to JS which doesn't have the concept) we treat them as visible. Even the TypeScript language server does so. Find all references does find all references to private methods even if it is flagged as a visibility violation.

How these exported elements are visible in other projects in most programming languages depends on how many files are packaged into a library or program. In TypeScript, the standard package manager is npm.

Consider that the following package.json file exists:

{
  "name": "lsif-ts-sample",
  "version": "1.0.0",
  "description": "",
  "main": "lib/index.js",
  "author": "MS",
  "license": "MIT",
}

then these monikers can be translated into monikers that are npm dependent. Instead of replacing the monikers we emit a second set of monikers and link the tsc monikers to corresponding npm monikers using a nextMonikeredge:

{ id: 991, type: "vertex", label: "packageInformation", name: "lsif-ts-sample", manager: "npm", version: "1.0.0" }

{ id: 987, type: "vertex", label: "moniker", kind: "export", scheme: "npm", identifier: "lsif-ts-sample::func" }
{ id: 986, type: "edge", label: "packageInformation", outV: 987, inV: 991 }
{ id: 985, type: "edge", label: "nextMoniker", outV: 12, inV: 987 }

{ id: 984, type: "vertex", label: "moniker", kind: "export", scheme: "npm", identifier: "lsif-ts-sample::Emitter" }
{ id: 983, type: "edge", label: "packageInformation", outV: 984, inV: 991 }
{ id: 982, type: "edge", label: "nextMoniker", outV: 19, inV: 984 }

{ id: 981, type: "vertex", label: "moniker", kind: "export", scheme: "npm", identifier: "lsif-ts-sample::Emitter.doEmit" }
{ id: 980, type: "edge", label: "packageInformation", outV: 981, inV: 991 }
{ id: 979, type: "edge", label: "nextMoniker", outV: 26, inV: 981 }

{id: 978, type: "vertex", label: "moniker", kind: "export", scheme: "npm", identifier: "lsif-ts-sample::Emitter.emit" }
{id: 977, type: "edge", label: "packageInformation", outV: 978, inV: 991 }
{id: 976, type: "edge", label: "nextMoniker", outV: 33, inV: 978 }

Things to observe:

  • a special packageInformationvertex got emitted to point to the corresponding npm package information.
  • the npm moniker refer to the package name.
  • since the file index.ts is the npm main file the moniker identifier as no file path. The is comparable to importing this module into TypeScript or JavaScript were only the module name and no file path is used (e.g. import * as lsif from 'lsif-ts-sample').
  • the nextMoniker edge points from the tsc moniker vertex to the npm moniker vertex.

For LSIF we recommend that a second tool is used to make the monikers emitted by the indexer be package manager dependent. This supports the use of different package managers and allows incorporating custom build tools. In the TypeScript implementation, this is done by an npm specific tool that rewrites the monikers taking the npm package information into account.

Reporting importing external symbols is done using the same approach. The LSIF emits monikers of kind import. Consider the following typescript example:

import * as mobx from 'mobx';

let map: mobx.ObservableMap = new mobx.ObservableMap();

where mobx is the npm mobx package. Running the tsc index tools produces:

{ id: 41, type: "vertex", label: "document", uri: "file:///Users/dirkb/samples/node_modules/mobx/lib/types/observablemap.d.ts", languageId: "typescript", contents: "..." }
{ id: 55, type: "vertex", label: "resultSet" }
{ id: 57, type: "vertex", label: "moniker",  kind: "import", scheme: "tsc", identifier: "node_modules/mobx/lib/mobx:ObservableMap" }
{ id: 58, type: "edge", label: "moniker", outV: 55, inV: 57 }
{ id: 59, type: "vertex", label: "range", start: { line: 17, character: 538 }, end: { line: 17, character: 551 } }
{ id: 60, type: "edge", label: "next", outV: 59, inV: 55 }

Three things to note here: First, TypeScript uses declarations files for externally imported symbols. That has the nice effect that the moniker information can be attached to the declaration ranges in these files. In other languages, the information might be attached to the file actually referencing the symbol. Or a virtual document for the referenced item is generated. Second, the tool only generates this information for symbols actually referenced, not for all available symbols. Third these monikers are tsc specific and point to the node_modules folder.

However piping this information through the npm tool will generate the following information:

{id: 991, type: "vertex", label: "packageInformation", name: "mobx", manager: "npm", version: "5.6.0", repository: { type: "git", url: "git+https://github.com/mobxjs/mobx.git" } }
{ id: 978, type: "vertex", label: "moniker", kind: "import", scheme: "npm", identifier: "mobx::ObservableMap" }
{ id: 977, type: "edge", label: "packageInformation", outV: 978, inV: 991 }
{ id: 976, type: "edge", label: "nextMoniker", outV: 978, inV: 57 }

which made the moniker specific to the npm mobx package. In addition information about the mobx package itself got emitted. Please note that since this is an import moniker the nextMoniker edge points from the npm moniker to the tsc moniker.

Usually, monikers are attached to result sets since they are the same for all ranges pointing to the result set. However, for dumps that don't use result sets, monikers can also be emitted on ranges.

For tools processing the dump and importing it into a database, it is sometimes useful to know whether a result is local to a file or not (for example function arguments can only be navigated inside the file). To help postprocessing tools to decide this LSIF generation tools should generate a moniker for locals as well. The corresponding kind to use is local. The identifier should still be unique inside the document.

For the following example

function foo(x: number): void {
}

The moniker for x looks like this:

{ id: 13, type: "vertex", label: "resultSet" }
{ id: 14, type: "vertex", label: "moniker", kind: "local", scheme: "tsc", identifier: "SfeOP6s53Y2HAkcViolxYA==" }
{ id: 15, type: "edge", label: "moniker", outV: 13, inV: 14 }
{ id: 16, type: "vertex", label: "range", start: { line: 0, character: 13 }, end: { line: 0, character: 14 }, tag: { type: "definition", text: "x", kind: 7, fullRange: { start: { line: 0, character: 13 }, end: { line: 0, character: 22 } } } }
{ id: 17, type: "edge", label: "next", outV: 16, inV: 13 }

In addition to this moniker schemes starting with $ are reserved and shouldn't be used by an LSIF tool.

Result ranges

Ranges in LSIF have currently two meanings:

  1. they act as LSP request sensitive areas in a document (e.g. we use them to decide if for a given position a corresponding LSP request result exists)
  2. they act as navigation targets (e.g. they are the result of a Go To declaration navigation).

To fulfill the first LSIF specifies that ranges can't overlap or be the same. However, this constraint is not necessary for the second meaning. To support equal or overlapping target ranges we introduce a vertex resultRange. It is not allowed to use a resultRange as a target in a contains edge.

Meta Data Vertex

To support versioning the LSIF defines a meta data vertex as follows:

export interface MetaData {

  /**
   * The label property.
   */
  label: 'metaData';

  /**
   * The version of the LSIF format using semver notation. See https://semver.org/. Please note
   * the version numbers starting with 0 don't adhere to semver and adopters have to assume
   * that each new version is breaking.
   */
  version: string;

  /**
   * The project root (in form of an URI) used to compute this dump.
   */
  projectRoot: Uri;

  /**
   * The string encoding used to compute line and character values in
   * positions and ranges. Currently only 'utf-16' is supported due to the
   * limitations in LSP.
   */
  positionEncoding: 'utf-16',

  /**
   * Information about the tool that created the dump
   */
  toolInfo?: {
    name: string;
    version?: string;
    args?: string[];
  }
}

Emitting constraints

The following emitting constraints (some of which have already mean mentioned in the document) exists:

  • a vertex needs to be emitted before it can be referenced in an edge.
  • a range and resultRange can only be contained in one document.
  • a resultRange can not be used as a target in a contains edge.
  • after a document end event has been emitted only result sets, reference or implementation results emitted through that document can be referenced in edges. It is for example not allowed to reference ranges or result ranges from that document. This also includes adding monikers to ranges or result sets. The document data so to speak can not be altered anymore.
  • if ranges point to result sets and monikers are emitted, they must be emitted on the result set and can't be emitted on individual ranges.

Tools

Open Questions

While implementing this for TypeScript and npm we collected a list of open questions in the form of GitHub issues we are already aware of.