Fix: Occasionally `publishDiagnostics` is not sent #887

Booksbaum · 2022-02-12T18:18:47Z

tl;dr: There's a lock issue in DiagnosticCollection which results in an ignored exception (-> event textDocument/publishDiagnostics doesn't get sent)

Description

(slightly verbose because I used this (at least to some extend) to structure my findings and detect where to look next)

LSP-Event textDocument/publishDiagnostics is used to send Diagnostics. But sometimes it doesn't get sent when it should be.

In FSAC publishDiagnostics gets send after: F# parsing, after each built-in Analyzer, after Custom Analyzers and (currently uncommented) after Linting.

When all three built-in analyzers (UnusedOpensAnalyzer, UnusedDeclarationsAnalyzer, SimplifyNameAnalyzer) are enabled, publishDiagnostics should occur four times: once for F# Compiler, and once for each Analyzer. (Sending happens in FsAutoComplete.Lsp.fs::DiagnosticCollection)

But occasionally an event doesn't get sent.
It's not deterministic -> no simple test to reproduce error.

But by running a test quite often it's likely the failure emerges (at least on my PC):

Checkout 3rd commit of this PR (4c79775)

run

dotnet run --project ./test/FsAutoComplete.Tests.Lsp -- --filter "lsp.Ionide WorkspaceLoader.diagnostics tests" --exclude-from-log=Checker,CheckerEvents,Commands,Opts,Analyzers,LSP

Quite likely (at least on my PC) at least one test will fail (if not: run again until a test fails...)

The tests are all the same -- it just gets execute quite often (100 times).
The test itself is quite simple: (shortened)

let tdop : DidOpenTextDocumentParams = {
  TextDocument = {
    Uri = $"untitled:Untitled-{i}"
    LanguageId = "fsharp"
    Version = 0
    Text = source
  }
}
do! server.TextDocumentDidOpen tdop
// there should be 4 `textDocument/publishDiagnostics` events:
//   one for F# Compiler, one for each of the three (enabled) built-in analyzers
let! _ =
  events
  |> fileDiagnostics uri
  |> Observable.take 4
  |> Observable.last
  |> Observable.timeoutSpan (TimeSpan.FromSeconds 3.0)
  |> Async.AwaitObservable

-> It just creates a new text document and waits for all 4 publishDiagnostics events. Fails when it doesn't get all 4 events in 3s.

Note: I'm deliberately using dotnet run with quite some logging instead of dotnet test: Makes the Issue far more likely to emerge. In fact: with dotnet test, the new tests never failed on my PC. (Logging is quite important as it makes the issue more likely to appear.)

Reason

Diagnostics are collected and sent in FsAutoComplete.Lsp.fs::DiagnosticsCollection: It collects Diags for each file and each diag source (F# compiler, built-in analyzer, ...):

SetFor(fileUri, kind, values: Diagnostic[]):
gets a mailbox for passed fileUri (-> getOrAddAgent (fileUri)),
and then sends the diags and kind to the mailbox loop
- kind is diag source like F# compiler or F# Unused opens
mailbox loop in agentFor(uri):
handles changed diagnostics for an uri: remembers diags for each kind and sends out all diags for that uri on every received mailbox message (-> every parse step).

Logging shows: Getting a mailbox in getOrAddAgent blocks when two want a mailbox for the same uri at the same time, but the mailbox doesn't exist yet.
In Log Messages:

[19:15:09 INF] Creating `untitled:Untitled-56` <diags tests 56>
[19:15:09 INF] waiting for events on file untitled:Untitled-56 <LSPTests>
[19:15:09 INF] [DiagnosticCollection] >> setFor(F# Compiler, untitled:Untitled-56)
[19:15:09 INF] [DiagnosticCollection] :: getOrAdd(untitled:Untitled-56)
[19:15:09 INF] [DiagnosticCollection] !! lock(agents, untitled:Untitled-56)
[19:15:09 INF] [DiagnosticCollection] >> setFor(F# Unused declarations, untitled:Untitled-56)
[19:15:09 INF] [DiagnosticCollection] :: getOrAdd(untitled:Untitled-56)
[19:15:09 INF] [DiagnosticCollection] !! lock(agents, untitled:Untitled-56)
[19:15:09 INF] [DiagnosticCollection] << mailbox(F# Compiler, untitled:Untitled-56)
[19:15:09 INF] [Diagnostics/untitled:Untitled-56] -> Clear(F# Compiler)
[19:15:09 INF] [DiagnosticCollection] -- Error(F# Unused declarations): An item with the same key has already been added. Key: untitled:Untitled-56
[19:15:09 INF] [DiagnosticCollection] >> setFor(F# simplify names, untitled:Untitled-56)
[19:15:09 INF] [DiagnosticCollection] :: getOrAdd(untitled:Untitled-56)
[19:15:09 INF] [DiagnosticCollection] << mailbox(F# simplify names, untitled:Untitled-56)
[19:15:09 INF] [Diagnostics/untitled:Untitled-56] -> Add(F# simplify names)
[19:15:09 INF] [DiagnosticCollection] >> setFor(F# Unused opens, untitled:Untitled-56)
[19:15:09 INF] [DiagnosticCollection] :: getOrAdd(untitled:Untitled-56)
[19:15:09 INF] [DiagnosticCollection] << mailbox(F# Unused opens, untitled:Untitled-56)
[19:15:09 INF] [Diagnostics/untitled:Untitled-56] -> Add(F# Unused opens)
[19:15:12 ERR] lsp.Ionide WorkspaceLoader.diagnostics tests.can get all publishDiagnostics events 56 errored in 00:00:03.0030000 <Expecto>
System.TimeoutException: The operation has timed out.

For every kind (F# Compiler, F# Unused delcarations, F# simplify names, F# unused opens) there should be:

>> setFor(F# Compiler, untitled:Untitled-56)
:: getOrAdd(untitled:Untitled-56)
<< mailbox(F# Compiler, untitled:Untitled-56)
-> Clear(F# Compiler)           // or `-> Add(...)`

But: F# Unused declarations does never return from getOrAdd with a mailbox (no << mailbox(F# Unused declarations, ...))

getOrAddAgent:

and getOrAddAgent fileUri =
  logger.info (Log.setMessage ":: getOrAdd({uri})" >> Log.addContext "uri" (UMX.untag fileUri))
  match agents.TryGetValue fileUri with
  | true, mailbox -> mailbox
  | false, _ ->
    logger.info (Log.setMessage "!! lock(agents, {uri})" >> Log.addContext "uri" (UMX.untag fileUri))
    lock agents (fun _ ->
      let cts = new CancellationTokenSource()
      let mailbox = agentFor fileUri cts.Token
      agents.Add(fileUri, mailbox)
      ctoks.Add(fileUri, cts)
      mailbox)

-> Both Compiler and Unused Declarations want to create a mailbox because it doesn't yet exist. But only Compiler actually exits with a mailbox.

The other one fails because agents already contains the key fileUri -> ArgumentException with Message An item with the same key has already been added. (-> inserts a new mailbox for an already existing uri)
Further Complication: The exception doesn't surface, but instead gets swallowed.

In practice probably not really an issue:

unlikely to happen
can only happen for newly opened file. After first publishDiagnostics for that file, future parses/publishDiags for that file aren't affected any more.

Fix

Use ConcurrentDictionary instead of Dictionary and custom locking

Note:
I didn't add any tests (or removed the example tests again):
Solution is something external (-> ConcurrentDictionary).
Additional: issue was not deterministic but just: run this over and over again and MAYBE it will fail...

Additional:

Add error logging to handleCommandEvents

Additional

I added two command line log filters for tests (via dotnet run --project ./test/FsAutoComplete.Tests.Lsp -- ...):

--log={level}: Sets Log Level. {level} is something like warn or info.
- (existing) --debug (= verbose logging) takes precedence over --log={level}
- when multiple --log=: first occurrence is used
--exclude-from-log={logger names}: Filters out loggers with the specified names
- filters just serilog logging (-> used in srcs; in tests Expecto is used for logging)
- {log sources}: Comma-separated list with logger names. Like --exclude-from-log=Checker,Commands,Opts,LSP
  (-> doesn't support spaces)
- can occur multiple times (-> --exclude-from-log=Checker,Commands,Opts,LSP = --exclude-from-log=Checker,Commands --exclude-from-log=Opts,LSP)

I implemented these to help me isolate the issue above (by reducing log clutter).
Might be useful for someone else too
-> I left these command line options in this PR.

Notes:

xxx=y[,y,y] because it's easy to parse (vs. separated by spaces)

(for running tests via `dotnet run --project ./test/FsAutoComplete.Tests.Lsp -- ...`) * `--log={level}`: Set Log Level. `{level}` is something like `warn` or `info`. * (existing) `--debug` (= verbose logging) takes precedence over `--log={level}` * when multiple `--log=`: first occurrence is used * `--exclude-from-log={logger names}`: Filters out loggers with the specified names * filters just serilog logging (-> used in `./src/`s; in `./test/`s Expecto is used for logging) * `{log sources}`: Comma-separated list with logger names. Like `--exclude-from-log=Checker,Commands,Opts,LSP` (-> doesn't support spaces) * can occur multiple times (-> `--exclude-from-log=Checker,Commands,Opts,LSP` = `--exclude-from-log=Checker,Commands --exclude-from-log=Opts,LSP`) Example: ``` dotnet run --project ./test/FsAutoComplete.Tests.Lsp -- --filter "lsp.Ionide WorkspaceLoader.Code Lens Tests" --exclude-from-log=Checker,Commands,Opts,Analyzers ``` -> doesn't print loggers `Checker`, `Commands`, `Opts`, `Analyzers`; but does print other log sources (for example `LSP`)

run: ``` dotnet run --project ./test/FsAutoComplete.Tests.Lsp -- --filter "lsp.Ionide WorkspaceLoader.diagnostics tests" --exclude-from-log=Checker,Commands --exclude-from-log=Opts,Analyzers,LSP ``` Failure isn't deterministic -> Same test run over and over again (currently: 100 times). Failure will probably show up. If not: run again... Additional: more logging was added to pinpoint location of failure. Note: `dotnet run` instead of `dotnet test`: is slower and with logging -> failure is more likely to emerge. (On my PC: `dotnet test` doesn't fail, but `dotnet run` fails more often than not)

Failure wasn't deterministic. In fact: extremely unlike to show up with `dotnet test` and no logging (which was removed too) -> no real point in running tests

baronfel · 2022-02-12T18:23:01Z

This is awesome. Your investigation makes total sense, the utilities you added to reproduce and solve the problem make sense, and your proposed solution makes sense.

test/FsAutoComplete.Tests.Lsp/Program.fs

Booksbaum added 7 commits February 8, 2022 14:02

Add logging for lock & exception

4c79775

Add Error Logging to handleCommandEvents

d0c595e

Use ConcurrentDictionary

26b04d2

Remove logging (introduced for debugging)

783a2fb

Remove tests

6e8b143

Failure wasn't deterministic. In fact: extremely unlike to show up with `dotnet test` and no logging (which was removed too) -> no real point in running tests

baronfel reviewed Feb 12, 2022

View reviewed changes

test/FsAutoComplete.Tests.Lsp/Program.fs Show resolved Hide resolved

baronfel reviewed Feb 12, 2022

View reviewed changes

test/FsAutoComplete.Tests.Lsp/Program.fs Show resolved Hide resolved

baronfel merged commit f236c7a into ionide:main Feb 12, 2022

Booksbaum deleted the SendDiagsLoop branch February 16, 2022 15:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Occasionally `publishDiagnostics` is not sent #887

Fix: Occasionally `publishDiagnostics` is not sent #887

Booksbaum commented Feb 12, 2022

baronfel commented Feb 12, 2022

Fix: Occasionally publishDiagnostics is not sent #887

Fix: Occasionally publishDiagnostics is not sent #887

Conversation

Booksbaum commented Feb 12, 2022

Description

Reason

Fix

Additional

baronfel commented Feb 12, 2022

Fix: Occasionally `publishDiagnostics` is not sent #887

Fix: Occasionally `publishDiagnostics` is not sent #887