Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Occasionally publishDiagnostics is not sent #887

Merged
merged 7 commits into from
Feb 12, 2022

Conversation

Booksbaum
Copy link
Contributor

tl;dr: There's a lock issue in DiagnosticCollection which results in an ignored exception (-> event textDocument/publishDiagnostics doesn't get sent)

Description

(slightly verbose because I used this (at least to some extend) to structure my findings and detect where to look next)

LSP-Event textDocument/publishDiagnostics is used to send Diagnostics. But sometimes it doesn't get sent when it should be.

In FSAC publishDiagnostics gets send after: F# parsing, after each built-in Analyzer, after Custom Analyzers and (currently uncommented) after Linting.

When all three built-in analyzers (UnusedOpensAnalyzer, UnusedDeclarationsAnalyzer, SimplifyNameAnalyzer) are enabled, publishDiagnostics should occur four times: once for F# Compiler, and once for each Analyzer. (Sending happens in FsAutoComplete.Lsp.fs::DiagnosticCollection)

But occasionally an event doesn't get sent.
It's not deterministic -> no simple test to reproduce error.

But by running a test quite often it's likely the failure emerges (at least on my PC):

  • Checkout 3rd commit of this PR (4c79775)
  • run
    dotnet run --project ./test/FsAutoComplete.Tests.Lsp -- --filter "lsp.Ionide WorkspaceLoader.diagnostics tests" --exclude-from-log=Checker,CheckerEvents,Commands,Opts,Analyzers,LSP
  • Quite likely (at least on my PC) at least one test will fail (if not: run again until a test fails...)

The tests are all the same -- it just gets execute quite often (100 times).
The test itself is quite simple: (shortened)

let tdop : DidOpenTextDocumentParams = {
  TextDocument = {
    Uri = $"untitled:Untitled-{i}"
    LanguageId = "fsharp"
    Version = 0
    Text = source
  }
}
do! server.TextDocumentDidOpen tdop
// there should be 4 `textDocument/publishDiagnostics` events:
//   one for F# Compiler, one for each of the three (enabled) built-in analyzers
let! _ =
  events
  |> fileDiagnostics uri
  |> Observable.take 4
  |> Observable.last
  |> Observable.timeoutSpan (TimeSpan.FromSeconds 3.0)
  |> Async.AwaitObservable

-> It just creates a new text document and waits for all 4 publishDiagnostics events. Fails when it doesn't get all 4 events in 3s.

Note: I'm deliberately using dotnet run with quite some logging instead of dotnet test: Makes the Issue far more likely to emerge. In fact: with dotnet test, the new tests never failed on my PC. (Logging is quite important as it makes the issue more likely to appear.)

Reason

Diagnostics are collected and sent in FsAutoComplete.Lsp.fs::DiagnosticsCollection: It collects Diags for each file and each diag source (F# compiler, built-in analyzer, ...):

  • SetFor(fileUri, kind, values: Diagnostic[]):
    gets a mailbox for passed fileUri (-> getOrAddAgent (fileUri)),
    and then sends the diags and kind to the mailbox loop
    • kind is diag source like F# compiler or F# Unused opens
  • mailbox loop in agentFor(uri):
    handles changed diagnostics for an uri: remembers diags for each kind and sends out all diags for that uri on every received mailbox message (-> every parse step).

Logging shows: Getting a mailbox in getOrAddAgent blocks when two want a mailbox for the same uri at the same time, but the mailbox doesn't exist yet.
In Log Messages:

[19:15:09 INF] Creating `untitled:Untitled-56` <diags tests 56>
[19:15:09 INF] waiting for events on file untitled:Untitled-56 <LSPTests>
[19:15:09 INF] [DiagnosticCollection] >> setFor(F# Compiler, untitled:Untitled-56)
[19:15:09 INF] [DiagnosticCollection] :: getOrAdd(untitled:Untitled-56)
[19:15:09 INF] [DiagnosticCollection] !! lock(agents, untitled:Untitled-56)
[19:15:09 INF] [DiagnosticCollection] >> setFor(F# Unused declarations, untitled:Untitled-56)
[19:15:09 INF] [DiagnosticCollection] :: getOrAdd(untitled:Untitled-56)
[19:15:09 INF] [DiagnosticCollection] !! lock(agents, untitled:Untitled-56)
[19:15:09 INF] [DiagnosticCollection] << mailbox(F# Compiler, untitled:Untitled-56)
[19:15:09 INF] [Diagnostics/untitled:Untitled-56] -> Clear(F# Compiler)
[19:15:09 INF] [DiagnosticCollection] -- Error(F# Unused declarations): An item with the same key has already been added. Key: untitled:Untitled-56
[19:15:09 INF] [DiagnosticCollection] >> setFor(F# simplify names, untitled:Untitled-56)
[19:15:09 INF] [DiagnosticCollection] :: getOrAdd(untitled:Untitled-56)
[19:15:09 INF] [DiagnosticCollection] << mailbox(F# simplify names, untitled:Untitled-56)
[19:15:09 INF] [Diagnostics/untitled:Untitled-56] -> Add(F# simplify names)
[19:15:09 INF] [DiagnosticCollection] >> setFor(F# Unused opens, untitled:Untitled-56)
[19:15:09 INF] [DiagnosticCollection] :: getOrAdd(untitled:Untitled-56)
[19:15:09 INF] [DiagnosticCollection] << mailbox(F# Unused opens, untitled:Untitled-56)
[19:15:09 INF] [Diagnostics/untitled:Untitled-56] -> Add(F# Unused opens)
[19:15:12 ERR] lsp.Ionide WorkspaceLoader.diagnostics tests.can get all publishDiagnostics events 56 errored in 00:00:03.0030000 <Expecto>
System.TimeoutException: The operation has timed out.

For every kind (F# Compiler, F# Unused delcarations, F# simplify names, F# unused opens) there should be:

>> setFor(F# Compiler, untitled:Untitled-56)
:: getOrAdd(untitled:Untitled-56)
<< mailbox(F# Compiler, untitled:Untitled-56)
-> Clear(F# Compiler)           // or `-> Add(...)`

But: F# Unused declarations does never return from getOrAdd with a mailbox (no << mailbox(F# Unused declarations, ...))

getOrAddAgent:

and getOrAddAgent fileUri =
  logger.info (Log.setMessage ":: getOrAdd({uri})" >> Log.addContext "uri" (UMX.untag fileUri))
  match agents.TryGetValue fileUri with
  | true, mailbox -> mailbox
  | false, _ ->
    logger.info (Log.setMessage "!! lock(agents, {uri})" >> Log.addContext "uri" (UMX.untag fileUri))
    lock agents (fun _ ->
      let cts = new CancellationTokenSource()
      let mailbox = agentFor fileUri cts.Token
      agents.Add(fileUri, mailbox)
      ctoks.Add(fileUri, cts)
      mailbox)

-> Both Compiler and Unused Declarations want to create a mailbox because it doesn't yet exist. But only Compiler actually exits with a mailbox.

The other one fails because agents already contains the key fileUri -> ArgumentException with Message An item with the same key has already been added. (-> inserts a new mailbox for an already existing uri)
Further Complication: The exception doesn't surface, but instead gets swallowed.

In practice probably not really an issue:

  • unlikely to happen
  • can only happen for newly opened file. After first publishDiagnostics for that file, future parses/publishDiags for that file aren't affected any more.

Fix

Use ConcurrentDictionary instead of Dictionary and custom locking

Note:
I didn't add any tests (or removed the example tests again):
Solution is something external (-> ConcurrentDictionary).
Additional: issue was not deterministic but just: run this over and over again and MAYBE it will fail...



Additional:






Additional

I added two command line log filters for tests (via dotnet run --project ./test/FsAutoComplete.Tests.Lsp -- ...):

  • --log={level}: Sets Log Level. {level} is something like warn or info.
    • (existing) --debug (= verbose logging) takes precedence over --log={level}
    • when multiple --log=: first occurrence is used
  • --exclude-from-log={logger names}: Filters out loggers with the specified names
    • filters just serilog logging (-> used in srcs; in tests Expecto is used for logging)
    • {log sources}: Comma-separated list with logger names. Like --exclude-from-log=Checker,Commands,Opts,LSP
      (-> doesn't support spaces)
    • can occur multiple times (-> --exclude-from-log=Checker,Commands,Opts,LSP = --exclude-from-log=Checker,Commands --exclude-from-log=Opts,LSP)

I implemented these to help me isolate the issue above (by reducing log clutter).
Might be useful for someone else too
-> I left these command line options in this PR.

Notes:

  • xxx=y[,y,y] because it's easy to parse (vs. separated by spaces)

(for running tests via `dotnet run --project ./test/FsAutoComplete.Tests.Lsp -- ...`)

* `--log={level}`: Set Log Level. `{level}` is something like `warn` or `info`.
  * (existing) `--debug` (= verbose logging) takes precedence over `--log={level}`
  * when multiple `--log=`: first occurrence is used
* `--exclude-from-log={logger names}`: Filters out loggers with the specified names
  * filters just serilog logging (-> used in `./src/`s; in `./test/`s Expecto is used for logging)
  * `{log sources}`: Comma-separated list with logger names. Like `--exclude-from-log=Checker,Commands,Opts,LSP`
    (-> doesn't support spaces)
  * can occur multiple times (-> `--exclude-from-log=Checker,Commands,Opts,LSP` = `--exclude-from-log=Checker,Commands --exclude-from-log=Opts,LSP`)

Example:
```
 dotnet run --project ./test/FsAutoComplete.Tests.Lsp -- --filter "lsp.Ionide WorkspaceLoader.Code Lens Tests" --exclude-from-log=Checker,Commands,Opts,Analyzers
```
-> doesn't print loggers `Checker`, `Commands`, `Opts`, `Analyzers`; but does print other log sources (for example `LSP`)
run:
```
dotnet run --project ./test/FsAutoComplete.Tests.Lsp -- --filter "lsp.Ionide WorkspaceLoader.diagnostics tests" --exclude-from-log=Checker,Commands --exclude-from-log=Opts,Analyzers,LSP
```

Failure isn't deterministic
-> Same test run over and over again (currently: 100 times). Failure will probably show up. If not: run again...

Additional: more logging was added to pinpoint location of failure.

Note: `dotnet run` instead of `dotnet test`: is slower and with logging -> failure is more likely to emerge.
(On my PC: `dotnet test` doesn't fail, but `dotnet run` fails more often than not)
Failure wasn't deterministic. In fact: extremely unlike to show up
with `dotnet test` and no logging (which was removed too)
-> no real point in running tests
@baronfel
Copy link
Contributor

This is awesome. Your investigation makes total sense, the utilities you added to reproduce and solve the problem make sense, and your proposed solution makes sense.

@baronfel baronfel merged commit f236c7a into ionide:main Feb 12, 2022
@Booksbaum Booksbaum deleted the SendDiagsLoop branch February 16, 2022 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants