Completion should take <16ms most of the time #7542

matklad · 2021-02-03T15:23:13Z

Code completion performance is more important than it seems. Obviously, completion shouldn't take a second or two. But even for "almost instant" completion, theres' a huge difference in experience when going 300ms -> 150ms -> 50 ms -> 16ms. It's difficult to perceive such small differences by just eyeballing a single completion, but it does affect fluency of typing a lot.

Historically, we haven't payed much attention to completion, barring obvious regressions. It was, and it is, generally fast enough. But it seems like we are at the point where it makes sense to push performance here a bit.

The 16ms is the boundary such that it doesn't make much sense to go beyond it (as thats comparable to typing latency), and which should be achievable. The size of the result is small, and the work should be roughly linear in the size of the result.

The best way to start here is to set this config:

    "rust-analyzer.server.extraEnv": {
        "RA_PROFILE": "handle_completion>16",
    },

and check Code's output panel for profiling info, which looks like this:

   85ms - handle_completion
       68ms - import_on_the_fly @ 
           67ms - import_assets::search_for_relative_paths
                0ms - crate_def_map:wait (804 calls)
                0ms - find_path (16 calls)
                2ms - find_similar_imports (1 calls)
                0ms - generic_params_query (334 calls)
               59ms - trait_solve_query (186 calls)
            0ms - Semantics::analyze_impl (1 calls)
            1ms - render_resolution (8 calls)
        0ms - Semantics::analyze_impl (5 calls)

Keep in mind that rust-analyzer employes lazy evaluation. That means that if both f and g call s, and s is slow, then s time will be attributed to either f or g depending on their relative order.

Another place to look at is the end-to-end request flow in the main loop. Profiling captures only the computation, but it's important that completion is not blocked by other requests.

It would also be sweet to implement some maintainable benchmarks here. This would have high impact, but I don't know how to best approach this.

Other that that, just look at the core with the profiler and try to make slow things faster!

The text was updated successfully, but these errors were encountered:

SomeoneToIgnore · 2021-02-03T22:49:43Z

Related issue about the benchmarks: #6633

A few comments on the interesting places in the handle_completion test profiling trace:

import_on_the_fly @ shows that we look up imports for blank input, which we only do, when searching for the associated items' completion (autoimporting traits).
We optimize away struct search, performing the actual search after the 3rd symbol in the completion input, but we cannot optimize the associated items this way: showing methods and functions not right away is awkward
the time difference between import_assets::search_for_relative_paths and trait_solve_query is 8ms, but 6ms are not shown in the profiling info, a few more profile::span placed in the modules used inside the import_assets::search_for_relative_paths should help to locate them
interesting to note, that it took only 2ms to look up all traits with at least one associated item in the project, as the find_similar_imports profile span tells.
We can still improve it nonetheless, this might be beneficial for other import_on_the_fly completion cases such as struct completions, we can remove that "after the 3rd symbol" "optimisation", but this all requires a separate set of checks.

As an optimisation idea, we can use a better key in import_map
https://github.com/rust-analyzer/rust-analyzer/blob/2c735ed734be9b9041921478e0049fffd7160f78/crates/hir_def/src/import_map.rs#L161

There we have an fst (the index we store textual full path strings in) that does a blazingly fast fuzzy search on the completion input, for our test profiling case.
Now, we use a lowercased path string, as a weird idea, we can prepend the paths in keys with special symbols (©, Ѭ?), depending on the entity type, thus decreasing the number of checks we need to do after getting the search results from fst.

trait_solve_query is called within the depth of the trait resolution, where we send all traits looked up on the previous step into chalk:

https://github.com/rust-analyzer/rust-analyzer/blob/342bf41022fa78856a48560b521ea32aa76f6f06/crates/ide_db/src/helpers/import_assets.rs#L267-L283

to retain only the traits that the completion receiver (some::path:: or struct.) implements.
We can limit the number of traits to check, but most probably not fully eliminate it.

There's a way to find a part of the traits that are implemented for a certain struct, which is used in the implementations code lens: https://github.com/rust-analyzer/rust-analyzer/blob/7967ce85cfc5fc2b1996425b44f2a45d0841c8ff/crates/ide/src/goto_implementation.rs#L41

Alas, it's limited and does not find some classes of trait impls entirely, and getting us there might be hard.
For example, one of those trait categories are the blanket trait impls: impl <T> MyTrait for T.
The following FIXME has two links with deeper ideas on the topic:
https://github.com/rust-analyzer/rust-analyzer/blob/342bf41022fa78856a48560b521ea32aa76f6f06/crates/ide_db/src/helpers/import_assets.rs#L177-L181

Yet, the fst lookup collects the all possible traits in the project for solving, including the ones that are not detected by the lens' code.
If we could come up with the way to filter out fst results to retain only the traits from such categories, we can reduce the number of trait solvings, optimising the slowest place in the current profile trace.
Additionally, we'll need to extract and profile the lens' code for finding the impls and add some code to filter out the traits, extract their associated item names and compare those to the fst search results (to reuse the approach for non-empty completion input)

matklad · 2021-02-05T14:17:10Z

Some more assorted thoughts:

on the fly imports are fundamentally costlier than anything else in completion, so it makes sense to specifically optimize that
the set of auto-imporatble items doesn't change between funcitons, so we might want to cache that
more generally, completions for xs.f and xs.fo are virtually identical, we should ensure that the second one takes almost no time
there's discontinuity in completion speed: it the completion is fast enough, code shows completion list without ctrl+space. We must always hit this breakpoint
on the protocol/API level, it might be worthwhile to separate immediate completions and slow completions. Itis imrortant that, eg, snippets are always instant, but its ok if flyimport is async
on the protocol level, we should measure the delay between the time we get a notification about a new char typed, and the time we get the completion request.

matklad · 2021-03-30T09:49:40Z

#8254 should help with perf optimization workflow.

9017: internal: Reduce the number of traits passed through chalk during applicable trait lookup r=SomeoneToIgnore a=SomeoneToIgnore Inherent traits can be omitted before trait solving, presumably slightly helping #7542 and slightly simplifying the code. Co-authored-by: Kirill Bulatov <[email protected]>

9206: minor: Speed up fst items lookup during completions r=SomeoneToIgnore a=SomeoneToIgnore Part of #7542 A number of profile calls added for `import_on_the_fly` contents. Before: <img width="606" alt="Screenshot 2021-06-11 at 00 19 13" src="https://user-images.githubusercontent.com/2690773/121598998-22321e80-ca4b-11eb-9a3d-dc9cb2936705.png"> After: <img width="859" alt="Screenshot 2021-06-11 at 00 19 27" src="https://user-images.githubusercontent.com/2690773/121599022-2a8a5980-ca4b-11eb-82b6-13ab0ed56d58.png"> As a result, low hanging fruit was spotted: crazy amount of `fst_path` calls. Reducing that won ~200ms in the `import_on_the_fly @ sel` case in the `integrated_completion_benchmark`: <img width="861" alt="Screenshot 2021-06-11 at 00 19 38" src="https://user-images.githubusercontent.com/2690773/121599277-7d641100-ca4b-11eb-8667-53206994de27.png"> I'm not sure how to proceed with the remaining `???` marks in such methods as `collect_import_map` though: there's nothing but library calls in cycles, but maybe I'll come up with something later. Co-authored-by: Kirill Bulatov <[email protected]>

9208: minor: Populate import maps eagerly to speed up flyimports r=SomeoneToIgnore a=SomeoneToIgnore Part of #7542 Follow up of #9206 (comment) Reduces `import_on_the_fly @ sel` case in the `integrated_completion_benchmark` by ~300ms. Also enables cache priming for manual workspace loading to reflect the results in the benchmarks. Before: <img width="1198" alt="image" src="https://user-images.githubusercontent.com/2690773/121606148-4a734a80-ca56-11eb-812a-7955e93817f1.png"> After: <img width="1200" alt="image" src="https://user-images.githubusercontent.com/2690773/121606156-4e06d180-ca56-11eb-891b-1ed878b53d7e.png"> Co-authored-by: Kirill Bulatov <[email protected]>

mirkoRainer · 2021-10-11T16:29:01Z

Hello. I'd like to tackle some of this work during my company's hackathon this week. Commenting here to hopefully claim it.

matklad · 2021-10-11T16:37:59Z

Splendid! @mirkoRainer running this test in release mode should be a good start.

matklad · 2021-10-11T16:38:42Z

See also https://github.com/rust-analyzer/rust-analyzer/blob/master/docs/dev/README.md#profiling and https://github.com/rust-analyzer/rust-analyzer/blob/1e9ecb9f5859afcca31b0c130b0b81325036e5a6/crates/profile/src/lib.rs#L85

mirkoRainer · 2021-10-17T18:12:21Z

Splendid! @mirkoRainer running this test in release mode should be a good start.

I'm terribly sorry, but I was not able to make any good progress on this issue during the Hackathon. I'm fine with leaving it assigned to me but I can't guarantee any timely progress on it.
Thank you though! I was able to learn a lot with the video series and reading the discussions around this issue.

hf29h8sh321 · 2022-01-11T23:43:46Z

I tested in a recent rust-analyzer version:

  288ms - handle_completion
       31ms - CompletionContext::new
           30ms - descend_into_macros
               30ms - Semantics::analyze_impl
                   29ms - SourceBinder::to_module_def
                       29ms - crate_def_map:wait
                        0   - relevant_crates (1 calls)
                    0   - body_with_source_map_query (1 calls)
                    0   - crate_def_map:wait (2 calls)
            0   - CompletionContext::expand_and_fill (1 calls)
            0   - Semantics::analyze_impl (1 calls)
            0   - crate_def_map:wait (1 calls)
      202ms - Semantics::analyze_impl
          202ms - infer:wait @ from_value
            0   - SourceBinder::to_module_def (1 calls)
            0   - crate_def_map:wait (1 calls)
       40ms - import_on_the_fly @ 
           40ms - import_assets::search_for_imports
               40ms - import_assets::search_for
                   40ms - import_assets::trait_applicable_items
                        0   - applicable_inherent_traits (1 calls)
                        0   - deref (2 calls)
                        0   - env_traits (1 calls)
                       20ms - get_name_definition (40 calls)
                       13ms - items_with_name (1 calls)
                        5ms - iterate_method_candidates (1 calls)
                    0   - import_assets::scope_definitions (1 calls)
            0   - Semantics::analyze_impl (1 calls)
            0   - render_resolution (1 calls)
        0   - Semantics::analyze_impl (1 calls)
        0   - complete_unqualified_path (1 calls)
        0   - crate_def_map:wait (226 calls)
        0   - deref (1 calls)
        0   - find_path_prefixed (3 calls)
        0   - item::Builder::build (17 calls)
       12ms - iterate_method_candidates (1 calls)
        0   - trait_solve::wait (3 calls)

flyimport now takes a small portion of the time. The majority of time is spent in analyze_impl

hf29h8sh321 · 2022-02-21T02:14:16Z

more generally, completions for xs.f and xs.fo are virtually identical, we should ensure that the second one takes almost no time

I tested Gnome-Builder with Rust-analyzer. The second one loads instantly even though the first one takes some time to process.

Remove redundant 'resolve_obligations_as_possible' call Hi! I was looking for a "good first issue" and saw this one: #7542. I like searching for performance improvements, so I wanted to try to find something useful there. There are two tests in integrated_benchmarks.rs, I looked at 'integrated_highlighting_benchmark' (not the one discussed in the issue above). Profile from that test looks like this: ``` $ RUN_SLOW_BENCHES=1 cargo test --release --package rust-analyzer --lib -- integrated_benchmarks::integrated_highlighting_benchmark --exact --nocapture Finished release [optimized] target(s) in 0.06s Running unittests src/lib.rs (target/release/deps/rust_analyzer-a80ca6bb8f877458) running 1 test workspace loading: 358.45ms initial: 9.60s change: 13.96µs cpu profiling is disabled, uncomment `default = [ "cpu_profiler" ]` in Cargo.toml to enable. 273ms - highlight 143ms - infer:wait @ per_query_memory_usage 143ms - infer_query 0 - crate_def_map:wait (3165 calls) 4ms - deref_by_trait (967 calls) 96ms - resolve_obligations_as_possible (22106 calls) 0 - trait_solve::wait (2068 calls) 21ms - Semantics::analyze_impl (18 calls) 0 - SourceBinder::to_module_def (20 calls) 36ms - classify_name (19 calls) 19ms - classify_name_ref (308 calls) 0 - crate_def_map:wait (461 calls) 4ms - descend_into_macros (628 calls) 0 - generic_params_query (4 calls) 0 - impl_data_with_diagnostics_query (1 calls) 45ms - infer:wait (37 calls) 0 - resolve_obligations_as_possible (2 calls) 0 - source_file_to_def (1 calls) 0 - trait_solve::wait (42 calls) after change: 275.23ms test integrated_benchmarks::integrated_highlighting_benchmark ... ok ``` 22106 calls to `resolve_obligations_as_possible` seem like the main issue there. One thing I noticed (and fixed in this PR) is that `InferenceContext::resolve_ty_shallow` first calls `resolve_obligations_as_possible`, and then calls `InferenceTable::resolve_ty_shallow`. But `InferenceTable::resolve_ty_shallow` [inside](https://github.com/rust-lang/rust-analyzer/blob/2e9f1204ca01c3e20898d4a67c8b84899d394a88/crates/hir-ty/src/infer/unify.rs#L372) again calls `resolve_obligations_as_possible`. `resolve_obligations_as_possible` inside has a while loop, which works until it can't find any helpful information. So calling this function for the second time does nothing, so one of the calls could be safely removed. `InferenceContext::resolve_ty_shallow` is actually quite a hot place, and after fixing it, the total number of `resolve_obligations_as_possible` in this test is reduced to 15516 (from 22106). "After change" time also improves from ~270ms to ~240ms, which is not a very huge win, but still something measurable. Same profile after PR: ``` $ RUN_SLOW_BENCHES=1 cargo test --release --package rust-analyzer --lib -- integrated_benchmarks::integrated_highlighting_benchmark --exact --nocapture Finished release [optimized] target(s) in 0.06s Running unittests src/lib.rs (target/release/deps/rust_analyzer-a80ca6bb8f877458) running 1 test workspace loading: 339.86ms initial: 9.28s change: 10.69µs cpu profiling is disabled, uncomment `default = [ "cpu_profiler" ]` in Cargo.toml to enable. 236ms - highlight 110ms - infer:wait @ per_query_memory_usage 110ms - infer_query 0 - crate_def_map:wait (3165 calls) 4ms - deref_by_trait (967 calls) 64ms - resolve_obligations_as_possible (15516 calls) 0 - trait_solve::wait (2068 calls) 21ms - Semantics::analyze_impl (18 calls) 0 - SourceBinder::to_module_def (20 calls) 34ms - classify_name (19 calls) 18ms - classify_name_ref (308 calls) 0 - crate_def_map:wait (461 calls) 3ms - descend_into_macros (628 calls) 0 - generic_params_query (4 calls) 0 - impl_data_with_diagnostics_query (1 calls) 45ms - infer:wait (37 calls) 0 - resolve_obligations_as_possible (2 calls) 0 - source_file_to_def (1 calls) 0 - trait_solve::wait (42 calls) after change: 238.15ms test integrated_benchmarks::integrated_highlighting_benchmark ... ok ``` The performance of this test could be further improved but at the cost of making code more complicated, so I wanted to check if such a change is desirable before sending another PR. `resolve_obligations_as_possible` is actually called a lot of times even when no new information was provided. As I understand, `resolve_obligations_as_possible` could do something useful only if some variables/values were unified since the last check. We can store a boolean variable inside `InferenceTable`, which indicates if `try_unify` was called after last `resolve_obligations_as_possible`. If it wasn't called, we can safely not call `resolve_obligations_as_possible` again. I tested this change locally, and it reduces the number of `resolve_obligations_as_possible` to several thousand (it is not shown in the profile anymore, so don't know the exact number), and the total time is reduced to ~180ms. Here is a generated profile: ``` $ RUN_SLOW_BENCHES=1 cargo test --release --package rust-analyzer --lib -- integrated_benchmarks::integrated_highlighting_benchmark --exact --nocapture Finished release [optimized] target(s) in 0.06s Running unittests src/lib.rs (target/release/deps/rust_analyzer-a80ca6bb8f877458) running 1 test workspace loading: 349.92ms initial: 8.56s change: 11.32µs cpu profiling is disabled, uncomment `default = [ "cpu_profiler" ]` in Cargo.toml to enable. 175ms - highlight 21ms - Semantics::analyze_impl (18 calls) 0 - SourceBinder::to_module_def (20 calls) 33ms - classify_name (19 calls) 17ms - classify_name_ref (308 calls) 0 - crate_def_map:wait (461 calls) 3ms - descend_into_macros (628 calls) 0 - generic_params_query (4 calls) 0 - impl_data_with_diagnostics_query (1 calls) 97ms - infer:wait (38 calls) 0 - resolve_obligations_as_possible (2 calls) 0 - source_file_to_def (1 calls) 0 - trait_solve::wait (42 calls) after change: 177.04ms test integrated_benchmarks::integrated_highlighting_benchmark ... ok ``` Let me know if adding a new bool field seems like a reasonable tradeoff, so I can send a PR.

matklad added good first issue fun A technically challenging issue with high impact A-completion autocompletion S-actionable Someone could pick this issue up and work on it right now labels Feb 3, 2021

SomeoneToIgnore added the E-has-instructions Issue has some instructions and pointers to code to get started label Feb 3, 2021

SomeoneToIgnore mentioned this issue Feb 4, 2021

Method auto-complete randomly doesn't work (jumps between three different behaviors) #7560

Closed

flodiebold mentioned this issue Feb 12, 2021

Completion response says "isIncomplete": true even when the total number of completion items is two #7649

Closed

matklad mentioned this issue May 24, 2021

Steering issue #4 #8972

Closed

6 tasks

SomeoneToIgnore mentioned this issue May 26, 2021

internal: Reduce the number of traits passed through chalk during applicable trait lookup #9017

Merged

SomeoneToIgnore mentioned this issue Jun 10, 2021

minor: Speed up fst items lookup during completions #9206

Merged

SomeoneToIgnore mentioned this issue Jun 10, 2021

minor: Populate import maps eagerly to speed up flyimports #9208

Merged

matklad mentioned this issue Jul 12, 2021

Steering issue #5 #9580

Closed

7 tasks

jonas-schievink added the A-perf performance issues label Jul 28, 2021

jonas-schievink mentioned this issue Sep 9, 2021

Auto complete broken when input quickly in a large codebase #10187

Closed

matklad assigned mirkoRainer Oct 11, 2021

Veykril unassigned mirkoRainer Nov 10, 2021

Veykril mentioned this issue Dec 20, 2021

Completion frequently needs manual trigger with app code using gtk-rs #10842

Closed

danielhuang mentioned this issue Jan 13, 2022

WIP: Improve completion performance through caching #11269

Closed

danielhuang mentioned this issue Mar 10, 2022

Provide a cache for Chalk RecursiveSolver #11667

Closed

bminaiev mentioned this issue Sep 12, 2022

Remove redundant 'resolve_obligations_as_possible' call #13222

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Completion should take <16ms most of the time #7542

Completion should take <16ms most of the time #7542

matklad commented Feb 3, 2021 •

edited

Loading

SomeoneToIgnore commented Feb 3, 2021 •

edited

Loading

matklad commented Feb 5, 2021

matklad commented Mar 30, 2021

mirkoRainer commented Oct 11, 2021

matklad commented Oct 11, 2021

matklad commented Oct 11, 2021

mirkoRainer commented Oct 17, 2021

hf29h8sh321 commented Jan 11, 2022

hf29h8sh321 commented Feb 21, 2022

Completion should take <16ms most of the time #7542

Completion should take <16ms most of the time #7542

Comments

matklad commented Feb 3, 2021 • edited Loading

SomeoneToIgnore commented Feb 3, 2021 • edited Loading

matklad commented Feb 5, 2021

matklad commented Mar 30, 2021

mirkoRainer commented Oct 11, 2021

matklad commented Oct 11, 2021

matklad commented Oct 11, 2021

mirkoRainer commented Oct 17, 2021

hf29h8sh321 commented Jan 11, 2022

hf29h8sh321 commented Feb 21, 2022

matklad commented Feb 3, 2021 •

edited

Loading

SomeoneToIgnore commented Feb 3, 2021 •

edited

Loading