internal: Test for word boundary in `FindUsages` #17908

ChayimFriedman2 · 2024-08-16T07:03:19Z

This speeds up short identifiers search significantly, while unlikely to have an effect on long identifiers (the analysis takes much longer than some character comparison).

Tested by finding all references to eq() (from PartialEq) in the rust-analyzer repo. Total time went down from 100s to 10s (a 10x reduction!).

Feel free to close this if you consider this a non-issue, as most short identifiers are local.

This speeds up short identifiers search significantly, while unlikely to have an effect on long identifiers (the analysis takes much longer than some character comparison). Tested by finding all references to `eq()` (from `PartialEq`) in the rust-analyzer repo. Total time went down from 100s to 10s (a 10x reduction!).

crates/ide-db/src/search.rs

Veykril · 2024-08-16T07:14:31Z

Seems fine by me, I imagine new won't benefit as much (It is one of the more common complains we get where search is slow) unfortunately given its a not a very common substring in identifiers
Thanks!
@bors r+

bors · 2024-08-16T07:14:34Z

📌 Commit 1a31fe2 has been approved by Veykril

It is now in the queue for this repository.

bors · 2024-08-16T07:20:25Z

⌛ Testing commit 1a31fe2 with merge 28b6838...

ChayimFriedman2 · 2024-08-16T07:25:59Z

I just got an idea about new(), let's see what I can do.

bors · 2024-08-16T07:34:13Z

☀️ Test successful - checks-actions
Approved by: Veykril
Pushing 28b6838 to master...

bors · 2024-08-16T07:34:13Z

☀️ Test successful - checks-actions
Approved by: Veykril
Pushing 28b6838 to master...

bors · 2024-08-16T07:35:11Z

👀 Test was successful, but fast-forwarding failed: 422 Changes must be made through a pull request.

perf: Speed up search for short associated functions, especially very common identifiers such as `new` `@Veykril` said in #17908 (comment) that people complain searches for `new()` are slow (they are right), so here I am to help! The search is used by IDE features such as rename and find all references. The search is slow because we need to verify each candidate, and that requires analyzing it; the key to speeding it up is to avoid the analysis where possible. I did that with a bunch of tricks that exploits knowledge about the language and its possibilities. The first key insight is that associated methods may only be referenced in the form `ContainerName::func_name` (parentheses are not necessary!) (Rust doesn't include a way to `use Container::func_name`, and even if it will in the future most usages are likely to stay in that form. Searching for `::` will help only a bit, but searching for `Container` can help considerably, since it is very rare that there will be two identical instances of both a container and a method of it. However, things are not as simple as they sound. In Rust a container can be aliased in multiple ways, and even aliased from different files/modules. If we will try to resolve the alias, we will lose any gain from the textual search (although very common method names such as `new` will still benefit, most will suffer because there are more instances of a container name than its associated item). This is where the key trick enters the picture. The key insight is that there is still a textual property: a container namer cannot be aliased, unless its name is mentioned in the alias declaration, or a name of alias of it is mentioned in the alias declaration. This becomes a fixpoint algorithm: we expand our list of aliases as we collect more and more (possible) aliases, until we eventually reach a fixpoint. A fixpoint is not guaranteed (and we do have guards for the rare cases where it does not happen), but it is almost so: most types have very few aliases, if at all. We do use some semantic information while analyzing aliases. It's a balance: too much semantic analysis, and the search will become slow. But too few of it, and we will bring many incorrect aliases to our list, and risk it expands and expands and never reach a fixpoint. At the end, based on benchmarks, it seems worth to do a lot to avoid adding an alias (but not too much), while it is worth to do a lot to avoid the need to semantically analyze func_name matches (but again, not too much). After we collected our list of aliases, we filter matches based on this list. Only if a match can be real, we do semantic analysis for it. The results are promising: searching for all references on `new()` in `base-db` in the rust-analyzer repository, which previously took around 60 seconds, now takes as least as two seconds and a half (roughly), while searching for `Vec::new()`, almost an upper bound to how much a symbol can be used, that used to take 7-9 minutes(!) now completes in 100-120 seconds, and with less than half of non-verified results (aka. false positives). This is the less strictly correct (but faster) branch of this patch; it can miss some (rare) cases (there is a test for that - `goto_ref_on_short_associated_function_complicated_type_magic_can_confuse_our_logic()`). There is another branch that have no false negatives but is slower to search (`Vec::new()` never reaches a fixpoint in aliases collection there). I believe it is possible to create a strategy that will have the best of both worlds, but it will involve significant complexity and I didn't bother, especially considering that in the vast majority of the searches the other branch will be more than enough. But all in all, I decided to bring this branch (of course if the maintainers will agree), since our search is already not 100% accurate (it misses macros), and I believe there is value in the additional perf. You can find the strict branch at https://github.com/ChayimFriedman2/rust-analyzer/tree/speedup-new-usages-strict. Should fix #7404, I guess (will check now).

…=Veykril perf: Speed up search for short associated functions, especially very common identifiers such as `new` `@Veykril` said in rust-lang/rust-analyzer#17908 (comment) that people complain searches for `new()` are slow (they are right), so here I am to help! The search is used by IDE features such as rename and find all references. The search is slow because we need to verify each candidate, and that requires analyzing it; the key to speeding it up is to avoid the analysis where possible. I did that with a bunch of tricks that exploits knowledge about the language and its possibilities. The first key insight is that associated methods may only be referenced in the form `ContainerName::func_name` (parentheses are not necessary!) (Rust doesn't include a way to `use Container::func_name`, and even if it will in the future most usages are likely to stay in that form. Searching for `::` will help only a bit, but searching for `Container` can help considerably, since it is very rare that there will be two identical instances of both a container and a method of it. However, things are not as simple as they sound. In Rust a container can be aliased in multiple ways, and even aliased from different files/modules. If we will try to resolve the alias, we will lose any gain from the textual search (although very common method names such as `new` will still benefit, most will suffer because there are more instances of a container name than its associated item). This is where the key trick enters the picture. The key insight is that there is still a textual property: a container namer cannot be aliased, unless its name is mentioned in the alias declaration, or a name of alias of it is mentioned in the alias declaration. This becomes a fixpoint algorithm: we expand our list of aliases as we collect more and more (possible) aliases, until we eventually reach a fixpoint. A fixpoint is not guaranteed (and we do have guards for the rare cases where it does not happen), but it is almost so: most types have very few aliases, if at all. We do use some semantic information while analyzing aliases. It's a balance: too much semantic analysis, and the search will become slow. But too few of it, and we will bring many incorrect aliases to our list, and risk it expands and expands and never reach a fixpoint. At the end, based on benchmarks, it seems worth to do a lot to avoid adding an alias (but not too much), while it is worth to do a lot to avoid the need to semantically analyze func_name matches (but again, not too much). After we collected our list of aliases, we filter matches based on this list. Only if a match can be real, we do semantic analysis for it. The results are promising: searching for all references on `new()` in `base-db` in the rust-analyzer repository, which previously took around 60 seconds, now takes as least as two seconds and a half (roughly), while searching for `Vec::new()`, almost an upper bound to how much a symbol can be used, that used to take 7-9 minutes(!) now completes in 100-120 seconds, and with less than half of non-verified results (aka. false positives). This is the less strictly correct (but faster) branch of this patch; it can miss some (rare) cases (there is a test for that - `goto_ref_on_short_associated_function_complicated_type_magic_can_confuse_our_logic()`). There is another branch that have no false negatives but is slower to search (`Vec::new()` never reaches a fixpoint in aliases collection there). I believe it is possible to create a strategy that will have the best of both worlds, but it will involve significant complexity and I didn't bother, especially considering that in the vast majority of the searches the other branch will be more than enough. But all in all, I decided to bring this branch (of course if the maintainers will agree), since our search is already not 100% accurate (it misses macros), and I believe there is value in the additional perf. You can find the strict branch at https://github.com/ChayimFriedman2/rust-analyzer/tree/speedup-new-usages-strict. Should fix rust-lang#7404, I guess (will check now).

rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 16, 2024

Veykril reviewed Aug 16, 2024

View reviewed changes

crates/ide-db/src/search.rs Show resolved Hide resolved

bors merged commit 28b6838 into rust-lang:master Aug 16, 2024
11 checks passed

ChayimFriedman2 deleted the usages-word-boundaries branch August 16, 2024 07:50

lnicola changed the title ~~Test for word boundary in FindUsages~~ internal: Test for word boundary in FindUsages Aug 16, 2024

ChayimFriedman2 mentioned this pull request Aug 18, 2024

perf: Speed up search for short associated functions, especially very common identifiers such as new #17927

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

internal: Test for word boundary in `FindUsages` #17908

internal: Test for word boundary in `FindUsages` #17908

ChayimFriedman2 commented Aug 16, 2024

Veykril commented Aug 16, 2024

bors commented Aug 16, 2024

bors commented Aug 16, 2024

ChayimFriedman2 commented Aug 16, 2024

bors commented Aug 16, 2024

bors commented Aug 16, 2024

bors commented Aug 16, 2024

internal: Test for word boundary in FindUsages #17908

internal: Test for word boundary in FindUsages #17908

Conversation

ChayimFriedman2 commented Aug 16, 2024

Veykril commented Aug 16, 2024

bors commented Aug 16, 2024

bors commented Aug 16, 2024

ChayimFriedman2 commented Aug 16, 2024

bors commented Aug 16, 2024

bors commented Aug 16, 2024

bors commented Aug 16, 2024

internal: Test for word boundary in `FindUsages` #17908

internal: Test for word boundary in `FindUsages` #17908