Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(js_semantic): identify bindings with indexes #582

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Conaclos
Copy link
Member

@Conaclos Conaclos commented Oct 22, 2023

Summary

The name resolver identifies a declaration (binding identifier) thanks to its range, while the semantic model identifies a declaration thanks to its index. Using a different way of identifying a declaration leads to wasted lookups: every time that the name resolver emits an event that binds a reference to its binding, the semantic model have to retrieve the binding index from the declaration range.

For consistency, we should use a single way of identifying a declaration. We have two options: using an index or relying on the range (or preferably the start of the range). The current architecture of the semantic model heavily relies on declaration indexes. In a such architecture, the use of range could lead to a perf overhead. With enough refactoring we could reduce this overhead. However, this looks like a large refactor. Moreover, scopes are currently identified by indexes (in the semantic model and in the name resolver). Using indexes seems like a conservative approach.

Thus, I propose in this PR to refactor the name resolver in order to use indexes as declaration identifiers. This change also simplifies a potential refactoring that will use ranges start as identifiers for declarations.

Using indexes instead of ranges could also allow separating declarations (binding identifiers) from (semantic) bindings. In other words we could associate several declarations (binding identifiers to a same semantic binding. It is one of the approach I proposed to solve #565. Although I am no longer convinced by this approach (see the EDIT: comment), using indexes provides more flexibility.

For now an index is a usize. It should be changed to u32. I prefer delaying this task to another PR because the change is broad: it affects the semantic model. We could even use the typed-index-collections crate.

Test Plan

Ci should pass.

@Conaclos Conaclos temporarily deployed to Website deployment October 22, 2023 16:01 — with GitHub Actions Inactive
@github-actions github-actions bot added the L-JavaScript Language: JavaScript and super languages label Oct 22, 2023
@Conaclos
Copy link
Member Author

!bench_analyzer

@github-actions
Copy link
Contributor

github-actions bot commented Oct 22, 2023

Parser conformance results on

js/262

Test result main count This PR count Difference
Total 48502 48502 0
Passed 47311 47311 0
Failed 1191 1191 0
Panics 0 0 0
Coverage 97.54% 97.54% 0.00%

jsx/babel

Test result main count This PR count Difference
Total 40 40 0
Passed 37 37 0
Failed 3 3 0
Panics 0 0 0
Coverage 92.50% 92.50% 0.00%

symbols/microsoft

Test result main count This PR count Difference
Total 6568 6568 0
Passed 2201 2201 0
Failed 4367 4367 0
Panics 0 0 0
Coverage 33.51% 33.51% 0.00%

ts/babel

Test result main count This PR count Difference
Total 671 671 0
Passed 599 599 0
Failed 72 72 0
Panics 0 0 0
Coverage 89.27% 89.27% 0.00%

ts/microsoft

Test result main count This PR count Difference
Total 18417 18417 0
Passed 14099 14099 0
Failed 4318 4318 0
Panics 0 0 0
Coverage 76.55% 76.55% 0.00%

@github-actions
Copy link
Contributor

Analyzer Benchmark Results

group                     main                                   pr
-----                     ----                                   --
analyzer/css.js           1.00      4.4±0.01ms     2.7 MB/sec    1.01      4.4±0.02ms     2.7 MB/sec
analyzer/index.js         1.00      9.2±0.03ms     3.4 MB/sec    1.01      9.4±0.05ms     3.3 MB/sec
analyzer/lint.ts          1.00      6.7±0.00ms     6.2 MB/sec    1.01      6.8±0.01ms     6.1 MB/sec
analyzer/parser.ts        1.00     14.1±0.02ms     3.5 MB/sec    1.01     14.2±0.03ms     3.4 MB/sec
analyzer/router.ts        1.00      4.4±0.07ms     5.3 MB/sec    1.00      4.4±0.01ms     5.3 MB/sec
analyzer/statement.ts     1.00     13.0±0.02ms     2.7 MB/sec    1.01     13.1±0.05ms     2.7 MB/sec
analyzer/typescript.ts    1.00     20.7±0.08ms     2.6 MB/sec    1.01     21.0±0.20ms     2.6 MB/sec

@Conaclos Conaclos requested a review from a team October 22, 2023 16:18
@ematipico
Copy link
Member

ematipico commented Oct 22, 2023

Some preliminary feedback to make your PR easier to review:

  • it's important to explain why this PR proposes an index instead of a text range, the benefits and the shortcomings (use the PR description) if there are any, and also the reasons;
  • the PR description says that it's a refactor "suggested in the code", I believe there's some wording problem; if there isn't, I can't understand it;
  • if there are some changes to the public APIs, we should explain the change
  • our parser and documents work in the assumptions that we handle u32 documents. We should apply this assumption here too, and use it instead of usize; this should be reflected in the description of your PR

Overall I think it's important to explain the technical decisions: reasons, tradeoffs, etc.

This should also make easier to implement future changes.

It seems you have a vision, but it's important to not be vague and explain it to us.

@Conaclos
Copy link
Member Author

Description updated.

@ematipico
Copy link
Member

In a such architecture, the use of range could lead to a perf overhead.

Would you happen to have any proof to claim that?

Using indexes seems like a conservative approach.

This might be a translation problem, but when I read this phrase, I interpret it in a negative way, something like that "it doesn't allow progression", which is something that goes against the previous claim because it seems to you are in favour of using indexes.

For now an index is a usize. It should be changed to u32. I prefer delaying this task to another PR because the change is broad: it affects the semantic model.

Overall I understand what you have in mind, and I follow what you explained, although this refactor seems bigger than expected, and we are still determining what it will provide. There's a claim around "overhead", but it isn't explained what this overhead is. I am not very comfortable landing this PR to main just yet, but what we can do is the following:

  • creating an issue that explains your work in detail, with a list of tasks; this is also helpful to you because yo can issue to gather your thoughts and write them down;
  • create a branch for this refactor;
  • merge the PRs into that branch; in this way, you can make multiple PRs, and we can help you with the reviews;
  • eventually, open a PR to main and see the result;

If you think this suggestion is too much, then I suggest making the whole refactor in one single PR and explaining it to us.

crates/biome_js_semantic/src/events.rs Outdated Show resolved Hide resolved
crates/biome_js_semantic/src/events.rs Outdated Show resolved Hide resolved
crates/biome_js_semantic/src/events.rs Outdated Show resolved Hide resolved
crates/biome_js_semantic/src/events.rs Outdated Show resolved Hide resolved
crates/biome_js_semantic/src/events.rs Outdated Show resolved Hide resolved
crates/biome_js_semantic/src/semantic_model/builder.rs Outdated Show resolved Hide resolved
Copy link

netlify bot commented Feb 6, 2024

Deploy Preview for biomejs canceled.

Name Link
🔨 Latest commit 24ee787
🔍 Latest deploy log https://app.netlify.com/sites/biomejs/deploys/65c24f5c1d33b0000850c405

@ematipico
Copy link
Member

Any updates on this PR?

@Conaclos Conaclos force-pushed the conaclos/js-sematic/binding-index-refactor branch 4 times, most recently from 427b4e1 to 9757e18 Compare July 11, 2024 18:03
@ematipico
Copy link
Member

@Conaclos what's the status of this PR?

@Conaclos
Copy link
Member Author

what's the status of this PR?

I updated the PR. However, I'm still hesitating if it is the best move.

The semantic model is currently using two ways of identifying a binding (declaration):

  • indexes to a Vec of binding data
  • range starts

The semantic event emitter uses range starts and then change it to indexes in the semantic model builder.
However, the semantic model still uses range start as a way of identifying bindings.
This is mainly used to retrieve node syntaxes and to know if a binding is exported.

Indexes are used for efficient retrieving of semantic data: it is used when we convert declaration node into a semantic binding or when iterating over semantic references to a binding.
The former is not so used: generally code in the linter use a node declaration directly instead of first converting it to a semantic binding for efficient repeated querying. Thus, it is useful for the latter when iterating over references: the next reference is built from the current reference and this requires a lookup to the binding data where references are stored.

Thus, it is hard to know which way of identification is better / more performant.
I tend to think that using range start is simpler. However, I am not sure of the perf impact on reference iteration.
We could also try to remove the lookup when we iterate over references.

@ematipico
Copy link
Member

Thank you for the explanation! I don't have a great in-depth knowledge of the semantic model, however I trust your judgement.

Another way to judge the feature is DX, and usages outside of a linter. For example, we might need the semantic model to retrieve the bindings exported from a module.

@ematipico
Copy link
Member

I just run the benchmark action in this branch to see the differences: https://github.com/biomejs/biome/actions/runs/9993569703/job/27621225066

Copy link

codspeed-hq bot commented Jul 18, 2024

CodSpeed Performance Report

Merging #582 will degrade performances by 37.22%

Comparing conaclos/js-sematic/binding-index-refactor (c09c99e) with conaclos/js-sematic/binding-index-refactor (9757e18)

Summary

⚡ 4 improvements
❌ 10 regressions
✅ 81 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark conaclos/js-sematic/binding-index-refactor conaclos/js-sematic/binding-index-refactor Change
css_analyzer[bootstrap_18416142857265205439.css] 93.8 ms 124.4 ms -24.56%
css_analyzer[bulma_15117074458139280020.css] 81.2 ms 110.7 ms -26.62%
css_analyzer[foundation_7285025277362245733.css] 62.2 ms 82.8 ms -24.84%
css_analyzer[tachyons_9066552643484828443.css] 44.2 ms 70.4 ms -37.22%
graphql_formatter[schema_9009027329454913681.graphql] 2.8 ms 2.6 ms +6.44%
graphql_formatter[schema_9328028900703568614.graphql] 1.6 ms 1.5 ms +7.25%
schema_9328028900703568614.graphql[cached] 655 µs 609 µs +7.55%
schema_9328028900703568614.graphql[uncached] 671.4 µs 632.8 µs +6.11%
js_analyzer[lint_11654203871763747189.ts] 26.4 ms 28.6 ms -7.71%
js_analyzer[parser_8282555732666328882.ts] 54.4 ms 58.4 ms -6.83%
js_analyzer[router_14177007973872119684.ts] 15.8 ms 16.9 ms -6.4%
js_analyzer[statement_17851180443367440905.ts] 49.8 ms 53.3 ms -6.47%
react.production.min_3378072959512366797.js[cached] 1.9 ms 2 ms -6.84%
react.production.min_3378072959512366797.js[uncached] 2.1 ms 2.2 ms -6.3%

@Conaclos Conaclos force-pushed the conaclos/js-sematic/binding-index-refactor branch from 9757e18 to c09c99e Compare September 2, 2024 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
L-JavaScript Language: JavaScript and super languages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants