refactor(js_semantic): identify bindings with indexes #582

Conaclos · 2023-10-22T16:01:25Z

Summary

The name resolver identifies a declaration (binding identifier) thanks to its range, while the semantic model identifies a declaration thanks to its index. Using a different way of identifying a declaration leads to wasted lookups: every time that the name resolver emits an event that binds a reference to its binding, the semantic model have to retrieve the binding index from the declaration range.

For consistency, we should use a single way of identifying a declaration. We have two options: using an index or relying on the range (or preferably the start of the range). The current architecture of the semantic model heavily relies on declaration indexes. In a such architecture, the use of range could lead to a perf overhead. With enough refactoring we could reduce this overhead. However, this looks like a large refactor. Moreover, scopes are currently identified by indexes (in the semantic model and in the name resolver). Using indexes seems like a conservative approach.

Thus, I propose in this PR to refactor the name resolver in order to use indexes as declaration identifiers. This change also simplifies a potential refactoring that will use ranges start as identifiers for declarations.

Using indexes instead of ranges could also allow separating declarations (binding identifiers) from (semantic) bindings. In other words we could associate several declarations (binding identifiers to a same semantic binding. It is one of the approach I proposed to solve #565. Although I am no longer convinced by this approach (see the EDIT: comment), using indexes provides more flexibility.

For now an index is a usize. It should be changed to u32. I prefer delaying this task to another PR because the change is broad: it affects the semantic model. We could even use the typed-index-collections crate.

Test Plan

Ci should pass.

Conaclos · 2023-10-22T16:02:10Z

!bench_analyzer

github-actions · 2023-10-22T16:08:36Z

Parser conformance results on

js/262

Test result	`main` count	This PR count	Difference
Total	48502	48502	0
Passed	47311	47311	0
Failed	1191	1191	0
Panics	0	0	0
Coverage	97.54%	97.54%	0.00%

jsx/babel

Test result	`main` count	This PR count	Difference
Total	40	40	0
Passed	37	37	0
Failed	3	3	0
Panics	0	0	0
Coverage	92.50%	92.50%	0.00%

symbols/microsoft

Test result	`main` count	This PR count	Difference
Total	6568	6568	0
Passed	2201	2201	0
Failed	4367	4367	0
Panics	0	0	0
Coverage	33.51%	33.51%	0.00%

ts/babel

Test result	`main` count	This PR count	Difference
Total	671	671	0
Passed	599	599	0
Failed	72	72	0
Panics	0	0	0
Coverage	89.27%	89.27%	0.00%

ts/microsoft

Test result	`main` count	This PR count	Difference
Total	18417	18417	0
Passed	14099	14099	0
Failed	4318	4318	0
Panics	0	0	0
Coverage	76.55%	76.55%	0.00%

github-actions · 2023-10-22T16:14:24Z

Analyzer Benchmark Results

group                     main                                   pr
-----                     ----                                   --
analyzer/css.js           1.00      4.4±0.01ms     2.7 MB/sec    1.01      4.4±0.02ms     2.7 MB/sec
analyzer/index.js         1.00      9.2±0.03ms     3.4 MB/sec    1.01      9.4±0.05ms     3.3 MB/sec
analyzer/lint.ts          1.00      6.7±0.00ms     6.2 MB/sec    1.01      6.8±0.01ms     6.1 MB/sec
analyzer/parser.ts        1.00     14.1±0.02ms     3.5 MB/sec    1.01     14.2±0.03ms     3.4 MB/sec
analyzer/router.ts        1.00      4.4±0.07ms     5.3 MB/sec    1.00      4.4±0.01ms     5.3 MB/sec
analyzer/statement.ts     1.00     13.0±0.02ms     2.7 MB/sec    1.01     13.1±0.05ms     2.7 MB/sec
analyzer/typescript.ts    1.00     20.7±0.08ms     2.6 MB/sec    1.01     21.0±0.20ms     2.6 MB/sec

ematipico · 2023-10-22T16:33:36Z

Some preliminary feedback to make your PR easier to review:

it's important to explain why this PR proposes an index instead of a text range, the benefits and the shortcomings (use the PR description) if there are any, and also the reasons;
the PR description says that it's a refactor "suggested in the code", I believe there's some wording problem; if there isn't, I can't understand it;
if there are some changes to the public APIs, we should explain the change
our parser and documents work in the assumptions that we handle u32 documents. We should apply this assumption here too, and use it instead of usize; this should be reflected in the description of your PR

Overall I think it's important to explain the technical decisions: reasons, tradeoffs, etc.

This should also make easier to implement future changes.

It seems you have a vision, but it's important to not be vague and explain it to us.

Conaclos · 2023-10-22T17:35:40Z

Description updated.

ematipico · 2023-10-23T09:03:28Z

In a such architecture, the use of range could lead to a perf overhead.

Would you happen to have any proof to claim that?

Using indexes seems like a conservative approach.

This might be a translation problem, but when I read this phrase, I interpret it in a negative way, something like that "it doesn't allow progression", which is something that goes against the previous claim because it seems to you are in favour of using indexes.

For now an index is a usize. It should be changed to u32. I prefer delaying this task to another PR because the change is broad: it affects the semantic model.

Overall I understand what you have in mind, and I follow what you explained, although this refactor seems bigger than expected, and we are still determining what it will provide. There's a claim around "overhead", but it isn't explained what this overhead is. I am not very comfortable landing this PR to main just yet, but what we can do is the following:

creating an issue that explains your work in detail, with a list of tasks; this is also helpful to you because yo can issue to gather your thoughts and write them down;
create a branch for this refactor;
merge the PRs into that branch; in this way, you can make multiple PRs, and we can help you with the reviews;
eventually, open a PR to main and see the result;

If you think this suggestion is too much, then I suggest making the whole refactor in one single PR and explaining it to us.

crates/biome_js_semantic/src/events.rs

crates/biome_js_semantic/src/semantic_model/builder.rs

netlify · 2024-02-06T15:31:34Z

✅ Deploy Preview for biomejs canceled.

Name	Link
🔨 Latest commit	`24ee787`
🔍 Latest deploy log	https://app.netlify.com/sites/biomejs/deploys/65c24f5c1d33b0000850c405

ematipico · 2024-02-17T11:22:19Z

Any updates on this PR?

ematipico · 2024-07-17T08:13:04Z

@Conaclos what's the status of this PR?

Conaclos · 2024-07-17T11:19:03Z

what's the status of this PR?

I updated the PR. However, I'm still hesitating if it is the best move.

The semantic model is currently using two ways of identifying a binding (declaration):

indexes to a Vec of binding data
range starts

The semantic event emitter uses range starts and then change it to indexes in the semantic model builder.
However, the semantic model still uses range start as a way of identifying bindings.
This is mainly used to retrieve node syntaxes and to know if a binding is exported.

Indexes are used for efficient retrieving of semantic data: it is used when we convert declaration node into a semantic binding or when iterating over semantic references to a binding.
The former is not so used: generally code in the linter use a node declaration directly instead of first converting it to a semantic binding for efficient repeated querying. Thus, it is useful for the latter when iterating over references: the next reference is built from the current reference and this requires a lookup to the binding data where references are stored.

Thus, it is hard to know which way of identification is better / more performant.
I tend to think that using range start is simpler. However, I am not sure of the perf impact on reference iteration.
We could also try to remove the lookup when we iterate over references.

ematipico · 2024-07-18T14:40:27Z

Thank you for the explanation! I don't have a great in-depth knowledge of the semantic model, however I trust your judgement.

Another way to judge the feature is DX, and usages outside of a linter. For example, we might need the semantic model to retrieve the bindings exported from a module.

ematipico · 2024-07-18T14:41:05Z

I just run the benchmark action in this branch to see the differences: https://github.com/biomejs/biome/actions/runs/9993569703/job/27621225066

codspeed-hq · 2024-07-18T15:13:00Z

CodSpeed Performance Report

Merging #582 will degrade performances by 37.22%

_{Comparing conaclos/js-sematic/binding-index-refactor (c09c99e) with conaclos/js-sematic/binding-index-refactor (9757e18)}

Summary

⚡ 4 improvements
❌ 10 regressions
✅ 81 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`conaclos/js-sematic/binding-index-refactor`	`conaclos/js-sematic/binding-index-refactor`	Change
❌	`css_analyzer[bootstrap_18416142857265205439.css]`	93.8 ms	124.4 ms	-24.56%
❌	`css_analyzer[bulma_15117074458139280020.css]`	81.2 ms	110.7 ms	-26.62%
❌	`css_analyzer[foundation_7285025277362245733.css]`	62.2 ms	82.8 ms	-24.84%
❌	`css_analyzer[tachyons_9066552643484828443.css]`	44.2 ms	70.4 ms	-37.22%
⚡	`graphql_formatter[schema_9009027329454913681.graphql]`	2.8 ms	2.6 ms	+6.44%
⚡	`graphql_formatter[schema_9328028900703568614.graphql]`	1.6 ms	1.5 ms	+7.25%
⚡	`schema_9328028900703568614.graphql[cached]`	655 µs	609 µs	+7.55%
⚡	`schema_9328028900703568614.graphql[uncached]`	671.4 µs	632.8 µs	+6.11%
❌	`js_analyzer[lint_11654203871763747189.ts]`	26.4 ms	28.6 ms	-7.71%
❌	`js_analyzer[parser_8282555732666328882.ts]`	54.4 ms	58.4 ms	-6.83%
❌	`js_analyzer[router_14177007973872119684.ts]`	15.8 ms	16.9 ms	-6.4%
❌	`js_analyzer[statement_17851180443367440905.ts]`	49.8 ms	53.3 ms	-6.47%
❌	`react.production.min_3378072959512366797.js[cached]`	1.9 ms	2 ms	-6.84%
❌	`react.production.min_3378072959512366797.js[uncached]`	2.1 ms	2.2 ms	-6.3%

Conaclos temporarily deployed to Website deployment October 22, 2023 16:01 — with GitHub Actions Inactive

github-actions bot added the L-JavaScript Language: JavaScript and super languages label Oct 22, 2023

Conaclos requested a review from a team October 22, 2023 16:18

ematipico reviewed Oct 23, 2023

View reviewed changes

Conaclos force-pushed the conaclos/js-sematic/binding-index-refactor branch 4 times, most recently from 427b4e1 to 9757e18 Compare July 11, 2024 18:03

Conaclos force-pushed the conaclos/js-sematic/binding-index-refactor branch from 9757e18 to c09c99e Compare September 2, 2024 15:13

refcator(js_semantic): identify declaration with indexes

c09c99e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(js_semantic): identify bindings with indexes #582

refactor(js_semantic): identify bindings with indexes #582

Conaclos commented Oct 22, 2023 •

edited

Loading

Conaclos commented Oct 22, 2023

github-actions bot commented Oct 22, 2023 •

edited

Loading

github-actions bot commented Oct 22, 2023

ematipico commented Oct 22, 2023 •

edited

Loading

Conaclos commented Oct 22, 2023

ematipico commented Oct 23, 2023

netlify bot commented Feb 6, 2024 •

edited

Loading

ematipico commented Feb 17, 2024

ematipico commented Jul 17, 2024

Conaclos commented Jul 17, 2024

ematipico commented Jul 18, 2024

ematipico commented Jul 18, 2024

codspeed-hq bot commented Jul 18, 2024 •

edited

Loading

refactor(js_semantic): identify bindings with indexes #582

Are you sure you want to change the base?

refactor(js_semantic): identify bindings with indexes #582

Conversation

Conaclos commented Oct 22, 2023 • edited Loading

Summary

Test Plan

Conaclos commented Oct 22, 2023

github-actions bot commented Oct 22, 2023 • edited Loading

Parser conformance results on

js/262

jsx/babel

symbols/microsoft

ts/babel

ts/microsoft

github-actions bot commented Oct 22, 2023

Analyzer Benchmark Results

ematipico commented Oct 22, 2023 • edited Loading

Conaclos commented Oct 22, 2023

ematipico commented Oct 23, 2023

netlify bot commented Feb 6, 2024 • edited Loading

✅ Deploy Preview for biomejs canceled.

ematipico commented Feb 17, 2024

ematipico commented Jul 17, 2024

Conaclos commented Jul 17, 2024

ematipico commented Jul 18, 2024

ematipico commented Jul 18, 2024

codspeed-hq bot commented Jul 18, 2024 • edited Loading

CodSpeed Performance Report

Merging #582 will degrade performances by 37.22%

Summary

Benchmarks breakdown

Conaclos commented Oct 22, 2023 •

edited

Loading

github-actions bot commented Oct 22, 2023 •

edited

Loading

ematipico commented Oct 22, 2023 •

edited

Loading

netlify bot commented Feb 6, 2024 •

edited

Loading

codspeed-hq bot commented Jul 18, 2024 •

edited

Loading