perf(rome_js_formatter): Reduce the `String` allocations for Tokens #2462

MichaReiser · 2022-04-19T06:10:16Z

This PR reduces the amount of String allocation necessary for FormatElement::Tokens by
making use of the observation that most tokens match the text of a SyntaxToken. For example,
identifiers, or punctuation tokens are kept by the formatter as is. This is even true for string literal tokens if they already use the right quotes.

The way this is implemented is by introducing a new SyntaxTokenText that is Send + Sync and allows referencing a slice in a SyntaxToken without worrying about the &str's lifetime. The PR further extends FormatElement::Token to make use of this new introduced SyntaxTokenText.

This change reduces overall memory consumption and improves performance:

group                                    format-element                         token
-----                                    --------------                         -----
formatter/checker.ts                     1.04    250.2±5.03ms    10.4 MB/sec    1.00    239.5±1.76ms    10.9 MB/sec
formatter/compiler.js                    1.07    145.6±1.30ms     7.2 MB/sec    1.00    136.5±1.43ms     7.7 MB/sec
formatter/d3.min.js                      1.07    117.4±3.70ms     2.2 MB/sec    1.00    109.6±1.24ms     2.4 MB/sec
formatter/dojo.js                        1.03      7.4±0.15ms     9.2 MB/sec    1.00      7.2±0.03ms     9.5 MB/sec
formatter/ios.d.ts                       1.05    181.2±1.95ms    10.3 MB/sec    1.00    172.8±2.23ms    10.8 MB/sec
formatter/jquery.min.js                  1.02     29.1±0.55ms     2.8 MB/sec    1.00     28.5±0.07ms     2.9 MB/sec
formatter/math.js                        1.05    233.1±4.69ms     2.8 MB/sec    1.00    222.8±1.79ms     2.9 MB/sec
formatter/parser.ts                      1.03      5.3±0.15ms     9.2 MB/sec    1.00      5.1±0.01ms     9.5 MB/sec
formatter/pixi.min.js                    1.10    131.0±7.11ms     3.3 MB/sec    1.00    119.3±2.12ms     3.7 MB/sec
formatter/react-dom.production.min.js    1.07     37.0±0.82ms     3.1 MB/sec    1.00     34.5±0.21ms     3.3 MB/sec
formatter/react.production.min.js        1.08  1825.1±57.85µs     3.4 MB/sec    1.00  1683.8±30.49µs     3.7 MB/sec
formatter/router.ts                      1.02      3.7±0.09ms    16.2 MB/sec    1.00      3.6±0.01ms    16.6 MB/sec
formatter/tex-chtml-full.js              1.05    288.3±5.19ms     3.2 MB/sec    1.00    273.4±1.29ms     3.3 MB/sec
formatter/three.min.js                   1.11    155.7±3.79ms     3.8 MB/sec    1.00    139.7±1.76ms     4.2 MB/sec
formatter/typescript.js                  1.04    945.2±6.64ms    10.1 MB/sec    1.00    909.3±7.16ms    10.4 MB/sec
formatter/vue.global.prod.js             1.07     49.1±1.49ms     2.5 MB/sec    1.00     45.8±0.20ms     2.6 MB/sec

ematipico · 2022-04-19T07:44:16Z

crates/rome_formatter/src/format_element.rs

@@ -1125,6 +1127,13 @@ pub enum Token {
        // The position of the dynamic token in the unformatted source code
        source_position: TextSize,
    },
+    // A token that is taken 1:1 from the source code


Suggested change

// A token that is taken 1:1 from the source code

/// A token that is taken 1:1 from the source code

What does "taken" mean exactly here? Computed? Extracted?

I can try to improve the existing documentation. It was neither computed nor extracted. It just means it's a text that we keep as is, the formatting doesn't change it.

Maybe "borrowed" could be a more familiar wording, although in Rust it implies some lifetime semantics that are not present here

crates/rome_rowan/src/syntax_token_text.rs

crates/rome_rowan/src/syntax/token.rs

crates/rome_rowan/src/cursor/token.rs

crates/rome_formatter/src/format_element.rs

crates/rome_js_formatter/src/utils/mod.rs

crates/rome_js_formatter/src/utils/binary_like_expression.rs

leops · 2022-04-19T09:47:06Z

crates/rome_formatter/src/format_element.rs

@@ -1125,6 +1127,13 @@ pub enum Token {
        // The position of the dynamic token in the unformatted source code
        source_position: TextSize,
    },
+    // A token that is taken 1:1 from the source code


Maybe "borrowed" could be a more familiar wording, although in Rust it implies some lifetime semantics that are not present here

crates/rome_formatter/src/format_element.rs

cloudflare-workers-and-pages · 2022-04-19T10:32:27Z

Deploying with Cloudflare Pages

Latest commit:	`a93522b`
Status:	✅ Deploy successful!
Preview URL:	https://57a4bc55.tools-8rn.pages.dev

View logs

This PR reduces the amount of `String` allocation necessary for `FormatElement::Token`s by making use of the observation that most tokens match the text of a `SyntaxToken`. For example, identifiers, or punctuation tokens are kept by the formatter as is. This is even true for string literal tokens if they already use the right quotes. The way this is implemented is by introducing a new `SyntaxTokenText` that is `Send + Sync` and allows referencing a slice in a `SyntaxToken` without worrying about the `&str`'s lifetime. The PR further extends `FormatElement::Token` to make use of this new introduced `SyntaxTokenText`. This change reduces overall memory consumption and improves performance: ``` group format-element token ----- -------------- ----- formatter/checker.ts 1.04 250.2±5.03ms 10.4 MB/sec 1.00 239.5±1.76ms 10.9 MB/sec formatter/compiler.js 1.07 145.6±1.30ms 7.2 MB/sec 1.00 136.5±1.43ms 7.7 MB/sec formatter/d3.min.js 1.07 117.4±3.70ms 2.2 MB/sec 1.00 109.6±1.24ms 2.4 MB/sec formatter/dojo.js 1.03 7.4±0.15ms 9.2 MB/sec 1.00 7.2±0.03ms 9.5 MB/sec formatter/ios.d.ts 1.05 181.2±1.95ms 10.3 MB/sec 1.00 172.8±2.23ms 10.8 MB/sec formatter/jquery.min.js 1.02 29.1±0.55ms 2.8 MB/sec 1.00 28.5±0.07ms 2.9 MB/sec formatter/math.js 1.05 233.1±4.69ms 2.8 MB/sec 1.00 222.8±1.79ms 2.9 MB/sec formatter/parser.ts 1.03 5.3±0.15ms 9.2 MB/sec 1.00 5.1±0.01ms 9.5 MB/sec formatter/pixi.min.js 1.10 131.0±7.11ms 3.3 MB/sec 1.00 119.3±2.12ms 3.7 MB/sec formatter/react-dom.production.min.js 1.07 37.0±0.82ms 3.1 MB/sec 1.00 34.5±0.21ms 3.3 MB/sec formatter/react.production.min.js 1.08 1825.1±57.85µs 3.4 MB/sec 1.00 1683.8±30.49µs 3.7 MB/sec formatter/router.ts 1.02 3.7±0.09ms 16.2 MB/sec 1.00 3.6±0.01ms 16.6 MB/sec formatter/tex-chtml-full.js 1.05 288.3±5.19ms 3.2 MB/sec 1.00 273.4±1.29ms 3.3 MB/sec formatter/three.min.js 1.11 155.7±3.79ms 3.8 MB/sec 1.00 139.7±1.76ms 4.2 MB/sec formatter/typescript.js 1.04 945.2±6.64ms 10.1 MB/sec 1.00 909.3±7.16ms 10.4 MB/sec formatter/vue.global.prod.js 1.07 49.1±1.49ms 2.5 MB/sec 1.00 45.8±0.20ms 2.6 MB/sec ```

github-actions · 2022-04-20T14:07:01Z

Parser conformance results on ubuntu-latest

js/262

Test result	`main` count	This PR count	Difference
Total	45323	45323	0
Passed	44383	44383	0
Failed	940	940	0
Panics	0	0	0
Coverage	97.93%	97.93%	0.00%

jsx/babel

Test result	`main` count	This PR count	Difference
Total	39	39	0
Passed	36	36	0
Failed	3	3	0
Panics	0	0	0
Coverage	92.31%	92.31%	0.00%

ts/babel

Test result	`main` count	This PR count	Difference
Total	588	588	0
Passed	519	519	0
Failed	69	69	0
Panics	0	0	0
Coverage	88.27%	88.27%	0.00%

ts/microsoft

Test result	`main` count	This PR count	Difference
Total	16171	16171	0
Passed	12315	12315	0
Failed	3856	3856	0
Panics	0	0	0
Coverage	76.15%	76.15%	0.00%

github-actions · 2022-04-20T14:09:06Z

Playground for commit f50be3c

MichaReiser requested a review from leops April 19, 2022 06:10

ematipico reviewed Apr 19, 2022

View reviewed changes

leops reviewed Apr 19, 2022

View reviewed changes

MichaReiser requested review from ematipico and leops April 20, 2022 12:28

ematipico approved these changes Apr 20, 2022

View reviewed changes

leops approved these changes Apr 20, 2022

View reviewed changes

Base automatically changed from perf/concat-elements to main April 20, 2022 14:01

MichaReiser added 3 commits April 20, 2022 16:02

Code review feedback

dcb40ca

Rebase

a93522b

MichaReiser force-pushed the perf/syntax-token-slice branch from 245c0ea to a93522b Compare April 20, 2022 14:03

MichaReiser temporarily deployed to aws April 20, 2022 14:04 Inactive

MichaReiser merged commit 6cdcad6 into main Apr 20, 2022

MichaReiser deleted the perf/syntax-token-slice branch April 20, 2022 14:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(rome_js_formatter): Reduce the `String` allocations for Tokens #2462

perf(rome_js_formatter): Reduce the `String` allocations for Tokens #2462

MichaReiser commented Apr 19, 2022

ematipico Apr 19, 2022

MichaReiser Apr 19, 2022

leops Apr 19, 2022

leops Apr 19, 2022

cloudflare-workers-and-pages bot commented Apr 19, 2022 •

edited

Loading

github-actions bot commented Apr 20, 2022

github-actions bot commented Apr 20, 2022

	// A token that is taken 1:1 from the source code
	/// A token that is taken 1:1 from the source code

perf(rome_js_formatter): Reduce the String allocations for Tokens #2462

perf(rome_js_formatter): Reduce the String allocations for Tokens #2462

Conversation

MichaReiser commented Apr 19, 2022

ematipico Apr 19, 2022

Choose a reason for hiding this comment

MichaReiser Apr 19, 2022

Choose a reason for hiding this comment

leops Apr 19, 2022

Choose a reason for hiding this comment

leops Apr 19, 2022

Choose a reason for hiding this comment

cloudflare-workers-and-pages bot commented Apr 19, 2022 • edited Loading

Deploying with Cloudflare Pages

github-actions bot commented Apr 20, 2022

Parser conformance results on ubuntu-latest

js/262

jsx/babel

ts/babel

ts/microsoft

github-actions bot commented Apr 20, 2022

perf(rome_js_formatter): Reduce the `String` allocations for Tokens #2462

perf(rome_js_formatter): Reduce the `String` allocations for Tokens #2462

cloudflare-workers-and-pages bot commented Apr 19, 2022 •

edited

Loading