Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDX performance #1152

Closed
johno opened this issue Jul 16, 2020 · 8 comments
Closed

MDX performance #1152

johno opened this issue Jul 16, 2020 · 8 comments
Labels
🏁 area/perf This affects performance 💬 type/discussion This is a request for comments 🦋 type/enhancement This is great to have 💎 v2 Issues related to v2
Milestone

Comments

@johno
Copy link
Member

johno commented Jul 16, 2020

☂️ This umbrella issue is for tracking work related to improving performance to MDX.

I've been working with @pvdz on MDX performance. We've noted a few aspects that add unnecessary work which we should be able to reduce, especially in v2.

Numerous babel parse and transformation steps

Firstly, we have multiple babel parse steps throughout the MDX transpilation pipeline.

Imports and exports

  • Partitioning imports and exports
  • Finding the default export

Peter has done some work here in gatsby-plugin-mdx that we can potentially adapt gatsbyjs/gatsby#25437 for usage in core.

Shortcode generation

We use babel to figure out what imports and exports exist, and then use that to instantiate variables coming from MDXProvider with makeShortcode. Also related to gatsbyjs/gatsby#25437

mdxType

This is used by the runtime (react/preact/vue) to determine which component to render. This is something we can do from the MDXAST in v2 since the JSX structure is represented.

Returning a compiled string that inevitably needs to be transpiled

Secondly, to these parse steps we also return a JSX string. In nearly all cases this JSX string is then transpiled to JS and mdx pragma function calls. This was originally an intentional output because we wanted to make MDX more palatable and familiar. However, it might make sense to serialize directly to function calls and JS.

This would remove a babel step users need (unless they're using optional syntax or need browser polyfills which is still achievable in user land).


You all are welcome to bring up other areas of the codebase we can make more performant or other ideas as well! In fact, we'd love your thoughts.

@johno johno added 🦋 type/enhancement This is great to have 💬 type/discussion This is a request for comments 💎 v2 Issues related to v2 🏁 area/perf This affects performance labels Jul 16, 2020
@wooorm
Copy link
Member

wooorm commented Jul 17, 2020

However, it might make sense to serialize directly to function calls and JS.

Even if we compile # This and <That /> to function calls, I can still see folks using JSX inside JSX, or inside expressions though, and expect of MDX for them to be handled:

<Heading icon={<Icon />}>

## Heading {something ? <Y /> : <Z />}

@johno
Copy link
Member Author

johno commented Jul 17, 2020

I can still see folks using JSX inside JSX, or inside expressions though, and expect of MDX for them to be handled

Yeah, I think what we can do is transpile JSX to function calls inside expressions potentially. I should still be a lot faster than handling the whole document (of course this is something we will benchmark to be sure).

@wooorm
Copy link
Member

wooorm commented Jul 17, 2020

Probably also faster if we only process jsx there, and nothing else, leaving that up to folks. But indeed, wondering on the benchmarks of 100 expressions vs 1 file

@pvdz
Copy link

pvdz commented Jul 17, 2020

Yeah so if we keep certain artificials limitations (which already apply today) in place then we can distill the imports/exports from the mdast without the need of Babel. That's been the source of some significant perf improvements at startup time (like gatsbyjs/gatsby#25757).

The reasoning here is that the import and export syntax is very strict and if we disallow comments in between then a regular expression or simple string manipulation can quickly get us the answers we need (-> the symbols being imported and exported).

For imports the only limitation might be not to allow comments inside an import and only at the end of a line. These are the forms of import:

  • import ID from 'y'
  • import * as ID from 'y'
  • import {ID} from 'y'
  • import {ID as ID2} from 'y'
  • import ID, {ID2} from 'y'
  • import * as ID, {ID2} from 'y'

The {} pattern can repeat and for each case as is optional. For the fix in Gatsby, to get the imported idents, I took these imports and used a regex to remove all parts that were not interested in, leaving us with comma separated sets of ID or ID as ID2. You can easily take the last ID and that'll be the one you want.

Leaning on the fact that imports are constants (and valid input), no further need to dedupe them is required.

So to make life easy, the only syntactical restriction, beyond non-standard syntax of course, is to disallow comments inside the import declaration. And maybe disallow the variant where from is omitted (where you import a module for side effects).

For exports it's a little trickier, mainly because you can export arbitrary expressions and because of defaults in destructuring. However, it turns out that exports are currently limited to a single line. That's great because that makes them easy to slice out.

Further more, if you apply the same comment restriction to exports and disallow destructuring defaults, you can "cheat" your way out of not requiring any JS parser and still distill all the exported symbols, as well as finding the default export. You can even support the newer export <pattern> from 'file', which I believe is currently not supported.

  • export default function abc(){}
  • export const foo = bar
  • export class Boo {}
  • export { ding, dong as dang }
  • export let [a, b] = obj
  • export let [a = 1, b = 2] = obj <-- this is the one to disallow

In all the above cases, except last, you can parse up to the first = character (for var, let, and const exports) to get all the exported symbol names safely. The syntax for function and class is restricted enough by itself. The re-export syntax can be done similarly as the imports above. All in all, it'll be much faster than the overhead of a full JS parse.

For JSX serialization you can use a faster parser/printer than Babel. I know Acorn can do it. There's also Sucrase, and a few others.

My suggestion to John was to default to anything fast and to expose an option for the user to do it for you instead, since mdx doesn't reaaally care how the jsx gets compiled to JS. Or wouldn't need to, as far as I understand. So a user could give mdx a callback like function callback(jsxString) { return parser(jsxString).serialize(pragma); } and mdx would just run it instead.

If I'm not mistaken, this way MDX wouldn't need to run a JS parser at all.

One other potential trick is to concat the expressions with a searchable separator (an identifier of sorts or the debugger statement) and concat the jsx expressions together. Feed them to a parser, print them again, split on the debugger statement (or whatever you pick). That may already be what's happening now, I'm not sure..?

Oh and a third option is to allow the user to pass through a Babel config / options for the whole build step. That way if Babel is ran inside MDX anyways, it can just as well also do all the other transformations, like polyfill transforms etc, so that the main pipeline doesn't need to process it again. Potentially. But that might be a pretty big pandora's box of complexity to open up.

@johno johno added this to the v2 milestone Jul 22, 2020
@pvdz
Copy link

pvdz commented Aug 11, 2020

FYI: gatsbyjs/gatsby#26265 adds a baseline mdx benchmark to benchmarks/mdx-without-images where you can run N=10000 M=4 yarn bench to run a benchmark on 10000 basic mdx files and giving it 4gb of memory. Has no images and should test most of the main mdx pipeline (improvements to cover more of mdx are welcome).

@mariusGundersen
Copy link
Contributor

Can't the parsed and transformed AST be passed directly to babel, which then can skip the parse step? That way mdx doesn't need to generate the js and the next step doesn't need to parse, just traverse and generate the final output?

@byCedric
Copy link

Can't the parsed and transformed AST be passed directly to babel, which then can skip the parse step? That way mdx doesn't need to generate the js and the next step doesn't need to parse, just traverse and generate the final output?

There are methods to parse the AST upfront, and transform the AST itself. But it looks like the mdx-hast-to-jsx is parsing different parts of the MDX HAST. Which means, different ASTs for all 3 different transformSync calls.

PS. I'm happy to help anyway I can, at Expo we would love improved performance ❤️

wooorm added a commit that referenced this issue Dec 15, 2020
This PR changes the internals of the core `@mdx-js/mdx` package to generate a
JavaScript syntax tree instead of a string.
This fixes escaping issues such as #1219.
It makes `mdx-hast-to-jsx` much more palatable.
It also prevents several Babel parses.
It paves the way for passing in Babel plugins, which is useful for users, but
also for us to compile to `React.createElement`, `_jsx`, or Vue’s `h` calls
directly and make MDX’s output directly usable.

* `babel-plugin-apply-mdx-type-props`: add `parentType`
* `mdx`: use `rehype-minify-whitespace` to remove superfluous whitespace
* `mdx`: use `hast-util-to-estree` to transform hast to estree
* `mdx`: use `estree-to-babel` to transform estree to Babel
* `mdx`: generate estree/Babel instead of strings
* `mdx`: use `@babel/generator` to serialize Babel AST
* `vue`: stop supporting the react transform: (it doesn’t make sense)
* `vue`: fix support for props to components

Related to GH-741.
Related to GH-1152.

Closes GH-606.
Closes GH-1028.
Closes GH-1219.
wooorm added a commit that referenced this issue Dec 18, 2020
This PR changes the internals of the core `@mdx-js/mdx` package to generate a
JavaScript syntax tree instead of a string.
This fixes escaping issues such as #1219.
It makes `mdx-hast-to-jsx` much more palatable.
It also prevents several Babel parses.
It paves the way for passing in Babel plugins, which is useful for users, but
also for us to compile to `React.createElement`, `_jsx`, or Vue’s `h` calls
directly and make MDX’s output directly usable.

* `babel-plugin-apply-mdx-type-props`: add `parentType`
* `mdx`: use `rehype-minify-whitespace` to remove superfluous whitespace
* `mdx`: use `hast-util-to-estree` to transform hast to estree
* `mdx`: use `estree-to-babel` to transform estree to Babel
* `mdx`: generate estree/Babel instead of strings
* `mdx`: use `@babel/generator` to serialize Babel AST
* `vue`: stop supporting the react transform: (it doesn’t make sense)
* `vue`: fix support for props to components

Related to GH-741.
Related to GH-1152.

Closes GH-606.
Closes GH-1028.
Closes GH-1219.
wooorm added a commit that referenced this issue Dec 18, 2020
This PR changes the internals of the core `@mdx-js/mdx` package to generate a
JavaScript syntax tree instead of a string.
This fixes escaping issues such as #1219.
It makes `mdx-hast-to-jsx` much more palatable.
It also prevents several Babel parses.
It paves the way for passing in Babel plugins, which is useful for users, but
also for us to compile to `React.createElement`, `_jsx`, or Vue’s `h` calls
directly and make MDX’s output directly usable.

* `babel-plugin-apply-mdx-type-props`: add `parentType`
* `mdx`: use `rehype-minify-whitespace` to remove superfluous whitespace
* `mdx`: use `hast-util-to-estree` to transform hast to estree
* `mdx`: use `estree-to-babel` to transform estree to Babel
* `mdx`: generate estree/Babel instead of strings
* `mdx`: use `@babel/generator` to serialize Babel AST
* `vue`: stop supporting the react transform: (it doesn’t make sense)
* `vue`: fix support for props to components

Related to GH-741.
Related to GH-1152.

Closes GH-606.
Closes GH-1028.
Closes GH-1219.
Closes GH-1382.

Reviewed-by: Christian Murphy <[email protected]>
wooorm added a commit that referenced this issue Dec 18, 2020
This PR changes the internals of the core `@mdx-js/mdx` package to generate a
JavaScript syntax tree instead of a string.
This fixes escaping issues such as #1219.
It makes `mdx-hast-to-jsx` much more palatable.
It also prevents several Babel parses.
It paves the way for passing in Babel plugins, which is useful for users, but
also for us to compile to `React.createElement`, `_jsx`, or Vue’s `h` calls
directly and make MDX’s output directly usable.

* `babel-plugin-apply-mdx-type-props`: add `parentType`
* `mdx`: use `rehype-minify-whitespace` to remove superfluous whitespace
* `mdx`: use `hast-util-to-estree` to transform hast to estree
* `mdx`: use `estree-to-babel` to transform estree to Babel
* `mdx`: generate estree/Babel instead of strings
* `mdx`: use `@babel/generator` to serialize Babel AST
* `vue`: stop supporting the react transform: (it doesn’t make sense)
* `vue`: fix support for props to components

Related to GH-741.
Related to GH-1152.

Closes GH-606.
Closes GH-1028.
Closes GH-1219.
Closes GH-1382.

Reviewed-by: Christian Murphy <[email protected]>
wooorm added a commit that referenced this issue Dec 20, 2020
This removes the last three custom Babel plugins we had and replaces
them with estree versions.
Furthermore, it removes `@babel/generator`.

For the plugins, we were only looking at ESM import/exports, but right
now we’re delegating work to `periscopic` to look at which things are
defined in the top-level scope.
It’s a bit more complex, but this matches better with intentions,
fixes some bugs, and prepares for a potential future where other ES
constructs are allowed, so all in all should be a nice improvement.

For serializing, we’re switching to `astring`, and handling JSX for now
internally (could be externalized later).
`astring` seems fast and is incredibly small, but is not very popular.
We might perhaps see bugs is serialization in the future because of that,
but all our tests seem fine, so I’m not too worried about that.

Estree remains a somewhat fragmented ecosystem, such as that the tree
walkers in `periscopic` and `astring` are different, so we might also
consider writing our own serializer in the future.
Or, when we implement Babel’s React JSX transform ourselves, could switch
to another generator, or at least drop the JSX serialization code here.

Because of these changes, we can drop `@babel/core` and
`@babel/generator` from `@mdx-js/mdx`, which drops the bundle size of
from 349kb to 111kb.
That’s 68%.
Pretty nice.
This should improve downloading and parsing time of bundles
significantly.
Of course, we currently still have JSX in the output, so folks will
have to resort to Babel (or `buble-jsx-only`) in another step.

For performance, v2 (micromark) was already an improvement over v1.
On 1000 simple files totalling about 1mb of MDX:

* v1: 3739ms
* v2: 2734ms (26% faster)
* v2 (w/o babel): 1392ms (63% faster).

Of course, this all really depends on what type of stuff is in your MDX.
But it looks pretty sweet!

✨

Related to GH-1046.
Related to GH-1152.
Related to GH-1338.
Closes GH-704.
Closes GH-1384.
johno pushed a commit that referenced this issue Dec 20, 2020
This removes the last three custom Babel plugins we had and replaces
them with estree versions.
Furthermore, it removes `@babel/generator`.

For the plugins, we were only looking at ESM import/exports, but right
now we’re delegating work to `periscopic` to look at which things are
defined in the top-level scope.
It’s a bit more complex, but this matches better with intentions,
fixes some bugs, and prepares for a potential future where other ES
constructs are allowed, so all in all should be a nice improvement.

For serializing, we’re switching to `astring`, and handling JSX for now
internally (could be externalized later).
`astring` seems fast and is incredibly small, but is not very popular.
We might perhaps see bugs is serialization in the future because of that,
but all our tests seem fine, so I’m not too worried about that.

Estree remains a somewhat fragmented ecosystem, such as that the tree
walkers in `periscopic` and `astring` are different, so we might also
consider writing our own serializer in the future.
Or, when we implement Babel’s React JSX transform ourselves, could switch
to another generator, or at least drop the JSX serialization code here.

Because of these changes, we can drop `@babel/core` and
`@babel/generator` from `@mdx-js/mdx`, which drops the bundle size of
from 349kb to 111kb.
That’s 68%.
Pretty nice.
This should improve downloading and parsing time of bundles
significantly.
Of course, we currently still have JSX in the output, so folks will
have to resort to Babel (or `buble-jsx-only`) in another step.

For performance, v2 (micromark) was already an improvement over v1.
On 1000 simple files totalling about 1mb of MDX:

* v1: 3739ms
* v2: 2734ms (26% faster)
* v2 (w/o babel): 1392ms (63% faster).

Of course, this all really depends on what type of stuff is in your MDX.
But it looks pretty sweet!

✨

Related to GH-1046.
Related to GH-1152.
Related to GH-1338.
Closes GH-704.
Closes GH-1384.
wooorm added a commit that referenced this issue Dec 23, 2020
Previously, we required an extra build step to produce runnable code.
This change makes the output of MDX immediately runnable.

This drops the final requirement on Babel (Or Bublé).
Dropping Babel leads to a size and performance win for the runtime (and for
any use case that doesn’t otherwise require Babel, such as running in Node).
Of course, if people want to use the latest JavaScript features, they can still
use Babel, but it’s not *required*.

Finally, if JSX is preferred (for example, Vue treats JSX radically different
from other hyperscript interfaces and has its own JSX builders), `keepJsx` can
be set to `true`.

In short, the size breakdown for the runtime is:

* `@mdx-js/[email protected]` (last stable tag): 356.4kb
* `@mdx-js/[email protected]` (last next tag): 362.9kb
* Previous commit (on an unmaintained Bublé fork): 165kb
* This commit: 120kb (26% / 66% / 69% smaller)

Core only adds ±1kb to its bundle size, because `estree-util-build-jsx`
reuses dependencies that we already use, too.

Related to GH-1041.
Related to GH-1044.
Related to GH-1152.
wooorm added a commit that referenced this issue Dec 28, 2020
Previously, we required an extra build step to produce runnable code.
This change makes the output of MDX immediately runnable.

This drops the final requirement on Babel (Or Bublé).
Dropping Babel leads to a size and performance win for the runtime (and for
any use case that doesn’t otherwise require Babel, such as running in Node).
Of course, if people want to use the latest JavaScript features, they can still
use Babel, but it’s not *required*.

Finally, if JSX is preferred (for example, Vue treats JSX radically different
from other hyperscript interfaces and has its own JSX builders), `keepJsx` can
be set to `true`.

In short, the size breakdown for the runtime is:

* `@mdx-js/[email protected]` (last stable tag): 356.4kb
* `@mdx-js/[email protected]` (last next tag): 362.9kb
* Previous commit (on an unmaintained Bublé fork): 165kb
* This commit: 120kb (26% / 66% / 69% smaller)

Core only adds ±1kb to its bundle size, because `estree-util-build-jsx`
reuses dependencies that we already use, too.

Related to GH-1041.
Related to GH-1044.
Related to GH-1152.
wooorm added a commit that referenced this issue Jan 2, 2021
This PR moves most of the runtime to the compile time.

This issue has nothing to do with `@mdx-js/runtime`. It’s about
`@mdx-js/mdx` being compile time, and moving most work there, from the
“runtimes” `@mdx-js/react`, `@mdx-js/preact`, `@mdx-js/vue`.

Most of the runtime is undocumented features that allow amazing things,
but those are in my opinion *too magical*, more powerful than needed,
complex to reason about, and again: undocumented.
These features are added by overwriting an actual renderer (such as
react, preact, or vue). Doing so makes it hard to combine MDX with for
example Emotion or theme-ui, to opt into a new JSX transform when React
introduces one, to support other hyperscripts, or to add features such
as members (`<Foo.Bar />`). Removing these runtime features does what
MDX says in the readme: “**🔥 Blazingly blazing fast: MDX has no
runtime […]**”

This does remove the ability to overwrite *anything* at runtime. This
brings back the project to what is documented: users can still
overwrite markdown things (e.g., blockquotes) to become components and
pass components in at runtime without importing them. And it does still
allow undocumented parent-child combos (`blockquote.p`).

* Remove runtime renderers (`createElement`s hijacking) from
  `@mdx-js/react`, `@mdx-js/preact`, `@mdx-js/vue`
* Add `jsxRuntime` option to switch to the modern automatic JSX runtime
* Add `jsxImportSource` option to switch to a modern non-React JSX
  runtime
* Add `pragma` option to define a classic JSX pragma
* Add `pragmaFrag` option to define a classic JSX fragment
* Add `mdxProviderImportSource` option to load an optional runtime
  provider
* Add tests for automatic React JSX runtime
* Add tests for `@mdx-js/mdx` combined with `emotion`
* Add support and test members as “tag names” of elements
* Add support and test qualified names (namespaces) as “tag names” of
  elements
* Add tests for parent-child combos
* Add tests to assert explicit (inline) components precede over
  provided/given components
* Add tests for `mdxFragment: false` (runtime renderers w/o fragment
  support)
* Fix and test double quotes in attribute values

This PR removes the runtime renderers and related things such as the
`mdxType` and `parentName` props while keeping the `MDXProvider` in
tact.

This improves runtime performance, because all that runs at runtime is
plain vanilla React/preact/vue code.

This reduces the surface of the MDX API while being identical to what
is documented and hence to user expectations (except perhaps to some
power users).

This also makes it easier to support other renderers without having to
maintain projects like `@mdx-js/react`, `@mdx-js/preact`, `@mdx-js/vue`:
anything that can be used as a JSX pragma (including the [automatic
runtime](https://reactjs.org/blog/2020/09/22/introducing-the-new-jsx-transform.html))
is now supported.
A related benefit is that it’s easier to integrate with
[emotion](https://github.com/emotion-js/emotion/blob/master/packages/react/src/jsx.js#L7)
(including through `theme-ui`) and similar projects which also
overwrite the renderer: as it’s not possible to have two runtimes, they
were hard to combine; because with this PR MDX is no longer a renderer,
there’s no conflict anymore.

This is done by the compile time (`@mdx-js/mdx`) knowing about an
(**optional**) runtime for an `MDXProvider` (such as `@mdx-js/react`,
`@mdx-js/preact`). Importantly, it’s not required for other
hyperscript interfaces to have a provider: `MDXContent` exported from
a compiled MDX file *also* accepts components (it already did), and Vue
comes with component passing out of the box.

In short, the runtime looked like this:

```js
function mdx(thing, props, ...children) {
  const overwrites = getOverwritesSomeWay()
  return React.createElement(overwrites[props.mdxType] || thing, props, ...children)
}
```

And we had a compile time, which added that `mdxType` prop. So:

```mdx
<Youtube />
```

Became:

```js
const Youtube = () => throw new Error('Youtube is not loaded!')

<Youtube mdxType="Youtube" />
```

Which in plain JS looks like:

```js
const Youtube = () => throw new Error('Youtube is not loaded!')

React.createElement(Youtube, {mdxType: 'Youtube'})
```

Instead, this now compiles to:

```js
const {Youtube} = Object.assign({Youtube: () => throw new Error('Youtube is not loaded!')}, getOverwritesSomeWay())

React.createElement(Youtube)
```

The previous example shows what is sometimes called a “shortcode”: a
way to inject components as identifiers into the MDX file, which was
introduced in [MDX 1](https://mdxjs.com/blog/shortcodes)

A different use case for the runtime was overwriting “defaults”. This
is documented on the website as the “[Table of
components](https://mdxjs.com/table-of-components)”. This MDX:

```mdx
Hello, *world*!
```

Became:

```js
<p mdxType="p">Hello, <em mdxType="em">world</em>!</p>
```

This now compiles to:

```js
const overwrites = Object.assign({p: 'p', em: 'em'}, getOverwritesSomeWay())

<overwrites.p>Hello, <overwrites.em>world</overwrites.em>!</overwrites.p>
```

This MDX:

```mdx
export const Video = () => <Vimeo />

<Video />
```

Used like so:

```jsx
<MDXProvider components={{Video: () => <Youtube />}}>
  <Content />
</MDXProvider>
```

Would result in a `Youtube` component being rendered. It no longer
does. I see the previous behavior as a bug and hence this as a fix.

A subset of the above point is that:

```mdx
export default props => <main {...props} />

x
```

Used like so:

```jsx
<MDXProvider components={{wrapper: props => <article {...props} />}}>
  <Content />
</MDXProvider>
```

Would result in an `article` instead of the explicit `main`. It no
longer does. I see the previous behavior as a bug and hence this as a
fix.

(#821)

```mdx

<h2>World</h2>
```

Used like so:

```jsx
<MDXProvider components={{h2: () => <SomethingElse />}}>
  <Content />
</MDXProvider>
```

Would result in a `SomethingElse` for both. This PR **does not** change
that. But it could more easily be changed if we want to, because at
compile time we know whether something was a tag or not.

An undocumented feature of the current MDX runtime renderer is that
it’s possible to overwrite anything:

```mdx
<span />
```

Used like so:

```jsx
<MDXProvider components={{span: props => <b>{props.children}</b>}}>
  <Content />
</MDXProvider>
```

Would overwrite to become bold, even though it’s not documented
anywhere. This PR changes that: only allowed markdown “tag names” can
be changed (`p`, `li`, ...). **This list could be expanded.**

Another undocumented feature is that parent–child combos can be
overwritten. A `li` in an `ol` can be treated differently from one in
an `ul` by passing `'ol.li': () => <SomethingElse />`.

This PR no longer lets users “nest” arbitrary parent–child combos
except for `ol.li`, `ul.li`, and `blockquote.p`. **This list could
be expanded.**

It was not possible to use members (`<foo.bar />`, `<Foo.bar.baz />`,
<#953>) and supporting it previously
would be complex. This PR adds support for them.

Previously, `mdxType` and `parentName` attributes were added to all
elements. And a `components` prop was accepted on **all** elements to
change the provider. These are no longer passed and no longer accepted.
Lastly, `components`, `props` were in scope for all JSX tags defined in
the “markdown” section (not the import/exports) of each document.

This adds identifiers to the scope prefixed with double underscores:
`__provideComponents`, `__components`, and `__props`.

A single 1mb MDX file, about 20k lines and 135k words (basically 3
books). Heavy on the “markdown”, few tags, no import/exports.
322kb gzipped.

* v1: 2895.122856
* 2.0.0-next.8: 3187.4684129999996
* main: 4058.917152000001
* this pr: 4066.642403

* v1: raw: 1.5mb, gzip: 348kb
* 2.0.0-next.8: raw: 1.4mb, gzip: 347kb
* main: raw: 1.3mb, gzip: 342kb
* this pr: raw: 1.8mb, gzip: 353kb
* this pr, automatic runtime: raw: 1.7mb, gzip: 355kb

* v1: 321.761208
* 2.0.0-next.8: 321.79749599999997
* main: 162.412757
* this pr: 107.28038599999996
* this pr, automatic runtime: 123.73588899999999

This PR is much faster on giant markdown-esque documents during runtime.
The win over the current `main` branch is 34%, the win over the last
beta and v1 is 66%.

For output size, the raw value increases with this PR, which is because
the output is now `/*#__PURE__*/React.createElement(__components.span…)`
or `/*#__PURE__*/_jsx(__components.span…)`, instead of `mdx("span",
{mdxType: "span"…})`. The change is more repetition, as can be seen by
the roughly same gzip sizes.

That the build time of `main` and this PR is slower than v1 and the
last beta does surprise me a lot. I benchmarked earlier with 1000 small
simple MDX files, totalling 1mb, [where the results were the
inverse](#1399 (comment)). So
it looks like we have a problem with giant files. Still, this PR has no
effect on build time performance, because the results are the same as
currently on `main`.

This PR makes MDX faster, adds support for the modern automatic JSX
runtime, and makes it easier to combine with Emotion and similar
projects.

---

Some of what this PR does has been discussed over the years:

Related-to: GH-166.
Related-to: GH-197.
Related-to: GH-466 (very similar).
Related-to: GH-714.
Related-to: GH-938.
Related-to: GH-1327.

This PR solves some of the items outlined in these issues:

Related-to: GH-1152.
Related-to: #1014 (comment).

This PR solves:

Closes GH-591.
Closes GH-638.
Closes GH-785.
Closes GH-953.
Closes GH-1084.
Closes GH-1385.
@wooorm
Copy link
Member

wooorm commented Oct 19, 2021

Lots has changed since this discussion. Babel is no longer needed. Much has been improved.
I did several tests this week, and included the improvements on the website of the next RC:

🏃‍♀️ Compiles at least 25% faster
🚴 Generated code runs twice as fast (100% faster)
🚄 Bundle size of @mdx-js/mdx is more than three times as small (250% smaller)

See https://v2.mdxjs.com/blog/v2/ for more info.

@wooorm wooorm closed this as completed Oct 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏁 area/perf This affects performance 💬 type/discussion This is a request for comments 🦋 type/enhancement This is great to have 💎 v2 Issues related to v2
Development

No branches or pull requests

5 participants