Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(gatsby-plugin-mdx): prevent babel parse step at sourcing time #25437

Merged
merged 3 commits into from
Jul 8, 2020

Conversation

pvdz
Copy link
Contributor

@pvdz pvdz commented Jul 1, 2020

The mdx plugin was doing its default parsing step for every time it got called. At sourcing time it's only called to retrieve the import bindings and for this we can be much faster by manually processing the import statements. So that's what this does.

This change reduces baseline mdx processing by 30% by skipping a Babel parse step in one of the bootstrap steps

A very simple flat no-image mdx benchmark cuts down sourcing time in half for this change.

Benchmark numbers

This are the numbers for a simple, flat, local sourced, zero-image, mdx benchmark where each mdx file is automatically generated and look like this:

---
articleNumber: 1
title: "Maiores iusto autem dolorum."
path: 'maiores-iusto-autem-dolorum.'
date: 2018-04-14
---

import { Link } from "gatsby"

<Link to="/">Go Home</Link>

Reiciendis assumenda molestiae a. Ea aperiam aut numquam id. Ratione sint saepe illum est porro asperiores quas ut tempora. Minima sit enim deserunt maxime.
 
Earum iure et explicabo quis qui inventore modi commodi qui. Dolorem ab quo minima neque suscipit voluptas odio deleniti occaecati. Dolore temporibus ut fugiat facilis maiores dignissimos ut eius. Eius similique eos aut magnam quaerat modi. Autem dolor saepe quibusdam reprehenderit error et aut illo.

So there's a header, an import, a link, and two paragraphs of random garble (pre-generated, not part of the timings). The benchmark can run with an arbitrary number of pages (N=1000 yarn bench) so that's what we do for 1k, 2k, 4k, 8k, and 16k. By 2x-ing every step we can detect logaritmic perf problems easier. While the baseline Gatsby pipeline holds for a million pages easily, for mdx that ceiling is a lot lower right now (that's what I'm trying to improve :p )

The benchmark runs on a dedicated Intel NUC, which is a headless linux setup, specifically to benchmark, so numbers ought to be fairly consistent relative to other runs.

Data

Master is currently roughly on [email protected] and [email protected] ( + whatever is unpublished).

The raw data for both runs can be found in two gists:


  • 1000 pages:
    • master:
      • sourcing: 30s
      • bootstrap: 33s
      • finished: 59s
    • PR:
      • sourcing: 14s
      • bootstrap: 17s
      • finished: 45s
    • delta: -14s
    • e2e page speed 16p/s -> 22p/s
  • 2000 pages:
    • master
      • sourcing: 55s
      • bootstrap: 59s
      • finished: 104s
    • PR:
      • sourcing: 25s
      • bootstrap: 29s
      • finished: 76s
    • delta: -28s
    • e2e page speed 19p/s -> 26p/s
  • 4000 pages:
    • master
      • sourcing: 107s
      • bootstrap: 112s
      • finished: 193s
    • PR:
      • sourcing: 48s
      • bootstrap: 52s
      • finished: 137s
    • delta: -56s
    • e2e page speed 20p/s -> 29p/s
  • 8000 pages:
    • master
      • sourcing: 221s
      • bootstrap: 227s
      • finished: 388s
    • PR:
      • sourcing: 104s
      • bootstrap: 110s
      • finished: 281s
    • delta: -107s
    • e2e page speed 20p/s -> 28p/s
  • 16000 pages:
    • master
      • sourcing: 464s
      • bootstrap: 474s
      • finished: 819s
    • PR:
      • sourcing: 218s
      • bootstrap: 226s
      • finished: 572s
    • delta: -247s
    • e2e page speed 19p/s -> 27p/s (I mean, 27.97 ...)

@pvdz pvdz added topic: MDX topic: performance Related to runtime & build performance labels Jul 1, 2020
@pvdz pvdz requested a review from a team as a code owner July 1, 2020 11:13
@gatsbot gatsbot bot added the status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer label Jul 1, 2020
@pvdz pvdz removed the status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer label Jul 1, 2020
@pvdz pvdz force-pushed the skip-mdx-babel-once branch 3 times, most recently from 10daace to c424102 Compare July 6, 2020 09:45
@gatsby-cloud-staging
Copy link

gatsby-cloud-staging bot commented Jul 6, 2020

Your pull request can be previewed in Gatsby Cloud: https://build-fcca516d-a972-47c2-87c6-e2a80c0f9051.staging-previews.gtsb.io

The mdx plugin was doing its default parsing step for every time it got called. At sourcing time it's only called to retrieve the import bindings and for this we can be much faster by manually processing the import statements. So that's what this does.

A very simple flat no-image mdx benchmark cuts down sourcing time in half for this change.
@pvdz pvdz mentioned this pull request Jul 7, 2020
@pvdz pvdz mentioned this pull request Jul 7, 2020
@pvdz
Copy link
Contributor Author

pvdz commented Jul 7, 2020

Copy link
Contributor

@johno johno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😻

@pvdz pvdz merged commit 760845a into master Jul 8, 2020
@delete-merged-branch delete-merged-branch bot deleted the skip-mdx-babel-once branch July 8, 2020 07:24
johno added a commit that referenced this pull request Jul 16, 2020
We recently began using a new API in MDX with #25437 which
doesn't exist for all versions below 1.5.9. This updates the
peer dep for MDX to ensure that the new minimum version is
specified.

Fixes #25691
laurieontech pushed a commit that referenced this pull request Jul 16, 2020
…25798)

We recently began using a new API in MDX with #25437 which
doesn't exist for all versions below 1.5.9. This updates the
peer dep for MDX to ensure that the new minimum version is
specified.

Fixes #25691
@johno johno mentioned this pull request Jul 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: performance Related to runtime & build performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants