Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental Builds #698

Closed
natemoo-re opened this issue Sep 13, 2023 · 8 comments
Closed

Incremental Builds #698

natemoo-re opened this issue Sep 13, 2023 · 8 comments

Comments

@natemoo-re
Copy link
Member

Summary

Incremental Build support in Astro aims to significantly speed up repeated runs of astro build. Initial builds will populate some kind of cache, which will allow subsequent builds to bypass Rollup for unchanged trees of the module graph.

Background & Motivation

The original incremental build support proposal is one of our oldest and most highly upvoted issues. Unfortunately, users have not shared many concrete examples in that thread. However, from the extensive performance profiling we've done, we know that Rollup is the main bottleneck in Astro's build process.

Now that our rendering, Markdown, and MDX performance has been optimized about as far as we can take it, now is the time to explore new options. The best option we have is to move as much of our build process out of Rollup as possible, likely with some sort of build cache. That is the goal of introducing incremental builds.

Why now? The reason is straightforward—caching is notoriously difficult to get right. We did not want to take on the additional complexity until our API surface was stable and our internal build process was easier to reason about. Thanks to significant effort from @bluwy and @ematipico in the past few months, we're now in a good place to tackle this.

Goals

  • Improve astro build performance
  • Avoid as much repeated processing during astro build as possible
  • Implement a generic API that enables incremental builds
  • Eventually, enable incremental builds by default
  • Support existing static, hybrid, and server outputs

Non-Goals

  • Vendor lock-in. Incremental builds will be implemented generically, supporting our existing ecosystem of deployment platforms where possible. If a host caches assets between builds, it is likely that they will support incremental builds automatically.
  • Incremental Static Regeneration (also known as ISR or DPR). The proposal for supporting ISR is an entirely different topic, not covered by this accepted proposal. These features are not mutually exclusive. Implementing incremental build support benefits every user of Astro and does not prevent Astro from potentially introducing ISR in the future.
  • Future: Adapter API. Some adapters perform a significant amount of processing and may also want some form of incremental build support. To reduce the scope of this proposal, we are not considering exposing a public Adapter API for this. This may be implemented in the future as an enhancement.
@natemoo-re
Copy link
Member Author

Ideally, this is something that could be solved generically on the Vite / Rollup level so that every framework could benefit from this. I'm really not sure if that's on the table, though, since the ultimate goal is to bypass Vite / Rollup as much as possible. If this was easy to solve incremental builds in a generic way, it would have been done already.

My current sketch for an API is very straightforward from the user's perspective:

// astro.config.mjs
import { defineConfig } from 'astro/config'

export default defineConfig({
  build: { incremental: true },
  experimental: { incremental: true } // until this is stable
})

Unfortunately that's where the simplicity ends. To implement this, we'll likely need to:

  • Generate a serializable module graph that contains every possible build input. This will need to track all module relationships (hence the "graph" part).
  • Given the module graph, generate a stable checksum for each file. This will allow us to determine which parts of the graph have changed.
  • On every build, we'll need to generate a new module graph and compare it to the old one. Use the checksum to determine which files need to be invalidated (changes bubble all the way up to an entry point). If any subtree of the module graph has changed shape, that also invalidates relevant portions of the module graph.
  • Pass any invalidated modules as inputs to Vite. Thankfully Astro already controls every Vite input!
  • As Vite generates new outputs for invalid parts of the graph, we can restore valid parts of the graph from our cache (likely in node_modules/.cache/astro or node_modules/.astro).
    • Ideal scenario: the entire module graph is valid, so the entire output is restored.
    • Worst case scenario: the entire module graph is invalid, so the entire output needs to be generated from scratch. This is currently what we do on every build.
  • Any invalidated modules should be removed from the existing cache.
  • Now that out output has been merged into a repaired state, execute it to prerender our .html files. (We can't skip directly to restoring the .html files because the .js output can depend on external data that we don't know about.)
  • Populate the cache with our output for next time.
  • Remove the chunks needed for prerendering from our dist folder

@natemoo-re
Copy link
Member Author

Exciting news! I've spent the last month investigating quite a few approaches to this problem and we're ready to move forward with the first phase of our plan.

Pretty immediately, we hit a major problem with the way Content Collections are currently architected. Invalidating a single article would have a waterfall effect that would invalidate the entire collection it belonged to so every page that referenced that collection would need to be rebuilt. We also were able to verify that the size of the module graph was the single biggest contributor to extremely slow builds. This is not particularly surprising, as module graphs have long been identified as the main bottleneck for JS build tools, but it's nice to have confirmation that this holds true for Astro.

Our first step towards incremental builds will be an internal refactor to the way that Content Collections are generated. Instead of treating Content Collection entries as part of the larger module graph, Astro will treat them as individual entrypoints for a separate build process. Not only does this drastically reduce the size of the main module graph, it should allow us to detect and rebuild only the Content Collection entries that change between builds.

Note

To begin, this refactor will only benefit users that make heavy use of Content Collections. We hope to use this effort to develop internal patterns and primitives that will inform later incremental build improvements. Stay tuned!

@natemoo-re
Copy link
Member Author

Also wanted to share some diagrams that describe how we expect to break this project down.

The current build in Astro 3.x is a single bundle step with a large module graph. Referencing astro:content pulls in every module that exists inside of every collection, leading to a huge module graph that Rollup struggles to process.

current

Phase One of this incremental build project will focus on refactoring Content Collections out to a self-contained build step. Instead of treating astro:content as the entrypoint, the collection items themselves are treated as the entrypoints and astro:content is regenerated after. This keeps the module graph small and efficient, while opening up an opportunity to cache the outputs for unchanged collection items. The rest of the build remains the same.

incremental-one

Phase Two of the incremental build project will build on top of the learnings and patterns established during Phase One. This step will focus on making the main server build more efficient and tracing exactly which pages need to be rebuilt. This will extend the benefits of incremental builds beyond the previously established Content Collections use case. Treating this as a separate phase will allow us to hone our approach before tackling the more generalized solution.

incremental-two

@EyePulp
Copy link

EyePulp commented Oct 12, 2023

@natemoo-re Thanks for the clarity detail and documentation of your approach. I'm eager for the performance improvements.

More selfishly, I'm hopeful this opens the door to selective page renders. Our use case has a lot configuration options within a single astro project, up to and including rendering or not rendering individual pages. We can solve it today by using the dynamic route feature, but something more explicit and declarative would be very welcome, and it feels like incremental builds might offer that.

Regardless, thanks for the work!

@natemoo-re
Copy link
Member Author

Graduating to a full-fledged RFC. #763

@matthewp matthewp moved this from Stage 2: Accepted Proposals, No RFC to Stage 3: Accepted Proposals, Has RFC in Public Roadmap Dec 12, 2023
@fparedlo
Copy link

This is amazing, hope it comes in the next release!

@heyitsdoodler
Copy link

Is there a branch where work on phase 2 is being conducted?

@ImBIOS
Copy link

ImBIOS commented Oct 5, 2024

Wow, Phase 2 feels like a sci-fi!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants