Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gatsby): selective cache invalidation #8379

Closed
wants to merge 112 commits into from

Conversation

DSchau
Copy link
Contributor

@DSchau DSchau commented Sep 20, 2018

Fixes #8324

This PR introduces a cache removal mechanism where we don't remove the cached results from createRemoteFileNode

To-do

  1. Add an integration test validating the functionality
  2. Add more unit tests validating the functionality
  3. Test with real-world projects (e.g. gatsby-image, reactjs.org, gatsbyjs.org, etc.)

Copy link
Contributor

@KyleAMathews KyleAMathews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

One other wrinkle however is that createRemoteFileNode uses the cache api to store information about the files it's downloading and to decide if it needs to download them again.

const cachedHeaders = await cache.get(cacheId(url))

Without the cached headers, it'll just download the file again.

packages/gatsby/src/bootstrap/remove-cache.js Outdated Show resolved Hide resolved
@DSchau
Copy link
Contributor Author

DSchau commented Sep 20, 2018

@KyleAMathews for the cache, you mean the json file at .cache/cache/db.json correct? Just gotta add that to the default exclusion!

@DSchau DSchau changed the title WIP: implement smarter cache deletion algorithm implement smarter cache deletion algorithm Sep 20, 2018
@KyleAMathews
Copy link
Contributor

So more trickiness :-)

gatsby-transformer-remark also caches rendered markdown in .cache/cache :-)

We should probably change the cache key it uses to include the version of remark

@KyleAMathews
Copy link
Contributor

perhaps transformer-remark too

@DSchau
Copy link
Contributor Author

DSchau commented Sep 21, 2018

@KyleAMathews so to fully fix this problem, it'd require a PR to change where the Markdown nodes get cached? Because with this solution we'd be caching them unnecessarily long when we don't want to?

I'll get that fixed up today. This thing spiraled in a new direction, hah 😅

@DSchau DSchau changed the title implement smarter cache deletion algorithm feat(gatsby): implement improved cache purging mechanism Oct 3, 2018
@DSchau
Copy link
Contributor Author

DSchau commented Oct 3, 2018

@KyleAMathews @pieh so now we have plugin level cache at .cache/caches. Do you think there's any harm in always keeping this around, or should we treat .cache/caches/gatsby-transformer-remark and .cache/caches/gatsby-source-filesystem as special?

@DSchau DSchau requested a review from a team as a code owner October 3, 2018 19:05
@DSchau
Copy link
Contributor Author

DSchau commented Oct 3, 2018

@KyleAMathews @pieh I checked this on ReactJS.org with the following strategy (I'll need to try on something that pulls remote file nodes next):

  1. Link to this PR with gatsby-dev
  2. Remove cache and run yarn build
  3. Validate that .cache/caches exists/was created
  4. Change gatsby-config.js and run yarn build
  5. Validate .cache/caches still exists, was not deleted
  6. Change gatsby-config.js and update a Markdown file
  7. Run yarn build
  8. Validate new md change shows up 🎉

@DSchau
Copy link
Contributor Author

DSchau commented Oct 5, 2018

@KyleAMathews @pieh I think this will work, but we should add one additional check. If a plugin has changed, we should wipe out its individual cache. I'll try and work on that soon.

@DSchau DSchau requested review from a team October 11, 2018 21:23
@DSchau DSchau requested a review from a team as a code owner October 11, 2018 21:23
@DSchau
Copy link
Contributor Author

DSchau commented Oct 11, 2018

Wacky merge here, I'll fix up. And it's fixed. Nothing a force push won't solve 😅

@DSchau DSchau force-pushed the gatsby/fast-cache branch 2 times, most recently from 76a408a to 020d160 Compare October 11, 2018 21:40
@DSchau
Copy link
Contributor Author

DSchau commented Oct 12, 2018

Looks like the e2e tests actually caught an issue 🎉 I'll address the failing path prefix tests tomorrow. Fixed

@DSchau
Copy link
Contributor Author

DSchau commented Oct 15, 2018

Fixing these failing tests now 😅

Copy link
Contributor

@wardpeet wardpeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/gatsbyjs/gatsby/blob/2532319081562a02f81ff170a2105012fb0d654d/packages/gatsby-source-filesystem/src/create-remote-file-node.js#L200-L201
Wouldn't cache.directory work here too? Maybe it's not accessible but than we don't have to do CACHE_DIR & FS_PLUGIND_DIR.

wardpeet
wardpeet previously approved these changes Jan 10, 2019
Copy link
Contributor

@wardpeet wardpeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks alright to me.

@jpalmieri
Copy link

Looking forward to this. Any idea when it will be released?

@DSchau
Copy link
Contributor Author

DSchau commented Feb 26, 2019

@jpalmieri great question! I think this would be a pretty impactful PR, but there are some issues that we need to resolve first. I'll be working on a few, other PRs this week--but maybe this is something I can revisit early next week?

I'll make a note of it and plan to revisit.

Also you are more than welcome to pick this up! I believe it primarily needs some more testing, and I think there may be some weirdness with Redux state and caching nodes, as well. More testing would likely surface some of these issues!

@DSchau DSchau changed the title feat(gatsby): implement improved cache purging mechanism [WIP]: feat(gatsby): implement improved cache purging mechanism May 3, 2019
Copy link
Contributor

@muescha muescha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not sure, is some mistake about getPluginHash?

the same function names nearby can be mixt up easy i think

1) packages/gatsby/src/bootstrap/cache/plugin-hash.js

signature:

getPluginHash({ plugins, existing })

2) packages/gatsby/src/bootstrap/get-plugin-hash.js

signature:

getPluginHash({
  additional = [],
  directory,
  plugins,
  existing,
}) {

@pieh
Copy link
Contributor

pieh commented Mar 5, 2020

Thanks @muescha! Yeah, confusing names will get sorted out (eventually). But I'd like just to do some comments on current shape of the PR and the next steps. Due to some issues and lack of APIs in v2 - we will not be able to enable this by default in v2, so priority of shipping the feature was decreased.

But we still have valuable things here to get in, so what I'm doing right now is updating new integration tests to be in shape to get merged into master without any other code changes (so will be ripping just integration_tests_cache_resilience from this PR once it's finished - there are some quirks to workout after merging master in yesterday. Here's outline of what we can split from this PR and ship individually #8379 (comment)

And that's why for now at least the code part will not be touched too much, but yeah - the confusing naming could be cleaned up, so thanks for inline comments - it will help me with keeping this in mind one I get back to actual source code part.

@pvdz pvdz marked this pull request as draft April 9, 2020 08:02
@pvdz pvdz removed the status: WIP label Apr 9, 2020
@pvdz
Copy link
Contributor

pvdz commented Apr 9, 2020

(I've converted this PR into a Draft PR. Although it's been two years, perhaps we should close it and open a new one? Surely this is bit-rotten beyond repair?)

@pieh
Copy link
Contributor

pieh commented Apr 14, 2020

(I've converted this PR into a Draft PR. Although it's been two years, perhaps we should close it and open a new one? Surely this is bit-rotten beyond repair?)

This one gets updates periodically - it doesn't strictly need to be open, but is also not 2 years out of date

@pragmaticpat
Copy link
Contributor

Closing this issue due to age, and has been superseded by #28331

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change If implemented, this proposed work would break functionality for older versions of Gatsby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't clear downloaded files from createRemoteFileNode calls when we clear cache on package/code changes