-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check if remote node has been downloaded already #4748
Conversation
The blog I work on has over 9,000 media objects and currently, it tries to download them all. This PR chunks them in groups of 100, but that setting can be increased.
Add Better Queue for more control over processing
update readme with new config option
number and we can look to change this in the future
Deploy preview for using-drupal ready! Built with commit 559fc0f |
Deploy preview for gatsbygram ready! Built with commit 1ef4f40 |
Deploy preview for using-drupal ready! Built with commit 1ef4f40 |
Problem here is we can't skip making request based just on filename - file could be changed on remote host but still have same name. That's why we store headers so we do request and server check headers and if nothing changed it will return "304 Not Modified" response with empty body. Real solution for this problem would be to track this in wordpress plugin with |
Interesting. It must be since our images are hosted on S3, but they all get downloaded fresh every build. I added the check for production environment, as well as the a new var to be added to make sure they get downloaded if it does change. |
S3 has etag support so it shouldn't be downloading anew every time. If there's something wrong with our algorithm there, would love your help debugging. But as @pieh explained, this PR will have false positives. |
This is branched off of my other PR: #4616
The only change is found on: https://github.com/gatsbyjs/gatsby/pull/4748/files#diff-1dde006d0131753bd3f750b01d2288f3R202
This sped up the build of a 1000 image blog by 30-60 seconds.
This checks to see if the remote node has been downloaded already by generating the file path, and running a fast
pathExists
check. If it was downloaded already and there are cachedHeaders, don't bother requesting the file again.I also added support for an env variable:
NO_CACHE
in case the image updated on the server, and new ones should be downloaded. I have no strong ties to how it's named, and would like to hear if such a flag exists already.