Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload the prerendered JSON files into S3 #390

Merged
merged 12 commits into from
May 14, 2020

Conversation

mertyildiran
Copy link
Contributor

@mertyildiran mertyildiran commented May 7, 2020

This PR implements the getStaticProps part of #355 for getStaticPaths, falback: false

There was no need to define a new cache behaviour will be created in CloudFront for the path pattern for /_next/data/* because this cache behaviour already covers those paths with /_next/* wildcard.

The page has to be visited for caching it. While ZEIT(Vercel) Now has some sort of mechanism to always keep it on cache. I think @danielcondemarin your #355 (comment) indicates that it's possible but with a limit. Also I guess that's why ZEIT(Vercel) Now implemented a 24 Serverless Functions per Deployment limit for Pro plan recently. For a website that with never changing static pages, on certain paths, can we make MinTTL infinite(or at least a huge value)?

The old deployments will consume quite amount of S3 storage space, so I think there should be a clean up after the deployment. The build IDs are always 21 characters long, so maybe we can look up for directories under /_next/data/ with 21 characters name and not equal to current build ID.

@mertyildiran
Copy link
Contributor Author

I tried to redeploy using this PR but CloudFront uses the old cache somehow and pages suppose to be updated are not updating.

@danielcondemarin
Copy link
Contributor

danielcondemarin commented May 7, 2020

Thanks for looking into this @mertyildiran !

There was no need to define a new cache behaviour will be created in CloudFront for the path pattern for /_next/data/* because this cache behaviour already covers those paths with /_next/* wildcard.

The cache behaviour in your link is for the old version of this project. serverless-plugin is not the same as serverless-component. Components are a new project the serverless framework folks started. The cache behaviour you're looking for is this one: https://github.com/danielcondemarin/serverless-next.js/blob/master/packages/serverless-component/serverless.js#L139
And yes you're right, for supporting getStaticProps and getStaticPaths: fallback: false I think we don't need another cache behaviour 👍

For a website that with never changing static pages, on certain paths, can we make MinTTL infinite(or at least a huge value)

There is a PR that will allow configuring the cache behaviour TTLs etc > #282

The old deployments will consume quite amount of S3 storage space, so I think there should be a clean up after the deployment. The build IDs are always 21 characters long, so maybe we can look up for directories under /_next/data/ with 21 characters name and not equal to current build ID.

We have to be careful about this, since serverless-next apps are replicated across the globe, we can only cleanup old builds once Cloudfront has fully propagated the changes, so it may not be as easy as it seems. I think is better to handle that on a separate issue / PR.

@danielcondemarin
Copy link
Contributor

One more thing, could you add a test like this one 🙏 ?

@mertyildiran
Copy link
Contributor Author

mertyildiran commented May 7, 2020

@danielcondemarin OK, I've added a test.

I tried to redeploy using this PR but CloudFront uses the old cache somehow and pages suppose to be updated are not updating.

☝️ OK, I think people will have to use serverless-cloudfront-invalidate package to invalidate the cache if they want to invalidate. So I see now that's not an issue of this PR.

There is a PR that will allow configuring the cache behaviour TTLs etc > #282

Great! 👍

We have to be careful about this, since serverless-next apps are replicated across the globe, we can only cleanup old builds once Cloudfront has fully propagated the changes, so it may not be as easy as it seems. I think is better to handle that on a separate issue / PR.

Cached pages do not come from S3, I've tested that just a moment ago by deleting all of the old build folders. If the page is cached, it will be fetched directly from CloudFront CDN, otherwise AWS will pull it from S3. That's what I understand, I might be wrong. Sure we can discuss on another issue or PR.

@danielcondemarin
Copy link
Contributor

@mertyildiran Is this PR ready ✅ now?

@mertyildiran
Copy link
Contributor Author

mertyildiran commented May 8, 2020

@danielcondemarin I was afraid that upload time will take too long but for ~4000 pages but it got only 0.4% slower. I'm not sure what happens if we hit huge numbers though. Is it possible to bulk upload to S3 or upload in parallel(if it's not in parallel already)?

As I said in here, ZEIT Now has some sort of mechanism to cache the pages by default. I mean even if you are the first visitor of the page, because the page is cached before-hand, you get the HTML instantly. Maybe it's related to #355 (comment) Check this website of mine which I've deployed to ZEIT Now, for example. Everything is instantaneous in that website. How can we achieve the same result with serverless-next.js? Please guide me.

@danielcondemarin
Copy link
Contributor

@danielcondemarin I was afraid that upload time will take too long but for ~4000 pages but it got only 0.4% slower. I'm not sure what happens if we hit huge numbers though. Is it possible to bulk upload to S3 or upload in parallel(if it's not in parallel already)?

Only 0.4% slower is awesome! It already should be uploading to S3 in parallel. As you may have seen on top of that it tries to use S3 accelerated uploads which is much faster.
One question, how long does it take for you to upload the 4k pages?

As I said in here, ZEIT Now has some sort of mechanism to cache the pages by default. I mean even if you are the first visitor of the page, because the page is cached before-hand, you get the HTML instantly. Maybe it's related to #355 (comment) Check this website of mine which I've deployed to ZEIT Now, for example. Everything is instantaneous in that website. How can we achieve the same result with serverless-next.js? Please guide me.

AFAIK I know CloudFront doesn't support what you are describing. Do you have any reference ZEIT Vercel 😁 documentation I can take a look at? Couldn't find much here.

@mertyildiran
Copy link
Contributor Author

@danielcondemarin actually I tried to calculate the elapsed time with two different approaches and I get 200 - 300 seconds for uploading 9k pages, on a 5 mbps upload speed connection.

With the same internet connection, deployment of 9k pages took 2265 seconds.

AFAIK I know CloudFront doesn't support what you are describing. Do you have any reference ZEIT Vercel grin documentation I can take a look at? Couldn't find much here.

I don't have any proof but I rely on my observations. Maybe I should visit all pages(programmatically) after every deployment(via a shell script for example) and set MinTTL to a huge value(I don't know what's the maximum value, maybe you can tell me?) therefore I trigger the caching.

@danielcondemarin
Copy link
Contributor

@danielcondemarin actually I tried to calculate the elapsed time with two different approaches and I get 200 - 300 seconds for uploading 9k pages, on a 5 mbps upload speed connection.

Do you know how long it takes when deploying to Zeit NOW?

Uploads to S3 are async as you can see here. Not sure how else we could optimise here tbh. Since this task is IO and not CPU bound, creating a few more node processes won't make much difference.

Maybe I should visit all pages(programmatically) after every deployment(via a shell script for example) and set MinTTL to a huge value(I don't know what's the maximum value, maybe you can tell me?) therefore I trigger the caching.

You could do that but bear in mind that would only result in caching at one of CloudFront's points of presence / edge location and possibly a regional cache as well. For minimum TTL I'd recommend you set it to one year if you want it to not expire. This RFC recommends doing that.

To mark a response as "never expires," an origin server sends an
Expires date approximately one year from the time the response is
sent. HTTP/1.1 servers SHOULD NOT send Expires dates more than one
year in the future.

@mertyildiran
Copy link
Contributor Author

mertyildiran commented May 8, 2020

@danielcondemarin I cannot deploy to ZEIT Now because of maximum 24 Serverless Functions per Deployment limit. 😄

There is a package called s3-batch-upload but I'm not sure how usable it's in this scenario.

OK, I'm gonna set MinTTL to 31556952 then. But until #282 merged, I have to do it using AWS's web UI 😄

@mertyildiran
Copy link
Contributor Author

@danielcondemarin by the way, I'm gonna do a few more checks before marking this PR Ready.

@mertyildiran
Copy link
Contributor Author

@danielcondemarin I put console.log into getStaticProps and getStaticPaths function bodies and whenever I visit an uncached page I see a log created by getStaticProps in the CloudWatch. That means this PR is not working. What could be wrong?

@danielcondemarin
Copy link
Contributor

@danielcondemarin I put console.log into getStaticProps and getStaticPaths function bodies and whenever I visit an uncached page I see a log created by getStaticProps in the CloudWatch. That means this PR is not working. What could be wrong?

Did you set fallback: false ? getStaticProps should only be called once at build time unless fallback: true (which we haven't implemented yet).

@danielcondemarin
Copy link
Contributor

danielcondemarin commented May 8, 2020

@danielcondemarin I put console.log into getStaticProps and getStaticPaths function bodies and whenever I visit an uncached page I see a log created by getStaticProps in the CloudWatch. That means this PR is not working. What could be wrong?

Did you set fallback: false ? getStaticProps should only be called once at build time unless fallback: true (which we haven't implemented yet).

Also, could you check if there is a .js file for the page in question? If that's the case we may need to check fallback: false to avoid server side rendering the page and instead, proxy the request to the pre-rendered HTML that lives in S3.

@mertyildiran
Copy link
Contributor Author

@danielcondemarin I return fallback: false from getStaticPaths functions like:

return { paths, fallback: false };

There are .js files on my local machine in directories like .serverless_nextjs/default-lambda/pages/post/[id].js if that's the thing you are asking.

If you're talking about the existence of .js files in AWS then where should I look? There are no .js files in S3, if that's the case.

@danielcondemarin
Copy link
Contributor

danielcondemarin commented May 8, 2020

There are .js files on my local machine in directories like .serverless_nextjs/default-lambda/pages/post/[id].js if that's the thing you are asking.

Right, so I have a theory of what is happening:

Because next.js outputs a .js file for fallback: false pages (don't see why) serverless-next.js thinks is a server side rendered page, therefore the request is treated as such, leading to getStaticProps getting called.
An easy way around it would be to look at the prerender-manifest.json and don't copy the .js page file to the lambda artefact if fallback: false See EDIT at the bottom.

For example:

  "dynamicRoutes": {
    "/post/[id]": {
      "routeRegex": "...",
      "dataRoute": "/_next/data/123/post/[id].json",
      "fallback": false,
      "dataRouteRegex": "..."
    }

We should copy the pre-rendered HTML page and its associated JSON props file:

.next/serverless/pages/post/
 > [id].js                
 > id-123.html      # this needs to be uploaded to S3
 > ...
 > id-123.json      # this needs to be uploaded to S3
 > ...

In summary, you'll need to change a few things:

  1. Make sure we upload the prerender-manifest.json file to the Lambda@Edge default artefact. You'll have to do that here

  2. Make sure we upload the pre-rendered HTML pages (id-123, id-456, etc.) to S3. You can do that in the s3-static-assets package which you are already familiar with. The HTML pages are listed in the routes field of the prerender-manifest.json.
    You've already done the uploading of JSON prop files.

  3. Update the default lambda@edge handler so it checks the prerender-manifest. Using the regex provided in routeRegex match against the incoming user request and forward to S3 if it matches.

I may be missing something here, but its the best I can do in 15 mins of writing 😄

Btw: I think I may start a disqus chat group for people to be able to join, especially contributors like yourself. That way communication happens quicker 💨

EDIT

You shouldn't actually need to skip copying the .js page file to the Lambda artefact considering that on step 3. the check for the prerender manifest can happen before attempting to SSR anything ;)

@mertyildiran
Copy link
Contributor Author

@danielcondemarin thanks for the detailed guide. I can do 1 and 2, but 3 is beyond my understanding of inner workflows of serverless-next.js. Could you do 3?

@danielcondemarin
Copy link
Contributor

@danielcondemarin thanks for the detailed guide. I can do 1 and 2, but 3 is beyond my understanding of inner workflows of serverless-next.js. Could you do 3?

Yes! I’m happy to. Just let me know once the prerender-manifest is being uploaded and I’ll jump on the PR

@mertyildiran
Copy link
Contributor Author

mertyildiran commented May 8, 2020

@danielcondemarin I did 1 and 2, although;

I added this function sort of a duplicate of index.ts#L52-L66 in terms of purpose, and a duplicate of index.ts#L72-L87 in terms of coding. I kept index.ts#L52-L66 because it seemed like it has some different purposes.

I'm not sure what's the point of uploading JSON files if we're gonna upload the HTML files.

I observed that the functions specified in here are executed twice. I don't know if that's the intended behaviour.

Anyway, I'm not sure that if I made any mistakes but I'm sure that you'll be able to detect and fix my mistakes. 😊

@mertyildiran mertyildiran marked this pull request as ready for review May 8, 2020 18:15
@danielcondemarin
Copy link
Contributor

@danielcondemarin I did 1 and 2, although;

I added this function sort of a duplicate of index.ts#L52-L66 in terms of purpose, and a duplicate of index.ts#L72-L87 in terms of coding. I kept index.ts#L52-L66 because it seemed like it has some different purposes.

I'm not sure what's the point of uploading JSON files if we're gonna upload the HTML files.

I observed that the functions specified in here are executed twice. I don't know if that's the intended behaviour.

Anyway, I'm not sure that if I made any mistakes but I'm sure that you'll be able to detect and fix my mistakes. 😊

Cheers @mertyildiran . I'll take a look tomorrow!

@marcfielding1
Copy link

@danielcondemarin OK, I've added a test.

I tried to redeploy using this PR but CloudFront uses the old cache somehow and pages suppose to be updated are not updating.

OK, I think people will have to use serverless-cloudfront-invalidate package to invalidate the cache if they want to invalidate. So I see now that's not an issue of this PR.

There is a PR that will allow configuring the cache behaviour TTLs etc > #282

Great!

We have to be careful about this, since serverless-next apps are replicated across the globe, we can only cleanup old builds once Cloudfront has fully propagated the changes, so it may not be as easy as it seems. I think is better to handle that on a separate issue / PR.

Cached pages do not come from S3, I've tested that just a moment ago by deleting all of the old build folders. If the page is cached, it will be fetched directly from CloudFront CDN, otherwise AWS will pull it from S3. That's what I understand, I might be wrong. Sure we can discuss on another issue or PR.

Hey guys, absolutely awesome work on this by the way, I'm from a startup who's rewriting their e-commerce on next/react after a quick PoC on wordpress/woo(I feel very dirty) - we're pretty much a serverless house so this is very interesting, quick note don't add that cloudfront invalidate plugin just call createInvalidation from the AWS/js cloudfront sdk since you've got creds already to do it!

By the way we'll have a about 4000 items this week that will need staticPaths/props implemented so I can try and give this a whirl on our staging env if it helps?

@danielcondemarin
Copy link
Contributor

danielcondemarin commented May 12, 2020

@danielcondemarin I did 1 and 2, although;

I added this function sort of a duplicate of index.ts#L52-L66 in terms of purpose, and a duplicate of index.ts#L72-L87 in terms of coding. I kept index.ts#L52-L66 because it seemed like it has some different purposes.

I can see why you'd be confused between this function and index.ts#L52-L66 as they both upload HTML Static Pages. However, the pages uploaded from the prerender-manifest.json are only those that use getStaticProps and there is an associated JSON props file, whereas the other ones are what Next.js pre-rendered before getStaticProps existed and they are referenced on the .next/pages-manifest.json.

I'm not sure what's the point of uploading JSON files if we're gonna upload the HTML files.

The JSON files are for client side transitions. Click a next <Link /> that links to a page using getStaticProps and you'll see Next.js requesting the json file: GET _next/data/build-id/foo.json

I observed that the functions specified in here are executed twice. I don't know if that's the intended behaviour.

Do you mean HTML pages uploaded in 2 places? If that's it, that's intended as I explained above.

Anyway, I'm not sure that if I made any mistakes but I'm sure that you'll be able to detect and fix my mistakes

I just had to fix the path to the HTML pages to put them in the S3 bucket static-pages/ directory and add support for the prerender-manifest to the Lambda@Edge default handler.

I've done some testing of my own using getStaticPaths & getStaticProps with fallback: false and seems to work as expected! Do you mind doing some testing of your own?, as you have quite a large number of static pages to test with :)

@danielcondemarin
Copy link
Contributor

@danielcondemarin OK, I've added a test.

I tried to redeploy using this PR but CloudFront uses the old cache somehow and pages suppose to be updated are not updating.

OK, I think people will have to use serverless-cloudfront-invalidate package to invalidate the cache if they want to invalidate. So I see now that's not an issue of this PR.

There is a PR that will allow configuring the cache behaviour TTLs etc > #282

Great!

We have to be careful about this, since serverless-next apps are replicated across the globe, we can only cleanup old builds once Cloudfront has fully propagated the changes, so it may not be as easy as it seems. I think is better to handle that on a separate issue / PR.

Cached pages do not come from S3, I've tested that just a moment ago by deleting all of the old build folders. If the page is cached, it will be fetched directly from CloudFront CDN, otherwise AWS will pull it from S3. That's what I understand, I might be wrong. Sure we can discuss on another issue or PR.

Hey guys, absolutely awesome work on this by the way, I'm from a startup who's rewriting their e-commerce on next/react after a quick PoC on wordpress/woo(I feel very dirty) - we're pretty much a serverless house so this is very interesting, quick note don't add that cloudfront invalidate plugin just call createInvalidation from the AWS/js cloudfront sdk since you've got creds already to do it!

By the way we'll have a about 4000 items this week that will need staticPaths/props implemented so I can try and give this a whirl on our staging env if it helps?

@marcfielding1 Please do test it! That would be much appreciated. Bear in mind that this PR only works for getStaticPaths using fallback: false though 🙂

@mertyildiran
Copy link
Contributor Author

mertyildiran commented May 12, 2020

@danielcondemarin is there a way to skip next build on local machine?

I think next build does not scale for huge amount of pages. Instead maybe we can directly push the code into AWS Lambda and triger the build with, for example Scrapy SitemapSpider or a Node.js script that traverses the URLs in sitemap. Therefore we solely rely on the CloudFront caching and get rid of the build time. I mean, I realized that we should make AWS Lambda to do the build work, not our local machines. I think you describe a similar workflow with the schema in this comment of yours, am I right?

I opened this PR to implement your:

All JSON files for a given build can be found in .next/prerender-manifest.json.
serverless-next.js will lookup the files and upload them to S3 so they can be fetch-ed from the browser.

description in here. But now, I think we should follow a different strategy to reduce the build and/or upload time. Hence, this PR might become irrelevant.

Edit: I remember now, there is build: false input option to disable the build. But is it usable in this scenario?

@danielcondemarin
Copy link
Contributor

@danielcondemarin is there a way to skip next build on local machine?

I think next build does not scale for huge amount of pages. Instead maybe we can directly push the code into AWS Lambda and triger the build with, for example Scrapy SitemapSpider or a Node.js script that traverses the URLs in sitemap. Therefore we solely rely on the CloudFront caching and get rid of the build time. I mean, I realized that we should make AWS Lambda to do the build work, not our local machines. I think you describe a similar workflow with the schema in this comment of yours, am I right?

I opened this PR to implement your:

All JSON files for a given build can be found in .next/prerender-manifest.json.

serverless-next.js will lookup the files and upload them to S3 so they can be fetch-ed from the browser.

description in here. But now, I think we should follow a different strategy to reduce the build and/or upload time. Hence, this PR might become irrelevant.

@mertyildiran You should be able to skip the build locally by using build: false. All the available inputs are here.

Also, the problem you are describing as to having a large number of pages and build scalability issues is what getStaticPaths in combination with fallback: true were designed for. Basically Next.js renders a fallback page whilst in the background the static page is compiled, so the build is brought to Lambda. You are right, my architecture proposal implements that and also the more simple case fallback: false.

I don’t think this PR would be irrelevant though, it’s a good step closer to supporting the more complex workflow.

@mertyildiran
Copy link
Contributor Author

@danielcondemarin yeah I remembered build: false after I post the comment then edited my comment later on ☝️

I will adjust my workflow according to build: false and sitemap crawling, let's see how much it will reduce the deployment time.

@danielcondemarin
Copy link
Contributor

@mertyildiran If you don't have any objections I'm planning on getting this PR merged.
A separate one can be created for the fallback: true workflow.

@mertyildiran
Copy link
Contributor Author

mertyildiran commented May 13, 2020

@danielcondemarin sure, feel free to do anything with this PR.

@danielcondemarin
Copy link
Contributor

@danielcondemarin sure, feel free to do anything with this PR.

Thanks. Out of interest did you manage to test it using build: false?

@mertyildiran
Copy link
Contributor Author

mertyildiran commented May 14, 2020

@danielcondemarin what I'm looking for was actually Incremental Static Regeneration which is not implemented yet. So build: false is not a solution in my case.

@danielcondemarin danielcondemarin merged commit 5185649 into serverless-nextjs:master May 14, 2020
@danielcondemarin
Copy link
Contributor

@danielcondemarin what I'm looking for was actually Incremental Static Regeneration which is not implemented yet. So build: false is not a solution in my case.

Makes sense. Let me know if you would like to collaborate on supporting that. More than happy to work on it with you.

sclaughl pushed a commit to sclaughl/serverless-next.js that referenced this pull request Jul 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants