The EXL Inc. prerender
server offers a simple configuration-free Dockerized server for prerendering single page apps for search engines, bots, and other automated systems that suck at/cannot render SPAs properly. The server uses puppeteer
(which in turn uses headless google chrome) to actually render the pages.
-
Crawl and save entire site ahead of time -- this will give you significantly higher ranking due to super fast page load times
-
Automatically generate a
sitemap.xml
file with all indexed pages. This is used by search engines to help speed up indexing -
Serve prerendered (cached) site
-
Serve cached metadata (
sitemap.xml
) -
Serve live -- this is the most expensive option as it will run the render process each time it receives a request
-
Add option to render on the fly when a request for a whitelisted page arrives but it is not found in the cache (right now the search engine would get a 404)
-
Add option to continuously update the cache on a schedule (right now this is typically done by rebuilding the docker image with a cron job)
The easiest way is to get started with the prebuilt docker image running the live server (slowest option, but zero configuration required!):
docker run --cap-add SYS_ADMIN --rm -it -p 3000:3000 exlinc/prerender
This will run the server on port 3000 on your machine. The SYS_ADMIN
capability is required for the puppeteer (Chromium) rendering process.
You can also run it locally (assuming you have google chrome installed)
git clone https://github.com/exlinc/prerender
cd prerender
# You must install yarn for this project, if you don't already have it, run: npm install -g yarn
yarn install
yarn start
Once you have the server running you can call it like this (assuming it's running on localhost:3000):
If you go into your browser and navigate to, http://localhost:3000/http://example.com/
, for example, you will get a prerendered version of that page. You can replace http://example.com/
with your site's address and you will get the prerendered page.
To build an image with a cache built in (for easy distribution on ECS/K8s), you can check out the sample-cached-docker-compose.yml
file and the sample-cached-build.sh
script. This sample will render and write to an image all of the pages that it is directed to index based on the PR_WHITELIST_REGEX
starting at the PR_INDEX_URL
. It will output a ready-to-run image called exlinc/prerender:cached-latest
(you change the name in the script later) that will have all of your data ready for serving.
Assuming the name of your cached image is exlinc/prerender:cached-latest
, as is the default, it would look something like this:
# Port 4000 serves the cached pages
# Port 5000 serves the metadata (sitemap.xml)
docker run --rm -d -p 4000:4000 -p 5000:5000 exlinc/prerender:cached-latest
And you can test it out like this:
# Get your sitemap.xml
curl http://localhost:5000/sitemap.xml
# Get a page (must be in the cache)
curl http://localhost:4000/https://exlskills.com/learn-en/dashboard
This is an example HTTP nginx config that runs behind an SSL-terminating load balancer (like AWS ALB). However, you can easily add HTTPS to the config via the letsencrypt bot or just by manually inserting and configuring the certs+listener.
It will take traffic from known bots and route that to the prerender server, while routing user traffic to your HTML resources. Requests for static files (like JS/CSS/images) bypass prerender and are served by nginx. Check out the EXLskills Web Client source for a full example+docker.
server {
listen 80 default_server;
server_name _;
#charset koi8-r;
#access_log /var/log/nginx/host.access.log main;
root /usr/share/nginx/html;
index /index.html;
location /sitemap.xml {
# NOTE: The prerender container is linked to the nginx container in docker
proxy_pass http://prerender:5000;
}
location / {
try_files $uri @botexl;
}
location @botexl {
set $prerender 0;
if ($http_user_agent ~* "googlebot|bingbot|yandex|baiduspider|twitterbot|facebookexternalhit|rogerbot|linkedinbot|embedly|quora link preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator") {
set $prerender 1;
}
if ($args ~ "_escaped_fragment_") {
set $prerender 1;
}
if ($http_user_agent ~ "bot-exlpre") {
set $prerender 0;
}
if ($uri ~* "\.(js|css|xml|less|png|jpg|jpeg|gif|pdf|doc|txt|ico|rss|zip|mp3|rar|exe|wmv|doc|avi|ppt|mpg|mpeg|tif|wav|mov|psd|ai|xls|mp4|m4a|swf|dat|dmg|iso|flv|m4v|torrent|ttf|woff|svg|eot)") {
set $prerender 0;
}
if ($prerender = 1) {
rewrite .* /$scheme://$host$request_uri? break;
# NOTE: This assumes that the prerender container is linked to the nginx container in docker
proxy_pass http://prerender:4000;
}
if ($prerender = 0) {
rewrite .* /learn/index.html break;
}
}
#error_page 404 /404.html;
# redirect server error pages to the static page /50x.html
#
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root /usr/share/nginx/html;
}
# proxy the PHP scripts to Apache listening on 127.0.0.1:80
#
#location ~ \.php$ {
# proxy_pass http://127.0.0.1;
#}
# pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000
#
#location ~ \.php$ {
# root html;
# fastcgi_pass 127.0.0.1:9000;
# fastcgi_index index.php;
# fastcgi_param SCRIPT_FILENAME /scripts$fastcgi_script_name;
# include fastcgi_params;
#}
# deny access to .htaccess files, if Apache's document root
# concurs with nginx's one
#
location ~ /\.ht {
deny all;
}
}
We use this project in production, however, keep in mind that it is in it's early stages and may not support all edge cases.
When using an option that renders in real time, you should allocate at least 1vCPU and 1GB to 1.5GB of RAM per prerender container, otherwise chrome is unlikely to have enough resources to properly render your pages (assuming you have a realistic ReactJS/AngularJS/VueJS application that loads resources and renders a substantial application). The containers may also crash due to chrome errors/load, so we recommend that you run them with some sort of an auto-restart functionality (such as on AWS ECS or kubernetes).