Skip to content

Commit

Permalink
Prerender workers (#1398)
Browse files Browse the repository at this point in the history
* spike lambda renderer

* working on node errors

* i think this is in a working state now

* Added prerender manager script and copied the lambda script as the worker script

* Added worker code; Handle the sitemap as well

* Removed lambda-specific stuff

* Restored the original prerender script

* Commented out sitemap code and restored code that waits 1 minute and checks that the work queue message attributes are 0

* Fixed lint and typecheck errors; Actually delete processed messages from the queue

* Reverted change to script/entry.js

* Fixed some mistakes in the manage script; Added more logging

* Repurposed Tom's lambda Dockerfile
Fixed a bunch of errors in the prerendering script

* Fixed crash when trying to get messages from an empty queue

* Removed local proxy; Fixed lint errors

* Made the saveS3Page function async

* Reduced S3 concurrency to prevent errors and fixed Cluster deletion

* Use c6i instaces instead of t2

* Reverted credentials code

* Use node work_threads in the work script so a single instance of the script can use multiple cores and avoid network errors due to CPU contention

* Fixed errors in the threading code

* Send a single page to each thread rather than sets of 10

* Since we wait for the threads on each loop iteration to send the delete messages, there will always be threads available when we get 10 new messages, so simplify the logic

* Use asyncPool for prepareBooks
Use a promise for the work queues rather than polling

* Handle different message types
Queue and work sitemap jobs

* Write sitemaps to S3 instead of locally

* Upload sitemaps with text/xml content type

* Removed stats that don't work with distributed jobs

* Don't block the queuing thread waiting for all work threads to complete each batch of messages

* Removed more unused stats

* Create a separate concourse dir for prerender-workers

* Exclude already-uploaded files from verify-production-urls

* Replaced echo with print to fix shellcheck issue

* Fixed lint error that is only caught on CI for some reason

* Improved some comments

* Removed concourse/prerender-workers, to be re-added on the next PR

* Reverted verify-production-urls

* Removed unused Server export

* Added a comment explaining the sqsHeartbeat function

* Converted the task switch statement to a mapping of task functions
Replaced function factories with the use of function.bind()

* Replaced some more undefined checks with assertDefined

* Combined prerender-workers code with old prerender code without modifying old code behavior

* Combined the sitemap functions some more

* Combined the old and new page rendering code as much as possible

* Removed duplicated comment

* Use more threads for bigger instances

* Replaced CodeVersion and ReleaseIdSuffix parameters with ReleaseId and SanitizedReleaseId

* Updated some comments

* Added a random BUILD_ID suffix, used to keep stack and resource names unique

* Changed the Application tags to Rex Prerender

* Moved content route stuff (sitemap etc) to a separate file
Rewrote some of the types using existing generic types

* Removed unnecessary assertDefineds in getArchivePage() and getArchiveBook()

* Changed most contentPages functions to work with SerializedPageMatch instead of PageMatch
renderAndSavePage() now calls deserializePageMatch() to add { route: content } to its match argument

* Removed commented-out RedriveAllowPolicy

* Decomposed writeS3File function and replaced bind() calls

* Moved import './setup' to contentPages

* Split renderManifest() into its own file

* Format the ISO date using date-fns-tz instead of toISOString()

* Use lodash/fp/chunk instead of a manual loop to chunk pages

* Unnest page task and sitemapIndex task payloads

* More typechecks for message payloads

* Removed Loadable.preloadAll() from manage.ts and moved loader initialization to the beginning of the file

* Removed commented-out log messages

* Improved comments about checking that the SQS queue is empty

* Rewrote most of manage.ts to make the logic easier to follow

* Rewrote findOutputValue() for clarity

* manage() pieces now pass some arguments around; got rid of all non-const globals

* Fixed undefined variable error

* Take BuildId from the stack name rather than passing it in as a parameter

* Pass the PrerenderImageTag to the workers stack rather than a SanitizedReleaseId

* Re-added CodeVersion parameter, required for the workers to run

* Increased workers stack create/delete timeout

* Don't set execArgv for worker_threads
Add a global hook to handle all unhandledRejections and exit
Split some of thread.ts into smaller functions

* Removed max-old-space-size and added vm.max_map_count instead

* Added tests for assertObject()

* Removed remaining commented-out console.log()

* Make savePage return Promise<unknown> and make savePageAsset async

* Use lodash's once function to initialize the S3Client

* Make saveFile return Promise<unknown> and make renderSitemap* functions use an async version of writeAssetFile

* Integrate prerender workers with Concourse (#1408)

* Readded concourse/prerender-workers

* Renamed prerender-workers/* to prerender/manage.*

* Renamed prerender to prerender:local and prerender:manage to prerender:fleet

* Renamed prerender target to release and made it the default
Removed entrypoint and cmd from release target

* Use a prerender-mode parameter to switch prerendering modes
No longer need to run yarn install and yarn build on the rex-web image and cache node_modules

* Removed -prerender suffix from Rex image tag

* Fixed lint errors

* Downcased ci target; Build release target from utils instead of CI

* Re-added removed calls to yarn install and yarn build

* Allow execute on concourse/prerender/script.bash

* Added AWS secrets and BUCKET_NAME to prerender task
Read IMAGE_TAG from build-config

* Re-added -prerender- to the prerender stack name

* Give the large Docker image another try

* Run prerender from the image, not from the repo

* Also set the correct paths for update-content

* Build Docker image with build-configs as build args
Use PUBLIC_URL during yarn build and store it in the image

* Might as well store all build configs in the image

* Try the other Docker env variable format to see if it handles quotes better

* Create unquoted config file to use with oci-build-task

* Increased the prerender stack creation timeout

* Increased the workers stack creation/deletion timeout to 5 minutes

* Revert "update content (#1468)"

This reverts commit 31c7350.

* Pass the document to wrapSolutions

* Fixed the redirect for Fisica Universitaria

* Handle unicode URIs in update-redirects-data

* Also call decodeURI in prepareRedirects

* Add a maximum number of SQS heartbeats
Make failed messages immediately visible to other workers

* Reorganized and DRYed up code that changes ReceiptHandle visibility timeout

Co-authored-by: staxly[bot] <35789409+staxly[bot]@users.noreply.github.com>
Co-authored-by: Thomas Woodward <[email protected]>

Co-authored-by: tom <[email protected]>
Co-authored-by: staxly[bot] <35789409+staxly[bot]@users.noreply.github.com>
  • Loading branch information
3 people authored May 24, 2022
1 parent 8977214 commit 920b1f4
Show file tree
Hide file tree
Showing 32 changed files with 2,704 additions and 151 deletions.
89 changes: 84 additions & 5 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,85 @@
# Copied from .gitignore

# dependencies
node_modules
Dockerfile
.dockerignore
.git
node_modules
coverage
/.pnp
.pnp.js

# testing
/coverage
/src/test/fixtures/specials/
.eslintcache

# production
/build

# misc
.DS_Store
.env.local
.env.development.local
.env.test.local
.env.production.local
/src/react-app-env.d.ts

__diff_output__

npm-debug.log*
yarn-debug.log*
yarn-error.log*

# lighthouse report
/localhost_*.report.html

# Byte-compiled / optimized / DLL files
__pycache__/
*$py.class

# C extensions
*.so

# Python Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Pytest cache
.pytest_cache

# Local Virtual Environment
.venv/

# pytest report
ui-test.html

# generated code
src/content.css
src/content.css.map
open-search-client

# development environment.json
public/rex/environment.json

# pre-rendering cache folder
/cache
36 changes: 32 additions & 4 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# this dockerfile is not for production, its for QA and CI
FROM debian:buster as utils
FROM debian:buster AS utils

# general utils
RUN apt-get update && apt-get install -y \
Expand Down Expand Up @@ -33,11 +33,12 @@ RUN curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.35.3/install.sh | b
NVM_DIR="$HOME/.nvm" && . $HOME/.nvm/nvm.sh && \
cd && nvm install && \
npm install -g yarn && \
ln -s $(dirname $(which node)) /usr/local/node-bin
mv $(dirname $(dirname $(which node))) /usr/local/node && \
rm -r "$NVM_DIR"

ENV PATH /usr/local/node-bin:$PATH
ENV PATH /usr/local/node/bin:$PATH

from utils as CI
FROM utils AS ci

# shellcheck (apt version is very old)
# includes crazy hack around some linking issue from https://github.com/koalaman/shellcheck/issues/1053#issuecomment-357816927
Expand Down Expand Up @@ -94,3 +95,30 @@ RUN apt-get update && apt-get install -y \
&& rm -rf /var/lib/apt/lists/*

FROM utils AS release

# Docker trickery so we can reuse the yarn install layer until package.json or yarn.lock change
COPY package.json yarn.lock /code/
WORKDIR /code
RUN yarn install

COPY . /code

ARG BOOKS
ENV BOOKS=${BOOKS}

ARG IMAGE_TAG
ENV IMAGE_TAG=${IMAGE_TAG}

ARG PUBLIC_URL
ENV PUBLIC_URL=${PUBLIC_URL}

ARG REACT_APP_CODE_VERSION
ENV REACT_APP_CODE_VERSION=${REACT_APP_CODE_VERSION}

ARG REACT_APP_RELEASE_ID
ENV REACT_APP_RELEASE_ID=${REACT_APP_RELEASE_ID}

ARG REACT_APP_ENV
ENV REACT_APP_ENV=${REACT_APP_ENV}

RUN yarn build:clean
4 changes: 3 additions & 1 deletion concourse/build-image/task.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,13 @@ image_resource:
tag: master

params:
BUILD_ARGS_FILE: build-configs/unquoted-config.env
CONTEXT: rex-web
UNPACK_ROOTFS: true

inputs:
- name: build-configs
- name: rex-web
path: .

outputs:
- name: image
Expand Down
8 changes: 8 additions & 0 deletions concourse/create-build-configs/script.js
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ const crypto = require('crypto');
const versionFile = 'rex-web/.git/ref';
const booksFile = 'rex-web/src/config.books.json';
const envFile = 'build-configs/config.env';
const unquotedEnvFile = 'build-configs/unquoted-config.env';
const commitFile = 'build-configs/commit.txt';
const releaseFile = 'build-configs/release-id.txt';
const imageFile = 'build-configs/image-tag.txt';
Expand Down Expand Up @@ -52,6 +53,13 @@ Promise.all([
handleErr
);

console.log('Generating unquoted env file...');
fs.writeFile(
unquotedEnvFile,
Object.entries(args).reduce((result, [key, value]) => [...result, `${key}=${value}`], []).join("\n"),
handleErr
);

console.log(`Generating commit file with: ${commit}`);
fs.writeFile(commitFile, commit, handleErr);

Expand Down
4 changes: 2 additions & 2 deletions concourse/create-build-configs/task.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ image_resource:
repository: node

inputs:
- name: rex-web
- name: rex-web
outputs:
- name: build-configs

run:
path: rex-web/concourse/create-build-configs/script.js
path: rex-web/concourse/create-build-configs/script.js
8 changes: 3 additions & 5 deletions concourse/prerender/script.bash
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/bin/bash

set -ex
set -euxo pipefail

destination=$(pwd)/release

Expand All @@ -9,10 +9,8 @@ source build-configs/config.env
# shellcheck disable=SC2046
export $(cut -d= -f1 build-configs/config.env)

cd rex-web
cd /code

yarn install
yarn build:clean
yarn prerender
yarn "prerender:$PRERENDER_MODE"

cp -r build/* "$destination"
10 changes: 6 additions & 4 deletions concourse/prerender/task.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,15 @@
platform: linux

inputs:
- name: rex-web
- name: build-configs
outputs:
- name: release

caches:
- path: rex-web/node_modules
params:
AWS_ACCESS_KEY_ID: ((prod-aws-access-key-id))
AWS_SECRET_ACCESS_KEY: ((prod-aws-secret-access-key))
BUCKET_NAME: ((prod-unified-s3-bucket))
PRERENDER_MODE: ((prerender-mode))

run:
path: rex-web/concourse/prerender/script.bash
path: /code/concourse/prerender/script.bash
4 changes: 1 addition & 3 deletions concourse/update-content/script.bash
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/usr/bin/env bash

cd rex-web
cd /code

if [ "$GITHUB_USERNAME" != "" ] && [ "$GITHUB_PASSWORD" != "" ]; then
git config --global user.email "$GITHUB_USERNAME"
Expand All @@ -11,8 +11,6 @@ fi
# this is here so the creds don't get pasted to the output
set -e; if [ -n "$DEBUG" ]; then set -x; fi

yarn

approved_books_default_branch=$(curl -s https://api.github.com/repos/openstax/content-manager-approved-books | jq -r .default_branch)
rex_default_branch=$(curl -s https://api.github.com/repos/openstax/rex-web | jq -r .default_branch)

Expand Down
8 changes: 1 addition & 7 deletions concourse/update-content/task.yml
Original file line number Diff line number Diff line change
@@ -1,16 +1,10 @@
---
platform: linux

inputs:
- name: rex-web

params:
GITHUB_ACCESS_TOKEN: ((github-access-token))
GITHUB_USERNAME: ((github-username))
GITHUB_PASSWORD: ((github-token))

caches:
- path: rex-web/node_modules

run:
path: rex-web/concourse/update-content/script.bash
path: /code/concourse/update-content/script.bash
17 changes: 12 additions & 5 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -57,13 +57,15 @@
"lint:bash": "shellcheck $(find . -type f \\( -iname '*\\.sh' -or -iname '*\\.bash' \\) | grep -v 'node_modules' )",
"prestart": "npm run-script build:css",
"start": "HTTPS=${HTTPS:-true} craco start",
"start:static": "export REACT_APP_ENV=${REACT_APP_ENV:-test} && npm run-script build && npm run-script prerender && npm run-script server",
"start:static": "export REACT_APP_ENV=${REACT_APP_ENV:-test} && npm run-script build && npm run-script prerender:local && npm run-script server",
"clean": "rm -rf ./build",
"build": "npm run-script build:css && npm run-script build:js",
"build:css": "lessc --source-map --source-map-include-source ./generic-styles/index.less ./src/content.css",
"build:js": "export REACT_APP_ENV=${REACT_APP_ENV:-production} && GENERATE_SOURCEMAP=${GENERATE_SOURCEMAP:-true} craco build",
"build:clean": "npm run-script clean && npm run-script build",
"prerender": "REACT_APP_ENV=${REACT_APP_ENV:-production} node ./script/entry prerender/index",
"prerender:local": "REACT_APP_ENV=${REACT_APP_ENV:-production} node ./script/entry prerender/local",
"prerender:fleet": "REACT_APP_ENV=${REACT_APP_ENV:-production} node ./script/entry prerender/fleet",
"prerender:work": "REACT_APP_ENV=${REACT_APP_ENV:-production} node ./script/entry prerender/work",
"server": "REACT_APP_ENV=${REACT_APP_ENV:-development} node ./script/entry server/cli",
"codecov": "codecov -f ./coverage/lcov.info",
"coverage-report": "open coverage/index.html",
Expand All @@ -78,7 +80,7 @@
"test:build": "REACT_APP_ENV=test npm run-script build:clean && npm run-script test:sourcemap && npm run-script test:prerender",
"test:sourcemap": "REACT_APP_ENV=test node ./script/verifySourcemaps.js",
"test:prerender": "npm run-script test:prerender:setup && npm run-script test:prerender:specs && npm run-script test:prerender:browser",
"test:prerender:setup": "REACT_APP_ENV=test npm run-script prerender && npm run-script pretest:sims",
"test:prerender:setup": "REACT_APP_ENV=test npm run-script prerender:local && npm run-script pretest:sims",
"test:prerender:specs": "REACT_APP_ENV=test SERVER_MODE=built jest --testPathPattern=\"(\\.|/)prerenderspec\\.tsx?\" --config jest-puppeteer.config.json",
"test:prerender:browser": "REACT_APP_ENV=test SERVER_MODE=built jest --testPathPattern=\"(\\.|/)browserspec\\.tsx?\" --config jest-puppeteer.config.json -i",
"test:prerender:screenshots": "REACT_APP_ENV=test SERVER_MODE=built jest --testPathPattern=\"(\\.|/)screenshotspec\\.tsx?\" --config jest-puppeteer.config.json",
Expand All @@ -97,6 +99,10 @@
"not op_mini all"
],
"devDependencies": {
"@aws-sdk/client-cloudformation": "^3.43.0",
"@aws-sdk/client-s3": "^3.44.0",
"@aws-sdk/client-sqs": "^3.43.0",
"@aws-sdk/credential-providers": "^3.45.0",
"@babel/core": "^7.0.0-0",
"@babel/plugin-proposal-class-properties": "^7.1.0",
"@babel/plugin-proposal-object-rest-spread": "^7.0.0",
Expand All @@ -118,7 +124,7 @@
"@types/js-cookie": "^2.2.2",
"@types/lodash": "^4.14.136",
"@types/md5-file": "^4.0.0",
"@types/node": "^12.0.10",
"@types/node": "^14.0.0",
"@types/node-fetch": "^2.1.4",
"@types/pretty": "^2.0.0",
"@types/progress": "^2.0.3",
Expand All @@ -140,7 +146,8 @@
"babel-core": "7.0.0-bridge.0",
"babel-plugin-transform-dynamic-import": "^2.1.0",
"codecov": "^3.8.1",
"date-fns": "^1.30.1",
"date-fns": "^2.28.0",
"date-fns-tz": "^1.2.2",
"express": "^4.16.4",
"glob": "^7.1.3",
"http-proxy-middleware": "^0.19.0",
Expand Down
Loading

0 comments on commit 920b1f4

Please sign in to comment.