A severless (AWS Lambda) broken link checker for checking 403/404/500s on websites.
This is a Python serverless-based project to create a lambda on AWS running as a daily cron job. The goal is to scrape a website and check that all links are valid (i.e. no 403s, 404s, 500s and 501s).
It scrapes your website using the scrapy
Python library. The crawler
will follow all internal links on your website. All external URLs are
checked, but are not followed. After the crawler has finished it will
send an email using mailgun
s REST API.
It shouldn't be too hard to convert this to use another cloud provider.
- AWS account (see serverless quick start)
- Mailgun account
- Node.js (tested with v6.11.4)
- Serverless (tested with 1.24.1)
- Python3 (tested with 3.6.2)
npm install
serverless plugin install -n serverless-python-requirements
You will need to export the required settings and secrets.
export MAILGUN_API_KEY=key-xxxx MAILGUN_DOMAIN_NAME=example.com [email protected] URL=https://example.com
serverless deploy
The code is set to run every 24 hours. But you can run it manually with:
serverless invoke -f cron
The following environmental variables are exposed. You must set these
before you run serverless deploy
.
- MAILGUN_API_KEY: Your mailgun API key
- MAILGUN_DOMAIN_NAME: Your mailgun domain name
- EMAIL: The email address you want to send the report to
- URL: The URL you want to check (in the format:
https://example.com/
)