This document contains tips and tricks for working with Perma.
See the installation documentation to get up and running.
- Perma - developer notes
- Common tasks and commands
- Git and GitHub
- Logs
- Code style and techniques
- Schema and data migrations
- Testing and Test Coverage
- Working with Celery
- Working with Redis
- Running with DEBUG=False locally
- Perma Payments
- Scoop
These commands assume you have configured your shell with the alias defined in the shortcuts section of the installation docs, and that Perma's Docker containers are up and running in the background:
- run
docker compose up -d
to start the containers - run
docker compose down
to stop them when you are finished.
(If you are not running Perma inside Docker, most of the commands below
should still work: just skip the d
!)
d invoke run
That's it! You should now be able to load Perma in your browser at
https://perma.test:8000/
. It will take a few seconds for the first page
to load, while we wait for Perma's CSS, JS and other assets to be compiled.
(Note: if you ran init.sh
when setting up this instance of Perma, the necessary
SSL certs and keys should already be present. If they are not, or if they have
expired, you can run bash make_cert.sh
to generate new files.)
To log in and explore Perma, try logging in as one of our
test users (the linkuser
objects). All test users have a password of "pass".
The server will automatically reload any time you made a change to the
perma_web
directory: just refresh the page to see your changes.
Press CONTROL-C
to stop the server.
d pytest
d npm test
See Testing and Test Coverage for more information about testing Perma.
Python tests are run via pytest. Pytest supports several ways to select and run tests, including a super-convenient keyword-matching option:
d pytest -k "name_of_a_test_that_failed"
d pytest -k "a_specific_test_module"
See Testing and Test Coverage for more information about testing Perma.
Top-level requirements are stored in requirements.in
. After updating that file, you should run
d invoke pip-compile
to freeze all subdependencies into requirements.txt
.
To upgrade a single requirement to the latest version:
d invoke pip-compile --args "-P package_name"
Install new packages: d npm install --save-dev package_name
Uninstall new packages: d npm uninstall package_name
Update a single package:
- if necessary, change the pinned version in package.json
d npm update package_name
Update all dependencies: ``
d ./manage.py makemigrations
d ./manage.py migrate
For more information on migrations, see Schema and data migrations
docker compose down
to delete your existing containers.docker volume rm perma_postgres_data
to delete the database.docker compose up -d
to spin up new containers.docker compose exec web invoke dev.init-db
to create a fresh database, pre-populated with test fixtures.
You can run d bash
to get a bash terminal in your container. Your python
environment will be activated and you will be logged in as root.
You can also prefix arbitrary commands with d
:
d which python
(output: the virtualenv's python)d ls
(output: /perma/perma_web)
We use git to track code changes and use GitHub to host the code publicly.
The Master branch always contains production code (probably the thing currently running at Perma.cc) while the develop branch contains the group's working version. We follow Vincent Driessen's approach.
Fork our repo, then make a feature branch on your fork. Issue a pull request to merge your feature branch into harvard-lil's develop branch when your code is ready.
Track issues using GitHub Issues.
All of your logs will end up in ./services/logs
. As a convenience, you can tail -f all of them with d invoke dev.logs
.
We have several types of users:
- Logged in users are identified the standard Django way:
user.is_authenticated
- Users may belong to organizations. You should test this with
user.is_organization_user
. - Users may belong to a registrar (
user.registrar is not None
). You should test this withuser.is_registrar_member()
. - Admin users are identified the standard Django way:
user.is_staff
Users that belong to organizations can belong to many, including organizations belonging to multiple registrars. Users who belong to a registrar may only belong to a single registrar. Users should not simultaneously belong to both organizations and to a registrar.
All emails should be sent using perma.email.send_user_email
(for an email from us to a user) or
perma.utils.send_admin_email
(for an email "from" a user to us). This makes sure that from
and reply-to
fields
are configured so our MTA will actually transmit the email.
We recommend addressing the email to user.raw_email rather than user.email (which is downcased), just in case.
On the development server, emails are dumped to the standard out courtesy of EMAIL_BACKEND in settings_dev.py.
Front-end assets are processed and packaged by Webpack. Assets can be compiled with this command:
docker compose exec web npm build
This is automatically run in the background by d invoke run
, so there is usually no need to run it manually.
Compiled bundles generated by Webpack will be added to the git repository by CI if you omit them.
We use Django's built-in functions to manage static assets (Javascript/CSS/etc.) and user-generated media (our link archives).
To make sure everything works smoothly in various environments (local dev, Linux servers, and cloud services), be sure to use the following settings when referring to disk locations and URLs in your code and templates:
- STATIC_ROOT: Absolute path to static assets (e.g. '/tmp/perma/static/')
- STATIC_URL: URL to retrieve static assets (e.g. '/static/')
- MEDIA_ROOT: Absolute path to user-generated assets (e.g. '/tmp/perma/generated/')
- MEDIA_URL: URL to retrieve user-generated assets (e.g. '/media/')
The _ROOT settings may have different meanings depending on the storage backend. For example, STORAGES["default"] is set to use the Amazon S3 storage backend, then MEDIA_ROOT would just be '/generated/' and would be relative to the root of the S3 bucket.
In templates, use the {% static %}
tag and MEDIA_URL:
{% load static %}
<img src="{% static "img/header_image.jpg" %}">
<img src="{{ MEDIA_URL }}{{ asset.image_capture }}">
Using the {% static %}
tag instead of {{ STATIC_URL }}
ensures that cache-busting and
pre-compressed versions of the files will be served on production.
In code, use Django's storage
to read and write user-generated files rather than accessing the filesystem directly:
from django.core.files.storage import storages
with storages['default'].open('some/path', 'rb') as image_file:
do_stuff_with_image_file(image_file)
Paths for default storage are relative to MEDIA_ROOT.
Further reading:
We like to host our fonts locally. If you're linking a font from Google fonts and the licensing allows, check out fontdump
*** Before changing the schema or the data of your production database, make a backup! ***
If you make a change to the Django model (models get mapped directly to relational database tables), you'll need to create a migration. Migrations come in two flavors: schema migrations and data migrations.
Schema migrations are used when changing the model structure (adding, removing, editing fields) and data migrations are used when you need to ferry data between your schema changes (you renamed a field and need to move data from the old field name to the new field name).
The most straight forward data migration might be the addition of a new model or the addition of a field to a model. When you perform a straight forward change to the model, your command might look like this
$ d ./manage.py makemigrations
This will create a migration file for you on disk, something like,
$ cat perma_web/perma/migrations/0003_auto__add_org__add_field_linkuser_org.py
Even though you've changed your models file and created a migration (just a python file on disk), your database remains unchanged. You'll need to apply the migration to update your database,
$ d manage.py migrate
Now, your database, your model, and your migration should all be at the same point. You can list your migrations using the list command,
$ d manage.py migrate --list
Data migrations follow the same flow, but add a step in the middle. See the Django docs for details on how to perform a data migration.
- Obtain a database dump with the help of your friendly local dev ops engineer.
- Make sure no containers are running:
docker compose down
. - Edit the
volumes
section of thedb
service ofdocker-compose.yml
:- rename the
postgres_data
volume to something new likeprod_postgres_data
, and make the same change down at the bottom of the file in thevolumes
stanza.
- rename the
- Run
docker compose up -d
. - Run
bash ingest.sh -f path-to-file.dump
. It will take several minutes to complete. Expect a single non-fatal error at the end of the process, "role "rdsadmin" does not exist".
You should then be able to run as usual, and log into any account using the password "changeme".
You should commit your migrations to your repository and push to GitHub.
$ git add perma_web/perma/migrations/0003_auto__add_org__add_field_linkuser_org.py
$ git commit -m "Added migration"
Python unit tests live in perma/tests
, api/tests
, etc.
Functional tests live in functional_tests/
.
Javascript tests live in spec/
.
See the Common tasks and commands for the common techniques for running the tests.
All code must show zero warnings or errors when running flake8 .
in perma_web/
.
Flake8 settings are configured in perma_web/setup.cfg
If you want to automatically run flake8 before pushing your code, you can add something like this to .git/hooks/pre-commit
or .git/hooks/pre-push
:
#!/usr/bin/env bash
docker compose exec -T web flake8 .
exit $?
Be sure to mark the hook as executable: chmod u+x .git/hooks/pre-commit
or chmod u+x .git/hooks/pre-push
.
(You have to have started the containers with docker compose up -d
for this to work.)
Celery does two things in Perma.cc: it runs the capture tasks and it runs scheduled jobs (to gather things nightly like statistics, just like cron might).
In development, it's sometimes easier to run everything synchronously, without the additional layer of complexity a
Celery worker adds. By default Perma will run celery tasks synchronously. To run asynchronously, set CELERY_TASK_ALWAYS_EAGER = False
in settings.py. CELERY_TASK_ALWAYS_EAGER
must be False
if you are specifically testing or setting up a new a Celery <-> Django
interaction or if you are working with LinkBatches (otherwise subtle bugs may not surface).
In our production environment we use Redis as a cache for our thumbnail data. If you want to simulate the production environment:
- find the "redis" stanza of
docker-compose.yml
, currently commented out, and comment in it - find the "volumes" stanza of
docker-compose.yml
and comment inredis_data
- add the caches setting found in
settings_prod.py
to yoursettings.py
If you are running Perma locally for development using the default settings_dev.py, DEBUG is set to true. This is in general a big help, because Django displays a detailed error page any time your code raises an exception. However, it makes it impossible to test your app's error handing, see your custom 404 or 500 pages, etc.
To run with DEBUG=False locally, first stop the webserver, if it's running. Add DEBUG=False
to settings.py or (to alter settings_dev.py). Then, run d ./manage.py collectstatic
, which creates ./services/django/static_assets
(necessary for the css and other static assets to be served properly). Then, run d invoke run
as usual to start the web server.
NB: With DEBUG=False, the server will not automatically restart each time you save changes.
NB: If you make changes to static files, like css, while running with DEBUG=False, you must rerun d manage.py collectstatic
and restart the server to see your changes.
Aspects of Perma's paid subscription service are handled by the companion application, Perma Payments.
By default, Perma's docker-compose.yml
file will spin up a local Perma Payments for you to experiment with. For more fruitful experimentation, configure this Perma Payments to interact with Cybersource's test tier, by running Payments with a custom settings.py that contains our credentials. See docker-compose.yml
and /services/docker/perma-payments/settings.py.example
for more information. CyberSource will not be able to communicate its responses back to your local instance, of course, but you can simulate active subscriptions using the Django admin.
You may also decide to run both services by running docker compose
in both repositories simultaneously, with a tweaked Perma network config.
First, head over to the Perma Payments
repo for instructions on how to spin that up.
Once it's running, spin up Perma... but with a slightly different command than usual, so that it doesn't try to create its own Perma Payments, but instead uses the already-running one:
docker compose -f docker-compose.yml up -d
Then, run Perma's dev server as usual:
docker compose exec web invoke run
When you are finished, take down the Perma containers by running:
docker compose -f docker-compose.yml down
Don't worry if you get the following error:
ERROR: error while removing network: network perma-payments_default id 1902203ed2ca5dee5b57462201db417638317baef142e112173ee300461eb527 has active endpoints
It just means that Perma Payments is still running: the network is maintained until both projects are down. Head back over to the Perma Payments repo and run docker compose down
there... and you're done.
Perma's web archives are produced using Scoop: Perma capture requests call out to the Scoop API, which capture the requested website and return a WARC/WACZ to Perma.
By default, Perma's docker-compose.yml
file will spin up a local Scoop API for you to experiment with.
You may also decide to run both services by running docker compose
in both repositories simultaneously, with a tweaked Perma network config.
First, head over to the Scoop API
repo for instructions on how to spin that up.
Once it's running, spin up Perma... but with a slightly different command than usual, so that it doesn't try to create its own Scoop API, but instead uses the already-running one:
docker compose -f docker-compose.yml up -d
Then, run Perma's dev server as usual:
docker compose exec web invoke run
When you are finished, take down the Perma containers by running:
docker compose -f docker-compose.yml down
Don't worry if you get the following error:
ERROR: error while removing network: network perma-scoop-api_default id 1902203ed2ca5dee5b57462201db417638317baef142e112173ee300461eb527 has active endpoints
The Scoop API is still running: the network is maintained until both projects are down. Head back over to the Scoop API repo and run docker compose down
there... and you're done.
Superset is a data visualization tool that connects to Perma db allowing users to create saved SQL queries, datasets, charts, and dashboards. In order to experiment with the service, run the below commands to stop the running docker containers and to build the service image.
docker compose down
docker compose up -d --build
Navigate to http://localhost:8088/
and log in to the service using the credentials specified in docker-compose.override.yml
. Once logged in, the existing objects should be imported into the local playground.
When the local development is complete, export the dashboards using the Bulk Select Dashboards button, and place the downloaded zip file into the path: services/docker/superset/dashboard_export.zip
.