-
-
Notifications
You must be signed in to change notification settings - Fork 1
Operations
Access the Data is a CKAN specialization deployed via the Hack for LA Ops Incubator project to Amazon Web Services Elastic Container Service.
⚠️ That sounds good, right? Beware that while the deployment is mostly complete, Access the Data, as of the end of 2023, is not completely and successfully deployed. This guide needs to be read as somewhat aspirational as a result. Operations engineers joining Access the Data should first strive to make this guide true.
Roughly speaking, the CKAN deployment looks like this:
flowchart TB
CKAN
SOLR
Datapusher
Redis
db([RDS PostgreSQL])
CKAN <--> SOLR
CKAN <--> Redis
CKAN <--> Datapusher
CKAN <--> db
CKAN itself is a Python application, which the Access the Data codebase is an extension of. The default set of plugins CKAN ships with require the other services, which we deploy effectively unmodified.
Each block in the above diagram is a container running in ECS. The databases that CKAN requires are hosted in Hack for LA's single shared RDS instances.
We believe that CKAN uses SOLR to do full-text search of its content, and Redis for caching and/or to queue background jobs.
Datapusher is a CKAN companion application (built by the CKAN team) to allow data sets to be pushed into the CKAN database. This process relies on a somewhat unusual database implementation; CKAN uses a primary database much like a "normal" web application would, and a second database to track the actual data that it manages. Datapusher pulls unstructured data sets into the database.
All the images for the containers in the CKAN runtime are hosted in Hack for LA ECR repositories.
We maintain a deployed CKAN instance
at the root of the accessthedata-ckan
repo,
which is built and pushed manually to ECR.
⚠️ This should be corrected in the near future: the CKAN image should be build and deployed via CI in Github Actions
For SOLR, Datapusher and Redis, we re-tag images selected by the CKAN maintainers.
Those images are:
ckan/ckan-solr:2.10-solr9
ckan/ckan-base-datapusher:0.0.20
redis:6
This tag-and-push process should likewise be handled in Github Actions.
All of the above assets are maintained by Terraform code in the incubator project. Do not attempt to make durable infrastructure changes by hand!
This includes:
- DNS domain registration
- DNS resource configuration
- AWS ALB configuration
- AWS private DNS discovery
- AWS ECS Service configuration
- AWS Task definitions
- AWS ECR repositories
- Database provisioning within the RDS PostgreSQL instance
- Secrets for database authentication, and web secrets (e.g. CSRF)
Changes to the runtime should be effected
by changing configuration in Incubator
and using terraform apply
.
Aspects of the deployment that aren't managed by Terraform
should be brought under Terraform control,
and then updated there.
The most useful tool we have for debugging CKAN is the ECS Tasks view. From there, the running Task can be viewed and its logs reviewed.
There are interesting possiblities around using ECS Exec to investigate behaviors of running containers, but we don't have experience with that yet - the current failure mode exits before the ECS Exec sidecar can boot.