Creating a standalone repository service allows you to store and have control over the licensing data that you are publishing into the Copyright Hub (CH) ecosystem. In order to make that as simple as possible we have created an Amazon Web Services (AWS) Image that you can set up in your AWS with minimal config changes. In this document we will describe the steps required to get up and running with your own (CH) repository.
To fully complete this process you will need to have:
-
An AWS account.
-
A domain with an SSL certificate (e.g. https://copyrighthub.org).
-
The SSL keys for the above certificate.
-
An ssh key to log into the machine you will create (see here for instructions on how to generate ssh keys)
There are four steps to follow:
This is currently a process that requires authorisation at every step, so we recommend that you liaise with a CH admin when you are ready to do this, in order to complete the steps in quick succession.
Go to http://services.copyrighthub.org/
Create an account on http://services.copyrighthub.org/signup
You will receive an email requesting that you click a link to verify your email address. Note that even after clicking on this link, you may still have to logout of the service.copyrighthub.org system and then re-login in order to fully activate your account.
Click on "Create a New Organization" and fill in the form.
Only organization name is a required field. You will not get any on-screen confirmation that your request has been sent in, but you will get an email saying it has been sent in. The request has to be approved by a CH admin. When that happens, you will get an email confirming your request has been approved.
Join your organisation by using the “Join Existing Organisation” button
Again, you will get an email saying you have requested to join. Once the request has been authorised, you will get an email telling you it has been authorised.
Select the Organisation from the “Existing Organisations” dropdown
Click on "Create a New Service"
"Location" is the URL where you intend the repo to be available, e.g. “https://myrepo.mydomain.org”. "Service Type" is “Repository”.
Make a note of the client ID and Client Secret for later.
As above, this request has to be approved. You will get an email saying your request has been sent in. And you will get an email when it has been approved.
Go to “Repositories” and select “Create a new Repository”
Select the repository service you have just created above and give it a name. Again, you will get an email acknowledging your request, and an email when it has been approved.
Make a note of the repository id
You have completed Step 1!
Go to [link to AMI]
Choose the region you want the instance deployed in
Go through the setup procedure. We recommend:
- a t2.xlarge instance type
- That it sits in its own VPC
- When configuring the security group, add a rule to allow all traffic (0.0.0.0) to port 8765 (where the repo service is looking for queries)
Wait for the machine to go through the pending state and be running
Your instance will have an IP address. At this point we recommend that you:
-
Obtain an Amazon Elastic IP address (i.e. a permanent IP address. This is a service you pay for) and assign it to your new instance.
-
Create a DNS entry in AWS Route52 that points the Elastic IP address to your preferred subdomain on your site (e.g. repo.mydomain.org)
More instructions here....
ssh into the machine
ssh -i ~/.ssh/your-key-pair.pub ubuntu@{your-instance-IP}
Edit (using sudo!) the file at /srv/repository/current/config/local.conf
Enter your client secret, and service id (see Step 1 above. Confusingly, service_id = client_id), e.g.
service_id = 91b43242eb3e95ecdb178d36204b8f69
client_secret = DSxnM9PRB3CdK9lAV5pTTqD1mDnokj
use_ssl = False
Restart the repository service
sudo supervisorctl
restart repository
Edit the nginx config at /etc/nginx/conf.d/opp.conf
Make sure ssl_certificate and ssl_certificate_key match the location of your SSL cert and key, eg:
server {
listen 8765 ssl;
ssl_certificate /etc/ssl/certs/{redacted}.crt;
ssl_certificate_key /srv/{redacted}.key;
Make sure that the certificate files have the right permissions and owners:
sudo chmod 775 {keyfile}
sudo chown ubuntu:ubuntu {keyfile}
sudo chmod 775 {certfile}
sudo chown root:root {certfile}
Restart nginx
sudo nginx -s reload
If it wasn’t already running (you get an error when doing the above) then just
sudo nginx
Go to yourdomain:8080 (the blazegraph admin)
Click on the "Namespaces" tab and create a namespace that corresponds to the repo id of your organisation (see Step 1 above).
At this point, you should have a repository that is fully connected to the Copyright Hub ecosystem. To test it, you can use the onboarding and query services provided by the Copyright Hub.
Save this document to a Makefile in your local machine (and substitute in your client, secret and repo ids) :
.DEFAULT_GOAL := onboard
SRV_AUTH = https://auth.copyrighthub.org
SRV_ON = https://on.copyrighthub.org
CLIENT = <your client id here>
SECRET = <your client secret here>
REPO = <your repository id here>
AUTH_FILE = auth.json
DATA_FILE = on.csv
clean:
-rm $(AUTH_FILE) $(DATA_FILE)
$(AUTH_FILE) :
curl -k $(SRV_AUTH)/v1/auth/token --user $(CLIENT):$(SECRET) --data "grant_type=client_credentials&scope=delegate[https://on.copyrighthub.org]:write[$(REPO)]" -o $@
auth: $(AUTH_FILE)
$(DATA_FILE) :
echo source_id_types,source_ids,offer_ids,description > $@
echo danpicspictureid,DSC_012344567,,"Leopard eating Gazelle in Africa" >> $@
data: $(DATA_FILE)
onboard: clean auth data
$(eval TOKEN := $(shell python -c "import sys, json; print(json.loads(open('${AUTH_FILE}').read())['access_token'])"))
curl -k $(SRV_ON)/v1/onboarding/repositories/$(REPO)/assets --data-binary @$(DATA_FILE) --header "Accept: application/json" --header "Content-Type: text/csv; charset=utf-8" --header "Authorization: $(TOKEN)"
verify: clean auth data
$(eval TOKEN := $(shell python -c "import sys, json; print(json.loads(open('${AUTH_FILE}').read())['access_token'])"))
curl -k $(SRV_AUTH)/v1/auth/verify --header "Accept: application/json" --header "Content-Type: application/x-www-form-urlencode" --header "Authorization: BASIC [$(CLIENT):$(SECRET)]" --data-binary "requested_access=r&token=$(TOKEN)&resource_id=$(REPO)"
Then run
make
If all is well, you should get something like this as a response:
{
"status": 200,
"data": [{
"entity_id": "74c2436fae9e4a13a9d85a6f5a4578e4",
"source_ids": [{
"source_id": "DSC_012344567",
"source_id_type": "danpicspictureid"
}],
"hub_key": "https://openpermissions.org/s1/hub1/f74c5de3db2e49c693c152a2da87e6d7/asset/74c2436fae9e4a13a9d85a6f5a4578e4",
"entity_type": "asset"
}]
}
If changes have been made to the code and pushed to Github and you want to integrate them into your installation, take the following steps:
sudo -u deploy -s
cd /srv/repository/current
git checkout master
git pull
exit
sudo supervisorctl restart repository
This will pull the latest version from Github and restart the service to incoporate the new code.
- Create an s3 bucket for you backups, eg
bigdata-backups
- Create an IAM policy to grant access to that bucket, with name
bigdata_backups_S3_Bucket_ReadWrite
, with the following policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::bigdata-backups"
]
},
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::bigdata-backups/*"
]
}
]
}
- Create an EC2 instance role called
ec2_blazegraph
with the above policy attached - Assign the new role
ec2_blazegraph
to your repository instance in the EC2 Management Console by selecting your instance and going to Action -> Instance Settings -> Attach/Replace IAM role - on the instance find the
/home/opp-backup/bin/jnl_backup.sh
file and edit it as below (replace the bucket name as appropriate from step 1)
#!/bin/bash
TIME_NOW=$(date +%s)
service monit stop
service bigdataNSS stop
aws s3 cp --region eu-west-1 --recursive /var/lib/bigdata/var/data/ s3://bigdata-backups/$TIME_NOW/
service bigdataNSS start
service monit start
- use crontab to add the following cron schedule with
crontab -e
# Blazegraph (bigdata) backup to Amazon S3 Bucket
0 23 * * * sudo /home/opp-backup/bin/jnl_backup.sh >/dev/null 2>&1