Harvest information & documentation from repositories and images. Easily extensible to work with multiple Git & Image hosts. Store it all as linked data.
Prerequisites: Docker, (git if building from source)
The repo-harvester is built to collect information about mu-semtech repositories, into app-mu-info. This tutorial will show how to get a basic version up and running. It assumes a basic understanding of mu-semtech.
# docker-compose.yml
version: '3.4'
services:
db:
image: redpencil/virtuoso:feature-woodpecker-feature-builds
environment:
SPARQL_UPDATE: "true"
DEFAULT_GRAPH: "http://mu.semte.ch/application"
volumes:
- ./data/db:/data
harvester:
image: repo-harvester:0.1.0
ports:
- "5000:80" # Remove this in production
links:
- db:database
volumes:
- ./config:/usr/src/app/config
Now all you have to do is run the following commands:
docker-compose up # Starts the stack
curl localhost:5000/init # Populates the database
curl localhost:5000/update # Updates the database
And that's it! However, if you wish, you can change the repo/image sources and categories of this app rather easily. For a production-ready example usage, view app-mu-info.
- repos.conf - Where to find repositores
- categories.conf - How to categorise the repositories
- overrides.conf - In case some of your repositories/images do not follow your other conventions
For all of these files, an example is provided in the config/ folder. You can apply the provided ones by simply removing .example
from the filename.
They can also be loaded in through docker-compose. See app-mu-info/docker-compose.yml for an example.
[mu-semtech GitHub + DockerHub] # Section name, arbitrary
repos_host=GitHub # Host on which the repos are hosted. Case-insensitive
repos_username=mu-semtech # Username/org name on the repos' host
images_host=DockerHub # Host on which the repos' relevant iamges are hosted. Case-insensitive
images_username=semtech # Username/org name on the images' host
Currently supported hosts are:
name/id | type | Link to code |
---|---|---|
GitHub | Repositories | reposource/GitHub.py |
DockerHub | Images | imagesource/DockerHub.py |
For info on what categories are, please see the discussions section
Categories are defined as follows:
[templates] # Section name, arbitrary
name=Templates # Human readable name
id=templates # id, will be used to generate a linked data url
regex=.*-template # regex which will be ran against the repository name to check whether it's part of the category
The order in which you add them does matter, as every repo only has one category. If you add a category with a .*
regex on the top of the file, it will subsequently match everything. Make sure to work down from most specific to less specific/catch-all.
You can configure overrides.conf in case you break your own Category naming convention, or want to archive specific repositories without changing anything in the repository itself. The syntax is as follows:
[regex-.*-for-repo-name]
Category=tools # Optional, reassigns to the category with specified
ImageName=mu-login-service # Optional, overrides the container image name for this repo
Thanks to mu-python-template, you can mount the following volumes to save cache, as well as enable live-reload.
version: '3.4'
services:
# ...
harvester:
# ...
volumes:
- /path/to/repo-harvester/:/app
- ./cache/:/tmp/repo-harvester/
ports:
- "5000:80"
environment:
MODE: "development" # mu-python-template variable
LOG_LEVEL: "debug" # mu-python-template variable
RH_CACHE: "True" # Save requested API data to cache
RH_PRINT: "True" # Print to stdout
It is only intended to be used during development.
git clone https://github.com/mu-semtech/repo-harvester.git
cd repo-harvester
docker build -t repo-harvester .
docker run -p 80:80 repo-harvester
Please also check the docstrings and typing included in the code!
For the model which this microservice uses, check the reference in mu-app-info.
Categories are arbitrary. We've added these because a lot of organisations - mu-semtech included - have a lot of different types of repos. More specifically, repos that are no longer in active use, as well as repos that are closely related to eachother. Categories allow you to categorise your repos in any way you want. See it as an html data-
tag.
We've implemented revisions to allow easy traversal through current & past documentation. However, there was no set way to define a revision. Using Image tags would be useless, as there are often projects with multiple images per release (alpine vs regular for example). Using GitHub releases would break this projects rule of being platform-independent. Using solely Git Tags could lead to a lack of relevant Image releases.
Thus revisions are found & added by the following logic: if a Git tag has a corresponding image tag, it is counted as a revision.
This project is licensed under the MIT License.