Skip to content

Latest commit

 

History

History
42 lines (24 loc) · 2.27 KB

README.md

File metadata and controls

42 lines (24 loc) · 2.27 KB

git_to_rdf

what

A utility (that runs in a Docker container) that transforms a git repository on disk into RDF. It uses SPARQL Anything to transform the intermediate csv and json git repository representations into RDF using SPARQL construct queries.

why

I've blogged about using this tool to view a git repository as RDF.

how

Say you cloned this git repo to /home/alice/repo/git_to_rdf, your git repository of interest is at /home/alice/repo/projX, and you want to put the RDF this tool produces at /home/alice/RDF

First build the Docker image.

  1. have Docker installed
  2. have make and wget installed
  3. cd to /home/alice/repo/git_to_rdf
  4. in a bash shell run: make build

Now use the Docker image .

  1. cd to /home/alice/
  2. in a bash shell run: /home/alice/repo/git_to_rdf/git_to_rdf.sh --repository /mnt/repo/projX --output /mnt/RDF
  3. find your 2 output files summaries.nq and hunks.nq at /home/alice/RDF and the log file git_to_rdf.log in your current directory.

speed

It took 6 minutes to transform this git repository (consisting of 1,122 commits) into 2,207,235 triples.

notes

I don't have instructions on using the jar file directly because this utility uses the /tmp directory and it doesn't clean up after itself. Running it in a disposable Docker container means that /tmp directory goes away after it is done running. Also the utility needs splitpatch installed and that comes in the Docker image.

Note the use of /mnt in the instructions. When you run git_to_rdf.sh your current directory gets mapped into the Docker container at /mnt. A drawback of this is that if you want to put the produced RDF at /tmp/ and the git repo of interest lives at /home/alice/repos/projX you'd have to cd to / then run /home/alice/repo/git_to_rdf/git_to_rdf.sh --repository /mnt/home/alice/repo/projX --output /mnt/tmp.

I might address those shortcomings of the utility but my immediate goal is to use the RDF not polish a utility for tidy deployments.

Also, you'll find many TODOs in the source. I may get to those one day.