Three scripts to connect Swedish runic inscriptions in FMIS with literature written about them from Libris, using linked data; intended to enrich K-samsök's user-generated content. Written in brief moments here and there during the Swedish National Heritage Board's ArkHack 2.0 in Umeå, April 2014.
The idea was to produce linked data connecting runic inscriptions to the literature describing them, for the user-generated content part of K-samsök. There are five main sources of data required for this:
- K-samsök, which assigns Swedish monuments URIs and harvests metadata about them from, among other sources…
- …FMIS, which uniquely identifies ancient monuments using a numeric id (used by K-samsök when minting its URIs);
- Libris, which maintains URIs for literary objects;
- Svensk runbibliografi, which contains bibliographic information about works concerning runic inscriptions. For our purposes, we're interested in these works' Libris URIs, and the runic signa of the inscriptions they're about;
- Samnordisk runtextdatabas, which contain masses of useful information about Scandinavian runic inscriptions but is interesting here only insomuch as it allows us to create a mapping between Swedish inscriptions' runic signa (which Svensk runbibliografi uses) and their numeric FMIS-ids (which K-samsök uses). This enables us to connect FMIS objects in K-samsök to Libris URIs from Svensk runbibliografi.
The resulting scripts assume the admittedly unlikely scenario that you have a copy of Samnordisk runtextdatabas mapped to a structured, normalised relational database that you can query to get a list of all runic signa and their corresponding K-samsök URIs. Creating such a database is left as an exercise for the reader, but without one these scripts are unlikely to be of much use unless you can provide the signa-to-FMIS/URI mapping by other means. Consequently, the final output of these scripts – the interesting part! – is also provided here, so you don't actually have to run them yourself. :)
Svensk runbibliografi sadly has no web API or other similar method of directly querying or accessing its data, so the first order of the day is to scrape it and structure the data (yes, their robots.txt
appears to allow this). runlit-fetch.pl
queries Samnordisk runtextdatabas for a complete list of signa, and queries Svensk runbibliografi over the web for details of works pertaining to those inscriptions. Because Svensk runbibliografi does not assign any of its listed works URLs (sic) either (everything is addressed indirectly by session-based query URLs) this means that details about the same work are likely to be fetched multiple times. To speed things up, the script is multithreaded, using MCE to do all of this for a number of records in parallel, hashing the results and discarding works which have already been cached. Once it's finished, it outputs the resulting data to srb-lit.yml
for use by the other two scripts. (Nb. this output is not included here due to unclear licensing of the data.)
runlit2ksam.pl
queries Samnordisk runtextdatabas to create a mapping between runic signa and FMIS ids (i.e. K-samsök URIs). It then reads in the cached bibliographic data from srb-lit.yml
and proceeds to filter the data, looking only for works with Libris URIs which concern inscriptions which have FMIS ids. Using RDF::Trine, the resulting assertions are collated as RDF triples in a temporary (in-memory) triplestore before being dumped out as Turtle to srb-lit-soch.ttl
. Ta-da!
Does exactly the same as runlit2ksam.pl
except that it assumes that you have access to K-samsök's actual UGC hub database (or a reasonable facsimile) and inserts the data there instead, rather than using actual RDF.
TL;DR, here's-one-I-made-earlier:
srb-lit-soch.ttl
contains the RDF assertions relating runic inscriptions in FMIS to literature in Libris, as Turtle.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.