issa-pipeline/pipeline/morph-xr2rml/xR2RML at main · issa-project/issa-pipeline

History

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
run-transformation.sh		run-transformation.sh
xr2rml_annif_descriptors.tpl.ttl		xr2rml_annif_descriptors.tpl.ttl
xr2rml_document_descriptors.tpl.ttl		xr2rml_document_descriptors.tpl.ttl
xr2rml_document_domains.tpl.ttl		xr2rml_document_domains.tpl.ttl
xr2rml_document_keywords.tpl.ttl		xr2rml_document_keywords.tpl.ttl
xr2rml_document_metadata.tpl.ttl		xr2rml_document_metadata.tpl.ttl
xr2rml_document_text.tpl.ttl		xr2rml_document_text.tpl.ttl
xr2rml_entityfishing_annot.tpl.ttl		xr2rml_entityfishing_annot.tpl.ttl
xr2rml_geonames_annot.tpl.ttl		xr2rml_geonames_annot.tpl.ttl
xr2rml_pyclinrec_annot.tpl.ttl		xr2rml_pyclinrec_annot.tpl.ttl
xr2rml_rao_stirling_index.tpl.ttl		xr2rml_rao_stirling_index.tpl.ttl
xr2rml_spotlight_annot.tpl.ttl		xr2rml_spotlight_annot.tpl.ttl

README.md

To transform data collected in the metadata, indexing and annotation steps that are generally represented in JSON format into RDF we use Morph-xR2RML tool. This tool takes the following input:

a database query as data input
R2RML mapping language templates as transformation instructions.

In ISSA we chose to use MongoDB as an intermediate queryable database storage as the most suitable for the JSON import, see mongo for details.

This folder provides the scripts, configuration and mappings files needed for transformation to RDF:

script run-transformation.sh is the main entry point. Comment or uncomment the lines as needed.
the configuration xR2RML.properies for Java applications contain the database.name parameter that defines the name of the database to be read for transformation.

👉 Since each update of ISSA dataset is stored in a separate database one parameter database.name of xR2RML.properties will be assigned automatically to the latest update defined in env.sh.

the Turtle files *.tpl.ttl provide the mapping between fields in a database collection and RDF triples, subjects and objects, to build a desired knowledge graph. For simplicity, each template file corresponds to one collection in the database.

Each template file contains placeholders that serve as "parameters":

{{dataset}} - RDF dataset name for provenance
{{collection}} - MongoDB database collection name
{{namespace}} - ISSA instance specific namespace, .e.g. http://data-issa.cirad.fr

To reduce the size of RDF files for annotations the *annot.tpl.ttl mappings are done per article part (title, abstract, body text) and take an extra parameter:

{{documentpart}} - article part name (title, abstract, body_text)

Example of xR2EML transformation:

Input JSON

{ "paper_id" : "123456", 
  "title" :"The irreversible momentum of clean energy", 
  "authors": ["Obama, Barack"]  }

xR2RML mapping

<#LS> 
   a xrr:LogicalSource;   
   xrr:query """db.metadata.find( { paper_id: { $exists: true} } )""".
<#TM> 
   a rr:TriplesMap; 
   xrr:logicalSourse <#LS>;
   rr:subjectMap [
            rr:template "http://example.org/article/{$.paper_id}"   ];        
   rr:predicateObjectMap [ 
           rr:predicate dct:title; 
           rr:objectMap [ xrr:reference "$.title"; ];   ];
   rr:predicateObjectMap [
           rr:predicate dca:creator;
           rr:objectMap [ xff:reference "$.authors.*"; ] ].

Output RDF

<http://example.org/article/123456> dct:title "Irreversible momentum of clean energy".
<http://example.org/article/123456> dca:creator "Obama, Barack".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xR2RML

xR2RML

README.md

Files

xR2RML

Directory actions

More options

Directory actions

More options

Latest commit

History

xR2RML

Folders and files

parent directory

README.md