RDFminer is an open Web application to automatically discovering SHACL shapes representative of an RDF data graph. It is composed of the following components:
RDFminer-core
an API to exploit the evolutionary algorithm implemented in Java. The server relies on an implementation of a RESTful web service using the JAX-RS framework.RDFminer-front
a VueJS application to control the mining process interactively: it enables to parameterize and launch the discovery process, monitor its execution, inspect and analyze its results.RDFminer-server
a server to provide web services: interaction betweenfront
andcore
, db, ... It relies on an Express server for web services and socket.io server for websockets transport.
RDFminer builds candidate shapes using BNF grammar writted by the user (in the Creation form), it must be compliant with the SHACL W3C recommendation. They are composed of static and dynamic rules.
Considering the following grammar:
Shape := " a " NodeShape
NodeShape := "sh:NodeShape ; " ShapeBody
ShapeBody := "sh:targetClass " Class " ; " ShapeProperty
ShapeProperty := "sh:property [ " PropertyBody " ] . "
PropertyBody := "sh:path rdf:type ; sh:hasValue " Class " ; "
Class := "SPARQL ?x a ?Class ."
Here, the rule Class
is dynamic and includes the SPARQL keyword for mining all classes from the named RDF data graph. Every classes are associated to this rule:
// It becomes:
Class := "<Class_1>" | "<Class_2>" | ... | "<Class_n>"
Consequently, the rules ShapeBody
and PropertyBody
will evolve between each candidates:
// candidate shape 1
"a sh:NodeShape ; sh:targetClass <Class_21> ; sh:property [ sh:path rdf:type ; sh:hasValue <Class_4> ] ."
// candidate shape 2
"a sh:NodeShape ; sh:targetClass <Class_641> ; sh:property [ sh:path rdf:type ; sh:hasValue <Class_89> ] ."
// ...
As only two characters evolve, the ChromosomeSize
value must be 2
More complex grammar can be designed to build rich shapes:
Shape := " a " NodeShape
NodeShape := "sh:NodeShape ; " ShapeBody
ShapeBody := ClassTarget | SubjectsOfTarget | ObjectsOfTarget
ClassTarget := "sh:targetClass " Class " ; " ShapeProperty
SubjectsOfTarget := "sh:targetSubjectsOf " Property " ; " ShapeProperty
ObjectsOfTarget := "sh:targetObjectsOf " Property " ; " ValueTypeConstraintComponent " . "
ShapeProperty := "sh:property [ " PropertyBody " ] . "
PropertyBody := "sh:path " Property " ; " ValueTypeConstraintComponent " ; "
ValueTypeConstraintComponent := ClassConstraint | DatatypeConstraint | NodeKindConstraintinstance of a given type.
ClassConstraint := "sh:class " Classdatatype of each value node.
DatatypeConstraint := "sh:datatype " DataTypeeach value node.
NodeKindConstraint := "sh:nodeKind " NodeKind
NodeKind := "sh:BlankNode" | "sh:IRI" | "sh:Literal" | "sh:BlankNodeOrIRI" | "sh:BlankNodeOrLiteral" | "sh:IRIOrLiteral"
Class := "SPARQL ?x a ?Class ."
Property := "SPARQL ?subj ?Property ?obj . FILTER ( isIRI(?Property) ) ."
DataType := "SPARQL { SELECT distinct ?o WHERE { ?s ?p ?o . FILTER ( isLiteral(?o) ) } } BIND( datatype(?o) as ?DataType ) ."
Due to the complexity, we suggest to experiment different value for ChromosomeSize
: e.g. 20
SPARQL endpoint: http://ns.inria.fr/rdfminer/sparql
The corese-server, used by RDFminer-core, provides distincts named RDF data graphs:
It is an RDF dataset produced from the covid-on-the-web dataset. It describes scientific articles and named entities identified in these articles and linked to Wikidata entities. We consider a subset containing 18.79% of the articles and 0.01% of the named entities.
#RDF triples | #distinct articles | #distinct named entities |
---|---|---|
266,647 | 20,912 | 6,331 |
1% of full instance of DBPedia 2015.04 (english version) > 6,534,658 RDF triples