Skip to content

Latest commit

 

History

History
412 lines (275 loc) · 20.3 KB

readme.md

File metadata and controls

412 lines (275 loc) · 20.3 KB

Aristotle

An RDF/OWL library for Clojure, providing a data-oriented wrapper for Apache Jena.

Key features:

  • Read/write RDF graphs using idiomatic Clojure data structures.
  • SPARQL queries expressed using Clojure data structures.
  • Pluggable inferencing and reasoners.
  • Pluggable validators.

Rationale

RDF is a powerful framework for working with highly-annotated data in very abstract ways. Although it isn't perfect, it is highly researched, well defined and understood, and the industry standard for "rich" semi-structured, open-ended information modeling.

Most of the existing Clojure tools for RDF are focused mostly on creating and manipulating RDF graphs in pure Clojure at a low level. I desired a more comprehensive library with the specific objective of bridging existing idioms for working with Clojure data to RDF graphs.

Apache Jena is a very capable, well-designed library for working with RDF and the RDF ecosystem. It uses the Apache software license, which unlike many other RDF tools is compatible with Clojure's EPL. However, Jena's core APIs can only be described as agressively object-oriented. Since RDF is at its core highly data-oriented, and Clojure is also data-oriented, using an object-oriented or imperative API seems especially cumbersome. Aristotle attempts to preserve "good parts" of Jena, while replacing the cumbersome APIs with clean data-driven interfaces.

Aristotle does not provide direct access to other RDF frameworks (such as RDF4j, JSONLD, Commons RDF, OWL API, etc.) However, Jena itself is highly pluggable, so if you need to interact with one of these other systems it is highly probably that a Jena adapter already exists or can be easily created.

Index

Data Model

To express RDF data as Clojure, Aristotle provides two protocols. arachne.aristotle.graph/AsNode converts Clojure literals to RDF Nodes of the appropriate type, while arachne.aristotle.graph/AsTriples converts Clojure data structures to sets of RDF triples.

Literals

Clojure primitive values map to Jena Node objects of the appropriate type.

Clojure Type RDF Node
long XSD Long
double XSD Double
boolean XSD Boolean
java.math.BigDecimal XSD Decimal
java.util.Date XSD DateTime
java.util.Calendar XSD DateTime
string enclosed by angle brackets
(e.g, "<http://foo.com/#bar>")
IRI
other strings XSD String
keyword IRI (see explanation of IRI/keyword registry below)
java.net.URL IRI
java.net.URI IRI
symbols starting with ? variable node (for patterns or queries)
the symbol _ unique blank node
symbols starting with _ named blank node
other symbols IRI of the form <urn:clojure:namespace/name>.

IRI/Keyword Registry

Since IRIs are usually long strings, and tend to be used repeatedly, using the full string expression can be cumbersome. Furthermore, Clojure tends to prefer keywords to strings, especially for property/attribute names and enumerated or constant values.

Therefore, Aristotle provides a mechanism to associate a namespace with an IRI prefix. Keywords with a registered namespace will be converted to a corresponding IRI.

Use the arachne.aristotle.registry/prefix function to declare a prefix. For example,

(reg/prefix 'foaf "http://xmlns.com/foaf/0.1/")

Then, keywords with a :foaf namespace will be interpreted as IRI nodes. For example, with the above declaration :foaf/name will be interpreted as <http://xmlns.com/foaf/0.1/name>.

The following common namespace prefixes are defined by default:

Namespace IRI Prefix
rdf <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
rdfs <http://www.w3.org/2000/01/rdf-schema#>
xsd <http://www.w3.org/2001/XMLSchema#>
owl <http://www.w3.org/2002/07/owl#>
owl2 <http://www.w3.org/2006/12/owl2#>

The registry is stored in the global dynamic Var arachne.aristotle.registry/*registry*, which can be also overridden on a thread-local basis using the arachne.aristotle.registry/with macro, which takes a map of namespaces (as keywords) and IRI prefixes. For example:

(reg/with {'foaf "http://xmlns.com/foaf/0.1/"
           'dc "http://purl.org/dc/elements/1.1/"}
  ;; Code using keywords with :foaf and :dc namespaces
  )

You can also register a prefix in RDF/EDN data, using the #rdf/prefix tagged literal. The prefix will be added to the thread-local binding and is scoped to the same triple expansion. This allows you to define a prefix alongside the data that uses it, without installing it globally or managing it in your code. For example:

[#rdf/prefix [:ex "http://example.com/"]
 {:rdf/about :ex/luke
  :foaf/name "Luke"}]

Wildcard Prefixes

Aristotle now allows you to register a RDF IRI prefix for a namespace prefix, rather than a fully specified namespace. To do so, use an asterisk in the symbol you provide to the prefix function:

(reg/prefix 'arachne.* "http://arachne-framework.org/vocab/1.0/")

This means that keywords with a namespace that starts with an arachne namespace segment will use the supplied prefix. Any additional namespace segments will be appended to the prefix, separated by a forward slash (/).

Given the registration above, for example, the keyword :arachne.http.request/body would be interpreted as the IRI "http://arachne-framework.org/vocab/1.0/http/request/body"

If multiple wildcard prefixes overlap, the system will use whichever is more specific, and will prefer non-wildcard registrations to wildcard registrations in the case of ambiguity.

Using # or any other character as a prefix separator for wildcard prefixes, instead of /, is currently not supported.

Data Structures

You can use the arachne.aristotle.graph/triples function to convert any compatible Clojure data structure to a collection of RDF Triples (usually in practice it isn't necessary to call triples explicitly, as the higher-level APIs do it for you.)

Single Triple

A 3-element vector can be used to represent a single RDF Triple. For example:

(ns arachne.aristotle.example
  (:require [arachne.aristotle.registry :as reg]
            [arachne.aristotle.graph :as g]))

(reg/prefix 'arachne.aristotle.example "http://arachne-framework.org/example#")

(g/triples [::luke :foaf/firstName "Luke"])

The call to g/triples returns a collection containing a single Jena Triple with a subject of <http://arachne-framework.org/example#luke>, a predicate of <http://xmlns.com/foaf/0.1/firstName> an the string literal "Luke" as the object.

Collections of Triples

A collection of multiple triples works the same way.

For example,

(g/triples '[[luke :foaf/firstName "Luke"]
             [luke :foaf/knows nola]
             [nola :foaf/firstName "Nola"]]) 

Note the use of symbols; in this case, the nodes for both Luke and Nola are represented as blank nodes (without explicit IRIs.)

Maps

Maps may be used to represent multiple statements about a single subject, with each key indicating an RDF property. The subject of the map is indicated using the special :rdf/about key, which is not interpreted as a property, but rather as identifying the subject of the map. If no :rdf/about key is present, a blank node will be used as the subject.

For example:

(g/triples {:rdf/about ::luke
            :foaf/firstName "Luke"
            :foaf/lastName "VanderHart"})

This is equivalent to two triples:

<http://arachne-framework.org/example#luke> <http://xmlns.com/foaf/0.1/firstName> "Luke"
<http://arachne-framework.org/example#luke> <http://xmlns.com/foaf/0.1/lastName> "VanderHart"
Multiple Values

If the value for a key is a single literal, it is interpreted as a single triple. If the value is a collection, it is intererpreted as multiple values for the same property. For example:

(g/triples {:rdf/about ::luke
            :foaf/made [::arachne ::aristotle ::quiescent]})

Expands to:

<http://arachne-framework.org/example#luke> <http://xmlns.com/foaf/0.1/made> <http://arachne-framework.org/example#arachne>
<http://arachne-framework.org/example#luke> <http://xmlns.com/foaf/0.1/made> <http://arachne-framework.org/example#quiescent>
<http://arachne-framework.org/example#luke> <http://xmlns.com/foaf/0.1/made> <http://arachne-framework.org/example#aristotle>
Nested Maps

In addition to literals, the values of keys may be additional maps (or collections of maps). The subject of the nested map will be both the object of the property under which it is specified, and the subject if statements in its own map.

(g/triples {:rdf/about ::luke
            :foaf/knows [{:rdf/about ::nola
                          :foaf/name "Nola"
                          :foaf/knows ::luke}}
                         {:rdf/about ::Jim
                          :foaf/name "Jim"}}])

Expressed in expanded triples, this is:

<http://arachne-framework.org/example#luke> <http://xmlns.com/foaf/0.1/knows> <http://arachne-framework.org/example#nola>
<http://arachne-framework.org/example#nola> <http://xmlns.com/foaf/0.1/name> "Nola"
<http://arachne-framework.org/example#nola> <http://xmlns.com/foaf/0.1/knows> <http://arachne-framework.org/example#luke>
<http://arachne-framework.org/example#luke> <http://xmlns.com/foaf/0.1/knows> <http://arachne-framework.org/example#jim>
<http://arachne-framework.org/example#jim> <http://xmlns.com/foaf/0.1/name> "Jim"

API

Aristotle's primary API is exposed in its top-level namespace, arachne.aristotle, which defines functions to create and interact with graphs.

A graph is a collection of RDF data, together with (optionally) logic and/or inferencing engines. Graphs may be stored in memory or be a facade to an external RDF database (although all the graph constructors shipped with Aristotle are for in-memory graphs.)

Graphs are instances of org.apache.jena.graph.Graph, which are stateful mutable objects (mutability is too deeply ingrained in Jena to provide an immutable facade.) However, Aristotle's APIs are consistent in returning the model from any update operations, as if graphs were immutable Clojure-style collections. It is reccomended to rely on the return value of update operations, as if graphs were immutable, so your code does not break if immutable graph representations are ever supported.

Jena Graphs are not thread-safe by default; make sure you limit concurrent graph access.

Creating a Graph

To create a new graph, invoke the arachne.aristotle/graph multimethod. The first argument to graph is a keyword specifying the type of graph to construct, additional arguments vary depending on the type of graph.

Graph constructors provided by Aristotle include:

type model
:simple Basic in-memory triple store with no inferencing capability
:jena-mini In-memory triple store that performs OWL 1 inferencing using Jena's "Mini" inferencer (a subset of OWL Full with restrictions on some of the less useful forward entailments.)
:jena-rules In-memory triple store supporting custom rules, using Jena's hybrid backward/forward rules engine. Takes a collection of org.apache.jena.reasoner.rulesys.Rule objects as an additional argument (the prebuilt collection of rules for Jena Mini is provided at arachne.aristotle.inference/mini-rules)

Clients may wish to provide additional implementations of the graph multimethod to support additional underlying graphy or inference types; the only requirement is that the method return an instance of org.apache.jena.rdf.graph.Graph. For example, for your project, you may wish to create a Graph backed by on-disk or database storag, or which uses the more powerful Pellet reasoner, which has Jena integration but is not shipped with Aristotle due to license restrictions.

Example:

(require '[arachne.aristotle :as aa])
(def m (aa/graph :jena-mini))

Adding data to a graph

In order to do anything useful with a graph, you must add additional facts. Facts may be added either programatically in your code, or by reading serialized data from a file or remote URL.

Adding data programatically

To add data programatically, use the arachne.aristotle/add function, which takes a graph and some data to add. The data is processed into RDF triples using arachne.aristotle.graph/triples, using the data format documented above. For example:

(require '[arachne.aristotle :as aa])

(def g (aa/graph :jena-mini))

(aa/add g {:rdf/about ::luke
           :foaf/firstName "Luke"
           :foaf/lastName "VanderHart"})

Adding data from a file

To add data from a file, use the arachne.aristotle/read function, which takes a graph and a file. The file may be specified by a:

  • String of the absolute or process-relative filename
  • java.net.URI
  • java.net.URL
  • java.io.File

Jena will detect what format the file is in, which may be one of RDF/XML, Turtle, N3, or N-Triples. All of the statements it contains will be added to the graph. Example:

Query

Aristotle provides a data-oriented interface to Jena's SPARQL query engine. Queries themselves are expressed as Clojure data, and can be programatically generated and combined (similar to queries in Datomic.)

To invoke a query, use the arachne.aristotle.query/query function, which takes a query data structure, a graph, and any query inputs. It returns the results of the query.

SPARQL itself is string oriented, with a heavily lexical grammar that does not translate cleanly to data structures. However, SPARQL has an internal algebra that is very clean and composable. Aristotle's query data uses this internal SPARQL alegebra (which is exposed by Jena's ARQ data graph) ignoring SPARQL syntax. All queries expressible in SPARQL syntax are also expressible in Aristotle's query data, modulo some features that are not implemented yet (e.g, query fedration across remote data sources.)

Unfortunately, the SPARQL algebra has no well documented syntax. A rough overview is available, and this readme will document some of the more common forms. For more details, see the query specs with their associated docstrings.

Aristotle queries are expressed as compositions of algebraic operations, using the generalized form [operation expression* sub-operation*] These operation vectors may be nested arbitrarily.

Expressions are specified using a Clojure list form, with the expression type as a symbol. These expressions take the general form (expr-type arg*).

Running Queries

To run a query, use the arachne.aristotle.query/run function. This function takes a graph, an (optional) binding vector, a query, and (optionally) a map of variable bindings which serve as query inputs.

If a binding vector is given, results will be returned as a set of tuples, one for each unique binding of the variables in the binding vector.

If no binding vector is supplied, results will be returned as a sequence of query solutions, with each solution represented as a map of the variables it binds. In this case, solutions may not be unique (unless the query specifically inclues a :distinct operation.)

Some examples follow:

Sample: simple query

(require '[arachne.aristotle.query :as q])

(q/run my-graph '[:bgp [:example/luke :foaf/knows ?person]
                       [?person :foaf/name ?name]])

This query is a single pattern match (using a "basic graph pattern" or "bgp"), binding the :foaf/name property of each entity that is the subject of :foaf/knows for an entity identified by :example/luke.

An example of the results that might be returned by this query is:

({?person <http://example.com/person#james> ?name "Jim"},
 {?person <http://example.com/person#sara> ?name "Sara"},
 {?person <http://example.com/person#jules> ?name "Jules"})

Sample: simple query with result binding

This is the same query, but using a binding vector

(q/run my-graph '[?name]
       '[:bgp [:example/luke :foaf/knows ?person]
              [?person :foaf/name ?name]])

In this case, results would look like:

#{["Jim"]
  ["Sara"]
  ["Jules"]}

Sample: query with filtering expression

This example expands on the previous query, using a :filter operation with an expression to only return acquaintances above the age of 18:

(q/run my-graph '[?name]
       '[:filter (< 18 ?age)
         '[:bgp [:example/luke :foaf/knows ?person]
                [?person :foaf/name ?name]
                [?person :foaf/age ?age]]])

Sample: providing inputs

This example is the same as those above, except instead of hardcoding the base individual as :example/luke, the starting individual is bound in a separate binding map provided to q/run.

(q/run my-graph '[?name]
        [:bgp [?individual :foaf/knows ?person]
              [?person :foaf/name ?name]]
  '{?individual :example/luke})

It is also possible to bind multiple possibilities for the value of ?individual:

(q/run my-graph '[?name]
        [:bgp [?individual :foaf/knows ?person]
              [?person :foaf/name ?name]]
  '{?individual #{:example/luke
                  :example/carin
                  :example/dan}})

This will find the names of all persons who are known by Luke, Carin OR Dan.

Precompiled Queries

Queries can also be precompiled into a Jena Operation object, meaning they do not need to be parsed, interpreted, and optimized again every time they are invoked. To precompile a query, use the arachne.aristotle.query/build function:

(def friends-q (q/build '[:bgp [?individual :foaf/knows ?person]
                               [?person :foaf/name ?name]]))

You can then use the precompiled query object (bound in this case to friends-q in calls to arachne.aristotle.query/run:

(q/run my-graph friends-q '{?individual :example/luke})

The results will be exactly the same as using the inline version.

Validation

One common use case is to take a given Graph and "validate" it, ensuring its internal consistency (including whether entities in it conform to any OWL or RDFS schema that is present.)

To do this, run the arachne.aristotle.validation/validate function. Passed only a graph, it will return any errors returned by the Jena Reasoner that was used when constructing the graph. The :simple reasoner will never return any errors, the :jena-mini reasoner will return OWL inconsistencies, etc.

If validation is successfull, the validator will return nil or an empty list. If there were any errors, each error will be returned as a map containing details about the specific error type.

Closed-World validation

The built-in reasoners use the standard open-world assumption of RDF and OWL. This means that many scenarios that would intuitively be "invalid" to a human (such as a missing min-cardinality attribute) will not be identified, because the reasoner alwas operates under the assumption that it doesn't yet know all the facts.

However, for certain use cases, it can be desirable to assert that yes, the graph actually does contain all pertinent facts, and that we want to make some assertions based on what the graph actually knows at a given moment, never mind what facts may be added in the future.

To do this, you can pass additional validator functions to validate, providing a sequence of optional validators as a second argument.

Each of these validator functions takes a graph as its argument, and returns a sequence of validation error maps. An empty sequence implies that the graph is valid.

The "min-cardinality" situation mentioned above has a built in validator, arachne.aristotle.validators/min-cardinality. It works by running a SPARQL query on the provided graph that detects if any min-cardinality attributes are missing from entities known to be of an OWL class where they are supposed to be present.

To use it, just provide it in the list of custom validators passed to validate:

(v/validate m [v/min-cardinality])

This will return the set not only of built in OWL validation errors, but also any min-cardinality violations that are discovered.

Of course, you can provide any additional validator functions as well.