-
Notifications
You must be signed in to change notification settings - Fork 46
RDFization Guide
The linked data that forms part of Bio2RDF ascribes a to simple set modeling patterns that permit the different datasets to interoperate seamlessly.
The first step of the RDFization process involves using a consistent identifier scheme so that we can syntactically integrate data across the Bio2RDF network. Bio2RDF identifiers are given by the following URI pattern: `$sda = 2l;` http://bio2rdf.org/namespace:identifier
where the namespace is a short name listed in our dataset registry that uniquely identifies the source (dataset/database). The identifier is the (alpha)numeric string assigned to identify that entity. For instance, the gene identified by the number 15275 in the NCBI EntrezGene Database (namespace = geneid) has the following identifier:
<code>http://bio2rdf.org/''geneid'':''15275''</code>
The Bio2RDF URI scheme is applied not just to data entries, but also for the vocabulary (types and relations) to describe these entries.
<code>http://bio2rdf.org/''namespace''_term:''term''</code>
For example, the gene identified by geneid:15275 is a kind of Gene, as defined by Entrez Gene.
<code>http://bio2rdf.org/''geneid''_term:''Gene''</code>
Each resource should contain the following annotations:
<code>http://purl.org/dc/terms/title</code> a human readable title as it appears in the source data.
<code>http://purl.org/dc/terms/identifier</code> a string that contains the identifier using the following pattern <namespace>:<identifier>
<code>rdfs:label</code> a Bio2RDF generated label containing a title followed by the identifier "title [ns:id]".
Used by convention in most RDF browsers to render the name of resource instead of its URI.
Taken together,
<code> geneid:15275 rdfs:label "Hk1 [geneid:15275]" ; dc:title "Hk1" ; dc:identifier "geneid:15275" ; rdf:type geneid_term:Gene . </code>
We recognize a minimum of 3 entities found in biological information resources: physical entities, records and datasets.
1. Record
Records are information objects that contain a set of statements, primarily about the subject.
<code> namespace_record:identifier bio2rdf_term:has-primary-subject namespace:identifier . </code>
<code> namespace:identifier bio2rdf_term:is-described-by namespace_record:identifier . </code>
2. Dataset Datasets are collections of records.
<code> bio2rdf_dataset:<namespace> bio2rdf_term:has-item namespace_record:identifer . </code>
Since datasets can be versioned, we
<code> bio2rdf_dataset:namespace.version dc:hasVersion "13" ; dc:partOf bio2rdf_dataset:namespace . </code>
this section is about how to create mappings from your dataset specific vocabulary to SIO.