-
Notifications
You must be signed in to change notification settings - Fork 46
RDFization Guide
The linked data that forms part of Bio2RDF ascribes a to simple set modeling patterns that permit the different datasets to interoperate seamlessly.
==Identifiers==
=== Entities ===
The first step of the RDFization process involves using a consistent identifier scheme so that we can syntactically integrate data across the Bio2RDF network. Bio2RDF identifiers are given by the following URI pattern:
$sda = 2l;
http://bio2rdf.org/''namespace'':''identifier''
where the ''namespace'' is a short name listed in our [http://www.freebase.com/view/base/bio2rdf/bm/-/base/bio2rdf dataset registry] that uniquely identifies the source (dataset/database). The ''identifier'' is the (alpha)numeric string assigned to identify that entity. For instance, the gene identified by the number 15275 in the NCBI EntrezGene Database (namespace = geneid) has the following identifier:
http://bio2rdf.org/''geneid'':''15275''
===Vocabulary=== The Bio2RDF URI scheme is applied not just to data entries, but also for the vocabulary (types and relations) to describe these entries.
http://bio2rdf.org/''namespace''_term:''term''
For example, the gene identified by geneid:15275 is a kind of Gene, as defined by Entrez Gene.
http://bio2rdf.org/''geneid''_term:''Gene''
==Descriptions== ===Minimum Annotations===
Each resource should contain the following annotations:
http://purl.org/dc/terms/title
a human readable title as it appears in the source data.
http://purl.org/dc/terms/identifier
a string that contains the identifier using the following pattern :
rdfs:label
a Bio2RDF generated label containing a title followed by the identifier "title [ns:id]".
Used by convention in most RDF browsers to render the name of resource instead of its URI.
Taken together,
geneid:15275
rdfs:label "Hk1 [geneid:15275]" ;
dc:title "Hk1" ;
dc:identifier "geneid:15275" ;
rdf:type geneid_term:Gene .
===Datasets, Records and Entities===
We recognize a minimum of 3 entities found in biological information resources: physical entities, records and datasets.
- Record
Records are information objects that contain a set of statements, primarily about the subject.
namespace_record:identifier
bio2rdf_term:has-primary-subject namespace:identifier .
namespace:identifier
bio2rdf_term:is-described-by namespace_record:identifier .
- Dataset Datasets are collections of records.
bio2rdf_dataset:
bio2rdf_term:has-item namespace_record:identifer .
Since datasets can be versioned, we
bio2rdf_dataset:namespace.version
dc:hasVersion "13" ;
dc:partOf bio2rdf_dataset:namespace .
==Mappings== this section is about how to create mappings from your dataset specific vocabulary to SIO.
==Scripts== :Category:Scripts
==Serialization==
==Loading== Loading the RDF database