Skip to content

Bio2RDF Dataset Summary Statistics

Michel Dumontier edited this page May 13, 2016 · 1 revision

Bio2RDF Release 3 Summary Statistics

For each Bio2RDF Release 3 Dataset, we compute summary statistics using SPARQL queries over the SPARQL endpoint with our script. You can browse the dataset descriptions and their statistics (e.g. Clinicaltrials) from the Release 3 Datasets.

We use a combination of the VOID vocabulary with our own custom vocabulary (described here). Each dataset is associated with its statistics using void:subset. Each statistics is typed using our custom vocabulary and has additional structure to describe its contents. We use YASGUI to execute the SPARQL queries [e.g. # triples for clinicaltrials]

 PREFIX void: <http://rdfs.org/ns/void#>
 PREFIX ds: <http://bio2rdf.org/bio2rdf.dataset_vocabulary:>

triples

 SELECT * { [] void:subset [ a ds:Dataset-Triples; void:entities ?triples;]}

distinct entities

 SELECT * { [] void:subset [ a ds:Dataset-Distinct-Entities; void:entities ?entities;]}

distinct subjects

 SELECT * { [] void:subset [ a ds:Dataset-Distinct-Subjects; void:entities ?subjects;]}

distinct objects

 SELECT * { [] void:subset [ a ds:Dataset-Distinct-Objects; void:entities ?objects;]}

distinct types

 SELECT * { [] void:subset [ a ds:Dataset-Distinct-Types; void:entities ?types;]}

distinct properties

 SELECT * { [] void:subset [ a ds:Dataset-Distinct-Properties; void:entities ?properties;]}

distinct literals

 SELECT * { [] void:subset [ a ds:Dataset-Distinct-Literals; void:entities ?literals;]}

type counts

 SELECT *
 { [] void:subset [ 
       a ds:Dataset-Type-Count; 
       void:class ?type; 
       void:entities ?count; 
       void:distinctEntities ?distinctCount;
   ]
 }

object property counts

 SELECT *
 { [] void:subset [
   a ds:Dataset-Object-Property-Count; 
   void:linkPredicate ?property;
   void:objectsTarget [
     void:entities ?object_count; 
     void:distinctEntities ?object_distinct_count;
  ]]
 }

datatype property counts

 SELECT *
 { [] void:subset [
   a ds:Dataset-Datatype-Property-Count; 
   void:linkPredicate ?property;
   void:objectsTarget [
     void:entities ?datatype_count; 
     void:distinctEntities ?datatype_distinct_count;
  ]]
}

property object type counts

 SELECT *
 { [] void:subset [
   a ds:Dataset-Property-Object-Type-Count; 
   void:linkPredicate ?property;
   void:objectsTarget [
     void:class ?object_type; 
     void:entities ?object_type_count; 
     void:distinctEntities ?object_type_distinct_count;
  ]]
}

subject property object counts

SELECT *
{ 
[] void:subset [
   a ds:Dataset-Subject-Property-Object-Count; 
   void:linkPredicate ?property;
   void:subjectsTarget [
     void:entities ?subject_count; 
     void:distinctEntities ?subject_distinct_count;
  ];
   void:objectsTarget [
     void:entities ?object_count; 
     void:distinctEntities ?object_distinct_count;
  ]]
}

subject-type property object-type counts

SELECT *
{ 
[] void:subset [
   a ds:Dataset-Type-Property-Type-Count; 
   void:linkPredicate ?property;
   void:subjectsTarget [
     void:class ?subject_type;
     void:entities ?subject_count; 
     void:distinctEntities ?subject_distinct_count;
  ];
   void:objectsTarget [
     void:class ?object_type;
     void:entities ?object_count; 
     void:distinctEntities ?object_distinct_count;
  ]]
}